您好,欢迎访问三七文档
PergamonPatternRecognition,Vol.30,No.4,pp.643~658,1997qL~1997PatternRecognitionSociety.PublishedbyElsevierScienceLtdPrintedinGreatBritain.Allrightsreserved0031-3203/97$17.00+.00PII:S0031-3203(96)00109-4ANINTEGRATEDSYSTEMFORCONTENT-BASEDVIDEORETRIEVALANDBROWSINGHONGJIANGZHANG,~'*JIANHUAWU,DIZHONGandSTEPHENW.SMOLIARInstituteofSystemsScience,NationalUniversityofSingapore,HengMuiKengTerrace,KentRidge,Singapore0511,Singapore(Received12June1996;receivedforpublication30July1996)Abstract--Thispaperpresentsanintegratedsystemsolutionforcomputerassistedvideoparsingandcontent-basedvideoretrievalandbrowsing.Theeffectivenessofthissolutionliesinitsuseofvideocontentinformationderivedfromaparsingprocess,beingdrivenbyvisualfeatureanalysis.Thatis,parsingwilltemporallysegmentandabstractavideosource,basedonlow-levelimageanalyses;thenretrievalandbrowsingofvideowillbebasedonkey-frame,temporalandmotionfeaturesofshots.Theseprocessesandasetoftoolstofacilitatecontent-basedvideoretrievalandbrowsingusingthefeaturedatasetarepresentedindetailasfunctionsofanintegratedsystem.~;)1997PatternRecognitionSociety.PublishedbyElsevierScienceLtd.VideoparsingVideoretrievalVideobrowsingMultimediaDatabaseI.INTRODUCTIONWithrapidadvancesincommunicationandmultimediacomputingtechnologies,accessingmassamountsofmultimediadataisbecomingareality.However,inter-actiontomultimediadata,videoinparticular,isingeneralstillnoteasy.Forexample,selectionofavideoclipinconventionalvideoondemand(VOD)systemsrarelyinvolvesanythingbetterthankeywordssearchorcategorybrowsing;andanymanipulationofthevideoitselfislimitedtothelowestlevelofVCRcontrol.Theproblemisthat,fromthepointofviewofcontent,theresourcesmanagedbysuchsystemsareunstructured,andthereforecanbeneitherindexednoraccessedonthebasisofstructuralproperties.Fundamentally,apartfromotherrequirements,whatweneedarevideoparsingtoolstoextractstructureandcontentinformationofvideo.Onlyaftersuchinformationbecomesavailablecancon-tent-basedretrievalandmanipulationofvideodatabefacilitated.Weseeparsingofvideodataasencompassingtwotasks:temporalsegmentationofavideoprogramintoelementalunits,andcontentextractionfromthoseunits,basedonbothvideoandaudiosemanticprimitives.(~)Thetemporalsegmentationprocessisanalogoustosentencesegmentationandparagraphinginparsingtex-tualdocuments,andmanyeffectivealgorithmsarenowavailableforthistask.(2-7)However,fullyautomatedcontentextractionisamuchmoredifficulttask,requiringbothsignalanalysisandknowledgerepresentationtech-niques;sohumanassistanceisstillneeded.Wethusfeelthemostfruitfulresearchapproachistoconcentrateonfacilitatingtools,usinglow-levelvisualfeatures.Such*Authortowhomcorrespondenceshouldbesent.tCurrentaddress:HPLabs,PageMillRoad,PaloAlto,CA94304,U.S.A.toolsareclearlyfeasibleandresearchinthisdirectionshouldultimatelyleadtoanintelligentvideoparsingsystem.(8-10)Retrievalandbrowsingrequirethatthesourcematerialfirstbeeffectivelyindexed.Whilemostpreviousresearchinindexinghasbeentext-based,(11'12)content-basedindexingofvideowithvisualfeaturesisstillanopenresearchproblem.Visualfeaturescanbedividedintotwolevels:low-levelimagefeatures,andsemanticfeaturesbasedonobjectsandevents.Thesemanticlevelincludesname,appearance,motion,andtemporalvariationofcharacteristicsofconstituentobjects,aswellasrelation-shipsamongdifferentobjectsatdifferenttimesandthecontributionsofalltheseattributesandrelationshipstothestorybeingpresentedinavideosequence.(11,t2)Todateautomationoflow-levelfeatureindexinghasbeenfarmoresuccessfulthanthatofsemanticindexinginimagedatabases,(13-~5)especiallyinsomespecialappli-cations,suchasdatabaseofhumanfaces.(tS~Then,perhapsthebiggestproblemwithindexingvideousingthelow-levelimagefeaturesofeveryframeisitsen-ormousvolume,whileuniformsubsampling°6~mayreducesomedata,butriskyforlosingimportantframes.Webelievethataviablesolutionistoindexrepresenta-tivekey-frames,(~7~extractedfromthevideosources.Then,theproblembecomeshowtoobtainthekey-framesautomaticallyandcontent-basedfromvideosources,whichisoneofthekeycontributionofourworkpre-sentedinthispaper.Whilewetendtothinkofindexingsupportingretrie-val,browsingisequallysignificantforvideosourcematerial.Bybrowsingwemeananinformalperusalofcontentwhichmaylackanyspecificgoalorfocus.Thetaskofbrowsingisactuallyveryintimatelyrelatedtoretrieval.Ontheonehand,ifaqueryistoogeneral,browsingisthebestwaytoexaminetheresults.This643644H.J.ZHANGetal.shouldprovidesomeindicationofwhythequerywaspoorlyexpressed;sobrowsingalsoservesasanaidtoformulatingqueries,makingiteasierfortheusertojustaskaroundintheprocessoffiguringoutthemostappropriatequerytopose.Unfortunately,theonlymajortechnologicalprecedentforvideobrowsingistheVCR(evenwhenavailableinthesoftformforcomputerviewing),withitssupportforsequentialfastforwardandreverseplay.Browsingavideothiswayisamatterofskippingframes:thefastertheplayspeed,thelargertheskipfactor.Thecontent-basedbrowserofreference(16)takesthisapproach,butauniformskipfactorreallydoesnotaccountforvideocontent.Furthermore,thereisalwaysthedangerthatsomeskippedframesmaycontainth
本文标题:An integrated system for content-based video retri
链接地址:https://www.777doc.com/doc-3336135 .html