您好,欢迎访问三七文档
Anattentionalsystemcombiningtop-downandbottom-upinfluencesBabakRasolzadeh,M˚artenBj¨orkmanandJan-OlofEklundhNADA/CVAPS-10044Stockholm,Sweden{babak2,celle,joe}@nada.kth.seAbstractAttentionplaysanimportantroleinhumanprocessingofsensoryinformationasameanoffocusingresourcesto-wardsthemostimportantinputsatthemoment.Invisionithasbeenarguedattentionalprocessesarecrucialfordeal-ingwiththecomplexityofrealworldscenes.Theproblemhasoftenbeenposedintermsofvisualsearchtasks.Ithasbeenshownthatboththeuseofpriortaskandcontextin-formation-top-downinfluences-andfavoringinformationthatstandsoutclearlyinthevisualfield-bottom-upinflu-ences-canmakesuchtasksmoreefficient.Ingenericsceneanalysisonepresumablyhasacombinationoftheseinflu-ences.Inthispaperwedescribeacomputationalmodelthatperformssuchacombinationinaprincipledway.Thesystemlearnsanoptimalrepresentationoftheinfluencesoftaskandcontextandtherebyconstructsabiasedsaliencymaprepresentingthetop-downinformation.Thismapiscombinedwithbottom-upsaliencymapsinaprocesspro-gressingovertimeasafunctionovertheinput.Thesystemisappliedtosearchtasksinsingleimagesaswellasinrealscenes,inthelattercaseusinganactivevisionsystemca-pableofgazeshifting.Theproposedmodelhasthedesiredqualitiesandgoesbeyondearlierproposedsystems.1IntroductionWhenobservingavisualenvironmenthumanstendtodoasubconsciousrankingofthe“interestingness”ofthedif-ferentcomponentsofthatscene.Thisrankingdependsontheobserveraswellasthescene.Whatthismeansinamorepragmaticsenseisthatourgoalsanddesiresinteractwiththeintrinsicpropertiesoftheenvironmentsothattherankingofcomponentsinthesceneisdonewithrespecttohowtheyrelatetotheirsurroundings(bottom-up)andtoourobjectives(top-down)[10,16].Inhumanstheattendedre-gionisthenselectedthroughdynamicmodificationsofcor-ticalconnectivityorthroughtheestablishmentofspecifictemporalpatternsofactivity,underbothtop-down(task-dependent)andbottom-up(scene-dependent)control[2].Currentmodelsofhowthisisdoneinthehumanvi-sualsystemgenerallyassumeabottom-up,fastandprim-itivemechanismthatbiasestheobservertowardsselect-ingstimulibasedontheirsaliency(mostlikelyencodedintermsofcenter-surroundmechanisms)andasecondslower,top-downmechanismwithvariableselectioncriteria,whichdirectsthe’spotlightofattention’undercognitive,voli-tionalcontrol[20].Incomputervision,attentiveprocess-ingforsceneanalysisinitiallylargelydealtwithsaliencebasedmodels,following[20]andtheinfluentialmodelofKochandUllman[13].However,severalcomputationalapproachestoselectiveattentiveprocessingthatcombinestop-downandbottom-upinfluenceshavebeenpresentedinrecentyears.KoikeandSaiki[14]proposethatastochasticWTAen-ablesthesaliency-basedsearchmodeltocausethevaria-tionoftherelativesaliencytochangesearchefficiency,duetostochasticshiftsofattention.Ramstr¨omandChristensen[18]calculatefeatureandbackgroundstatisticstobeusedinagametheoreticWTAframeworkfordetectionofobjects.Choietal.[4]suggestlearningthedesiredmodulationsofthesaliencymap(basedontheIttiandKochmodel[15])fortop-downtuningofattention,withtheaidofanART-network.NavalpakkamandItti[17]enhancethebottom-upsaliencemodeltoyieldasimple,yetpowerfularchitecturetolearntargetobjectsfromtrainingimagescontainingtar-getsindiverse,complexbackgrounds.Earlierversionsoftheirmodeldidnotlearnobjecthierarchiesandcouldnotgeneralize,butthecurrentmodelcandothatbycombiningobjectclassesintoamoregeneralsuper-class.Leeetal.[12]showedthatanInteractiveSpikingNeuralNetworkcanbeusedtobiasthebottom-upprocessingto-wardsatask(intheircaseinfacedetection),buttheirmodelwaslimitedtotheinfluenceofuserprovidedtop-downcuesandcouldnotlearntheinfluenceofcontext.InFrintrop’sVOCUS-model[7]therearetwoversionsofthesaliencymap;atop-downmapandabottom-upone.Thebottom-upmapissimilartothatofIttiandKoch’s,whilethetop-downmapisatunedversionofthebottom-upone.Thetotalsaliencymapisalinear-combinationofthetwomapsusingafixeduserprovidedweight.Thismakesthecombinationrigidandnon-flexible,whichmayresultinlossofimportantbottom-upinformation.Olivaetal.[1]showthattop-downinformationfromvisualcontextmodulatesthesaliencyofimageregionsduringthetaskofobjectdetection.Theirmodellearnstherelationshipbetweencontextfeaturesandthelocationofthetargetduringpastexperienceinordertoselectinterestingregionsoftheimage.Inthispaperwewilldefinethetop-downinformationasconsistingoftwocomponents:1)task-dependentinforma-tionwhichisusuallyvolitional,and2)contextualscene-dependentinformation.Wethenproposeasimple,butef-fective,NeuralNetworkthatlearnstheoptimalbiasofthetop-downsaliencymap,giventhesesourcesofinformation.Themostnovelpartoftheworkisadynamiccombina-tionofthebottom-upandtop-downsaliencymaps.Hereaninformation-measure(basedonentropymeasures)indi-catestheimportanceofeachmapandthushowthelinear-combinationshouldbealteredovertime.Thecombinationwillvaryovertimeandbegovernedbyadifferentialequa-tionthatcanbesolvedatleastnumericallyforsomespecialcases.TogetherwithamechanismforInhibition-of-Return,thisdynamicsystemmanagestoadjustitselftoanbalancedbehavior,whereneithertop-downnorbottom-upinforma-tioniseverneglected.Figur
本文标题:An attentional system combining top-down and botto
链接地址:https://www.777doc.com/doc-1332 .html