您好,欢迎访问三七文档
TechnicalReportNo.9607,DepartmentofStatistics,UniversityofTorontoFactorAnalysisUsingDelta-RuleWake-SleepLearningRadfordM.NealDepartmentofStatisticsandDepartmentofComputerScienceUniversityofTorontoradford@stat.utoronto.caPeterDayanDepartmentofBrainandCognitiveSciencesMassachusettsInstituteofTechnologydayan@ai.mit.edu24July1996Wedescribealinearnetworkthatmodelscorrelationsbetweenreal-valuedvisiblevari-ablesusingoneormorereal-valuedhiddenvariables—afactoranalysismodel.Thismodelcanbeseenasalinearversionofthe“Helmholtzmachine”,anditsparameterscanbelearnedusingthe“wake-sleep”method,inwhichlearningoftheprimary“generative”modelisassistedbya“recognition”model,whoseroleistofillinthevaluesofhiddenvariablesbasedonthevaluesofvisiblevariables.Thegenerativeandrecognitionmodelsarejointlylearnedin“wake”and“sleep”phases,usingjustthedeltarule.ThislearningprocedureiscomparableinsimplicitytoOja’sversionofHebbianlearning,whichpro-ducesasomewhatdifferentrepresentationofcorrelationsintermsofprincipalcompo-nents.Wearguethatthesimplicityofwake-sleeplearningmakesfactoranalysisaplau-siblealternativetoHebbianlearningasamodelofactivity-dependentcorticalplasticity.1IntroductionActivity-dependentplasticityinthevertebratebrainhastypicallybeenmodeledintermsofHebbianlearning(Hebb1959),inwhichweightchangesarebasedonthecovarianceofpre-synapticandpost-synapticactivity(eg,vonderMalsburg1973;Linsker1986;Miller,Keller,andStryker1989).Thesemodelsderivesupportfromneurobiologicalevidenceoflong-termpotentiation(see,forexample,CollingridgeandBliss(1987),andforarecentreview,BaudryandDavis(1994)).Theyhavealsobeenseenasperformingareasonablefunction,namelyextractingthestatisticalstructureamongstacollectionofinputsintermsofprincipalcom-ponents(Linkser1988).Inthispaper,wesuggestthestatisticaltechniqueoffactoranalysisasaninterestingalternativetoprincipalcomponentsanalysis,andshowhowtoimplementitusinganalgorithmwhosedemandsonsynapticplasticityareaslocalasthoseoftheHebbrule.Factoranalysisisamodelforreal-valueddatainwhichcorrelationsare“explained”bypostulatingthepresenceofoneormoreunderlying“factors”.Thesefactorsplaytheroleof1“latent”or“hidden”variables,whicharenotdirectlyobservable,butwhichallowthedepen-denciesbetweenthe“visible”variablestobeexpressedinaconvenientway.Everitt(1984)givesagoodintroductiontolatentvariablemodelsingeneral,andtofactoranalysisinpar-ticular.Thesemodelsarewidelyusedinpsychologyandthesocialsciencesasawayofex-ploringwhetherobservedpatternsindatamightbeexplainableintermsofasmallnumberofunobservedfactors.Ourinterestinthesemodelsstemsfromtheirpotentialasawayofbuildinghigh-levelrepresentationsfromsensorydata.Oja’sversionofHebbianlearning(OjaandKarhunen1985;Oja1989,1992)isaparticu-larlyconvenientcounterpoint.Thisruleappliestoalinearunitwithweightvectorwthatcomputesanoutputy=wTxwhenpresentedwithareal-valuedinputvectorx(which,forconvenience,isassumedtohavemeanzero).Aftereachpresentationofaninputvector,theweightsfortheunitarechangedbyanamountgivenbythefollowingproportionality: w/y(x yw)=yx y2w:(1)Thefirstterminthisweightincrement,yx,isofHebbianform.Thesecondterm, y2w,tendstopushtheweightstowardszero,balancingthepositivefeedbackinplainHebbianlearn-ing,whichwouldotherwiseincreasethemagnitudeoftheweightswithoutbound.WyattandElfadel(1995)giveanexplicitanalysisoflearningbasedonequation(1),showingthatwithreasonablestartingconditions,wconvergestotheprincipaleigenvectorofthecovari-ancematrixoftheinputs—thatis,itconvergestoaunitvectorpointinginthedirectionofhighestvarianceintheinputspace.Extractingthesubsidiaryeigenvectorsofthecovari-ancematrixoftheinputsissomewhatmorechallenging,requiringsomeformofinhibitionbetweensuccessiveoutputunits(Sanger1989;F¨oldi´ak1989;Plumbley1993).Linsker(1988)viewsHebbianlearningasawayofmaximisingtheinformationretainedbyyaboutx.UnderthesimplifyingassumptionthatthedistributionoftheinputsisGaus-sian,settingtheoutputofaunittotheprojectionofitsinputontothefirstprincipalcompo-nentoftheinputcovariancematrixconveysasmuchinformationaspossibleonaverage(seealsoPlumbley1993).Thisgoalseemsreasonablefortheveryearlystagesofsensoryprocess-ing,whereinformationbottleneckssuchastheopticnervemayplausiblybepresent.Note,however,thatitimplicitlyassumesthatallinformationisequallyimportant.Maximizingin-formationtransferseemslesscompellingasagoalforsubsequentlevelsofprocessing,oncesensorysignalshavereachedcortex.Severalothercomputationalgoalshavebeensuggestedfromthisstageupwards,includingfactorialcoding(Barlow1989),sparsification(OlshausenandField1995),andvariousmethodsforencouragingthecortextorespectreasonableinvari-ances,suchastranslationorscaleinvarianceforvisualprocessing(LiandAtick1994).Inthispaper,wepursuethesuggestionofHintonandZemel(1994)(seealsoGrenander1976-1981;Mumford1994;Dayan,Hinton,Neal,andZemel1995)thatthecortexmightbeconstructingahierarchicalstochastic“generative”modelofitsinputinthetop-downcon-nections,whileimplementinginthebottom-upconnectionsa“recognition”modelthatinasenseistheinverseofthegenerativemodel.Therecognitionmodelprovideshigh-
本文标题:Factor analysis using delta-rule wake-sleep learni
链接地址:https://www.777doc.com/doc-3294446 .html