您好,欢迎访问三七文档
当前位置:首页 > 建筑/环境 > 电气安装工程 > 协同过滤外文文献翻译
外文:IntroductiontoRecommenderSystemApproachesofCollaborativeFiltering:NearestNeighborhoodandMatrixFactorization“Weareleavingtheageofinformationandenteringtheageofrecommendation.”Likemanymachinelearningtechniques,arecommendersystemmakespredictionbasedonusers’historicalbehaviors.Specifically,it’stopredictuserpreferenceforasetofitemsbasedonpastexperience.Tobuildarecommendersystem,themosttwopopularapproachesareContent-basedandCollaborativeFiltering.Content-basedapproachrequiresagoodamountofinformationofitems’ownfeatures,ratherthanusingusers’interactionsandfeedbacks.Forexample,itcanbemovieattributessuchasgenre,year,director,actoretc.,ortextualcontentofarticlesthatcanextractedbyapplyingNaturalLanguageProcessing.CollaborativeFiltering,ontheotherhand,doesn’tneedanythingelseexceptusers’historicalpreferenceonasetofitems.Becauseit’sbasedonhistoricaldata,thecoreassumptionhereisthattheuserswhohaveagreedinthepasttendtoalsoagreeinthefuture.Intermsofuserpreference,itusuallyexpressedbytwocategories.ExplicitRating,isarategivenbyausertoanitemonaslidingscale,like5starsforTitanic.Thisisthemostdirectfeedbackfromuserstoshowhowmuchtheylikeanitem.ImplicitRating,suggestsuserspreferenceindirectly,suchaspageviews,clicks,purchaserecords,whetherornotlistentoamusictrack,andsoon.Inthisarticle,Iwilltakeacloselookatcollaborativefilteringthatisatraditionalandpowerfultoolforrecommendersystems.NearestNeighborhoodThestandardmethodofCollaborativeFilteringisknownasNearestNeighborhoodalgorithm.Thereareuser-basedCFanditem-basedCF.Let’sfirstlookatUser-basedCF.Wehaveann×mmatrixofratings,withuseruᵢ,i=1,...nanditempⱼ,j=1,…m.Nowwewanttopredicttheratingrᵢⱼiftargetuserididnotwatch/rateanitemj.Theprocessistocalculatethesimilaritiesbetweentargetuseriandallotherusers,selectthetopXsimilarusers,andtaketheweightedaverageofratingsfromtheseXuserswithsimilaritiesasweights.Whiledifferentpeoplemayhavedifferentbaselineswhengivingratings,somepeopletendtogivehighscoresgenerally,someareprettystricteventhoughtheyaresatisfiedwithitems.Toavoidthisbias,wecansubtracteachuser’saverageratingofallitemswhencomputingweightedaverage,andadditbackfortargetuser,shownasbelow.TwowaystocalculatesimilarityarePearsonCorrelationandCosineSimilarity.Basically,theideaistofindthemostsimilaruserstoyourtargetuser(nearestneighbors)andweighttheirratingsofanitemasthepredictionoftheratingofthisitemfortargetuser.Withoutknowinganythingaboutitemsandusersthemselves,wethinktwousersaresimilarwhentheygivethesameitemsimilarratings.Analogously,forItem-basedCF,wesaytwoitemsaresimilarwhentheyreceivedsimilarratingsfromasameuser.Then,wewillmakepredictionforatargetuseronanitembycalculatingweightedaverageofratingsonmostXsimilaritemsfromthisuser.OnekeyadvantageofItem-basedCFisthestabilitywhichisthattheratingsonagivenitemwillnotchangesignificantlyovertime,unlikethetastesofhumanbeings.Therearequiteafewlimitationsofthismethod.Itdoesn’thandlesparsitywellwhennooneintheneighborhoodratedanitemthatiswhatyouaretryingtopredictfortargetuser.Also,it’snotcomputationalefficientasthegrowthofthenumberofusersandproducts.MatrixFactorizationSincesparsityandscalabilityarethetwobiggestchallengesforstandardCFmethod,itcomesamoreadvancedmethodthatdecomposetheoriginalsparsematrixtolow-dimensionalmatriceswithlatentfactors/featuresandlesssparsity.ThatisMatrixFactorization.Besidesolvingtheissuesofsparsityandscalability,there’sanintuitiveexplanationofwhyweneedlow-dimensionalmatricestorepresentusers’preference.AusergavegoodratingstomovieAvatar,Gravity,andInception.Theyarenotnecessarily3separateopinionsbutshowingthatthisusersmightbeinfavorofSci-FimoviesandtheremaybemanymoreSci-Fimoviesthatthisuserwouldlike.Unlikespecificmovies,latentfeaturesisexpressedbyhigher-levelattributes,andSci-Ficategoryisoneoflatentfeaturesinthiscase.Whatmatrixfactorizationeventuallygivesusishowmuchauserisalignedwithasetoflatentfeatures,andhowmuchamoviefitsintothissetoflatentfeatures.Theadvantageofitoverstandardnearestneighborhoodisthateventhoughtwousershaven’tratedanysamemovies,it’sstillpossibletofindthesimilaritybetweenthemiftheysharethesimilarunderlyingtastes,againlatentfeatures.Toseehowamatrixbeingfactorized,firstthingtounderstandisSingularValueDecomposition(SVD).BasedonLinearAlgebra,anyrealmatrixRcanbedecomposedinto3matricesU,Σ,andV.Continuingusingmovieexample,Uisann×ruser-latentfeaturematrix,Visanm×rmovie-latentfeaturematrix.Σisanr×rdiagonalmatrixcontainingthesingularvaluesoforiginalmatrix,simplyrepresentinghowimportantaspecificfeatureistopredictuserpreference.TosortthevaluesofΣbydecreasingabsolutevalueandtruncatematrixΣtofirstkdimensions(ksingularvalues),wecanreconstructthematrixasmatrixA.TheselectionofkshouldmakesurethatAisabletocapturethemostofvariancewithintheoriginalmatrixR,sothatAistheapproximationofR,A≈R.ThedifferencebetweenAandRistheerrorthatisexpectedtobeminimized.ThisisexactlythethoughtofPrincipleComponentAnalysis.WhenmatrixRisdense,UandVcouldbeeasilyfactorizedanalytically.However,amatrixofmovieratingsissupersparse.Althoughtherearesomeimputationmethodstofil
本文标题:协同过滤外文文献翻译
链接地址:https://www.777doc.com/doc-5801353 .html