基于hadoop大数据框架的个性化推荐系统研究与实现

电子科技大学UNIVERSITYOFELECTRONICSCIENCEANDTECHNOLOGYOFCHINA硕士学位论文MASTERTHESIS论文题目基于hadoop大数据框架的个性化推荐系统研究与实现学科专业软件工程学号201321220114作者姓名邓玉林指导教师陆鑫分类号密级UDC注1学位论文基于hadoop大数据框架的个性化推荐系统研究与实现（题名和副题名）邓玉林（作者姓名）指导教师陆鑫副教授电子科技大学成都（姓名、职称、单位名称）申请学位级别硕士学科专业软件工程提交论文日期2016.3.18论文答辩日期2016.4.21学位授予单位和日期电子科技大学2016年6月答辩委员会主席评阅人注1：注明《国际十进分类法UDC》的类号。ResearchandImplementationofPersonalizedRecommendationSystemBasedonHadoopBigDataFrameAMasterThesisSubmittedtoUniversityofElectronicScienceandTechnologyofChinaMajor:SoftwareEngineeringAuthor:DengYulinSupervisor:LuXinSchool:SchoolofInformationandSoftwareEngineering独创性声明本人声明所呈交的学位论文是本人在导师指导下进行的研究工作及取得的研究成果。据我所知，除了文中特别加以标注和致谢的地方外，论文中不包含其他人已经发表或撰写过的研究成果，也不包含为获得电子科技大学或其它教育机构的学位或证书而使用过的材料。与我一同工作的同志对本研究所做的任何贡献均已在论文中作了明确的说明并表示谢意。作者签名：日期：年月日论文使用授权本学位论文作者完全了解电子科技大学有关保留、使用学位论文的规定，有权保留并向国家有关部门或机构送交论文的复印件和磁盘，允许论文被查阅和借阅。本人授权电子科技大学可以将学位论文的全部或部分内容编入有关数据库进行检索，可以采用影印、缩印或扫描等复制手段保存、汇编学位论文。（保密的学位论文在解密后应遵守此规定）作者签名：导师签名：日期：年月日摘要I摘要信息过载问题在当今世界越来越突出，目前有三种比较成熟的处理方法，即网站导航、搜索引擎以及推荐系统。网站导航通过收录著名网站并分门别类的方式解决信息过载问题。而搜索引擎通过为海量网页建立索引的方式解决信息过载问题。但是当用户不能明确表述自己的需求时，前两者就略显无力了，而推荐系统就可以解决此类问题。推荐系统通过分析用户历史行为记录，主动为用户推荐其潜在感兴趣的内容。但是随着互联网的高速发展，信息量也呈几何倍数增加，传统的推荐系统在海量数据下容易遭遇计算瓶颈。此外传统推荐系统未充分考虑用户兴趣多变且呈现一定的离散性的问题。针对以上问题，本文参考以往推荐系统设计方案，以搜索引擎下图书的个性化推荐系统为目标，研究并实现一种基于潜在语义分析和分片聚类的混合推荐系统方案。并使用hadoop大数据处理框架解决推荐系统海量数据处理问题。本文首先研究搜索引擎下用户行为数据采集方法。分析搜索引擎下用户行为类型及其特性，针对各数据类型及其特性使用不同的数据采集方式以及标准化方法，从而完成用户行为数据采集工作。其次，针对搜索引擎下用户行为独特性和用户兴趣多变问题，提出潜在语义分析模型和分片聚类模型分别挖掘用户行为大数据下的长久兴趣和即时兴趣。其中，潜在语义分析推荐模型以内容进行推荐，可以缓解用户和图书冷启动问题，并提升系统推荐的覆盖率。而基于分片聚类的协同过滤推荐模型中的将用户行为按属性和内容分片，可以抽取出用户不同时期的兴趣，从而进一步提升推荐性能，且推荐结果具有一定的新颖性。此外，针对分片聚类过程中搜索引擎下用户相似度计算问题，提出一种基于用户检索词的改进混合类型数据相似度计算方法。最后，基于Hadoop大数据处理框架研究用户行为预处理以及推荐算法的并行化方法，完成搜索引擎下图书的个性化推荐系统的设计与实现。通过引入Hadoop大数据处理平台，设计并行化的推荐算法，系统处理海量数据的能力有很大提升。通过基于潜在语义分析的推荐模型和分片聚类的推荐模型协同作用，搜索引擎下图书的个性化推荐精准度和覆盖率也有一定改善。最后，通过系统测试以及算法实验证明其正确性。关键词：推荐系统，Hadoop平台，大数据分析，潜在语义分析，个性化推荐ABSTRACTIIABSTRACTTheinformationoverloadproblemintoday'sworldismoreandmoreprominent,therearethreematuremethodstotreatthisproblem,namelysitenavigation,searchengineandrecommendationsystem.Thewayofsitenavigationtosolvetheproblemisgathersfamoussitesandclassifiesthem.Thesearchenginebuildsanindexbymassivewebpagesandsearchstheindextosolvetheproblem.Butwhentheusercannotclearlyexpresstheirneeds,theformertwoisabitweak,whiletherecommendedsystemcansolvetheseproblems.Therecommendationsystemcananalyzetheuser'shistoricalbehaviorrecords,soastoactivelyrecommendthecontentofthepotentialinteresttotheuser.ButwiththerapiddevelopmentofInternet,theamountofinformationisincreasingexponentially,thetraditionalrecommendersystemswillbeencounterthebottleneckinthecalculationofmassdata.Inaddition,thetraditionalrecommendationsystemdoesnotconsidertheproblemoftheuser'sinterestiseasytochange.Tosolvetheaboveissues,inthisthesis,werefertothepreviousrecommendationsystemdesign.Researchandimplementahybridrecommendationsystembasedonlatentsemanticanalysisandshard-clustering.Soastoachievethegoalthatisthepersonalizedrecommendationsystemofbooksintheenvironmentofsearchengine.UsingHadoopplatformsolvetheproblemofmassivedataprocessing.Thisthesisfirstlystudiesthemethodofsearchengineuserbehaviordataacquisition.Toanalyzethetypesandcharacteristicsofusers'behaviorunderthesearchengine,andusedifferentdatacollectionmethodsandstandardizedmethodstoprocessthesedata,soastocompletetheworkofuserbehaviordatacollection.Secondly,Inviewoftheuniquenessofuserbehavioranduser’sinterestchangeableprobleminthesearchengine,proposedthelatentsemanticanalysismodelandshardclusteringmodeltomininguser'slong-terminterestsandimmediateinterests,Latentsemanticanalysismodelisbasedoncontenttocreaterecommendation,soitcanalleviatetheproblemofuserandbookcoldstart,andenhancetherecommendationsystemcoverage.Andthecollaborativefilteringrecommendationmodelbasedonshard-clusteringcutstheuserbehaviortofragmentbytheattributesandcontent,soastoextracttheuserinterestsindifferentperiods.Soitcanimprovetheperformanceoftherecommendation,andmaketheresultsofrecommendationmorenovelty.Inaddition,accordingtotheproblemofcomputingtheABSTRACTIIIsimilarityofusersintheprocessofshard-clustering,thisthesisproposesanewmethodofcomputingthesimilarityofthemixedtypedatabasedontheuser'ssearchterm.Finally,BasedontheHadoopbigdataprocessingframework,theparallelmethodofuser'sbehaviorpretreatmentandrecommendationalgorithmisstudied,anddesignandimplementationofpersonalizedrecommendationsystemforbooksunderthesearchengine.ThroughusetheHadoopbigdataprocessingplatform,designoftheparallelrecommendationalgorithm,theabilityofthesystemtodealwithmassivedatahasagreatimprovement.Throughthesynergyoflatentsemanticanalysismodelandshardclusteringmodel,theaccuracyandcoveragerateofthepersonalizedrecommendationsystemisimproved.Atlast,thecorrectnessofthealgorithmisprovedbythetestofthesystemandthealgorithm.Keywords:Recommendationsystem,Hadoopplatform,Bigdata,Latentsemanticanalysis,Personalizedrecommendation目录IV目录第一章绪论....................................................................................................................11.1研究工作

基于hadoop大数据框架的个性化推荐系统研究与实现

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

建设工程勘察设计合同1

专利权转让合同--范本

API 5CT第九版与第八版差异

建筑工程安全生产自查自纠实施方案

(分类别）淘宝商城全安全链接

设备配置及技术协议

《劝学》优质公开课专用课件

建筑构造试题

接待工作讲座

景观设计概述

相关文档

相关搜索