您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 公司方案 > 组织专家检索系统的设计与实现
I学号_ _ 密级_____________ __ 武汉大学本科毕业论文组织专家检索系统的设计与实现 院(系)名称:信息管理学院专业名称:信息管理与信息系统学生姓名:韩曙光指导老师:陆伟副教授 二○○八年五月 II BACHELOR'SDEGREETHESISOFWUHANUNIVERSITYDesignandImplementationofOrganizationExpertSearchSystem College:SchoolofInformationManagementSubject:InformationManagementandInformationSystemName:ShuguangHanDirectedby:WeiLu,AssociateProfessor May,2008 III摘要Internet的快速发展和互联网相关技术的不断成熟,使得企业(组织)的相关资源纷纷上网,TREC(文本检索国际会议)也因此提出了企业检索任务,主要目标是帮助用户实现在对企业相关数据进行检索的基础上完成特定任务。企业检索的内容既可以是组织外部的数字资源也可以是组织内部的数字资源,这些数字资源通常以异构的形式存在,如邮件、数据库记录、文档、共享文件等。组织(企业)专家检索是企业检索的很重要的分支,也是当前垂直信息检索研究的热门领域。本文总结了目前国内外组织专家检索的研究现状,分析了构建组织专家检索系统的需求和挑战,并以此为基础,利用组织内外部的网页和期刊论文数据库等信息,设计了从数据资源采集、规整、索引、检索到可视化等整个过程的组织专家检索系统模型及以武汉大学为例的专家检索系统平台——WHU-ES。该系统通过动态定义组织内外表征专家信息的资源列表,设定资源动态更新周期,可实现资源的动态采集、专家专长的智能识别、专家共现聚类关系图的动态生成和分析、专家个人档案信息自动抽取(包括专家肖像提取、专家简介自动识别等)等功能。此外,本文也分析了构建专家检索系统存在的网页正文抽取、专家姓名重叠、社会网络关系分析等难点,提出了可能的解决方案,昀后对WHU-ES专家检索系统做了初步评价。关键词:专家检索;专长识别;组织检索;专家聚类 IVABSTRACTTherapidprogressofInternetandrelatedtechnologymakeitmucheasierforustoaccesstheenterprise(ororganization)documentsandwebpages.Asaresult,TREC(TextREtrievalConference)proposedtheenterpriseretrievaltaskwhichpurposeistostudyenterprisesearch:satisfyingauserwhoissearchingthedataofanorganizationtocompletesometask.Thecorpuscombinesthedigitalresourceswithdiversetypessuchaspublishedreports,email,databaserecords,filesandshareddocuments.AsanimportantpartoftheEnterpriseRetrieval,OrganizationExpertSearch(ExpertiseRetrieval)isthecurrenthotareaofVerticalInformationRetrievalresearch.Basedontheanalysisoftherequirementandchallenges,thispapersummarizesthecurrentdevelopmentoftheexpertsearch,andproposesageneralarchitectureoftheorganizationexpertsearchsystem,whichcontainsdatacollections,sorting,indexing,retrieving,visualizingandsoon,byusingtherelevantwebpagesandacademicdatabaseasthedatacollections.ThenweconstructanexpertsearchsystemtakingWuhanUniversityasanexample,whichwecalledWHU-ESforshort.Thissystemachievessomespecificfunctionssuchasthedynamiccollectionofdiverseresources,theintelligentrecognitionofexpertiseandtheautomaticextractionofexpertprofile(theportraitpictureextractionetc.)andsoon.WealsoanalyzethedifficultiessuchasPersonalNameResolution,SocialNetworksAnalysis,andContentExtraction,andthenprovidethepossiblesolutions.Atlast,wegivethepreliminaryevaluationoftheexpertsearchresult.Keywords:ExpertSearch;ExpertiseRecognition;OrganizationSearch;ExpertClustering V 目录中文摘要..............................................................................................................ⅢABSTRACT.......................................................................................................…….Ⅳ1绪论............................................................................................................................11.1引言....................................................................................................................11.2本文研究的内容................................................................................................11.3研究的创新点....................................................................................................21.4本文篇章结构....................................................................................................22国内外研究现状.....................................................................................................42.1TREC企业专家检索子任务..............................................................................42.2现有专家检索系统介绍....................................................................................52.2.1MITREExpertFinder................................................................................62.2.2PeopleFinder.............................................................................................62.2.3IBMSmallBlue.........................................................................................72.3专家检索其他相关研究....................................................................................73组织专家检索系统的分析与设计....................................................................93.1系统总体思路.....................................................................................................93.2系统体系结构...................................................................................................113.2.1Spider模块...............................................................................................113.2.2Indexer模块............................................................................................133.2.3Searcher模块..........................................................................................143.2.4Assistant模块..........................................................................................163.3专家检索系统的难点及对策..........................................................................163.3.1网页数据噪音剔除............................................................................
本文标题:组织专家检索系统的设计与实现
链接地址:https://www.777doc.com/doc-862023 .html