您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 招聘面试 > 基于Web招聘信息的文本挖掘系统研究
合肥工业大学硕士学位论文基于Web招聘信息的文本挖掘系统研究姓名:钟晓旭申请学位级别:硕士专业:计算机技术指导教师:胡学钢;吴玉2010-10WebWebWebWebWebWebWebWebWebResearchonTextMiningSystemBasedonTheRecruitmentInformationInTheWebABSTRACTAlongwiththepopularityofcomputersandtherapiddevelopmentofinternettechnology,therecruitmentinformationinthewebisincresinglyhuge,andmeanwhile,moreandmorecollegegraduatesarepronetosearchrecruitmentinformationfromtheinternet.Allthemessagesreflectstheemployingunits’requirementonemployees,whichhasprovidedthecollegeswiththepromptknowledgeaboutthesocialneedstowardtalentssoastobehelefulincolleges’specialism,coursearrangements,andcollegestudents’learningoncampus.Thus,itisasiginificantandworthyissuetodiscusshowtogettheneededrecruitmentmessagesfromthehugeamoutofwebinformation,andhowtofindouttheimplicatedknowledgeschema,whichenablesWebtextminingtechnologytobeapopularissueamongresearchesontextmining.ThisessayhasfirstlyintroducedtheWebtextmining’sresearchbackground,itssiginificance,thepresentsituationathomeandabroad,andtheresearchcontentandstructures.Next,thegeneralprocessofWebtextmininghasbeenstudied,includingthetechonologiessuchasWebpagesgathering,preprocessingandWebpagespurification,wordsseparationandfeatureexpression,featureselectionandtextclustering.Then,thisessayhasexploredthecorrelationanalysistechonology,introducingtheconceptofthecorrelationanalysis,andthecorelationship,andgivngthecalculatingmethodsoftherelativefactorsandsiginificancetesting.Moreover,theWebrecruitmentinformationtextminingsystemhasbeendevelopedfortheChinesetextsclustering.Eachfunctionmoduleoftextminingsystemhasbeenanalyzedanddesignedindetails.Finally,theWebrecruitmentinformationtextminingsystemhasbeenexperimented,whoseanalysishasbeengiven.Andtheexperimentresultprovesthatthepropertyindexofthissystemissatisfactoryandpractical.Keyword:Webemploymentinformation;Datamining;Correlationanalysis2.1Web.......................................................52.2.............................................................................53.1...................................................................................163.2...................................................................................163.3...................................................................................163.4...................................................................................163.5...................................................................................173.6...................................................................................174.1............................................................204.2............................................................214.3K-means....................................................................275.1...........................................................................295.2...................................................................................305.3...................................................................................313.1..................................................................................174.1.......................................................................225.1..................................................................................305.2..................................................................................315.3............................................................325.4.................................................325.5.................................................335.6..................................................................................345.7..................................................................................3420101112010111201011111.1DataMining[1]WebDataWarehouseWebWebWebWebWeb80%WebWebWeb[2]WebWebWebWeb21.2[3][456]MegaputerAutonomyConceptAgentsAutonomyIBM20[7]1.3WebWebWebk-meansWebWeb1Web32WebWeb34Web1.4WebWebWebWeb1.5WebWebWebWeb4WebWebWebWeb2.1WebWeb[8]WebWebWebWeb1WebWeb2WebWebWebWebWebWebWebWebWebWebWeb52.2WebWeb2.12.1Web1WebWebWeb[9]Web2[10]2.22.234WebWeb652.3WebWebWebWebWebSpiderHTMLSpiderSpiderSpiderSpider71pooln233MSWMSW2MSWWebWebWeb2.4WebWebWebWebWebWebWebWebWebWeb82.4.1WebWebWebDOM[11][12]VIPS[13][14]DOM[15][16][17]TABLE[18][19]SSTSST2.4.29WebWebWeb[21]1234[22]MM2.4.3WebVSMVectorSpaceModelBooleanModelProbabilisticModel1VSM101),,,,,(21nkttttDDocumentΛΛ=tkk2Wi3d),,,,,,()(11nnkkWtWtWtdVΛΛ=2.1tidiWii4Similarity2VSMTFIDF[]∑∈+×+×=dtttnNdttfnNdttfdtW2)01.0/log(),()01.0/log(),(),(2.2tdtftdtdWtdtntWeb[23]3EuclideanDistanceCosineMinkovskiDistanceCorrelationMahalanobisDistanceEuclideanDistance∑−=iiiYXYXD2),(2.311XYXiYiXY2.4.4ExpectedCrossEntropyInformationGaintheWeightofEvidenceforTextMutualInformationWordFrequencyOddsRatio[24]2.52.5.1NKKKNKKk-meansCLARAClusteringLARgeApplicationsPAMPartitionsforAroundMedioidsk-medioidsCLARANSClusteringLargeApplicationsbaseduponRANdomizedSearch[25][26]k-meanskK-means12k-meansK-meansnktnktkntnk-meansk-meansk-meansk-meanskk-meansk-meanskk-meanskkk2.5.2CUREBIRCH[27][28]CURECURECURECUR
本文标题:基于Web招聘信息的文本挖掘系统研究
链接地址:https://www.777doc.com/doc-1049497 .html