Efficient-Estimation-of-Word-Representations-inVec

EfﬁcientEstimationofWordRepresentationsinVectorSpaceTomasMikolovGoogleInc.,MountainView,CAtmikolov@google.comKaiChenGoogleInc.,MountainView,CAkaichen@google.comGregCorradoGoogleInc.,MountainView,CAgcorrado@google.comJeffreyDeanGoogleInc.,MountainView,CAjeff@google.comAbstractWeproposetwonovelmodelarchitecturesforcomputingcontinuousvectorrepre-sentationsofwordsfromverylargedatasets.Thequalityoftheserepresentationsismeasuredinawordsimilaritytask,andtheresultsarecomparedtotheprevi-ouslybestperformingtechniquesbasedondifferenttypesofneuralnetworks.Weobservelargeimprovementsinaccuracyatmuchlowercomputationalcost,i.e.ittakeslessthanadaytolearnhighqualitywordvectorsfroma1.6billionwordsdataset.Furthermore,weshowthatthesevectorsprovidestate-of-the-artperfor-manceonourtestsetformeasuringsyntacticandsemanticwordsimilarities.1IntroductionManycurrentNLPsystemsandtechniquestreatwordsasatomicunits-thereisnonotionofsimilar-itybetweenwords,asthesearerepresentedasindicesinavocabulary.Thischoicehasseveralgoodreasons-simplicity,robustnessandtheobservationthatsimplemodelstrainedonhugeamountsofdataoutperformcomplexsystemstrainedonlessdata.AnexampleisthepopularN-grammodelusedforstatisticallanguagemodeling-today,itispossibletotrainN-gramsonvirtuallyallavailabledata(trillionsofwords[3]).However,thesimpletechniquesareattheirlimitsinmanytasks.Forexample,theamountofrelevantin-domaindataforautomaticspeechrecognitionislimited-theperformanceisusuallydominatedbythesizeofhighqualitytranscribedspeechdata(oftenjustmillionsofwords).Inmachinetranslation,theexistingcorporaformanylanguagescontainonlyafewbillionsofwordsorless.Thus,therearesituationswheresimplescalingupofthebasictechniqueswillnotresultinanysigniﬁcantprogress,andwehavetofocusonmoreadvancedtechniques.Withprogressofmachinelearningtechniquesinrecentyears,ithasbecomepossibletotrainmorecomplexmodelsonmuchlargerdataset,andtheytypicallyoutperformthesimplemodels.Probablythemostsuccessfulconceptistousedistributedrepresentationsofwords[10].Forexample,neuralnetworkbasedlanguagemodelssigniﬁcantlyoutperformN-grammodels[1,27,17].1.1GoalsofthePaperThemaingoalofthispaperistointroducetechniquesthatcanbeusedforlearninghigh-qualitywordvectorsfromhugedatasetswithbillionsofwords,andwithmillionsofwordsinthevocabulary.Asfarasweknow,noneofthepreviouslyproposedarchitectureshasbeensuccessfullytrainedonmore1arXiv:1301.3781v3[cs.CL]7Sep2013thanafewhundredofmillionsofwords,withamodestdimensionalityofthewordvectorsbetween50-100.Weuserecentlyproposedtechniquesformeasuringthequalityoftheresultingvectorrepresenta-tions,withtheexpectationthatnotonlywillsimilarwordstendtobeclosetoeachother,butthatwordscanhavemultipledegreesofsimilarity[20].Thishasbeenobservedearlierinthecontextofinﬂectionallanguages-forexample,nounscanhavemultiplewordendings,andifwesearchforsimilarwordsinasubspaceoftheoriginalvectorspace,itispossibletoﬁndwordsthathavesimilarendings[13,14].Somewhatsurprisingly,itwasfoundthatsimilarityofwordrepresentationsgoesbeyondsimplesyntacticregularities.Usingawordoffsettechniquewheresimplealgebraicoperationsareper-formedonthewordvectors,itwasshownforexamplethatvector(”King”)-vector(”Man”)+vec-tor(”Woman”)resultsinavectorthatisclosesttothevectorrepresentationofthewordQueen[20].Inthispaper,wetrytomaximizeaccuracyofthesevectoroperationsbydevelopingnewmodelarchitecturesthatpreservethelinearregularitiesamongwords.Wedesignanewcomprehensivetestsetformeasuringbothsyntacticandsemanticregularities1,andshowthatmanysuchregularitiescanbelearnedwithhighaccuracy.Moreover,wediscusshowtrainingtimeandaccuracydependsonthedimensionalityofthewordvectorsandontheamountofthetrainingdata.1.2PreviousWorkRepresentationofwordsascontinuousvectorshasalonghistory[10,26,8].Averypopularmodelarchitectureforestimatingneuralnetworklanguagemodel(NNLM)wasproposedin[1],whereafeedforwardneuralnetworkwithalinearprojectionlayerandanon-linearhiddenlayerwasusedtolearnjointlythewordvectorrepresentationandastatisticallanguagemodel.Thisworkhasbeenfollowedbymanyothers.AnotherinterestingarchitectureofNNLMwaspresentedin[13,14],wherethewordvectorsareﬁrstlearnedusingneuralnetworkwithasinglehiddenlayer.ThewordvectorsarethenusedtotraintheNNLM.Thus,thewordvectorsarelearnedevenwithoutconstructingthefullNNLM.Inthiswork,wedirectlyextendthisarchitecture,andfocusjustontheﬁrststepwherethewordvectorsarelearnedusingasimplemodel.ItwaslatershownthatthewordvectorscanbeusedtosigniﬁcantlyimproveandsimplifymanyNLPapplications[4,5,29].Estimationofthewordvectorsitselfwasperformedusingdifferentmodelarchitecturesandtrainedonvariouscorpora[4,29,23,19,9],andsomeoftheresultingwordvectorsweremadeavailableforfutureresearchandcomparison2.However,asfarasweknow,thesearchitecturesweresigniﬁcantlymorecomputationallyexpensivefortrainingthantheoneproposedin[13],withtheexceptionofcertainversionoflog-bilinearmodelwherediagonalweightmatricesareused[23].2ModelArchitecturesManydifferenttypesofmodelswereproposedforestimatingcontinuousrepresentationsofwords,includingthewell-knownLatentSemanticAnalysis(

Efficient-Estimation-of-Word-Representations-inVec

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

售楼部圣诞节包装方案

3个自由度机械手设计

酒店工程部安全管理规定

建筑工程施工技术资料目录

40万立方商品混凝土项目建议书

GB6413-86渐开线圆柱齿轮胶合承载能力计算方法

中国最新会计准则修订解读

定额预算

第十四章国际贸易方式

车身控制模块BCM作用及功能介绍V1.0

相关文档

相关搜索