您好,欢迎访问三七文档
EfficientEstimationofWordRepresentationsinVectorSpaceTomasMikolovGoogleInc.,MountainView,CAtmikolov@google.comKaiChenGoogleInc.,MountainView,CAkaichen@google.comGregCorradoGoogleInc.,MountainView,CAgcorrado@google.comJeffreyDeanGoogleInc.,MountainView,CAjeff@google.comAbstractWeproposetwonovelmodelarchitecturesforcomputingcontinuousvectorrepre-sentationsofwordsfromverylargedatasets.Thequalityoftheserepresentationsismeasuredinawordsimilaritytask,andtheresultsarecomparedtotheprevi-ouslybestperformingtechniquesbasedondifferenttypesofneuralnetworks.Weobservelargeimprovementsinaccuracyatmuchlowercomputationalcost,i.e.ittakeslessthanadaytolearnhighqualitywordvectorsfroma1.6billionwordsdataset.Furthermore,weshowthatthesevectorsprovidestate-of-the-artperfor-manceonourtestsetformeasuringsyntacticandsemanticwordsimilarities.1IntroductionManycurrentNLPsystemsandtechniquestreatwordsasatomicunits-thereisnonotionofsimilar-itybetweenwords,asthesearerepresentedasindicesinavocabulary.Thischoicehasseveralgoodreasons-simplicity,robustnessandtheobservationthatsimplemodelstrainedonhugeamountsofdataoutperformcomplexsystemstrainedonlessdata.AnexampleisthepopularN-grammodelusedforstatisticallanguagemodeling-today,itispossibletotrainN-gramsonvirtuallyallavailabledata(trillionsofwords[3]).However,thesimpletechniquesareattheirlimitsinmanytasks.Forexample,theamountofrelevantin-domaindataforautomaticspeechrecognitionislimited-theperformanceisusuallydominatedbythesizeofhighqualitytranscribedspeechdata(oftenjustmillionsofwords).Inmachinetranslation,theexistingcorporaformanylanguagescontainonlyafewbillionsofwordsorless.Thus,therearesituationswheresimplescalingupofthebasictechniqueswillnotresultinanysignificantprogress,andwehavetofocusonmoreadvancedtechniques.Withprogressofmachinelearningtechniquesinrecentyears,ithasbecomepossibletotrainmorecomplexmodelsonmuchlargerdataset,andtheytypicallyoutperformthesimplemodels.Probablythemostsuccessfulconceptistousedistributedrepresentationsofwords[10].Forexample,neuralnetworkbasedlanguagemodelssignificantlyoutperformN-grammodels[1,27,17].1.1GoalsofthePaperThemaingoalofthispaperistointroducetechniquesthatcanbeusedforlearninghigh-qualitywordvectorsfromhugedatasetswithbillionsofwords,andwithmillionsofwordsinthevocabulary.Asfarasweknow,noneofthepreviouslyproposedarchitectureshasbeensuccessfullytrainedonmore1arXiv:1301.3781v3[cs.CL]7Sep2013thanafewhundredofmillionsofwords,withamodestdimensionalityofthewordvectorsbetween50-100.Weuserecentlyproposedtechniquesformeasuringthequalityoftheresultingvectorrepresenta-tions,withtheexpectationthatnotonlywillsimilarwordstendtobeclosetoeachother,butthatwordscanhavemultipledegreesofsimilarity[20].Thishasbeenobservedearlierinthecontextofinflectionallanguages-forexample,nounscanhavemultiplewordendings,andifwesearchforsimilarwordsinasubspaceoftheoriginalvectorspace,itispossibletofindwordsthathavesimilarendings[13,14].Somewhatsurprisingly,itwasfoundthatsimilarityofwordrepresentationsgoesbeyondsimplesyntacticregularities.Usingawordoffsettechniquewheresimplealgebraicoperationsareper-formedonthewordvectors,itwasshownforexamplethatvector(”King”)-vector(”Man”)+vec-tor(”Woman”)resultsinavectorthatisclosesttothevectorrepresentationofthewordQueen[20].Inthispaper,wetrytomaximizeaccuracyofthesevectoroperationsbydevelopingnewmodelarchitecturesthatpreservethelinearregularitiesamongwords.Wedesignanewcomprehensivetestsetformeasuringbothsyntacticandsemanticregularities1,andshowthatmanysuchregularitiescanbelearnedwithhighaccuracy.Moreover,wediscusshowtrainingtimeandaccuracydependsonthedimensionalityofthewordvectorsandontheamountofthetrainingdata.1.2PreviousWorkRepresentationofwordsascontinuousvectorshasalonghistory[10,26,8].Averypopularmodelarchitectureforestimatingneuralnetworklanguagemodel(NNLM)wasproposedin[1],whereafeedforwardneuralnetworkwithalinearprojectionlayerandanon-linearhiddenlayerwasusedtolearnjointlythewordvectorrepresentationandastatisticallanguagemodel.Thisworkhasbeenfollowedbymanyothers.AnotherinterestingarchitectureofNNLMwaspresentedin[13,14],wherethewordvectorsarefirstlearnedusingneuralnetworkwithasinglehiddenlayer.ThewordvectorsarethenusedtotraintheNNLM.Thus,thewordvectorsarelearnedevenwithoutconstructingthefullNNLM.Inthiswork,wedirectlyextendthisarchitecture,andfocusjustonthefirststepwherethewordvectorsarelearnedusingasimplemodel.ItwaslatershownthatthewordvectorscanbeusedtosignificantlyimproveandsimplifymanyNLPapplications[4,5,29].Estimationofthewordvectorsitselfwasperformedusingdifferentmodelarchitecturesandtrainedonvariouscorpora[4,29,23,19,9],andsomeoftheresultingwordvectorsweremadeavailableforfutureresearchandcomparison2.However,asfarasweknow,thesearchitecturesweresignificantlymorecomputationallyexpensivefortrainingthantheoneproposedin[13],withtheexceptionofcertainversionoflog-bilinearmodelwherediagonalweightmatricesareused[23].2ModelArchitecturesManydifferenttypesofmodelswereproposedforestimatingcontinuousrepresentationsofwords,includingthewell-knownLatentSemanticAnalysis(
本文标题:Efficient-Estimation-of-Word-Representations-inVec
链接地址:https://www.777doc.com/doc-4647326 .html