您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 招聘面试 > 深度学习在NLP结构化模型中的应用
DeepLearningforStructurePredictioninNaturalLanguageProcessingWenzhePei(裴文哲)PekingUniversityOutline•Introduction&&Motivation•DeepLearningforChineseWordSegmentation•DeepLearningforGraph-basedDependencyParsing•Conclusion&FutureWorkIntroduction&&MotivationIntroduction&&MotivationStructureinNLPIntroduction&&Motivation•StructurePrediction•KeyComponents–Modelforf–DecodingIntroduction&&Motivation–…•ConventionalModelsforf–CRF–StructurePerceptron–StructureSVM–…•DecodingAlgorithm–Viterbi–CKY–EisnerIntroduction&&Motivation•ConventionalModelsforf–CRF–StructurePerceptron–StructureSVM–…•DecodingAlgorithm–Viterbi–CKY–EisnerThisishowmostresearcherspublishpapers–…Introduction&&Motivation•ConventionalModels–Linear(Log-Linear)–Feature-based•Pros–Incorporateknowledgeintostatisticalmodels–Easytoexplain•Cons–Toomanyfeaturesmakethemodeltendtooverfit–FeaturedesignsometimesrequiresdomainknowledgeIntroduction&&Motivation•HowaboutDeepLearning?–NaïveBayes—HMM–MaximumEntropy—CRF–Perceptron—StructurePerceptron–SVM—StructureSVM–NeuralNetwork—StructureNeuralNetwork?•Inourwork,westudiedtwotasks–ChineseWordSegmentation(SequenceStructure)–DependencyParsing(TreeStructure)DeepLearningforChineseWordSegmentationDeepLearningforCWS•ChineseWordSegmentation–AtypicalsequencelabelingtaskSSBME我爱天安门DeepLearningforCWS•ConventionalNeuralNetworkModelsDeepLearningforCWS•ModelTraining–Mairgupetal.(2013):MLE-styletraining–Zhengetal.(2013):Perceptron-styletrainingDeepLearningforCWS•Pros:Minimizetheeffortinfeatureengineering•Cons:Hardtocapturecomplex“interactions”betweentagsandcontextsrelyingonlyonhiddenlayers–“Interactions”infeature-basedmodel:DeepLearningforCWS•Ourwork:Max-MarginTensorNeuralNetwork•Amodelthatcan:(1)Minimizetheeffortoffeatureengineering(2)Capturemoreinteractionsbetweentagsandcontext(3)BeeasilygeneralizedtoothersequencemodelingtasksMax-MarginTensorNeuralNetworkArchitectureofourmodelTagEmbeddingTensor-basedtransformationTensorFactorizationMax-MarginTensorNeuralNetwork•TagEmbeddingMax-MarginTensorNeuralNetwork•Tensor-basedTransformation–Weusea3-waytensor𝑉,1:𝐻2-∈�𝐻2×𝐻1×𝐻1tocapturetheinteractionsbetweentagsandcontexts–Combiningtensorproductwithlineartransformation,thetensor-basedtransformationinHiddenlayerisdefinedas:Max-MarginTensorNeuralNetwork•However….–Tensor-basedtransformationdrasticallyslowsdownthemodel(withoutconsideringmatrixoptimizationalgorithms)�(𝐻1𝐻2)→�(𝐻12𝐻2)–Theadditionaltensorparametercouldbringmillionsofparameterstothemodelwhichmakesthemodelsufferfromtheriskofoverfitting.Max-MarginTensorNeuralNetwork•TensorFactorization–Eachtensorslice𝑉,�-∈�𝐻1×𝐻1isfactorizedintotwolowrankmatrix�,�-∈�𝐻1×𝑟isthenumberoffactorsand�,�-∈�𝑟×𝐻1where�≪𝐻1–Factorizedtensortransformation�(𝐻12𝐻2)→�(�𝐻1𝐻2)Max-MarginTensorNeuralNetwork•Max-MarginTrainingObjectfunction:𝐽𝜃=1����𝜃+|𝜃|2�=1𝝺2where��𝜃=max(���,�,𝜃�∈𝑌(�𝑖)+∆(��,�))−���,��,𝜃�∆��,�Optimizationmethod:AdaGrad(Duchietal.,2011)withminibatches�𝑐1:�,�1:�,𝜃=�(��|𝑐�−2:�+2,��−1)�=1=𝞳𝟏*𝒚�,�≠��+�=1�Experiments•Dataset–PKUandMSRAdatasetfromthesecondInternationalChineseWordSegmentationBakeoff(Emerson,2005)•Hyper-parametersExperiments•EffectofdifferentmodelconfigurationsonPKUdatasetExperiments•EffectofcharacterembeddingsExperiments•ComparisonwithotherNNmodelsExperiments•UnsupervisedPre-trainingExperiments•MinimalFeatureEngineeringDeepLearningforGraph-basedDependencyParsingDeepLearningforDependencyParsing•DependencyParsingDeepLearningforDependencyParsing•Givenasentencex,graph-basedmodelsformulatestheparsingprocessasasearchingproblemDeepLearningforDependencyParsing•Forefficientdecoding,previousworkusesfactorization•cissubgraphofthetree�(�)DeepLearningforDependencyParsing•Themostcommonchoiceforscorefunction•Problems–Amassoffeaturescouldputthemodelintheriskofoverfitting–Featureextractionslowsdowntheparsingspeed–FeaturedesignrequiresdomainexpertiseDeepLearningforDependencyParsing•Inourwork,weproposeaneuralnetworkmodelthatcan–Learnfeaturecombinationsautomatically–Exploitphrase-levelinformationthroughdistributedrepresentation–Generalizetoanygraph-basedmodel(first-order,secondorder,third-order…)DeepLearningforDependencyParsing•ArchitectureofourmodelDeepLearningforDependencyParsing•FeatureEmbedding–Onlyatomicfeatures(wordunigrams,POStags,etc.)–UsedistributedrepresentationDeepLearningforDependencyParsing•PhraseEmbedding–Phraseinformationareveryimportantinconventionalgraph-basedmodels–Buttheycanonlyusebacked-offtri-gramstoavoiddatasparsenessproblem–Lexicalrepresentationcannotcapturesyntacticandsemanticsimilaritybetweenphrases,e.g.“hittheball”and“kickthefootball”DeepLearningforDependencyParsing•PhraseEmbeddingDeepLearningforDependencyParsing•Direction-specificTransformationDeepLearningforDependencyParsing•LearningFeatureCombinationsDeepLearningforDependencyParsing•LearningFeatureCombinations–Newactivationfunction:tanh-cube–Intuitively,thecubetermineachhiddenunitdirectlymodelsfeaturecombinationsinamultiplicativewayDeepLearningforDependencyParsing•Previousworkalsotriedtocapturefeaturecombinati
本文标题:深度学习在NLP结构化模型中的应用
链接地址:https://www.777doc.com/doc-2241455 .html