您好,欢迎访问三七文档
Semi-MarkovConditionalRandomFieldsforInformationExtractionSunitaSarawagiIndianInstituteofTechnologyBombay,Indiasunita@iitb.ac.inWilliamW.CohenCenterforAutomatedLearning&DiscoveryCarnegieMellonUniversitywcohen@cs.cmu.eduAbstractWedescribesemi-Markovconditionalrandomfields(semi-CRFs),acon-ditionallytrainedversionofsemi-Markovchains.Intuitively,asemi-CRFonaninputsequencexoutputsa“segmentation”ofx,inwhichlabelsareassignedtosegments(i.e.,subsequences)ofxratherthantoindividualelementsxiofx.Importantly,featuresforsemi-CRFscanmeasurepropertiesofsegments,andtransitionswithinasegmentcanbenon-Markovian.Inspiteofthisadditionalpower,exactlearningandinferencealgorithmsforsemi-CRFsarepolynomial-time—oftenonlyasmallconstantfactorslowerthanconventionalCRFs.Inexperimentsonfivenamedentityrecognitionproblems,semi-CRFsgenerallyoutper-formconventionalCRFs.1IntroductionConditionalrandomfields(CRFs)arearecently-introducedformalism[12]forrepresent-ingaconditionalmodelPr(y|x),wherebothxandyhavenon-trivialstructure(oftensequential).HereweintroduceageneralizationofsequentialCRFscalledsemi-Markovconditionalrandomfields(orsemi-CRFs).Recallthatsemi-MarkovchainmodelsextendhiddenMarkovmodels(HMMs)byallowingeachstatesitopersistforanon-unitlengthoftimedi.Afterthistimehaselapsed,thesystemwilltransitiontoanewstates0,whichde-pendsonlyonsi;however,duringthe“segment”oftimebetweeniandi+di,thebehaviorofthesystemmaybenon-Markovian.Generativesemi-Markovmodelsarefairlycommonincertainapplicationsofstatistics[8,9],andarealsousedinreinforcementlearningtomodelhierarchicalMarkovdecisionprocesses[19].Semi-CRFsareaconditionallytrainedversionofsemi-Markovchains.Inthispaper,wepresentinferenceandlearningmethodsforsemi-CRFs.Wealsoarguethatsegmentsoftenhaveaclearintuitivemeaning,andhencesemi-CRFsaremorenaturalthanconventionalCRFs.Wefocushereonnamedentityrecognition(NER),inwhichasegmentcorrespondstoanextractedentity;however,similarargumentsmightbemadeforseveralothertasks,suchasgene-finding[11]orNP-chunking[16].InNER,asemi-Markovformulationallowsonetoeasilyconstructentity-levelfeatures(suchas“entitylength”and“similaritytootherknownentities”)whichcannotbeeasilyencodedinCRFs.ExperimentsonfivedifferentNERproblemsshowthatsemi-CRFsoftenoutperformconventionalCRFs.2CRFsandSemi-CRFs2.1DefinitionsACRFmodelsPr(y|x)usingaMarkovrandomfield,withnodescorrespondingtoele-mentsofthestructuredobjecty,andpotentialfunctionsthatareconditionalon(featuresof)x.Learningisperformedbysettingparameterstomaximizethelikelihoodofasetof(x,y)pairsgivenastrainingdata.OnecommonuseofCRFsisforsequentiallearningproblemslikeNPchunking[16],POStagging[12],andNER[15].FortheseproblemstheMarkovfieldisachain,andyisalinearsequenceoflabelsfromafixedsetY.Forin-stance,intheNERapplication,xmightbeasequenceofwords,andymightbeasequencein{I,O}|x|,whereyi=Iindicates“wordxiisinsideaname”andyi=Oindicatestheopposite.Assumeavectorfoflocalfeaturefunctionsf=hf1,...,fKi,eachofwhichmapsapair(x,y)andanindexitoameasurementfk(i,x,y)∈R.Letf(i,x,y)bethevectorofthesemeasurements,andletF(x,y)=P|x|if(i,x,y).Forexample,inNER,thecomponentsoffmightincludethemeasurementf13(i,x,y)=[[xiiscapitalized]]·[[yi=I]],wheretheindicatorfunction[[c]]=1ifciftrueandzerootherwise;thisimpliesthatF13(x,y)wouldbethenumberofcapitalizedwordsxipairedwiththelabelI.Followingpreviouswork[12,16]wewilldefineaconditionalrandomfield(CRF)tobeanestimatoroftheformPr(y|x,W)=1Z(x)eW·F(x,y)(1)whereWisaweightvectoroverthecomponentsofF,andZ(x)=Py0eW·F(x,y0).Toextendthistothesemi-Markovcase,lets=hs1,...,spidenoteasegmentationofx,wheresegmentsj=htj,uj,yjiconsistsofastartpositiontj,anendpositionuj,andalabelyj∈Y.Conceptually,asegmentmeansthatthetagyjisgiventoallxi’sbetweeni=tjandi=uj,inclusive.Weassumesegmentshavepositivelength,andcompletelycoverthesequence1...|x|withoutoverlapping:thatis,thattjandujalwayssatisfyt1=1,up=|x|,1≤tj≤uj≤|x|,andtj+1=uj+1.ForNER,avalidsegmentationofthesentence“IwentskiingwithFernandoPereirainBritishColumbia”mightbes=h(1,1,O),(2,2,O),(3,3,O),(4,4,O),(5,6,I),(7,7,O),(8,9,I)i,correspondingtothelabelsequencey=hO,O,O,O,I,I,O,I,Ii.Wenowassumeavectorgofsegmentfeaturefunctionsg=hg1,...,gKi,eachofwhichmapsatriple(j,x,s)toameasurementgk(j,x,s)∈R,anddefineG(x,s)=P|s|jg(j,x,s).Wealsomakearestrictiononthefeatures,analogoustotheusualMarko-vianassumptionmadeinCRFs,andassumethateverycomponentgkofgisafunctiononlyofx,sj,andthelabelyj−1associatedwiththeprecedingsegmentsj−1.Inotherwords,weassumethateverygk(j,x,s)canberewrittenasgk(j,x,s)=g0k(yj,yj−1,x,tj,uj)(2)foranappropriatelydefinedg0k.Intherestofthepaper,wewilldroptheg0notationandusegforbothversionsofthesegment-levelfeaturefunctions.Asemi-CRFisthenanestimatoroftheformPr(s|x,W)=1Z(x)eW·G(x,s)(3)whereagainWisaweightvectorforGandZ(x)=Ps0eW·G(x,s0).2.2AnefficientinferencealgorithmTheinferenceproblemforasemi-CRFisdefinedasfollows:givenWandx,findthebestsegmentation,argmaxsPr(s|x,W),wherePr(s|x,W)isdefinedbyEquation3.AnefficientinferencealgorithmissuggestedbyEquation2,whichimpliesthatargmaxsPr(s|x,W)=argmaxsW·G(x,s)=argmaxsW·Xjg(yj,yj−1,x,tj
本文标题:Semi-Markov Conditional Random Fields for Informat
链接地址:https://www.777doc.com/doc-4419493 .html