您好,欢迎访问三七文档
当前位置:首页 > 学术论文 > 其它学术论文 > 中文网页语义标注由句子到RDF表示荆涛
ISSN10001239CN111777TPJournalofComputerResearchandDevelopment45(7):12211231,2008:2007-12-13;:2008-01-31:(60496321);(20070533):RDF荆涛1左万利1孙吉贵1,2车海燕11(130012)2(130012)(jingtaocst@email.jlu.edu.cn)SemanticAnnotationofChineseWebPages:FromSentencestoRDFRepresentationsJingTao1,ZuoWanli1,SunJigui1,2,andCheHaiyan11(CollegeofComputerScienceandTechnology,JilinUniversity,Changchun130012)2(MinistryofEducationKeyLaboratoryofSymbolicComputationandKnowledgeEngineering,JilinUniversity,Changchun130012)AbstractTheSemanticWebaimstoleveragetheWorldWideWebtoaWebofdata,wheremachinesareabletoprocessannotationsandrelationsbetweenresources,andwhereimplicitinformationcanbederivedfromutilizingontologiesandsharedvocabularies.TofulfillthevisionoftheSemanticWeb,amethodofautomaticsemanticannotationisneeded.ProposedinthispaperisamethodologyforsemanticannotationofChineseWebpages,whichisguidedbydomainontology.Thestatisticalmethodandthenaturallanguageprocessingtechnologyareemployed,andthemappingfromsentencestoRDFrepresentationsarerealizedthroughtheidentificationphaseandthegroupingphase.Themajortechnicalcontributionsare:thedomainlexiconconstructedbythestatisticalmethodratherthanthelinguisticontologyisusedastheexternaldomainknowledge;theexplicitpropertytypetaggingalgorithmisusedtorecognizebothinstancesandpropertiescontainedinsentencestofacilitaterelationextraction;afterbuildingdependencytreesordependencyforestsofsentences,theidentifiedinstancesandpropertiescanbegroupedintoRDFstatementsaccordingtothedependencyrelationshipamongChinesewords.Theexperimentalresultshowsthatcomparedwiththesemanticannotationmethodbasedonthegrammaticalrelationshipofsubjectverbobject,thismethodissignificantlymoreeffective.Keywordsnaturallanguageprocessing;dependencyrelationship;typetagging;relationextraction;ontology语义网远景的实现需要自动化的语义标注方法.提出了一种在领域本体指导下,针对中文网页的语义标注方法.运用统计学方法与自然语言处理技术,以文档中句子为处理对象,采取识别和组合两个阶段来完成句子向RDF表示的映射.它具有以下特点:以统计方法获得领域相关词汇,构造领域词汇标注列表作为外部领域知识,降低对通用语言本体的依赖;显式的属性类型标注方法识别出句子中表达关系的词汇,标注为属性类型,利于后续关系抽取;构造句子的句法依存关系树(森林),按照依存关系对词汇进行组合,形成RDF陈述.实验结果显示此方法较基于主谓宾语法关系的语义标注方法更为有效.自然语言处理;依存关系;类型标注;关系抽取;本体TP391BernersLee[1],(semanticWeb):,WebHTML,,.HTML,,.,,,.(XML,RDF,OWL,Ontology).,.,.,,OWL,RDF.:1),,RDF(RDFresource).2),(R1,P,R2),RDF(RDFstatement).P2R1,R2,.1)(typetagging,TT),2)(relationextraction,RE).,(informationextraction,IE)(ontologybasedinformationextraction,OBIE)(naturallanguageprocessing,NLP).:1).Amilcare[2].,.SCream[3]OntOMat[4]Amilcare,AmilcareOntOMat.,,..,IE.,IE.2)OBIE,,.SemTag[5]TAP,,,,.KIM[6],.OBIE,.3),.[79],,RDF,,.,.Artquakt[7]iOka[8],.Artquakt,,Wordnet[10].Wordnet.Artquakt,,,.,,.iOkra,NLP.iOkaCLO[11],CLO,CLO,,.,.[78]12222008,45(7),RelExt[9],,.,.,[12],,DOM.,DOM.,:1),,.2)(namedentityrecognizer,NER).NER(),(),.3),,,,.4),,.5),.,,.,,RDF.:1),,;2),,;3)(),,RDF(knowledgetriple,KT)..11.1.RDF.HTML,.,.,RDF.,RDFResource,RDF.,,.,RDF.RDF,,,RDF.1.3,.:.,,.,.,.,.:.,.NER,.:(),,1223::RDF,,.,RDF,.,,.,.,.Fig.1TheproposedframeworkforsemanticannotationofChineseWebpages.1http:protege.stanford.eduhttp::2=n(k11k00-k10k01)2(k11+k10)(k01+k00)(k11+k01)(k10+k00).(1)t,ki0it,ki1it;n,n=k00+k01+k10+k11.928,;39567,.,HTML,,832,8530.1,832734,12242008,45(7)851019,27936.75,,.1212:Table12TopDomainRelevantWords12Word2valuePositiveClassNegativeClass7936.75734195055.9347784899.5845943930.69402333316.3531633288.49321103133.53310132118.57260631897.4818641526.7815031429.76158201419.931414,,.,,,,.,.1.3,RDFRDF.,;,..NER,,()EPTT.EPTT.,:.;4.,;.,().1.(EPTT).:;:.BeginStep1.,,;Step2.,,;Step3.N(Ngram),.,(*);Step4.,;,;,.EndEPTT,Step1Step2NER,.Step3N,,.,.Step4,.,,;RDF.:1990126.,:1990126.,,,1990126.:1990126.1990126:19900126.,[1319],.,.1225::RDF1.4,.,,..,,,RDF..1.4.1[20]1959LucienTesnire().Tesnire,,..,,...1..,Relation(Gov,Dep),Gov,Dep,Relation.Relation(Gov,Dep).2..Gov,Dep,.Gov.3..,;..[21].[21],,:1)top,nsubj,dobj,attr;2)tcomp;3)range,num;4)pobj,prep;5)ccomp;6)conj,cc;7)nmod,numod;8)dep.,..,,2.Fig.2Anexampleofdependencytree.2[conj(3,1),cc(3,2),nsubj(7,3),dep(7,4),prep(7,5),pobj(5,6),top(10,9),ccomp(7,10),conj(13,11),cc(13,12),attr(10,13)]2:,;ccomp,;nsubj,top,attr,;conj..:.1.4.2RDF(,,),(subject,predicate,object),(grammaticalrelationshiptriple,GRT)GRT(S,vp,O),.S,ORDF,pRDF,v.2.(DTRE).:W;:RDF.Begin12262008,45(7)Step1.T=DependencyParse(W);Step2.grt=newGRT;root=Root(T);grt.v=root;EnQueue(Q,grt);Step3.Whilenotempty(Q)doBegingrt=DeQueue(Q);EnQueue(C,grt);ccomps=getCcomps(grt.v);sortBySeq(ccomps);EnQueue(Q,newGrt(ccomps));EndStep4.foreachgrtinCdobeginforeachrel(grt.v,w)dobeginifrel{nsubj,top}orw.typeontologytypethenput(grt.S,w);ifrel{dobj,attr,pobj,tcomp,range}orw.typegeneraltypethenput(grt.O,w);endifgrt.vispropertytypethengrt.p=grt.v;ifgrt.v.typeisnotpropertytypethenbeginp=findProptery(grt);ifpgrt.Sthenbegingrt.S=grt.S-{p}{p.dep};endifpgrt.Othenbegingrt.O=grt.O-{p}{p.dep};endendendStep5.i=1;whileilength(C)dobeginifisNull(C[i].S)thenC[i].S=C[i-1].S;ifisNull(C[i].O)thenC[i].O=C[i-1].O{datetype,tcomp};i=i+1;endStep6.foreachgrtinCdobeginforeachsgrt.S,o
本文标题:中文网页语义标注由句子到RDF表示荆涛
链接地址:https://www.777doc.com/doc-8693374 .html