您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 质量控制/管理 > 自动化构建的中文知识图谱系统2
2015-09-062015-11-12。9732014CB34040414511108002。1991—CCF、、1993—、1962—CCF、、、。1001-9081201604-0992-05DOI10.11772/j.issn.1001-9081.2016.04.0992*201804*eshijia1218@vip.qq.com、。。CayleyMongoDB。YAGO、HowNet50%。TP311.5AAutomaticalconstructionofChineseknowledgegraphsystemEShijia*LINPeiyuXIANGYangCollegeofElectronicsandInformationEngineeringTongjiUniversityShanghai201804ChinaAbstractTosolvetheproblemthatthemethodscurrentlyusedtoconstructChineseknowledgegraphsystemaretime-consuminghavelowaccuracyandrequirealotofmanualinterventionanintegratedend-to-endautomaticallyconstructedsolutionbasedonrichdatafromChineseencyclopediawasproposedandauser-orientedChineseknowledgegraphwasimplemented.InthissolutionsomepropertyandrelatedtextinformationoftheoriginalencyclopediadatawerescrapedtolocalsystemuninterruptedlybythecustomWebcrawlerandsavedasatriplewithextendedattributes.Throughgraph-orienteddatabaseCayleyanddocument-orienteddatabaseMongoDBthedatainthearchivedtriplefileswasimportedintheback-endsystemandthenconvertedtoahugeknowledgegraphsysteminordertoprovidevariousservicesdependentontheChineseknowledgegraphinthefront-endsystem.Comparedwithotherknowledgegraphsystemstheproposedsystemsignificantlyreducestheconstructiontimemoreoverthenumberofentitiesandrelationsisatleast50%higherthanthatoftheotherknowledgegraphsystemssuchasYAGOHowNetandtheChineseConceptDictionary.KeywordsknowledgegraphWebcrawlertriplefileknowledgebasegraph-orienteddatabase0、12。InternetMovieDatabase、YAGO3-4、DBpedia5-6Freebase7。8。9-10。、11-1213。。。。、、。JournalofComputerApplications2016364992-9961001ISSN1001-9081CODENJYIIDU2016-04-10http//.joca.cn1C1C2…CnCiIiR1R2…Rm。“is-a”AB“”“”。“is-a”Ci。“is-a”A→BBA。1。1。AA。。。2。。2.12.1.1PythonScrapy14、2。2URL。HTMLURLURL。URL。。2.1.23。3、。、。XXX。1。Tarjan15。。“”“”。“”“”。4。。42.2“”“”。16。“”“”。17。。3994。。。。。。“”“”———“”。———〈〉“”。〈12〉。2.2.1InfoBoxInfobox3。Infobox。5InfoBox“”InfoBox。5InfoBox1〈〉2〈computer〉3〈1946〉4〈ENIAC〉5〈〉6〈··〉。2.2.2HTML6。“”“”《》。〈〉。《》“”“”《》。HTML、InfoBox。62.3。。。。。Infobox〈〉、、。。。。。。33.1〈SubjectPredicateObjectLabel〉。Label。Label。LabelLabel。3.2。7。。CayleyMongoDB。MongoDBCayleyCRUDCayleyMongoDB。MongoDBCayley“quads”。2。Predicate1Predicate“is-a”“instance-of”499362Predicate“attribute”SubjectPredicateObject3Predicate。7InfoBox。MongoDB。。74。。WebContentUnderstanding-KnowledgeGraphCU-KG。4.1CU-KG。8。8b“”“”。“”。4.2CU-KG、。Predicateattribute。〈attributeURL〉“”UniformResourceLocatorURL。、9。894.3。CU-KG“”“2012-08-10”“2012-08-13”“2012-08-10”10。10“”“2012-08-10”1d115994“2012-08-13”。11“”“2012-08-11”。4.4。、。“”〈〉、〈〉、〈〉“”。。5。。。。。Freebase、WordNet。。CU-KG。、。。。CU-KG1。1CU-KGWordNet155287479887YAGO10546385000000HowNet19192446243366590—CU-KG339961556516141CU-KG50%。CU-KG。12.5h18。CU-KG、。6CU-KG。。CU-KGNoSQL。CU-KG。1LENATDB.CYCAlarge-scaleinvestmentinknowledgeinfra-structureJ.CommunicationsoftheACM1995381133-38.2SINGHALA.IntroducingtheknowledgegraphthingsnotstringsEB/OL.2014-10-10.https//googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html#/2012/05/introducing-knowledge-graph-things-not.html.3SUCHANEKFMKASNECIGWEIKUMG.Yagoacoreofse-manticknowledgeC//Proceedingsofthe16thInternationalCon-ferenceonWorldWideWeb.NewYorkACM2007697-706.4SUCHANEKFMKASNECIGWEIKUMG.Yagoalargeontol-ogyfromWikipediaandWordNetJ.WebSemanticsScienceServicesandAgentsontheWorldWideWeb200863203-217.5AUERSBIZERCKOBILAROVGetal.DBpediaaNucleusforaWebofOpenDataM.BerlinSpringer2007722-735.6BIZERCLEHMANNJKOBILAROVGetal.DBpedia—acrystallizationpointfortheWebofdataJ.WebSemanticsSci-enceServicesandAgentsontheWorldWideWeb200973154-165.100169936MapReduce、。PrePost。1AGRAWALRIMIEILNSKITSWAMIA.MiningassociationrulesbetweensetsofitemsinlargedatabasesC//Proceedingsof1993ACMSIGMODConferenceonManagementData.NewYorkACM1993207-216.2AGRAWALRSRIKANTR.FastalgorithmsforminingassociationrulesC//VLDB1994Proceedingsofthe20thInternationalConferenceonVeryLargeDataBases.SanFranciscoMorganKaufmannPublishers1994487-499.3LINKCLIAOIECHENZS.AnimprovedfrequentpatterngrowthmethodforminingassociationrulesJ.ExpertSystemswithApplications20113855154-5161.4GUPTARSATSANGICS.AnefficientrangepartitioningmethodforfindingfrequentpatternsfromhugedatabaseJ.InternationalJournalofAdvancedComputerResearch20122262-69.5.FP-treeJ.2011311101-103.LIYBTANGHHEYM.FrequentpatternminingalgorithmbasedonimprovedFP-treeJ.JournalofComputerApplications2011311101-103.6SUCAHYOYGGOPALANRP.CT-PROabottom-upnonre-cursivefrequentitemsetminingalgorithmusingcompressedFP-treedatastructureC//FIMI2004ProceedingsoftheIEEEICDMWorkshoponFrequentItemsetMiningImplementations.Piscat-awayNJIEEE2004212-223.7ZAKIMJGOUDAK.FastverticalminingusingdiffsetsC//Proceedingsofthe9thACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDatamining.NewYorkACM2003326-335.8LIZFLIUXFCAOX.AstudyonimprovedEclatdataminingalgorithmJ.AdvancedMaterialsResearch2011328/329/3301896-1899.9DENGZHWANGZHJIANGJJ.Anewalgorithmforfastmin-ingfrequentitemsetsusingN-listsJ.ScienceChinaInformationSciences20125592008-2030.10LINKCLIAOIECHANGTP.Afrequentitemsetminingal-gorithmbasedontheprincipleofinclusion-exclusionandtransac-ti
本文标题:自动化构建的中文知识图谱系统2
链接地址:https://www.777doc.com/doc-1730057 .html