您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 信息化管理 > 由知识挖掘提升商务智能应用(谢邦昌)
由知识挖掘提升商务智能应用--统计分析的进阶加值应用FromKnowledgeMiningtoBusinessIntelligence-AdvancedStatisticsApplication谢邦昌博士厦门大学讲座教授兼博导首都经贸大学讲座教授兼博导中央财经大学讲座教授兼博导西南财经大学讲座教授中国人民大学兼职教授辅仁大学统计资讯学系及应用统计所教授中华资料采矿协会理事长Outline知识采矿(整合数据采矿与文本采矿)与商业智慧的发展知识采矿程序、步骤、产出与应用如何进行数据采矿与文本采矿整合知识采矿之技术发展评论知识保存价值减少循环时间反应时间重复投资作业花费会议时间外界顾问…等等增加生产力与质量企业知识的转换快且有效的决策课程创新群策群力…等等美國企業之知識管理價值保留競爭的重要策略組織與使用企業有價值資料新舊技術管理迅速變革52%33%12%3%企业知识的保留与转换知识资产的投资精简与退休人员轮替生产力能力重复能量消耗过多的会议沟通问题组织目标下达决策可行性快速非正规为何知识如此迫切?“Thechiefeconomicpriorityfordevelopedcountriesistoraisetheproductivityofknowledge...Thecountrythatdoesthisfirstwilldominatethetwenty-firstcenturyeconomically.”开发中国家首要经济目标为知识的创造力…谁先掌握谁就统领二十一世纪的经济PeterF.Drucker资料知识形成流程DataWarehouseKnowledgeSelection/cleansingPreprocessingTargetDataPreprocessedDataPatternTransformedDataDataMiningTransformationInterpretation/EvaluationIntegrationRawDataUnderstandingBI结构Monitor&IntegratorCompleteDataWarehouseExtractTransformLoadRefreshmetadataOLAPServer1.ComprehensivePerformanceManagement2.Analysis3.Query4.Reports5.DataminingDataSourcesToolsServeDataMartsOperationalDBsOthersourcesBusinessIntelligence资料采矿/探勘ruleinductionneuralnetworkstreegeneratorsruleinductionsupportvectormachineregressionCOWEBestimationmaximizationk-meansroughsetsapriorigranularcomputingtrendfunctionsruleinductionneuralnetworksCategorizeyourcustomersorclientsClassificationForecastfuturesalesorusagePredictionGroupsimilarcustomersorclientsSegmentationDiscoverproductsthatarepurchasedtogetherAssociationFindpatternsandtrendsovertimeSequenceGainingmarketintelligencefromnewsfeedsSreekumarSukumaranandAshishSurekaIntegratedBISystemsCompleteDataWarehouseETLStructuralDataDBMSFileSystemXMLEALegacyUnstructuredDataCMSScannedDocumentsEmailETLTexttaggor&AnnotatorIntermediaDataRDBMSXMLSreekumarSukumaranandAshishSureka知识来源与价值0102030405060708090100資料量市場化價值非結構資料結構化資料“Onaverage,professionalusersspend11hoursperweeklookingforinformation.Seventy-onepercentsaidtheycouldnotfindwhattheywerelookingfor.—InformationManagementSoftwareLazardFreres&Co.LLCFebruary2001Thevolumeofdigitizedinformationwilldoubleeveryyearfrom2000to2005(anincreaseto30timestoday'svolume).—KnowledgeManagementvs.InformationManagementGartnerGroupSeptember2000网络讯息新闻报导专利电子邮件文件…文献问题出版统计8TB(书籍),25TB(新闻),20TB(杂志),2TB(期刊)平均每分钟科学知识增加2000页新材料的阅读须时5年(24hrs/day)HowCanIKeepUpWiththeLiterature?Evolution“Tostudyhistoryonemustknowinadvancethatoneisattemptingsomethingfundamentallyimpossible,yetnecessaryandhighlyimportant.”FatherJacobus(Hesse'sMagisterLudi)DasGlasperlenspiel(TheGlassBeadGame)文件知识发掘与管理技术检索文件过滤分类摘要分群自然语言内文分析萃取探勘可视化萃取应用探勘应用信息存取知识认知信息结构知识产生RawtextTermsimilarityDocsimilarityVectorcentroid分群d分类META-DATA/ANNOTATIONddddddddddddddttttttttttttStemming&StopwordsTokenizedtextTermWeightingw11w12…w1nw21w22…w2n……wm1wm2…wmnt1t2…tnd1d2…dmSentenceselection摘要TextETLtoMiningCallTaker:JamesDate:Aug.30,2002Duration:10min.CustomerID:ADC00123Q:custsyshasstoppedworking.A:checkedcustbiosanditneedupdated.…UnstructuredDataStructuredData[CallTaker]James[Date]2002/08/30[Duration]10min.[CustomerID]ADC00123[Noun]Customer[Software]BIOS[Subj...Verb]customersystem..stop[SW..Problem]BIOS..needOriginalDataMetaDataLinguisticAnalysisTaggingDependencyAnalysisNamedEntityExtractionIntentionAnalysisCategoryDictionarySynonymDictionaryCategoryItemVisualization&InteractiveMiningMiningIBMTAKMI(Nasukawa,Nagano,1999)Miningtarget:individualtextMiningunit:textscategorylabeleditemsextractedfromtextusingNLPTextisTough其系一个极不容易表达的抽象性概念(AI-Complete)是许多概念彼此间抽象而复杂的无尽关系组合一种名词可以代表很多不同的概念CELL,IV类似的概念也有很多种方式可以表达(aliases)spaceship,flyingsaucer,UFO,figmentofimagination概念是很难加以可视化的高维度其分析构面可能高达成百上千TextMiningisEasy重复性很高只要一些简单的算法,就可以从一些极为粗糙的工作中,得到不错的结果找出重要词组找到有意义的相关字从文章中建立摘要主要问题:结果评估必须定义目标及目的TraditionalIR-basedExtractiondocvector1profilevectordocvectorn…scoringscorejudgmentsrejecteddocsaccepteddocsnoyesvectorlearningthresholdlearningutilityfunctionOntologyVectorinitializationThresholdinitializationReuseretrievalalgorithmsNewthresholdalgorithmsScore?thresholdText-DBLexiconsLuhn'sideasItishereproposedthatthefrequencyofwordoccurrenceinanarticlefurnishesausefulmeasurementofwordsignificance.Itisfurtherproposedthattherelativepositionwithinasentenceofwordshavinggivenvaluesofsignificancefurnishausefulmeasurementfordeterminingthesignificanceofsentences.Thesignificancefactorofasentencewillthereforebebasedonacombinationofthesetwomeasurements.信息萃取foodscience.com-Job2JobTitle:IceCreamGuruEmployer:foodscience.comJobCategory:Travel/HospitalityJobFunction:FoodServicesJobLocation:UpperMidwestContactPhone:800-488-2611DateExtracted:January8,2001Source:www.foodscience.com/jobs_midwest.htmlOtherCompanyJobs:foodscience.com-Job1InformationExtractionGiven:SourceoftextualdocumentsWelldefinedlimitedquery(textbased)Find:SentenceswithrelevantinformationExtracttherelevantinformationandignorenon-relevantinformation(important!)LinkrelatedinformationandoutputinapredeterminedformatAdvisoryProgrammer-Oracle(Austin,TX)ResponseCode:1008-0074-97-iexc-jcnResponsibilities:Thi
本文标题:由知识挖掘提升商务智能应用(谢邦昌)
链接地址:https://www.777doc.com/doc-1718 .html