您好,欢迎访问三七文档
当前位置:首页 > 电子/通信 > 数据通信与网络 > Web信息抽取中的文本分类
摘要摘要在机器学习理论中支持向量机(SVM)有着重要的地位,无论是求解分类问题还是求解回归问题,SVM都有着广泛的应用。本文简单的介绍了SVM的基本原理,讨论了SVM在文本分类中的应用,并详细的分析了如何利用SVM构造文本分类器。这里说明了文本分类的详细处理过程,并介绍了这些过程中的关键技术,如:分词技术、向量空间模型(VSM)、特征选取技术和SVM的交叉验证技术等等。结合着分析和讨论又概略的说明了利用MicrosoftVisualC++6.0创建文本分类系统的过程,介绍了重要的类和关键处理函数的实现和优化,以及如何利用动态链接库来实现C++到Java的迁移。最后给出了由本系统得到的实验数据和结论。关键字:机器学习文本分类支持向量机(SVM)ABSTRACTABSTRACTSupportVectorMachines(SVM)hasanimportantpositioninMachinelearningtheory,whetheritistosolvetheclassificationproblemorrequestforthereunificationissue,SVMhasawiderangeofapplications.Inthispaper,ashortintroductionintothebasicprinciplesofSVM,adetaileddiscussionoftheSVMinthetextclassification,andacarefulanalysisofhowtomakeuseofSVMtoconstructclassifierforatextclassification.Here'sthetextofthedetailedclassificationprocessandintroducedinthecourseofthesekeytechnologies,suchas:segmentationtechnology,vectorspacemodel(VSM),featuresselectiontechnology,cross-verificationtechnologyoftheSVMandsoon.WiththeanalysisanddiscussionalsobrieflydescribedtheprocessofmakinguseofMicrosoftVisualC++6.0tocreatethetextclassificationsystem,introducedtherealizationandoptimizationofthekeyclassandimportantfunctions,andhowtouseofdynamiclinklibrarytoachievethemigrationfromC++toJava.Finally,theexperimentaldataandconclusionsproducedbythissystemareshown.Keywords:machinelearningtextclassificationSVM(supportvectormachine)目录目录第一章引言.....................................................................................................................11.1总体项目背景.......................................................................................................11.1.1基于Web的信息集成系统.....................................................................11.1.2基于Web的信息集成系统的需求和系统结构.....................................21.2文本分类系统的任务和目标...............................................................................31.3本文主要研究内容...............................................................................................4第二章相关理论.............................................................................................................72.1文本自动分类.......................................................................................................72.3支持向量机(SVM)................................................................................................82.4SVM的原理..........................................................................................................92.4.1线性支持向量机.......................................................................................92.4.2非线性支持向量机.................................................................................112.5SVM文本分类....................................................................................................13第三章需求分析...........................................................................................................153.1SVM的两个阶段................................................................................................153.2训练阶段目标.....................................................................................................163.3测试阶段目标.....................................................................................................183.4外部接口.............................................................................................................18第四章总体设计与实现工具的选择..........................................................................214.1总体结构.............................................................................................................214.2训练阶段.............................................................................................................214.2.1分词及词频统计.....................................................................................214.2.2文本向量空间模型(VSM)及文本特征选取.........................................274.2.3文本向量化.............................................................................................314.2.4文本分类器.............................................................................................324.3测试阶段.............................................................................................................364.3.1分词及词频统计.....................................................................................36目录4.3.2文本向量化.............................................................................................364.3.3分类处理.................................................................................................374.4实现工具的选择与跨语言迁移.........................................................................37第五章详细设计与实现..............................................................................................395.1界面设计.............................................................................................................395.2配置文件config.xml..........................................................................................405.3LIST类................................................................................................................405.4Frequency类................................................
本文标题:Web信息抽取中的文本分类
链接地址:https://www.777doc.com/doc-5891024 .html