您好,欢迎访问三七文档
PrinciplesandApplicationsofBusinessIntelligenceChap4:分类1Introductionto商务智能方法与应用©LiuHongyanPrinciplesandApplicationsofBusinessIntelligence第4章分类Chapter4:ClassificationPrinciplesandApplicationsofBusinessIntelligenceChap4:分类2信息管理学院数据挖掘十大算法Thek-meansalgorithmTheApriorialgorithmExpectation–MaximizationPageRankAdaBoost分类算法C4.5CARTNaiveBayesk-nearestneighborclassificationSupportvectormachinesC4.5CARTNaiveBayesk-nearestneighborclassificationSupportvectormachinesC4.5CARTNaiveBayesk-nearestneighborclassificationSupportvectormachines决策树分类算法PrinciplesandApplicationsofBusinessIntelligenceChap4:分类3主要内容4.1概念4.2决策树分类方法4.3朴素贝叶斯分类方法4.4k近邻分类方法4.5分类性能的度量PrinciplesandApplicationsofBusinessIntelligenceChap4:分类44.1基本概念PrinciplesandApplicationsofBusinessIntelligenceChap4:分类5信息管理学院分类(classification):总结已有类别的对象的特点并进而进行未知类别对象的类别预测的过程用给定的训练集用来建立一个分类模型(或称分类器),所建立的分类模型用来预测数据库中类标号未知的数据元组的类别。训练数据集由一组数据库元组(称为训练样本、实例或对象)构成样本形式为(v1,v2,…,vn;c),其中vi表示属性值,c表示类标号。分类及其相关的基本概念PrinciplesandApplicationsofBusinessIntelligenceChap4:分类6分类及其相关的基本概念分类器(classifier)训练数据集(trainingdataset)分类属性(classlabelattribute),每个取值称为一个类别(classlabel)属性,用于描述一个对象的某个特性或性质测试数据集(testingdataset)PrinciplesandApplicationsofBusinessIntelligenceChap4:分类7信息管理学院分类属于有监督学习还是无监督学习?有监督学习(classification)训练集是带有类标签的;新的数据是基于训练集进行分类的无监督学习(clustering)训练集是没有类标签的;提供一组属性,然后寻找出训练集中存在的类别或者聚集PrinciplesandApplicationsofBusinessIntelligenceChap4:分类8信息管理学院人口、收入、信用——购买力性别、年龄、婚姻状况、收入——信用等级地点、产品、折扣——促销效果性别、收入、兴趣——偏好产品类型信用评分营销策略市场预测CRM分类算法的应用领域PrinciplesandApplicationsofBusinessIntelligenceChap4:分类9分类及其相关的基本概念客户编号年龄性别年收入(万)婚姻豪华车130女86已婚否230男65单身否330男90离异否430女75已婚否530-50女82已婚是630-50男91已婚是730-50女200离异是830-50女40单身否930-50男20离异否1050女96离异否1150女80单身否1250男50单身是1350女80离异否1450男92离异是训练数据集PrinciplesandApplicationsofBusinessIntelligenceChap4:分类10分类方法LazyEager构建模型测试、使用模型NAMERANKYEARSTENUREDMikeAssistantProf3noMaryAssistantProf7yesBillProfessor2yesJimAssociateProf7yesDaveAssistantProf6noAnneAssociateProf3noNAMERANKYEARSTENUREDTomAssistantProf2noMerlisaAssociateProf7noGeorgeProfessor5yesJosephAssistantProf7yesNAMERANKYEARSTENUREDMikeAssistantProf3noMaryAssistantProf7yesBillProfessor2yesJimAssociateProf7yesDaveAssistantProf6noAnneAssociateProf3noTomAssistantProf2noMerlisaAssociateProf7noGeorgeProfessor5yesJosephAssistantProf7yesPrinciplesandApplicationsofBusinessIntelligenceChap4:分类11分类:构建模型NAMERANKYEARSTENUREDMikeAssistantProf3noMaryAssistantProf7yesBillProfessor2yesJimAssociateProf7yesDaveAssistantProf6noAnneAssociateProf3noTrainingDataClassificationAlgorithmsIFrank=‘professor’ORyears6THENtenured=‘yes’Classifier(Model)PrinciplesandApplicationsofBusinessIntelligenceChap4:分类12NAMERANKYEARSTENUREDTomAssistantProf2noMerlisaAssociateProf7noGeorgeProfessor5yesJosephAssistantProf7yesTestingDataUnseenData(Jeff,Professor,4)ClassifierTenured?分类:测试分类模型并预测PrinciplesandApplicationsofBusinessIntelligenceChap4:分类13Ifage=“30-40”andincome=Highthencredit_rating=excellentnameageincomecredit_ratingli=30Lowfairwang=30Medfairhong30-40Highexcellentchen40Medexcellentzhao24-35Highexcellent…………nameageincomecredit_ratingxin40High?wu=30Low?hu30-40High?…………分类规则未知数据incomeincomeage?exfexfex=304030-40highlow,medlow,medhigh决策树检验集训练集学习算法学习算法训练集检验集模型未知数据分类的概念与过程PrinciplesandApplicationsofBusinessIntelligenceChap4:分类14分类技术决策树(decisiontree)朴素贝叶斯(NaïveBayes)K近邻(KnearestNeighbors)基于关联的分类支持向量机(SupportVectorMachines)人工神经网络LogisticRegression……PrinciplesandApplicationsofBusinessIntelligenceChap4:分类154.2决策树分类方法PrinciplesandApplicationsofBusinessIntelligenceChap4:分类164.2决策树分类方法4.2.1决策树的构建过程4.2.2属性的类型及分裂条件4.2.3决策树的剪枝PrinciplesandApplicationsofBusinessIntelligenceChap4:分类17决策树的概念决策树叶子节点:类别其余节点:测试属性树的层次根结点的层次为1根结点的子女结点的层次为2……边:一种基于此结点属性的判断(分裂)条件根节点叶子节点双亲节点子女节点决策树(decisiontree)是一个类似于流程图的树结构。树的最顶层节点是根节点,根节点与每个内部节点表示数据集合在某个属性上的测试,每个分枝代表一个数据子集的输出,而每个树叶节点代表类或类分布。PrinciplesandApplicationsofBusinessIntelligenceChap4:分类18信息管理学院=3030…4040yesnoexcellentfair例:预测顾客是否可能购买计算机的决策树agestudentcredit_ratingbuys_computer﹤30nofairno30-40noexcellentyes…………yesage?nonoyesyescredit-rating?student?PrinciplesandApplicationsofBusinessIntelligenceChap4:分类19信息管理学院TidRefundMaritalStatusTaxableIncomeCheat1YesSingle125KNo2NoMarried100KNo3NoSingle70KNo4YesMarried120KNo5NoDivorced95KYes6NoMarried60KNo7YesDivorced220KNo8NoSingle85KYes9NoMarried75KNo10NoSingle90KYes10RefundMarStTaxIncYESNONONOYesNoMarriedSingle,Divorced80K80KSplittingAttributes训练数据模型:决策树决策树分类实例PrinciplesandApplicationsofBusinessIntelligenceChap4:分类20信息管理学院应用决策树进行分类RefundMarStTaxIncYESNONONOYesNoMarriedSingle,Divorced80K80KRefundMaritalStatusTaxableIncomeCheatNoMarried80K?10测试数据Startfromtherootoftree.PrinciplesandApplicationsofBusinessIntelligenceChap4:分类21信息管理学院应用决策树进行分类RefundMarStTaxIncYESNONONOYesNoMarriedSingle,Divorced80K80KRefundMaritalStatusTaxableIncomeCheatNoMarried80K?10测试数据PrinciplesandApplicationsofBusinessIntelligenceChap4:分类22信息管理学院应用决策树进行分类RefundMarStTaxIncYESNONONOYesNoMarriedSingle,Divorced80K80KRefundMaritalStatusTaxableIncomeCheatNoMarried8
本文标题:商务智能分类算法
链接地址:https://www.777doc.com/doc-465 .html