您好,欢迎访问三七文档
whgong@zjut.edu.cnZZID3Z“54050”VS.ZZ(prediction)ZZZZZZ——(1)ZZZ——(2)Z“”——NAMERANKYEARSTENUREDMikeAssistantProf3noMaryAssistantProf7yesBillProfessor2yesJimAssociateProf7yesDaveAssistantProf6noAnneAssociateProf3noIFrank=‘professor’ORyears6THENtenured=‘yes’——NAMERANKYEARSTENUREDTomAssistantProf2noMerlisaAssociateProf7noGeorgeProfessor5yesJosephAssistantProf7yes(Jeff,Professor,4)Tenured?VS.Z“”ZZZZZZZZZZZZZageincomestudentcredit_ratingbuys_computer=30highnofairno=30highnoexcellentno31…40highnofairyes40mediumnofairyes40lowyesfairyes40lowyesexcellentno31…40lowyesexcellentyes=30mediumnofairno=30lowyesfairyes40mediumyesfairyes=30mediumyesexcellentyes31…40mediumnoexcellentyes31…40highyesfairyes40mediumnoexcellentno“buys_computer”age?overcaststudent?creditrating?=3040noyesfairexcellentyes30..40noyesnoyesID3ID3ID3(1)1.2.3.——“”“”4.ID3(2)5.6.ZZ——ZfunctionID3(R:,C:,S:)returns;beginIfSreturnFailure;IfSreturn;IfRreturnS;//D=RGain(D,S);{dj|j=1,2,..,m}=D;{Sj|j=1,2,..,m}=SSjDdjDd1,d2,..,dmID3(R-{D},C,S1),ID3(R-{D},C,S2),..,ID3(R-{D},C,Sm);endID3;ID31np1/nLog2(n)2nP=(p1,p2…pn)P3TC1C2..CkTInfo(T)=E(p)PC1C2..CkP=(|C1|/|T|,…..|Ck|/|T|)4XTT1,T2…TnTTiInfo(Ti)Info(X,T)=((|Ti|/|T|)Info(Ti))5TXTGain(X,T)=Info(T)-Info(X,T)SS={E1,...,En},P={p1,...,pn}I(ei)=-log2pi26:I(e)=-log2(1/26)=4.72500I(e)=-log2(1/2500)=11.31SsmmCi(i=1,…,m)siCipiCisi/s2Av{a1,a2,…,av}ASv{S1,S2,…,Sv}SiSAajASsijSjCiA3--(Gain)=–----64641286464128643232606464132631--64641286064646412864132643232631“”ms1,s2…,sms=s1+s2+…+smI(s1,s2…,sm)=-pilog2(pi)pi=si/s--64641286064646412864132643232631(m=2):/s1=641,s2=383s=s1+s2=1024p1=s1/s=641/1024=0.6260p2=s2/s=383/1024=0.3740I(s1,s2)=I(641,383)=-(p1log2(p1)+p2log2(p2))=0.9537--64641286464128643232606464132631E1.:I(128,256)=0.9183:I(256,0)=0:I(257,127)=0.9157:(128+256)/1024=0.375:256/1024=0.25:(257+127)/1024=0.375E(=0.375*0.9183+0.25*0+0.375*0.9157=0.6877Gain()=I(641,383)-E()=0.9537–0.6877=0.2660--2.:I(160,128)=0.9911:I(289,191)=0.9697:I(192,64)=0.8133:288/1024=0.2813:480/1024=0.4687:256/1024=0.25E(=0.2813*0.9911+0.4687*0.9697+0.25*0.8133=0.9361Gain()=I(641,383)-E()=0.9537–0.9361=0.017660128132643263164641283264646464--3.:I(420,64)=0.5635:I(221,319)=0.9761:484/1024=0.4727:540/1024=0.5273E(=0.4727*0.5635+0.5273*0.9761=0.7811Gain()=I(641,383)-E()=0.9537–0.7811=0.172664641286012832631646464641326432--4.:I(480,192)=0.8631:I(161,191)=0.9948:672/1024=0.6563:352/1024=0.3437E(=0.6563*0.8631+0.3437*0.9948=0.9048Gain()=I(641,383)-E()=0.9537–0.9048=0.045364646464326316412860641286413232--E(=0.6877Gain()=0.2660E(=0.9361Gain()=0.0176E(=0.7811Gain()=0.1726E(=0.9048Gain()=0.0453--64641286464606464132631128643232--64641286464606464132631--1.I(128,256)=0.91836464I(0,128)=0:128/384=0.3333I(64,128)=0.9183:192/384=0.5I(64,0)=0:64/384=0.1667646412864641286464E(=0.3333*0+0.5*0.9183+0.1667*0=0.4592Gain()=I(128,256)-E()=0.9183–0.4592=0.4591--2.I(128,256)=0.9183I(128,0)=0:128/384=0.3333I(0,256)=0:256/384=0.666764641286464E(=0.3333*0+0.6667*0=0Gain()=I(128,256)-E()=0.9183–0=0.9183:64641286464--64646064641326316464128--606464132631--1.I(257,127)=0.9157E(=0.3333*1+0.6667*0.8050=0.8700Gain()=I(257,127)-E()=0.9157–0.8700=0.0457I(64,64)=1:128/384=0.3333I(193,63)=0.8050:256/384=0.6667606464132631601326316464--2.I(257,127)=0.9157E(=0.6771*0.8051+0.3229*0.9998=0.8680Gain()=I(257,127)-E()=0.9157–0.8680=0.0477I(196,64)=0.8051:260/384=0.6771I(61,63)=0.9998:124/384=0.3229606464132631606316464132--3.I(257,127)=0.9157E(=0.6667*0+0.3333*0.0659=0.0220Gain()=I(257,127)-E()=0.9157–0.0220=0.8937:I(256,0)=0:256/384=0.6667I(1,127)=0.0659:128/384=0.3333606464132631646316064132--606413264631--64631……ageincomestudentcredit_ratingbuys_computer=30highnofairno=30highnoexcellentno31…40highnofairyes40mediumnofairyes40lowyesfairyes40lowyesexcellentno31…40lowyesexcellentyes=30mediumnofairno=30lowyesfairyes40mediumyesfairyes=30mediumyesexcellentyes31…40mediumnoexcellentyes31…40highyesfairyes40mediumnoexcellentnoID3(1)123——ZGain(age)=0.246ZGain(income)=0.029ZGain(student)=0.151ZGain(credit_rating)=0.048ID3(2)ID3(3)age?overcaststudent?creditrating?=3040noyesfairexcellentyes30..40noyesnoyesZZZ——e.g.Z“”——IF-THEN-IFTHENIF-THENIFage=“=30”ANDstudent=“no”THENbuys_computer=“no”IFage=“=30”ANDstudent=“yes”THENbuys_computer=“yes”IFage=“31…40”THENbuys_computer=“yes”IFage=“40”ANDcredit_rating=“excellent”THENbuys_computer=“yes”IFage=“40”ANDcredit_rating=“fair”THENbuys_computer=“no”)()()|()|(DPhPhDPDhP=P(h)P(h|D)nX{x1,x2,…,xn}xkAkmC1,C2,…,CmCiP(Ci|X)P(Cj|X)1jm,jIXCiCiP(x)P(X|Ci)P(Ci)P(Ci)si/ssiCisP(x1|Ci)P(x2|Ci)…P(xn|Ci)P(xk|Ci)=sik/si1knsikAkxkCisiCisC1“”C2“”“”=“21”“”=“”“”=“”X={“”=“21”“”=“”“”=“”}P(X|C1)P(C1)P(X|C2)P(C2)P(C1)=9/14=0.64P(C2)=5/14=0.36P(“”=“=30”|C1)=2/9=0.22P(“”=“=30”|C2)=3/5=0.60P(“”=“”|C1)=6/9=0.67P(“”=“”|C2)=1/5=0.20P(“”=“”|C1)=6/9=0.67P(“”=“”|C2)=2/5=0.40P(X|C1)P(C1)=0.220.670.670.64=0.06P(X|C2)P(C2)=0.600.200.400.36=0.02XC1ID3ID3(1)(2)(1)(2)(3)ID3(4)ID3(1)(2)(3)
本文标题:数据挖掘之分类
链接地址:https://www.777doc.com/doc-5587580 .html