您好,欢迎访问三七文档
第1/92页KNNK第2/92页第3/92页第4/92页:,x∈X,X:C={c1,c2,…cn}:xc(x)∈C,c(x)XC第5/92页2(binary)(multi-class)2(multi-label):Reuters第6/92页ABCDEFGHIJKNOPQRSUVXTBTDTETFTGTHTJTKTLTMTNTPTQTSTUTV第7/92页第8/92页MultimediaGUIGarb.Coll.SemanticsMLPlanningplanningtemporalreasoningplanlanguage...programmingsemanticslanguageproof...learningintelligencealgorithmreinforcementnetwork...garbagecollectionmemoryoptimizationregion...“planninglanguageproofintelligence”(AI)(Programming)(HCI)......第9/92页F1第10/92页ContingencyTable(precision)=a/(a+b)(recall)=a/(a+c)fallout=b/(b+d)ABCD第11/92页BEPFBEPbreak-evenpointBEPF=1BEPFBEPF1p=rBEPF1()()rpprrpFβ++=221,ββrpprF+=21第12/92页macro-averagingmicro-averaging第13/92页TRECCMU,BERKLEY,CORNELL第14/92页863()第15/92页1992第16/92页第17/92页第18/92页第19/92页)()()|()|(EPHPHEPEHP=)()()|(EPEHPEHP∧=)()()|(HPEHPHEP∧=)()|()(HPHEPEHP=∧第20/92页{c1,c2,…cn}EEP(E))()|()()|(EPcEPcPEcPiii=∑∑====niiiniiEPcEPcPEcP111)()|()()|(∑==niiicEPcPEP1)|()()(第21/92页(cont.)::P(ci):P(E|ci)P(ci)DciniP(ci)=ni/|D|:P(E|ci)meeeE∧∧∧=21第22/92页P(ej|ci))|()|()|(121∏==∧∧∧=mjijimicePceeePcEP第23/92页NaïveBayes()VDci∈CDiDCiP(ci)=|Di|/|D|niDiwj∈VnijDiwijP(wi|ci)=(nij+1)/(ni+|V|)第24/92页NaïveBayes()XnX:wiXi)|()(argmax1∏=∈niiiiCiccwPcP第25/92页NaïveBayesC={allergy,cold,well}e1=sneeze;e2=cough;e3=feverE={sneeze,cough,¬fever}ProbWellColdAllergyP(ci)0.90.050.05P(sneeze|ci)0.10.90.9P(cough|ci)0.10.80.7P(fever|ci)0.010.70.4第26/92页NaïveBayes(cont.)P(well|E)=(0.9)(0.1)(0.1)(0.99)/P(E)=0.0089/P(E)P(cold|E)=(0.05)(0.9)(0.8)(0.3)/P(E)=0.01/P(E)P(allergy|E)=(0.05)(0.9)(0.7)(0.6)/P(E)=0.019/P(E):allergyP(E)=0.089+0.01+0.019=0.0379P(well|E)=0.23P(cold|E)=0.26P(allergy|E)=0.50第27/92页Play-tennis:P(xi|C)OutlookTemperatureHumidityWindyClasssunnyhothighfalseNsunnyhothightrueNovercasthothighfalsePrainmildhighfalsePraincoolnormalfalsePraincoolnormaltrueNovercastcoolnormaltruePsunnymildhighfalseNsunnycoolnormalfalsePrainmildnormalfalsePsunnymildnormaltruePovercastmildhightruePovercasthotnormalfalsePrainmildhightrueNP(p)=9/14P(n)=5/14第28/92页outlookP(sunny|p)=2/9P(sunny|n)=3/5P(overcast|p)=4/9P(overcast|n)=0P(rain|p)=3/9P(rain|n)=2/5temperatureP(hot|p)=2/9P(hot|n)=2/5P(mild|p)=4/9P(mild|n)=2/5P(cool|p)=3/9P(cool|n)=1/5humidityP(high|p)=3/9P(high|n)=4/5P(normal|p)=6/9P(normal|n)=2/5windyP(true|p)=3/9P(true|n)=3/5P(false|p)=6/9P(false|n)=2/5第29/92页Play-tennis:XX=rain,hot,high,falseP(X|p)·P(p)=P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p)=3/9·2/9·3/9·6/9·9/14=0.010582P(X|n)·P(n)=P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n)=2/5·2/5·4/5·2/5·5/14=0.018286Xn“”第30/92页Joachims(1996)20100022/31/3205%89%第31/92页第32/92页K第33/92页KKNNxx1xx1-KNNKNN第34/92页KNNXyyxkA,AXn1,n2c1,c2p(c1|y)p(c2|y),c1,c2()(,)MAXxNsimyMAXsimxy∈=max{|(,)()}AxNsimxysimy=∈=11(|)12npcynn=+22(|)12npcynn=+第35/92页kNNk=1,Ak=4Bk=10Bk35第36/92页KNN第37/92页KNN.kk第38/92页().m.m.tf/idf.第39/92页KNNKK15KNNKNN第40/92页KNNNBKNNNBKNNNBNB第41/92页第42/92页CLSID3C4.5CARTAssistant第43/92页第44/92页第45/92页()OutlookSunnyHumidityNormal=∩=()OutlookOvercast∪=()OutlookRainWindWeak∪=∩=第46/92页-第47/92页NP-第48/92页第49/92页第50/92页()(,)()()vvvValueASGainSAEntropySEntropySS∈=−∑第51/92页OutlookTemperatureHumidityWindyClasssunnyhothighfalseNsunnyhothightrueNovercasthothighfalsePrainmildhighfalsePraincoolnormalfalsePraincoolnormaltrueNovercastcoolnormaltruePsunnymildhighfalseNsunnycoolnormalfalsePrainmildnormalfalsePsunnymildnormaltruePovercastmildhightruePovercasthotnormalfalsePrainmildhightrueN第52/92页(),ValuesWindWeakStrong=[9,5]S=+−[6,2]WeakS←+−[3,3]StrongS←+−{,}(,)()vvWeakStrongSGainSWindEntroySEntropyS∈=−∑()(8/14)()(6/14)()WeakStrongEntropySEntropySEntropyS=−−0.949(8/14)0.811(6/14)1.00=−−0.048=第53/92页S:[9+,5-]E=0.940Humidity3+4-E=0.9856+,1-E=0.592Gain(S,Humidity)=0.940-(7/14)0.985-(7/14)0.592S:[9+,5-]E=0.940Wind6+2-E=0.8113+3-E=1.000Gain(S,Wind)=0.940-(8/14)0.811-(6/14)0.100highnormalstrongweak第54/92页Gain(S,Outlook)=0.246Gain(S,Humidity)=0.151Gain(S,Wind)=0.048Gain(S,Temperature)=0.029Outlook第55/92页D1,D2,…D149+,5-OutlookSunnyD1,D2,D8,D9,D112+,3-RainD4,D5,D6,D10,D143+,2-D3,D7,D12,D134+,0-Overcast?Ssunny={D1,D2,D8,D9,D11}GainSsunny,Humidity=0.970-(3/5)0.0-(2/5)0.0=0.970Gain(Ssunny,Temperature)=0.970-(2/5)0.0-(2/5)1.0-(1/5)0.0=0.570Gain(Ssunny,Wind)=0.970-(2/5)1.0-(3/5)0.918=0.019?Yes第56/92页ID3RootAÅAttributesRootÅAviRootA=viExamplesviExamplesAviExamplesvilable=Examples(target-attribute)ID3(examplevi,target-attribute,attributes-{A})Root第57/92页C4.5C4.5ID3第58/92页overfitting第59/92页forwardpruningbackwardpruning第60/92页第61/92页第62/92页{,,,…}Yahoo{spam,not-spam}第63/92页TextClustering第64/92页第65/92页第66/92页:第67/92页................................第68/92页.animalvertebratefishreptileamphib.mammalworminsectcrustaceaninvertebrate第69/92页vs.(bottom-up)(partitional,top-down)第70/92页(HAC)cicjci∪cjcicj第71/92页:d1d2d3d4d5d1,d2d4,d5d3d3,d4,d5第72/92页SingleLink:CompleteLink:GroupAverage:第73/92页ci,cjSingleLink:
本文标题:文本分类与聚类
链接地址:https://www.777doc.com/doc-5323008 .html