您好,欢迎访问三七文档
实验二:天气决策树的构造输入数据例子编号属性分类天况温度湿度风况1晴热大无N2晴热大有N3多云热大无P4雨中大无P5雨冷正常无P6雨冷正常有N7多云冷正常有P8晴中大无N9晴冷正常无P10雨中正常无P11晴中正常有P12多云中大有P13多云热正常无P14雨中大有N生成的决策树算法•选择一个属性,把数据分割为K份。•选择的准则:InformationGainciiivAValuesvvppSEntropySEntropySSSEntropyASGain12)(log)()()(),(原始数据的熵•本题中物体集C有十四个例子,9个正例,5个反例。于是:M(C)=-9/14*log2(9/14)-5/14*log2(5/14)=0.940bits选取属性“天况”的InformationGain•计算各分支的熵–“晴”的分支含2个正例3个反例,所需期望信息量为:M(天况为晴)=-2/5*log2(2/5)-3/5*log2(3/5)=0.971bits–“多云”的分支,含4个正例0个反例:M(天况为多云)=0–“雨”的分支,含3个正例2个反例:M(天况为雨)=-3/5*log2(3/5)-2/5*log2(2/5)=0.971bits•则以“天况”作划分后,对应决策树的信息量为:B(C,“天况”)=5/14*0.971+4/14*0+5/14*0.971=0.694bits•选择天况做为判别条件的InformationGainGain(C,“天况”)=M(C)-B(C,“天况”)=0.940-0.694=0.247bits各属性InformationGain的比较•Gain(C,“天况”)=M(C)-B(C,“天况”)=0.940-0.694=0.247bits•Gain(C,“温度”)=M(C)-B(C,温度)=0.940-0.911=0.029bits•Gain(C,“湿度”)=M(C)-B(C,湿度)=0.940-0.788=0.152bits•Gain(C,“风况”)=M(C)-B(C,风况)=0.940-0.892=0.048bits生成的决策树继续划分“晴”的分支•Gain(C晴,“温度”)=M(天况为晴)-B(天况为晴,“温度”)=0.571•Gain(C晴,“湿度”)=M(天况为晴)-B(天况为晴,“湿度”)=0.971•Gain(C晴,“风况”)=M(天况为晴)-B(天况为晴,风况)=0.420继续划分“多云”的分支•全部为正例,无须划分。继续划分“雨”的分支•Gain(C雨,“温度”)=M(天况为雨)-B(天况为雨,“温度”)=0.020•Gain(C雨,“湿度”)=M(天况为雨)-B(天况为雨,“湿度”)=0.020•Gain(C雨,“风况”)=M(天况为雨)-B(天况为雨,风况)=0.971生成的决策树
本文标题:实验二.天气决策树
链接地址:https://www.777doc.com/doc-7140568 .html