您好,欢迎访问三七文档
距离判别一、实验目的和要求掌握距离判别分析的理论与方法、模型的建立与误差率估计;掌握利用判别分析的SAS过程解决有关实际问题.实验要求:编写程序,结果分析.实验内容:要求:1题必做,2,3,4题可选1-2题1.写出几种距离公式,两总体距离判别准则;一.几种距离公式:1.欧氏距离2121])([),(jkikpkjixxdxx2.绝对距离pkjkikjixxd1),(xx3.Minkowski距离mpkmjkikjixxd11]||[),(xx其中1m.Minkowski距离又称mL距离,2L距离即欧氏距离,1L距离即绝对距离.4.Chebyshev距离jkikpkjixxd1max),(xxChebyshev距离是Minkowski距离当m时的极限.以上距离与各变量的量纲有关.为消除量纲的影响,可对数据进行标准化,然后用标准化数据计算距离.标准化数据即pknisxxxkkikik,...,2,1;,...,2,1,*其中ninikikkikkxxnsxnx1122)(11,1.5.方差加权距离21122])([),(pkkjkikjisxxdxx易证,标准化数据*ikx的欧氏距离既是方差加权距离.6.马氏距离211)]()),(jiTjijidxxSx[(xxx其中S是由样品nxxx,...,,21算得的样本协方差矩阵:niTiin1))((11xxxxS,其中.11niinxx令nxnijjiijdDdd)(),,(xx形成n个样品nxxx,...,,21两两之间的距离矩阵00021221112nnnnddddddD其中ijd=jid二.两个总体的距离判别准则1.距离判别准则21,GG为两个p维已知总体,均值向量21,μμ,协方差矩阵21,ΣΣ,Tpxxx),,,(21x为待判样品,距离判别准则为)()(,)()(,121221Gx,Gx,GxGx,Gx,Gxdddd若若(5.1)说明:马氏距离思想——极大似然思想一般p维总体,),(~),,(~2211ΣμΣμppNGNG,协方差矩阵同为Σ,概率密度为)()(21exp)2(11112121μxμxTpf)()(21exp)2(12122122μxμxTpf则)()(21Gx,Gx,dd)()()()(212111μxμxμxμxTT)()(21xxff距离判别准则转化为1)()(,1)()(,21221xxGxxxGx1ffff若若与似然比准则一致.2.ΣΣΣ21情形(1)线性判别函数样品x到总体21,GG的马氏平方距离之差)()(1222Gx,Gx,dd)()()()(111212μxμxμxμxTT)2()2(111111212121μΣμxΣμxΣxμΣμxΣμxΣxTTTTTT]21[2]21[21111121212μΣμxΣμμΣμxΣμTTTT)]()([212xxWW其中2122212222111111111121b,)(21b,)(μΣμμΣaxaxμΣμμΣaxaxTTTTbWbW=-,==-,=(5.2))(),(21xxWW称为x的线性判别函数.距离判别准则化为线性判别准则)()(,)()(,21221xxGxxxGx1若若(5.3)另一方面)()(1222Gx,Gx,dd111212121)(2μμμμxμμTTT112111211212121)(2μμμμμμμμxμμTTTTT)()()(221121121μμμμxμμTT)()(2121μxμμT)(2)(21xμxaWT其中)(2121μμμ,)(211μμΣa=,而)()(1μxaxTW为x的线性判别函数.判别准则化为线性判别准则0)(,0)(,2xGxxGx1WW若若(5.4))(),(),(21xxx的线性判别函数,简单易求.(2)样品判别函数实际中21,μμ,协方差矩阵Σ未知,设)1()1(2111,,,nxxx)(和)2()2(2212,,,nxxx)(来自总体21,GG的训练样本,则21,μμ,Σ的估计为211)2()1(221)1()1(111ˆ,1ˆniiniinnxxμxxμ2)1()1(ˆ212111nnnnSSΣS——为Σ的联合估计其中Tiniin)()(11)1()1(1)1()1(111xxxxSTiniin)()(11)2()2(1)2()2(222xxxxS分别为21,GG的样本协方差矩阵,由此得线性判别函数)(),(),(21xxx的估计)(21)(ˆ),(ˆ)(ˆ)(21bˆ,ˆˆ)(ˆ)(21bˆ,ˆˆ)(ˆ)2()1()2()1(1)2(1)2(2)2(12222)1(1)1(1)1(11111xxxxxSaxxaxxSxxSaxaxxSxxSaxax,=其中=-,=其中=-,=其中TTTTTWbWbW(5.5)两个总体的距离判别准则为)(ˆ)(ˆ,)(ˆ)(ˆ,21221xxGxxxGx1若若(5.6)或0)(ˆ,0)(ˆ,2xGxxGx1WW若若(5.7)3.21ΣΣ的情形)()()(11121μxΣμxx1Td)()()(212222μxΣμxxTd为x的二次函数,称为二次判别函数距离判别准则)()(,)()(,222122221xxGxxxGx1dddd若若(5.8)以)2(2)1(1ˆ,ˆxμxμ,2211ˆ,ˆSΣSΣ估计21,μμ及21,ΣΣ可得样品判别函数:)()()(ˆ)1(11)1(21xxSxxxTd)()()(ˆ)2(12)2(22xxSxxxTd样品判别准则)(ˆ)(ˆ,)(ˆ)(ˆ,212222122xxGxxxGx1dddd若若2.书上5.33.为了研究2005年全国各地区及国有控股工业企业的经营状况,数据见表1:2005经济指标:其中:X1—工业增加率(%),X2—总资产贡献率(%),X3—资产负债率(%),X4—流动资产周转次数(次),X5—工业成本费用利用率(%),X6—全员劳动生产率(万元/人.年),X7—产品销售率(%)(1)请用一种聚类分析方法将29个省市分为3种类型(广东、西藏除外);(2)利用距离判别建立判别函数,判定广东、西藏分别属于哪个发展类型?表32005经济指标样品序号地区X1X2X3X4X5X6X71北京26.914.531.141.886.3917.9698.992上海2811.743.61.998.5727.5799.23天津32.913.960.192.210.7721.27101.984河北30.3810.464.012.315.9611.2898.675山西37.489.467.821.716.827.9397.856内蒙古43.449.864.322.087.9416.3498.237辽宁28.767.559.332.152.7814.1999.868吉林29.488.560.572.113.4512.2999.459江苏24.3411.359.672.294.8915.9799.4110浙江24.8513.457.412.925.2824.6299.7211安徽34.5411.262.832.186.1511.7798.8912福建28.8711.956.162.385.7415.3899.4913江西27.219.769.382.0148.8699.4914山东36.5915.860.182.5510.8318.1799.0615河南31.910.265.622.065.348.8398.6116湖北33.279.257.341.699.0513.6899.6317湖南37.1312.767.232.074.2412.7199.5218广西31.6410.862.912.095.8810.4299.6919海南35.4411.754.231.9710.9514.26101.320重庆25.958.258.921.583.718.3499.3821四川36.299.164.341.567.3111.26101.2422贵州36.459.766.391.525.779.5299.0623陕西41.0115.961.881.718.9512.2898.7624甘肃25.769.559.322.33.559.0298.9625青海38.7712.268.561.3822.441797.926宁夏33.625.660.941.463.37999.3827黑龙江50.135.454.52.4239.4919.8197.7128云南44.7620.147.441.513.4122.54100.1329新疆45.2123.950.583.1527.124.8399.931广东26.511353.212.396.724.3498.712西藏55.734.725.480.9711.86.3193.68(1)代码:用谱系聚类中的最长距离法将29个省市分为三类dataexamp3;inputprovince$x1-x7;cards;北京26.914.531.141.886.3917.9698.99上海2811.743.61.998.5727.5799.2天津32.913.960.192.210.7721.27101.98河北30.3810.464.012.315.9611.2898.67山西37.489.467.821.716.827.9397.85内蒙古43.449.864.322.087.9416.3498.23辽宁28.767.559.332.152.7814.1999.86吉林29.488.560.572.113.4512.2999.45江苏24.3411.359.672.294.8915.9799.41浙江24.8513.457.412.925.2824.6299.72安徽34.5411.262.832.186.1511.7798.89福建28.8711.956.162.385.7415.3899.49江西27.219.769.382.0148.8699.49山东36.5915.860.182.5510.8318.1799.06河南31.910.265.622.065.348.8398.61湖北33.279.257.341.699.0513.6899.63湖南37.1312.767.232.074.2412.7199.52广西31.6410.862.912.095.8810.4299.69海南35.4411.754.231.9710.9514.26101.3重庆25.958.258.921.583.718.3499.38四川36.299.164.341.567.3111.26101.24贵州36.459.766.391.525.779.5299.06陕西41.0115.961.881.718.9512.2898.76甘肃25.769.559.322.33.559.0298.96青海38.7712.268.561.3822.441797.9宁夏33.625.660.941.463.37999.38黑龙江50.135.454.52.4239.4919.8197.71云南44.7620.147.441.513.4122.54100.13新疆45.2123.950.58
本文标题:距离判别-sas
链接地址:https://www.777doc.com/doc-2345509 .html