您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > AI人工智能 > 降维技术(3)-PCA实例
§6.1.6应用示例例6.1企业的经济效益的分析某公司有20个工厂,现要对每个工厂作经济效益分析,从所取得的生产成果同所消耗的人力、物力、财力的比率,选取5个指标(变量)作分析:——固定资产的产值率;——净产值的劳动生产率;——百元产值的流动资金占用率;——百元产值的利润率;——百元资金的利润率。1x2x3x4x5x今对这20个工厂同时按这5项指标收集数据,见表6.1所示。表6.1原始数据表分量样品XlXX3X4X5X6X7X8X9X10X11X12X13X14X15X16X17X18X19X20243.87240.31211.15413.18349.60205.47298.11414.94287.25308.93608.40433.92572.63533.78545.70284.61572.07409.86564.02221.201652182101534916760772181231330813781140431112622392125081210211990967865131866473291431164436.468.8910.097.676.4712.335.054.104.297.632.940.692.763.803.556.4l2.315.894.9314.0834.5716.9229.7724.1416.2718.4827.3516.6517.6718.3924.5620.0612.08l1.599.4612.8317.7612.2328.5030.25149.8555.8980.13105.3599.4146.18138.7698.2058.3574.23223.37118.70110.4375.5561.1948.15162.1176.68233.5880.480.481–1.868–1.0210.433–0.786–2.6360.3560.508–0.318–0.9513.7850.2541.1640.5210.179–1.6622.506–0.7372.422–2.585386.0912403.666.3219.98105.33135.574195.033.067.1954.65iiSix1kx2kx3kx4kx5kx1y表6.1中的数据可看作是总体的容量n=20的样本的观测值,现在要求的主成分。1.先求出样本相关矩阵为此,先利用式(6.22)将表6.1中的数据标准化,得标准化矩阵TxxX),,(51201,,XXXRˆ520)(kixX再利用式(6.24)求出样本相关矩阵经计算得XXRT191ˆ55554951.04240.07346.05612.013958.04248.03495.014818.07762.014523.01ˆ称对R2.求的特征根与特征向量以及主成分由的特征方程解得5个非负特征根,其中。由于所以,只需求前两个主成分,,所对应的单位化特征向量的计算结果列在表6.2中:RˆRˆ0|ˆ|sIR738.1,721.22185.08918.0738.1721.255121ii12表6.2前两个特征根对应的单位化特征向量、特征根、方差贡献率2a分量特征向量方差贡献率0.5030.500–0.4790.0600.5132.7210.5442–0.3370.2920.3820.7460.3151.7380.34761a1ia2ia3ia4ia5iai由此得第一、第二主成分分别为:其中是经过标准化变换得到的标准化向量。3.主成分的意义与作用首先分析第一主成分有何意义。在中,的系数0.060相对地很小,这表明在中的负荷量很小,所起的作用很小。543211513.0060.0479.0500.0503.0xxxxxy543212315.0746.0382.0292.0337.0xxxxxyTxx),,(51TxxX),,(511y1y4x4x1y4x而的系数都在0.5左右,它们在中起着明显的减值作用,(百元产值的流动资金占用率)取值越大,的值就越小,是反映经营能力的变量。因此,在综合了其他变量反映的信息基础上,突出地反映了经营能力的大小。现用第一主成分来评价每个工厂的经营能力的大小:521,,xxx1y3x3x1y1y1y)20,,2,1(,513.0060.0479.0500.0503.0543211kxxxxxykkkkkk如第11工厂的,其中由表6.1中的数值可算得于是Txxx),,(5,111,1111)5,,1(,11,11iSxxxiiiii343.2,637.0,105.1,381.2640.157.13509.38640.6085,114,113,112,111,11xxxxx785.31,11y又如第6工厂,同理可算得其中于是将各工厂第一主成分的计算结果列在表6.1中的右侧一栏中,从的各值可以看出,第11工厂的经营能力最大,第6工厂的经营能力最小。TX)082.1,209.0,964.1,02.1,332.1(6)5,,1(,6,6iSxxxiiiii636.21,6y现在再来看第二主成分有何意义。在中,(百元产值的利润率)的系数最大(0.746),其他各变量的系数的绝对值都在0.3左右。取值越大,的取值也明显地增大。因此,在综合了其他变量反映的信息基础上,突出地反映了企业的盈利水平的高低。如第11工厂的为2y2y2y2y4x4x2,11y934.0343.2315.0637.0746.0)105.1(382.0381.2292.064.1337.02,11y又如第6工厂的为可见第11工厂的盈利能力比第6工厂强。例6.2服装定型的分类问题。某服装厂为解决服装定型的分类问题,曾对个成年男人进行了体型测量,共测量了16个指标:2,6y405.02,6y——身长,——裤长,——前胸,——袖长,——坐高,——下裆,——后背,——肋围,——胸围,——手长,——肩厚,——腰围,——头高,——领围,——肩宽,——腿肚。其测量数据从略。为了使加工出来的服装适合绝大多数成年男人穿用,在服装定型研究中,需要从上述16项指标中定出起主要作用的综合指标,依这些综合指标进行批量生产,这就归结成主成分分析问题了。1x5x9x13x2x3x4x6x7x8x10x11x12x14x15x16x在数据的处理过程中,为了验证观察结果的重复性,又将样品分为Ⅰ、Ⅱ两组,每组128人。1.分别求出两组的样本均值、样本均方差、样本相关矩阵。计算方法同例6.1,计算结果一并列入表6.3中。2.求出两组的的特征根、累积方差贡献率、相应的特征向量,并写出主成分:由两组的的特征方程与ixRˆRˆiiSRˆ0|ˆ|16(I)IR0|ˆ|16(II)IR分别解得16个非负特征根并分别计算出两组的累积方差贡献率,一并列入表6.4中。表6.3样本均值、样本均方差、样本相关矩阵表0;16211621ixix*1isixix*1isⅡ组Ⅰ组指标x1x2x3x4x5x6x7x8162588.785.536.494.274.719.235.5指标5.22.84.14.83.73.90.82.3身长x1164.56.80.710.370.940.840.780.560.11坐高x290.03.70.790.300.660.460.470.350.17胸围x385.73.20.360.310.330.220.100.300.50头高x438.16.50.960.740.380.840.800.520.04裤长x596.04.90.890.580.390.900.840.510.02下档x675.54.40.790.580.300.780.790.53–0.03手长x719.41.10.760.550.350.750.740.730.01领围x835.81.60.260.190.580.250.250.180.24前胸x936.02.60.210.070.280.200.180.180.29–0.04后背x1034.82.60.260.160.330.220.230.230.250.49肩厚x1112.21.10.070.210.380.08–0.02–0.000.100.44肩宽x1220.71.40.520.410.350.530.480.380.440.30袖长x1375.13.40.770.470.410.790.790.690.790.32肋围x1473.34.20.250.170.640.270.270.140.160.51腰围x1586.33.70.510.350.580.570.510.260.380.51腿肚x1650.12.90.270.160.510.260.230.000.120.38续表ixixⅡ组Ⅰ组指标x9x10x11x12x13x14x15x1636.334.312.120.273.774.485.049.3指标2.12.41.01.32.96.13.93.12.3身长x1164.50.220.320.120.340.710.390.420.150.11坐高x290.00.210.260.170.280.500.280.440.150.17胸围x385.70.430.450.500.130.280.700.620.550.50头高x438.10.190.270.150.390.710.400.410.090.04裤长x596.00.110.28–0.050.010.680.220.25–0.000.02下档x675.50.040.180.020.330.650.140.12–0.08–0.03手长x719.40.190.17–0.090.270.480.240.260.050.01领围x835.8–0.000.210.22–0.040.160.240.310.40前胸x936.0–0.050.06–0.000.220.310.400.29–0.04后背x1034.8–0.340.190.430.370.380.340.300.49肩厚x1112.2–0.160.230.08–0.000.510.400.420.44肩宽x1220.7–0.050.500.240.510.160.14–0.050.30袖长x1375.10.230.340.100.620.310.380.100.32肋围x1473.30.210.150.310.170.260.720.550.51腰围x1586.30.150.290.280.410.500.630.630.51腿肚x1650.10.180.160.310.180.240.500.650.38本应按照的原则来确定主成分的个数,但从表6.4中看到两组的前三个特征根的累积方差贡献率都在70%左右,而自第四个特征根开始,其数值比起前三个来有显著减小(都小于1),所以,我们就取前三个主成分。85.011611iiqiiiiⅠ组Ⅱ组序号累积方差贡献率%累积方差贡献率%l7.03446.163922.61603.035131.66701.286540.84760.997250.778l0.897760.64850.698170.58880.578580.46900.558890.36930.479l100.31950.4094110.24960.2696120.22970.2297130.17990.2198140.14990.1599150.071000.09100160.041000.05100表6.4的特征根与累积方差贡献率表算出所对应的单位化特征向量以
本文标题:降维技术(3)-PCA实例
链接地址:https://www.777doc.com/doc-7302623 .html