您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 企业财务 > 第十四章-多变量数据的统计描述与推断(II)
多变量数据的统计分析与统计推断(二)陈炳为单变量多变量统计描述均数均数向量方差协方差阵相关系数相关矩阵统计推断单变量t检验HotellingT2检验单变量方差分析多变量方差分析单变量与多变量之间的比较多变量数据的统计描述和统计推断在描述与表达时采用了多变量的向量和矩阵的表示方法。三多组比较1多变量方差分析通过g个均数向量推断是否成立。其备择假设H1:g个组中至少有两个均向量不相等。gXXX,,,21gH210:方差分析单变量分析两两比较:LSD\SNK\描述固定与随机效应方差齐性检验B-F检验Welch检验95%ConfidenceIntervalforMeanNMeanStd.DeviationStd.ErrorLowerBoundUpperBoundMinimumMaximum1362.2671.97321.139257.36567.16860.063.62442.5259.35114.675527.64557.40532.153.43573.2202.32211.038570.33776.10370.075.9Total1260.25014.77624.265550.86269.63832.175.9TestofHomogeneityofVariancesT3LeveneStatisticdf1df2Sig.8.19229.009ANOVAT3SumofSquaresdfMeanSquareFSig.BetweenGroups2110.00821055.00432.553.000WithinGroups291.682932.409Total2401.69011RobustTestsofEqualityofMeansT3Statistic(a)df1df2Sig.Welch35.33225.064.001Brown-Forsythe32.78423.635.005aAsymptoticallyFdistributed.T3Subsetforalpha=.05GN1232442.5251362.2673573.220Student-Newman-Keuls(a,b)Sig.1.0001.0001.000Meansforgroupsinhomogeneoussubsetsaredisplayed.aUsesHarmonicMeanSampleSize=3.830.bThegroupsizesareunequal.Theharmonicmeanofthegroupsizesisused.TypeIerrorlevelsarenotguaranteed.正态性、方差不齐时对应的统计分析方法资料转换秩和检验稳健估计Welch(F’)Brown-Forsythe多变量方差分析的主要思想:将实验结果的总离差平方和SS总分解为SS组间和SS组内两部分,其中SS总,SS组间和SS组内矩阵表示。求得WilksLambda统计量计算F值,作出统计推断。*例14-4将患慢性胃炎的儿童随机分为3组,其中I组、II组为治疗组,另一组为对照组。试比较治疗药T细胞免疫功能(外周血T3,T4,T8细胞百分比)的影响。表14-5是其中部分儿童的T细胞免疫功能的测量结果。试比较三组慢性胃炎儿童T细胞免疫功能是否存在差异?编号治疗I组编号治疗II组编号对照组T3T4T8T3T4T8T3T4T8163.630.231.2153.422.525.0172.442.529.9260.030.033.4246.520.014.6275.049.529.3363.235.327.9338.125.918.1375.930.040.0432.112.111.8470.032.036.4572.836.733.1表14-5三组慢性胃炎儿童的T细胞免疫攻能(%)1作出检验假设,确定检验水准,即三组药物对T细胞免疫功能的影响是相等的。H1:三个均向量不都相等。gH210:05.0表14-4多变量方差分析表变异来源自由度离均差平和矩阵总变异H+E组间g-1组内11giin))((1XXXXnHigiiigngii1igiiSnE)1(12计算统计量Λ*及FH--groupSSCPE—Residual(error)SSCP182.6582.7970.12,183.10433.10725.17,275.3275.1017.2558.27558.30250.60740.33140.38220.73,375.17125.20525.42,833.30833.31267.625,4,3321321321XXXXXXXXXXnnn计算均数:042.638848.671693.1142848.671703.727108.1239693.1142108.1239008.2110182.6582.7970.12182.6582.7970.125183.10433.10725.17183.10433.10725.174275.3275.1017.2275.3275.1017.23H求得组间离均差平方和矩阵H:907.193639.81463.122639.81545.375857.103463.122857.103683.291)1(283.20387.33932.0387.33523.63801.3932.0801.3392.5,483.22461.22267.42461.22469.34813.27267.42813.27443.87,663.7737.7033.4737.7023.9607.2033.4607.2893.3321iiSnESSS949.831209.590156.1265209.590248.1103965.1342156.1265965.1342691.2401EH计算组内离均差平方和计算总离均差平方和矩阵变异来源自由度离均差平方和矩阵总变质11组间2组内9949.831209.590156.1265209.590248.1103965.1342156.1265965.1342691.2401EH907.193639.81463.122639.81545.375857.103463.122857.103683.291E042.638848.671693.1142848.671703.727108.1239693.1142108.1239008.2110H表14-6多变量方差分析表0887.0100702.1104961.9949.831209.590156.1265209.590248.1103965.1342156.1265965.1342691.2401907.193639.81463.122639.81545.375857.103463.122857.103683.291||||86*EHE计算统计量Λ*表14-7常见情况下与F值的关系**11ggnFi**111ggnFi**111mmnFi**12mmnFignvgvi21,1)1(2),1(221gnvgvi1,21mnvmvi)2(2,221mnvmvi*反应变量数组数转换关系F分布自由度m=1g≥2M=2g≥2m≥1G=2m≥1G=314)23543(2,6249.5089.0089.01323121221**mmmnFi由表14-7可知,两组均数向量作假设检验时,除HotellingT2外,还可用多变量方差分析。m=3,g=3,代入表14-7的第四个公式得:01.0,46.4)14,6(01.0PF3确定P值,作出结论查F值表,拒绝H0,接受H1.认为三个组慢性胃炎儿童T细胞免疫功能有差别。从三组的均数向量可看出,两个治疗组的T细胞免疫功能均比对照组低。SPSS操作dataex14_5;if_n_4thenc=1;elseif_n_8thenc=2;elsec=3;inputidT3T4T8@@;cards;163.630.231.2260.030.033.4……1272.836.733.1;procglm;classc;modelT3T4T8=c;manovaH=c/printeprinth;lsmeansc/stderrpdiff;Run;统计量(1)Wilks=det(E)/det(H+E)(2)Pilai迹=trace(H(H+E)-1)(3)Hotelling-Lawley迹=trace(E-1H)(4)R最大特征根=E-1H的最大特征根*MANOVATestCriteriaandFApproximationsfortheHypothesisofNoOverallcEffectH=TypeIIISSCPMatrixforcE=ErrorSSCPMatrixS=2M=0N=2.5StatisticValueFValueNumDFDenDFPrFWilks'Lambda0.088735385.506140.0041Pillai'sTrace1.049164542.946160.0394Hotelling-LawleyTrace8.715404369.7767.78950.0028Roy'sGreatestRoot8.5332872122.76380.0003四多变量与单变量分析多变量与对m个反应变量进行一次假设检验,对组间的差别作出推论。在大多数情况下,多变量假设检验结论与m次单变量假设检验的结论是一致的。即多变量假设检验拒绝H0,m次单变量假设检验至少有一次拒绝H0。(1)假定有k个样本均数向量,对每两个样本均数向量间的差别都作F检验,可作k!/2!(k-2)!次比较,每个样本均数向量都重复比较了k-1次,如果仍以为临界值,其Ⅰ型错误的概率会远远超过0.05。)1,(05.021mnnmF理论上,单变量假设检验不能代替多变量假设检验若有3个样本均数向量的比较,共作3次F检验,若每次比较的检验水准=0.05,则每次不犯第一类错误的概率为(1-0.05)=0.95,则正确接受全部3次无效假设的概率为0.953=0.857,这时犯第一类错误的概率为1-0.857=0.143。因此,两两比较时,不宜用前面所述的F检验。(2)单变量假设检验只能说明某一变量在数轴分布上的组间差别,不能反应多个变量在平面或空间上的差别,两者的意义不同。表14-8两组新生儿出生时的体重与身长数据编号A组编号B组体重(Kg)身长(Cm)体重(Kg)身长(Cm)13.104614.106023.205023.504833.506233.355043.004643.354953.856753.204863.154863.555073.004673.506083.505583.6056均数3.2952.503.5252.63方差0.308.110.275.21P(t检验)1.62(0.13)0.04(0.97)HotellingT2T2=9.87,F=4.58,P=0.03706050
本文标题:第十四章-多变量数据的统计描述与推断(II)
链接地址:https://www.777doc.com/doc-5689623 .html