您好,欢迎访问三七文档
回归大作业国内旅游消费影响的回归分析一、问题引入我国第三产业发展迅速,在2010年其已占国内生产总值的43.14%,而旅游业在第三产业中占有重要地位,且与餐饮、住宿、休闲、运输等产业联系密切,所以此次分析以探究国内旅游消费的影响为目的,并建立回归模型。二、模型设计运用多元线性模型拟合,若拟合效果不显著,则进行log或平方根变换或使用多项式拟合等其他模型。1、相关性分析,首先确定与因变量有相关性的变量。2、建立全模型多元线性回归,若回归方程F检验未通过,则查找原因、更换模型;若有部分回归系数检验未通过,则进行选元(步骤2),剔除部分变量再继续;若所有检验都良好,则模型初步确立,跳过步骤2。3、运用逐步回归方法筛选变量,并进行t检验,若效果显著,则可初步确立多元线性回归模型;若仍有部分变量未通过检验,则再单独进行变量筛选,综合运用AIC准则等确定剔除变量,直至所有变量都通过t检验。4、回归诊断。进行残差分析,检验残差是否满足正态分布,是否有相关性,也即自变量间是否有自相关性,检验是否存在异常值和强影响值,是否存在异方差性,是否存在多重共线性。若以上问题存在,则需修改模型,或重新筛选变量,或增减样本。5、模型最终确立。三、数据yearincomenumberexpenselevelroadrail199448108.5524195.3320.0111.785.90199559810.5629218.7345.1115.706.24199670142.5640256.2377.6118.586.49199778060.9644328.1394.6122.646.60199883024.3695345.0417.8127.856.64199988479.2719394.0452.3135.176.74200098000.5744426.6491.0140.276.872001108068.2784449.5521.2169.807.012002119095.7878441.8557.6176.527.192003135174.0870395.7596.9180.987.302004159586.81102427.5645.3187.077.442005183618.51212436.1695.2334.527.542006215883.91394446.9761.9345.707.712007266411.01610482.6843.4358.377.802008315274.71712511.0916.8373.027.972009341401.51902535.41001.6386.088.552010403260.02103598.21062.6400.829.12yearairrailtranroadtranshiptranairtrantravel1994104.561087389539402616540391023.51995112.9010274510408102392451171375.71996116.659479711221102289555551638.41997142.509330812045832257356302112.71998150.589508512573322054557552391.21999152.2210016412690041915160942831.92000150.2910507313473921938667223175.52001155.3610515514027981864575243522.42002163.7710560614752571869385943878.42003174.959726014643351714287593442.32004204.94111764162452619040121234710.72005199.85115583169738120227138275285.92006211.35125656186048722047159686229.72007234.30135670205068022835185767770.62008246.18146193268211420334192518749.32009234.511524512779081223142305210183.72010276.511676093052738223922676912579.8数据来源:《中国统计年鉴2011》数据说明:Year:年份。Income:国民总收入,单位亿元。Number:旅游人数。Expense:人均旅游花费,单位元。Level:居民消费水平指数,以1978年为基年。Road:公路里程,单位万公里。Rail:铁路里程,单位万公里。Air:民航里程,单位万公里。Roadtran:公路客运量,单位万人。Railtran:铁路客运量,单位万人。Shiptran:水路客运量,单位万人。Airtran:民航客运量,单位万人。Travel:国内旅游消费总额,单位亿元。四、回归分析1、相关性首先分析相关性,画出散布阵。可较为直观地看出,travel与各变量间有较强的相关性,除了road,和shiptran两项,做相关性检验,可见,travel与road是线性相关的,相关系数为0.93,p-value=4.563e-08,而travel与shiptran不相关,p-value=0.9983,所以可先排除shiptran,再做回归。2、全回归模型直接建立多元回归模型,得结果:Coefficients:EstimateStd.ErrortvaluePr(|t|)(Intercept)-5.972e+033.193e+03-1.8700.110617income2.151e-024.779e-034.5010.004100**number1.039e+001.446e+000.7190.499354expense6.805e+001.124e+006.0520.000922***level-5.815e+001.261e+00-4.6100.003653**road-1.468e+001.019e+00-1.4410.199608rail6.274e+024.462e+021.4060.209292air-4.155e+002.790e+00-1.4900.186935railtran2.524e-028.492e-032.9720.024903*roadtran-4.093e-044.554e-04-0.8990.403410airtran1.058e-011.272e-010.8320.437327---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1Residualstandarderror:84.55on6degreesoffreedomMultipleR-squared:0.9998,AdjustedR-squared:0.9994F-statistic:2462on10and6DF,p-value:5.061e-10其中,R2=0.9998,F检验的p-value:2.632e-08,可见回归模型的检验是成立的,但回归系数并不是全能通过检验,所以应该进行选元。3、选元先进行逐步回归,逐步回归排除了roadtran,number两个变量,以AIC准则为主要判断依据,调整后的AIC值为153.73,达到最小值。再检验一下回归模型:Coefficients:EstimateStd.ErrortvaluePr(|t|)(Intercept)-4.393e+032.102e+03-2.0900.070022.income1.898e-022.320e-038.1793.72e-05***expense7.038e+009.369e-017.5126.85e-05***level-5.427e+001.057e+00-5.1330.000893***road-1.460e+009.339e-01-1.5640.156518rail3.697e+022.865e+021.2900.232935air-3.589e+002.496e+00-1.4380.188431railtran2.166e-026.843e-033.1650.013295*airtran2.032e-015.464e-023.7190.005879**---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1Residualstandarderror:78.95on8degreesoffreedomMultipleR-squared:0.9997,AdjustedR-squared:0.9994F-statistic:3529on8and8DF,p-value:2.252e-13可见回归模型改善,自由度调整负相关系数达到了0.9994,有所提高,这与AIC准则的判断相符,而回归系数的检验也有所好转,但仍然有road,rail,air通不过检验。若去掉一个变量回归,可见:DfSumofSqRSSAICnone49866153.73income1416943466809189.75expense1351763401629187.19level1164237214103176.50road11524165107156.26rail11038060246154.94air11288662752155.63railtran162438112303165.53airtran186215136081168.79去掉rail,AIC增加最小,同时RSS增加最小,而回归方程系数检验:Coefficients:EstimateStd.ErrortvaluePr(|t|)(Intercept)-1.773e+035.648e+02-3.1400.011936*income1.935e-022.386e-038.1121.98e-05***expense7.977e+006.116e-0113.0433.77e-07***level-5.126e+001.069e+00-4.7970.000978***road-2.214e+007.550e-01-2.9330.016676*air-5.129e+002.272e+00-2.2570.050398.railtran1.495e-024.613e-033.2410.010144*airtran2.603e-013.323e-027.8322.62e-05***只有air一项在a=0.05的情况下是不能通过检验的,若排除air,则:Coefficients:EstimateStd.ErrortvaluePr(|t|)(Intercept)-2.450e+035.683e+02-4.3100.00154**income1.834e-022.782e-036.5936.13e-05***expense7.465e+006.742e-0111.0726.21e-07***level-5.389e+001.261e+00-4.2730.00163**road-2.381e+008.921e-01-2.6690.02355*railtran1.933e-024.970e-033.8890.00301**airtran2.451e-013.864e-026.3438.42e-05***所有回归系数通过检验,回归模型初步确立。4、回归诊断计算得出残差,进行W正态性检验,得到p-value=0.9066,不能拒绝正态性假设。而回归值与标准化残差的残差图为:从图中也可看出,残差分布均匀且无规律,所以线性回归的基本假设满足,且没有自相关性。而再看:综合看上面四幅图,11和15号观测值可能为强影响值,但产生原因还需要探究,可能是统计过程上的,亦可能是分析方法上的,去掉后回归效果减弱,所以暂不剔除。再检
本文标题:回归分析大作业
链接地址:https://www.777doc.com/doc-7305838 .html