您好,欢迎访问三七文档
1.DownloaddataReal_Estate.txt.UseRtoanswerthefollowingquestions.data=read.table(D:/R/data/Real-Estate.txt,header=T)attach(data)HypothesisTest:a)ReaddataintoR.Conductatestofhypothesistodetermineifthereisadifferenceinthemeansellingpriceofhomeswithanattachedgarageandhomeswithoutagarage.Usethe0.05significancelevel.garage-Price[Garage==1]nogarage-Price[Garage==0]a=t.test(garage,nogarage);aWelchTwoSamplet-testdata:garageandnogaraget=7.3521,df=95.781,p-value=6.566e-11alternativehypothesis:truedifferenceinmeansisnotequalto095percentconfidenceinterval:38.4902366.96188sampleestimates:meanofxmeanofy238.1761185.4500在95%的置信水平下P值明显小于0.05,拒绝原假设,有车库和没车库的房子的价格有显著差异b)Conductatestofhypothesistodetermineifthereisadifferenceinthevariabilityofthesellingpricesofhomesthathaveaswimmingpoolversusthosethatdonothaveaswimmingpool.Usethe0.02significancelevel.pool-Price[Pool==1]nopool-Price[Pool==0]b=t.test(pool,nopool,conf.level=0.98);bWelchTwoSamplet-testdata:poolandnopoolt=-3.4773,df=100.21,p-value=0.0007506alternativehypothesis:truedifferenceinmeansisnotequalto098percentconfidenceinterval:-48.191980-9.183433sampleestimates:meanofxmeanofy202.7974231.4851在98%的置信水平下P值明显小于0.05,拒绝原假设,有泳池和没泳池的房子的价格有显著差异Regression:a)Writeouttheregressionequation.Givesomeinterpretationstothismodel.y=Price;x1=Bedrooms;x2=Size;x3=Pool;x4=Distance;x5=Township;x6=Garage;x7=Bathsz=data.frame(y,x1,x2,x3,x4,x5,x6,x7)lmz=lm(y~1+x1+x2+x3+x4+x5+x6+x7,data=z);lmzCall:lm(formula=y~.,data=z)Coefficients:(Intercept)x1x2x3x462.248697.375500.03863-19.11144-1.01267x5x6x7-1.7390135.4980223.09255b)DetermineandinterprettheR2value.anova(lmz)AnalysisofVarianceTableResponse:yDfSumSqMeanSqFvaluePr(F)x11504095040945.43011.136e-09***x21995599558.97180.003480**x31147541475413.29640.000430***x41127531275311.49360.001011**x51128312831.15620.284920x61267712677124.12673.656e-06***x71721172116.49900.012360*Residuals971076311110---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1在方差分析中x5对因变量的影响不显著summary(lmz)Call:lm(formula=y~.,data=z)Residuals:Min1QMedian3QMax-84.609-19.391-0.20422.42969.800Coefficients:EstimateStd.ErrortvaluePr(|t|)(Intercept)62.2486940.914041.5210.13140x17.375502.590022.8480.00538**x20.038630.014752.6180.01026*x3-19.111447.12655-2.6820.00861**x4-1.012670.74138-1.3660.17512x5-1.739012.69942-0.6440.52096x635.498027.675844.6251.16e-05***x723.092559.058312.5490.01236*---Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1Residualstandarderror:33.31on97degreesoffreedomMultipleR-squared:0.5336,AdjustedR-squared:0.4999F-statistic:15.85on7and97DF,p-value:1.008e-13Summary中x4,x5未能通过t检验后面模型可以考虑剔除x4,x5R的平方为0.5336,修改后的R的平方为0.4999c)Developacorrelationmatrix.Summarizeyourfindings.Checktheindependentvariablesformulticollinearity.cor(z)yx1x2x3x4y1.00000000.4673771080.37104159-0.294064750-0.3470312x10.46737711.0000000000.38345610-0.005301227-0.1533558x20.37104160.3834561031.00000000-0.200590487-0.1171945x3-0.2940648-0.005301227-0.200590491.0000000000.1393824x4-0.3470312-0.153355767-0.117194500.1393824351.0000000x50.12817550.2001267980.18464617-0.201094525-0.2085930x60.52627390.2341021580.08302732-0.114153335-0.3592949x70.38217260.3289302380.02436486-0.054532583-0.1949930x5x6x7y0.128175520.526273940.38217258x10.200126800.234102160.32893024x20.184646170.083027320.02436486x3-0.20109453-0.11415333-0.05453258x4-0.20859297-0.35929488-0.19499297x51.000000000.056667830.04966964x60.056667831.000000000.22128891x70.049669640.221288911.00000000无明显多重共线性library(car)vif(lmz)x1x2x3x4x5x6x71.4195111.2616341.1098441.2237951.1285411.2207691.187750Vif均小于2,进一步说明无多重共线性d)Conductaglobaltestonthesetofindependentvariables.e)Testeachoftheindependentvariablestodetermineiftheydifferfromzero.f)GivetheANOVAtableforthisregressionmodel.GivesomeexplanationstothisANOVAtable.g)Wouldyouconsiderdeletinganyoftheindependentvariables?Ifso,reruntheregressionanalysisandreportthenewequation.install.packages(leaps)library(leaps)vselect=regsubsets(y~x1+x2+x3+x4+x5+x6+x7,data=z)s=summary(vselect)l=data.frame(s$outmat,RSS=s$rss,R2=s$rsq,cp=s$cp,BIC=s$bic);lx1x2x3x4x5x6x7RSSR2cp1(1)*166853.20.276964349.3725262(1)**137932.30.402289125.3082293(1)***124256.30.461552314.9830494(1)****117782.60.489605311.1487495(1)*****109890.30.52380556.0360116(1)******108091.60.53159976.4150157(1)*******107631.10.53359528.000000BIC1(1)-24.743232(1)-40.076183(1)-46.385974(1)-47.350155(1)-49.978766(1)-47.057647(1)-42.85196这里通过分析RSS,cp和BIC的值可以确定最佳模型,RSS的值越大越好,BIC越小越好,由于第一个模型剔除太多,不具有太高的经济意义,不予考虑通过综合的考虑,选择剔除x4和x5采用逐步回归法对模型进行调整step(lmz,direction=forward)Start:AIC=743.91y~1+x1+x2+x3+x4+x5+x6+x7Call:lm(formula=y~1+x1+x2+x3+x4+x5+x6+x7,data=z)Coefficients:(Intercept)x1x2x3x462.248697.375500.03863-19.11144-1.01267x5x6x7-1.7390135.4980223.09255采用前进法得到的回归模型AIC=743.91,模型没有改变step(lmz,direction=both)Start:AIC=743.91y~1+x1+x2+x3+x4+x5+x6+x7DfSumofSqRSSAIC-x51460.5108092742.36none107631743.91-x412070.2109701743.91-x717211.3114842748.72-x217604.9115236749.08-x317979.8115611749.42-x118997.9116629750.34-x6123731.4131362762.83Step:AIC=742.36y~x1+x2+x3
本文标题:作业-5-R
链接地址:https://www.777doc.com/doc-7160751 .html