logistic-regression(用R语言的logistic回归分析)

Logisticregression(withR)ChristopherManning4November20071TheoryWecantransformtheoutputofalinearregressiontobesuitableforprobabilitiesbyusingalogitlinkfunctiononthelhsasfollows:logitp=logo=logp1−p=β0+β1x1+β2x2+···+βkxk(1)Theoddscanvaryonascaleof(0,∞),sothelogoddscanvaryonthescaleof(−∞,∞)–preciselywhatwegetfromtherhsofthelinearmodel.Forareal-valuedexplanatoryvariablexi,theintuitionhereisthataunitadditivechangeinthevalueofthevariableshouldchangetheoddsbyaconstantmultiplicativeamount.Exponentiating,thisisequivalentto:1elogitp=eβ0+β1x1+β2x2+···+βkxk(2)o=p1−p=eβ0eβ1x1eβ2x2···eβkxk(3)Theinverseofthelogitfunctionisthelogisticfunction.Iflogit(π)=z,thenπ=ez1+ezThelogisticfunctionwillmapanyvalueoftherighthandside(z)toaproportionvaluebetween0and1,asshowninﬁgure1.Noteacommoncasewithcategoricaldata:Ifourexplanatoryvariablesxiareallbinary,thenfortheonesthatarefalse(0),wegete0=1andthetermdisappears.Similarly,ifxi=1,eβixi=eβi.Soweareleftwithtermsforonlythexithataretrue(1).Forinstance,ifx3,x4,x7=1only,wehave:logitp=β0+β3+β4+β7(4)o=eβ0eβ3eβ4eβ7(5)TheintuitionhereisthatifIknowthatacertainfactistrueofadatapoint,thenthatwillproduceaconstantchangeintheoddsoftheoutcome(“Ifhe’sEuropean,thatdoublestheoddsthathesmokes”).LetL=L(D;B)bethelikelihoodofthedataDgiventhemodel,whereB={β0,...,βk}aretheparametersofthemodel.Theparametersareestimatedbytheprincipleofmaximumlikelihood.Technicalpoint:thereisnoerrorterminalogisticregression,unlikeinlinearregressions.1Notethatwecanconvertfreelybetweenaprobabilitypandoddsoforaneventversusitscomplement:o=p1−pp=oo+11Logisticfunction-6-4-202460.00.20.40.60.81.0Figure1:Thelogisticfunction2BasicRlogisticregressionmodelsWewillillustratewiththeCedegrendatasetonthewebsite.cedegren-read.table(cedegren.txt,header=T)Youneedtocreateatwo-columnmatrixofsuccess/failurecountsforyourresponsevariable.Youcannotjustusepercentages.(Youcangivepercentagesbutthenweightthembyacountofsuccess+failures.)attach(cedegren)ced.del-cbind(sDel,sNoDel)Makethelogisticregressionmodel.Theshortersecondformisequivalenttotheﬁrst,butdon’tomitspecifyingthefamily.ced.logr-glm(ced.del~cat+follows+factor(class),family=binomial(logit))ced.logr-glm(ced.del~cat+follows+factor(class),family=binomial)Theoutputinmoreandlessdetail:ced.logrCall:glm(formula=ced.del~cat+follows+factor(class),family=binomial(logit))Coefficients:(Intercept)catdcatmcatncatvfollowsP-1.3183-0.16930.17860.6667-0.76750.9525followsVfactor(class)2factor(class)3factor(class)40.53411.27041.04801.3742DegreesofFreedom:51Total(i.e.Null);42ResidualNullDeviance:958.7ResidualDeviance:198.6AIC:446.1summary(ced.logr)Call:glm(formula=ced.del~cat+follows+factor(class),family=binomial(logit))DevianceResiduals:Min1QMedian3QMax2-3.24384-1.343250.049541.014886.40094Coefficients:EstimateStd.ErrorzvaluePr(|z|)(Intercept)-1.318270.12221-10.7872e-16catd-0.169310.10032-1.6880.091459catm0.178580.089521.9950.046053catn0.666720.096516.9084.91e-12catv-0.767540.21844-3.5140.000442followsP0.952550.0740012.8722e-16followsV0.534080.056609.4362e-16factor(class)21.270450.1032012.3102e-16factor(class)31.048050.1035510.1222e-16factor(class)41.374250.1015513.5322e-16(Dispersionparameterforbinomialfamilytakentobe1)Nulldeviance:958.66on51degreesoffreedomResidualdeviance:198.63on42degreesoffreedomAIC:446.10NumberofFisherScoringiterations:4ResidualdevianceisthediﬀerenceinG2=−2logLbetweenamaximalmodelthathasaseparateparameterforeachcellinthemodelandthebuiltmodel.Changesinthedeviance(thediﬀerenceinthequantity−2logL)fortwomodelswhichcanbenestedinareductionwillbeapproximatelyχ2-distributedwithdofequaltothechangeinthenumberofestimatedparameters.Thusthediﬀerenceindeviancescanbetestedagainsttheχ2distributionforsigniﬁcance.Thesameconcernsaboutthisapproximationbeingvalidonlyforreasonablysizedexpectedcounts(aswithcontingencytablesandmultinomialsinSuppes(1970))stillapplyhere,butwe(andmostpeople)ignorethiscautionandusethestatisticasaroughindicatorwhenexploringtoﬁndgoodmodels.We’reusuallymainlyinterestedintherelativegoodnessofmodels,butnevertheless,thehighresidualde-vianceshowsthatthemodelcannotbeacceptedtohavebeenlikelytogeneratethedata(pchisq(198.63,42)≈1).However,itcertainlyﬁtsthedatabetterthanthenullmodel(whichmeansthataﬁxedmeanprobabilityofdeletionisusedforallcells):pchisq(958.66-198.63,9)≈1.Whatcanweseefromtheparametersofthismodel?catdandcatmhavediﬀerenteﬀects,butbotharenotveryclearlysigniﬁcantlydiﬀerentfromtheeﬀectofcata(thedefaultvalue).Allfollowingenvironmentsseemdistinctive.Forclass,allofclass2–4seemtohavesomewhatsimilareﬀects,andwemightmodelclassasatwowaydistinction.Itseemslikewecannotproﬁtablydropawholefactor,butwecantestthatwiththeanovafunctiontogiveananalysisofdeviancetable,orthedrop1functiontotrydroppingeachfactor:anova(ced.logr,test=Chisq)AnalysisofDevianceTableModel:binomial,link:logitResponse:ced.delTermsaddedsequentially(firsttolast)DfDevianceResid.DfResid.DevP(|Chi|)NULL51958.66cat4314.8847643.796.690e-673follows2228.8645414.932.011e-50factor(class)3216.3042198.631.266e-46drop1(ced.logr,test=Chis

logistic-regression(用R语言的logistic回归分析)

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

服装公司内部培训教材实战篇（DOC+50页）

机械工程测试。信息。信号分析(第三版)7ppt

某某股份有限公司-天然气化工行-部门岗位梳理报告（PPT90页）

企业信用评价管理办法(粤交基【XXXX】1755号)

供春实业公司产业定位与竞争战略研究

绩效管理咨询式培训(赵安信)

房产项目管理方案（DOC 7页）

表面贴装技术

第十七章杠杆企业的估价与资本预算

深圳市缤纷商业中心商铺租赁合同--anton70

相关文档

相关搜索

logistic-regression(用R语言的logistic回归分析)

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

服装公司内部培训教材实战篇（DOC+50页）

机械工程测试。信息。信号分析(第三版)7ppt

某某股份有限公司-天然气化工行-部门岗位梳理报告（PPT90页）

企业信用评价管理办法(粤交基【XXXX】1755号)

供春实业公司产业定位与竞争战略研究

绩效管理咨询式培训(赵安信)

房产项目管理方案（DOC 7页）

表面贴装技术

第十七章 杠杆企业的估价与资本预算

深圳市缤纷商业中心商铺租赁合同--anton70

相关文档

相关搜索

第十七章杠杆企业的估价与资本预算