您好,欢迎访问三七文档
当前位置:首页 > 行业资料 > 国内外标准规范 > Least Angle Regression
LeastAngleRegressionBradleyEfron,TrevorHastie,IainJohnstoneandRobertTibshiraniStatisticsDepartment,StanfordUniversityJanuary9,2003AbstractThepurposeofmodelselectionalgorithmssuchasAllSubsets,ForwardSelection,andBackwardEliminationistochoosealinearmodelonthebasisofthesamesetofdatatowhichthemodelwillbeapplied.Typicallywehaveavailablealargecollectionofpossiblecovariatesfromwhichwehopetoselectaparsimonioussetfortheefficientpredictionofaresponsevariable.LeastAngleRegression(”LARS”),anewmodelse-lectionalgorithm,isausefulandlessgreedyversionoftraditionalforwardselectionmethods.Threemainpropertiesarederived.(1)AsimplemodificationoftheLARSalgorithmimplementstheLasso,anattractiveversionofOrdinaryLeastSquaresthatconstrainsthesumoftheabsoluteregressioncoefficients;theLARSmodificationcal-culatesallpossibleLassoestimatesforagivenproblem,usinganorderofmagnitudelesscomputertimethanpreviousmethods.(2)AdifferentLARSmodificationeffi-cientlyimplementsForwardStagewiselinearregression,anotherpromisingnewmodelselectionmethod;thisconnectionexplainsthesimilarnumericalresultspreviouslyobservedfortheLassoandStagewise,andhelpsunderstandthepropertiesofbothmethods,whichareseenasconstrainedversionsofthesimplerLARSalgorithm.(3)AsimpleapproximationforthedegreesoffreedomofaLARSestimateisavailable,fromwhichwederiveaCpestimateofpredictionerror;thisallowsaprincipledchoiceamongtherangeofpossibleLARSestimates.LARSanditsvariantsarecomputation-allyefficient:thepaperdescribesapubliclyavailablealgorithmthatrequiresonlythesameorderofmagnitudeofcomputationaleffortasOrdinaryLeastSquaresappliedtothefullsetofcovariates.1.IntroductionAutomaticmodel-buildingalgorithmsarefamiliar,andsometimesnoto-rious,inthelinearmodelliterature:ForwardSelection,BackwardElimination,AllSubsetsregression,andvariouscombinationsareusedtoautomaticallyproduce“good”linearmodelsforpredictingaresponseyonthebasisofsomemeasuredcovariatesx1,x2,...,xm.Good-nessisoftendefinedintermsofpredictionaccuracy,butparsimonyisanotherimportantcriterion:simplermodelsarepreferredforthesakeofscientificinsightintothex−yrelation-ship.Twopromisingrecentmodel-buildingalgorithms,theLassoandForwardStagewiselinearregression,willbediscussedhere,andmotivatedintermsofacomputationallysimplermethodcalledLeastAngleRegression.LeastAngleRegression(“LARS”)relatestotheclassicmodel-selectionmethodknown1asForwardSelection,or“forwardstepwiseregression”,describedinSection8.5ofWeisberg(1980):givenacollectionofpossiblepredictors,weselecttheonehavinglargestabsolutecorrelationwiththeresponsey,sayxj1,andperformsimplelinearregressionofyonxj1.Thisleavesaresidualvectororthogonaltoxj1,nowconsideredtobetheresponse.Weprojecttheotherpredictorsorthogonallytoxj1andrepeattheselectionprocess.Afterkstepsthisresultsinasetofpredictorsxj1,xj2,...,xjkthatarethenusedintheusualwaytoconstructak-parameterlinearmodel.ForwardSelectionisanaggressivefittingtechniquethatcanbeoverlygreedy,perhapseliminatingatthesecondstepusefulpredictorsthathappentobecorrelatedwithxj1.ForwardStagewise,asdescribedbelow,isamuchmorecautiousversionofForwardSelection,whichmaytakethousandsoftinystepsasitmovestowardafinalmodel.Itturnsout,andthiswastheoriginalmotivationfortheLARSalgorithm,thatasimpleformulaallowsForwardStagewisetobeimplementedusingfairlylargesteps,thoughnotaslargeasaclassicForwardSelection,greatlyreducingthecomputationalburden.Thegeometryofthealgorithm,describedinSection2,suggeststhename“LeastAngleRegression”.Itthenhappensthatthissamegeometryappliestoanother,seeminglyquitedifferentselec-tionmethodcalledtheLasso(Tibshirani1996).TheLARS/Lasso/Stagewiseconnectionisconceptuallyaswellascomputationallyuseful.TheLassoisdescribednext,intermsofthemainexampleusedinthispaper.Table1showsasmallpartofthedataforourmainexample.AGESEXBMIBP···SerumMeasurements···ResponsePatientx1x2x3x4x5x6x7x8x9x10y159232.110115793.23844.987151248121.687183103.27033.96975372230.59315693.64144.785141424125.384198131.44054.989206550123.0101192125.45244.380135623122.68913964.86124.26897....................................44136130.095201125.24255.18522044236119.671250133.29734.69257Table1.Diabetesstudy.442diabetespatientsweremeasuredon10baselinevariables.Apredictionmodelwasdesiredfortheresponsevariable,ameasureofdiseaseprogressiononeyearafterbaseline.Tenbaselinevariables,age,sex,bodymassindex,averagebloodpressure,andsixbloodserummeasurementswereobtainedforeachofn=442diabetespatients,aswellastheresponseofinterest,aquantitativemeasureofdiseaseprogressiononeyearafterbaseline.Thestatisticianswereaskedtoconstructamodelthatpredictedresponseyfromcovariatesx1,x2,...,x10.Twohopeswereevidenthere,thatthemodelwouldproduceaccuratebaselinepredictionsofresponseforfuturepatients,andalsothattheformofthemodelwouldsuggestwhichcovariateswereimportantfactorsindiseaseprogression.TheLassoisaconstrainedversionofordinaryleastsquares(OLS).Letx1,x2,...,xm2ben-vectorsrepresentingthecovariates,m=10andn=442inthediabetesstudy,andythevectorofresponsesforthencases.Bylocationandscaletransformationswecanalwaysassumethatthecovariateshavebeenstandardiz
本文标题:Least Angle Regression
链接地址:https://www.777doc.com/doc-5317284 .html