您好,欢迎访问三七文档
当前位置:首页 > 行业资料 > 国内外标准规范 > Part 2 - The simple linear regression model
1Part2–ThesimplelinearregressionmodelSTA503Outline2.1Galton’scontribution2.2Modelingtherelationshipbetweentwovariables2.3Thesimplelinearregressionmodel2.4Themethodofleastsquares2.5Propertiesoftheleast-squaresestimators2.6Fittedvaluesandresiduals2.7Estimationof2σ2.8Estimatedvariancesof0βand1β2.9FittingsimplelinearregressionlinesinSAS2.10Thesimplelinearregressionmodelwithnormalerrors2.11Methodofmaximumlikelihoodestimation2.12Propertiesofmaximumlikelihoodestimators2.13Asimulationstudy2.1Galton’scontribution-SirFrancisGaltoniscreditedwithregressionalthoughsomecontroversyexistsinthismatter.-Hemadesignificantcontributionasabiologist,andintheareaofeugeneics(“plannedbreedingofhumans”).-Hisworkincorrelationandregressionwasmotivatedbyhisstudyofthesweetpeaplant.-Hechosethesweetpeabecausethespeciesself-fertilizesandthereforesimplifiedtheaspectofgeneticcontribution.-Hisinsightscamefromexaminationofplots/tabulationsofthesizeofdaughterpeasagainstthesizeofmotherpeas.-Source:Galton(1886).2-Inhispresentationofthedata,ahand-drawnlinewasaddedtotheplotwhichfitreasonablywell.Thislinehadanincreasingslopeof0.33.-Henotedthatanaturalprocessworkedto“dampen”extremeoutliers.-Hecoinedtheterm“reversion”andphase“regressiontowardsmediocrity”;Laterthisobservationwascalled“regressiontothemean”.-Similarworkwasdoneinregardstothephysicalattributesofhumans.-Source:Galton(1886).-MuchofhiscontributiontowardsstatisticswaspublicizedbyhisstudentKarlPearson.-Source:PearsonandLee(1903).32.2Modelingtherelationshipbetweentwovariables-Insomesituations,scientifictheorymaysuggestthattwovariablesXandYarefunctionallyrelated,ex.()YgX=.-MoreoftentherelationshipbetweenXandYmustbeestimatedfromempiricalevidence:1122(,)(,)(,)nnXYXYXYorequivalentdenoted,(,)iiXY1,2,...,in=,wherenrepresentsthesamplesize.-Astatisticalmodelassumesaparticularformfortherelationshipbetweenthepredictorandresponse,()iiiYgXε=+.-Theabovemodelispredicatedonrepresentingthedatawithamodelwhichincludescomponentsrepresentingbothsystematicandrandomsourcesofvariation.-Thedeterministic(orsystematic)partiscapturedby()igXandtherandompartiscapturedbyiε.-ItisunrealistictothinkthattheobservedvaluesofYwillbeperfectlyrelatedtotheobservedvaluesofXthoughthefunction()g⋅.Thetermiεconveysthefactthattherewillnotbeaperfectrelationship.-Inthecontextoftheabovestatisticalmodel,thevariabledenotedYistheresponsevariable;alsoreferredtoastheoutcome,ordependentvariable.-ThevariabledenotedXistheexplanatoryvariable;alsoreferredtoasthepredictorvariable,covariate,orindependentvariable.-Wemustbecarefultodistinguishbetweentheresponseandexplanatoryvariablesinthatourregressionmodelsdonottreatthetwovariablessymmetrically.-Specifically,wemodeltheprobabilitydistributionofYforeachvalueofX,thatis,howthedistributionofYvarieswithX.-WhatcharacteristicsofthedistributionofYmaydependonthevalueofX?-Location(ex.doesthemeanofYvarywithX?)-Spread(ex.standarddeviation)-Shape(ex.skewness)-Wemustdecideinadvancewhichvariablewillbecalled“Y”andwhich“X”.-Examplesofstatisticalmodelswewillconsiderbasedonasinglepredictorinclude-Thestraightlinemodel,01iiiYXββε=++,-Thestraightlinemodelwithtransformedpredictor,01log()iiiYXββε=++,-Thequadraticmodel,2012iiiiYXXβββε=+++.4-Notethatallstatisticalmodelsfittodataarewrongandareonlyapproximationstoreality.-“Allmodelsarewrong,butsomeareuseful”-GeorgeBox-WhatuseisamodelfortherelationshipbetweenXandY?-Description:Amodelcanprovideasimple,compactsummaryoftherelationshipwhichcouldalsobeusedforgraphicalvisualization.-Interpretationandunderstanding:Theparametervaluesandformofthemodelmayprovideinsightintotherelationship.-Prediction:AmodelfortherelationshipofYtoXmayallowpredictionofYvaluegivenX,anditsuncertainty.-Regressionvs.causation.-RegressionmayestablishastatisticalrelationshipbetweenYandX.-RegressiondoesnottellusifXcausesYorYcausesX.2.3Thesimplelinearregressionmodel-LetusconsiderthesimplelinearregressionmodelfordescribingtherelationshipbetweenXandY:01iiiYXββε=++,1,2,...,in=,whereiYistheresponsecorrespondingtotheithobservation,0βand1βaretheregressioncoefficients,iXistheknown(fixed)predictorvariableassociatedwiththeithobservation,andiεisarandomvariablesuchthat()0iEε=,2()iVarεσ=,andijεε⊥forallij≠.-ThereforetheabovemodelviewsthevalueofiYasafunctionoftheknownconstant01iXββ+andtheunknownrandomvariableiε.-Note010101()()()()()iiiiiiEYEXEEXEXββεββεββ=++=++=+and012()()().iiiiVarYVarXVarββεεσ=++==5-So,alternatively,wemaystatethemodelasbeingthatiYisarandomvariablesuchthat-Themeanis01iXββ+,-Thevarianceis2σ,and-AlltheiY’sareindependentofeachother.-Asseenabove,weassumeaconstantvarianceacrossalltheiX’s.-Thisassumptionisprimarilyforinferenceanddoesnoteffecthowwefitthemodel.-GiventherelationshipbetweenXandYislinear,aperfectfitisequivalentto0iε=foralli,thatis,20σ=.-Theregressionfunctionisdefinedas01()iiEYXββ=+.-Thisimpliesthegraphof()iEYversusiXisastraightline.-Thisrelationshipmaybeexactbutisusuallyapproximate.-Regressioncoefficientinterpretation:1β=“slope”,changein()EYwhenXchangesby1unit0β=“intercept”,(|0)EYX=-Notinterestingunless0isameaning
本文标题:Part 2 - The simple linear regression model
链接地址:https://www.777doc.com/doc-3298365 .html