您好,欢迎访问三七文档
VariableSelectionandBayesianModelAveraginginCase-ControlStudiesValerieViallefontINSERM,FranceAdrianE.RafteryUniversityofWashingtonSylviaRichardsonINSERM,FranceTechnicalReportno.343DepartmentofStatisticsUniversityofWashingtonBox354322Seattle,WA98195-4322,USA.November1998AbstractCovariateandconfounderselectionincase-controlstudiesismostcommonlycarriedoutusingeitheratwo-stepmethodorastepwisevariableselectionmethodinlogisticregression.Inferenceisthencarriedoutconditionallyontheselectedmodel,butthisignoresthemodeluncertaintyimplicitinthevariableselectionprocess,andsounderestimatesuncertaintyaboutrelativerisks.Wereportonasimulationstudydesignedtobesimilartoactualcase-controlstudies.Thisshowsthatp-valuescomputedaftervariableselectioncangreatlyoverstatethestrengthofconclusions.Forexample,foroursimulatedcase-controlstudieswith1,000subjects,ofvariablesdeclaredtobe\signi cantwithp-valuesbetween.01and.05,only49%actuallywereriskfactorswhenstepwisevariableselectionwasused.WeproposeBayesianmodelaveragingasaformalwayoftakingaccountofmodelun-certaintyincase-controlstudies.Thisyieldsaneasilyinterpretedsummary,theposteriorprobabilitythatavariableisariskfactor,andoursimulationstudyindicatesthistoberea-sonablywellcalibratedinthesituationssimulated.Themethodsareappliedandcomparedinthecontextofapreviouslypublishedcase-controlstudyofcervicalcancer.Contents1Introduction12MATERIALSANDMETHODS22.1BayesianModelAveraging.................................22.1.1GeneralPrinciples.................................22.1.2BMAInferenceforanAdjustedRelativeRisk..................42.2StandardMethods.....................................62.2.1TheTwo-StepProcedure.............................72.2.2StepwiseBackwardsSelection...........................72.3DesignoftheSimulationStudy..............................72.3.1BasisfortheSimulationStudyDesign:TheCase-ControlStudiesinAJE,199672.3.2TheVariables....................................83RESULTS93.1PosteriorProbabilitiesandp-Values...........................103.1.1StandardMethods.................................103.1.2BayesianModelAveraging.............................123.2EstimationoftheCoe cients...............................134APPLICATION144.1Classicalanalyses......................................144.2BayesianModelAveraging.................................155DISCUSSION17Appendix:ReducingtheSetofModels21References21ListofTables1DistributionofthevaluesofOR’sofprimaryinterestfoundin49case-controlstudies,forsmallerandlargerstudiesseparately.........................92Designofthesimulations.................................103Two-StepMethod:Nominalsigni cancelevelsandproportionsofvariablesactuallyassociatedwiththeoutcome................................114StepwiseMethod:Nominalsigni cancelevelsandproportionsofvariablesactuallyassociatedwiththeoutcome................................125BayesianModelAveraging:Posteriorprobabilitiesandproportionsofvariablesac-tuallyassociatedwiththeoutcome............................126EstimationofLogisticRegressionCoe cients:SumsofSquaredErrors........137Cervicalcancerstudy:Adjustede ectspublishedin[9]................148Cervicalcancerstudyanalyses..............................16i1IntroductionCase-controlstudies([1],[2])representahighproportionofepidemiologicalpractice.Forexample,atleast49suchstudieswerepublishedintheAmericanJournalofEpidemiologyalonein1996(seebelow).Theaimofcase-controlstudiesistotesttheexistenceofpossibleriskfactorsofinterest,andtoestimatetheirassociationwiththepresenceorabsenceofadisease,afteradjustingforpossibleconfounders.Asampleofn1casesandn2controlsistaken,whereoftenn2isroughlyanintegermultipleofn1(n1 n2iscommon).Althoughthesampleisdrawnbasedonthediseaseoutcome,thesamplingplanismuchmoree cientthanrandomsampling.Remarkably,consistentandnear-optimalestimatesofadjustedrelativeriskscanbeobtainedifthemodelusedislogisticregression,namelylog Pr(Y=1)Pr(Y=0) = 0+ 1X1+ 2X2+:::+ qXq;(1)whereYis1ifthediseaseispresentand0ifitisabsent,X1isadichotomousriskfactorofinterest,X2;:::;Xqareconfounders,and 0; 1;:::; qareregressionparameters.Anattractionofthismodelisthattheadjustedrelativerisk,exp( 1),isthesameforallvaluesoftheconfounders,anddoesnotinvolveanycoe cientsotherthan 1.Thisgreatlyfacilitatesreportingandinterpretationoftheresults.Thisadjustedrelativeriskcanbeapproximatedbytheoddsratio,namelyOdds(Y=1jX1=1;X2;:::;Xq)Odds(Y=1jX1=0;X2;:::;Xq);(2)whereOdds=Probability/(1 Probability).Thechoiceoftheconfounders,X2;:::;Xq,toincludeisamajorissue.Inmoststudies,thepotentialconfoundersarenumerous,includingdemographic,socioeconomic,familialdis-easehistory,smokingandotherlifestylevariables,aswellasmedicalmeasurements;oftensomethinglike20to50suchvariablesareinitiallyconsidered.Itisvitalnottoomitim-portantconfounders,whichwouldpointtowardsincludingallconfoundersconsidered,butdoingsotendstoleadtoine cientestimation,bothintheory(e.g.[3],chap.9)andinpractice[4].Thus,investigatorshavetendedtousestatisticalmethodstochooseamongthemanyconfoundersindicatedbysubstantiveconsiderations.Thetwomostc
本文标题:Variable selection and Bayesian model averaging in
链接地址:https://www.777doc.com/doc-3335727 .html