您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 电子商务 > 【复旦大学首批FIST项目传播学研究方法讲义】7-回归分析中的定性变量
7、回归分析中的定性变量CategoricalVariablesinRegressionAnalysis张伦复旦大学新闻学院2013年FIST课程·传播研究方法Outline本讲内容当自变量为定性变量时oDummyCodingoWeightedCodingoContrastCoding当因变量为定性变量时oLogistic回归模型•模型构建•数据分析•模型评价•结果解读学习目标了解简单回归中定性自变量的三种转换方法,以及三种转换方法的异同能够运用DummyCoding方法对数据进行分析,并对结果进行正确解读初步掌握Logistic回归模型;能够运用Logistic回归模型分析实际问题,并对分析结果进行正确解读2CategoriesasasetofIndependentVariables第一部分定性/名目变量MakequalitativedistinctionsamongtheobjectstheydescribeoE.g.,Religion,Treatmentgroupsinexperiments,regionofcountry,ethnicgroup,occupation,etc.Groupsaremutuallyexclusiveandexhaustive名目变量各类之间无定量关系,无法分析x变化一个单位时,Y的平均变化以“类”为单位,比较各类别之间对Y的影响。4虚拟变量构建的基本思想用数值表示类别:o用k个取值为0和1的虚拟变量,分别代表各类别的属性。当案例属于一个虚拟变量所代表的类别时,这个虚拟变量就赋值为1,否则就赋值为0.Multicollinearity:K个虚拟变量存在线性相关。o一个虚拟变量取值为1,其他变量一定取值为0;o当其他变量全部取值为0时,该变量一定是1。o因此需要取消一个虚拟变量,而没有损失任何信息。5CodingSystems:RepresentationofCategoricalVariablesRepresentCVquantitatively“asnumbers”.6Dummy-VariableCodingUnweightedEffectsCodingWeightedEffectsCodingContrastCodingComparingwiththeselectedreferencegroupComparingwiththeun-weightedgroupmean,ratherthanaselectedgroup.ComparingwiththeweightedpopulationmeanComparingwithmeansofcombinedgroupsProduceidenticalresultsfortheoveralleffectofthenominalvariable(R,R2,theF/TtestofsignificanceoftheIV)Useg-1codevariablestorepresenttheggroups,eachrepresentingoneaspectofthedistinctionsamongtheggroupsOneofthegroupsmustbedesignatedDummy-variableCoding7A.CatholicasreferencegroupC.JewishasreferencegroupReligionC1.pC2.jC3.oReligionC1.cC2.pC3.oCatholic000Catholic100Protestant100Protestant010Jewish010Jewish000Other001Other001B.ProtestantasreferencegroupD.OtherasreferencegroupReligionC1.cC2.jC3.oReligionC1.cC2.pC3.jCatholic100Catholic100Protestant000Protestant010Jewish010Jewish001Other001Other000C.ccontrastcatholicwiththereferencegroup;C.jcontrastJewishwiththereferencegroup;C.pcontrasttheProtestantwiththereferencegroup.SelectionofReferenceGroupThereferencegroupshouldserveasausefulcomparison(e.g.,acontrolgroup;thegroupexpectedtoscorehighestorlowestonY;astandardtreatment)Forclarityofinterpretationoftheresults,thereferencegroupshouldbewelldefinedandnota“waste-basket”category(e.g.,other)Thereferencegroupshouldnothaveaverysmallsamplesizerelativetoothergroups.8ModelEstimation91122330ˆyBCBCBCBy^:thepredictedvalueofDVB0:theinterceptB1:unstandardizedregressioncoefficientforthefirstdummycodeB2:unstandardizedregressioncoefficientfortheseconddummycodeB3:unstandardizedregressioncoefficientfortheseconddummycodeˆyˆ11()ijyyijBBtSEnnComparisonsoftwogroupsinwhichneitherofthesegroupsisthereferencegroup.SEy-y^:standarderrorofestimateTestandResultInterpretationNofCategoriesCodingSchemeModelH0Interpretation2C1=1,C2=0Y=b0+b1C1b1=0NodifferenceinYbetweencategories1and2.3C1=1,C2=1,C3=0Y=b0+b1C1+b2C2b1=0;b2=0NodifferenceinYbetweencategories1and3;nodifferencebetweencategories2and3.k(3)C1=1,C2=1,…Cg-1=1,Cg=0Y=b0+b1C1+b2C2+…+bg-1Cg-1b1=0;b2=0;…;bg-1=0NodifferenceinYbetweencategories1andg;nodifferencebetweencategories2andg;…;nodifferencebetweencategoriesg-1andg.10VisualizationoftheCoefficients11CYb1b0C=1:Y=b0+b1N.OfCategories=2Y=b0+b1CC=0:Y=b0CYb1b0C1=C2=0:Y=b0b2N.OfCategories=3Y=b0+b1C1+b2C2C1=1&C2=0:Y=b0+b1C1=0&C2=1:Y=b0+b2ModelAssessment12TotestthesignificanceofR2ascomparedtothenilhypothesisK:thenumberofcodevariablesdf=n-k-122(1)F(1)RnkRkPearsonCorrelationsofDummyVariableswithYryi:correlationofeachoftheg-1dummycodeswiththeDVryr:correlationofthereferencegroupwiththeDVPr:proptionofthesampleinthereferencegroupPi:proportionofthetotalsampleinthegroupcoded1oneachdummyvariable13rr(1)P(1P)YiiiYrrPPrCorrelationamongDummy-codedVariablesThecorrelationbetweentwodummycodes(Ci,Cj):SignificanceTestforBivariater:14ijr=-(1)(1)ijijPPPP22with=n-21rntdfrTestsofBetween-SubjectsEffectsDependentVariable:AdultLiteracyRate31697.476a74528.21121.959.000117715.2901117715.290570.851.0006553.14951310.6306.356.000991.2871991.2874.807.0312606.03612606.03612.638.00120414.80599206.210694564.05010752112.282106SourceCorrectedModelInterceptregionIncomePercapitapopulation_growthrateErrorTotalCorrectedTotalTypeIIISumofSquaresdfMeanSquareFSig.RSquared=.608(AdjustedRSquared=.581)a.ParameterEstimatesDependentVariable:AdultLiteracyRate94.9375.54517.121.00083.935105.940-21.5505.243-4.110.000-31.954-11.146-1.3075.303-.246.806-11.8299.216-19.7766.823-2.899.005-33.313-6.238.0055.930.001.999-11.76211.771-11.1237.031-1.582.117-25.0742.8290a......001.0012.193.031.000.002-6.8661.931-3.555.001-10.699-3.034ParameterIntercept[region=1][region=2][region=3][region=4][region=5][region=6]IncomePercapitapopulation_growthrateBStd.ErrortSig.LowerBoundUpperBound95%ConfidenceIntervalThisparameterissettozerobecauseitisredundant.a.案例:IV:区域(Region)、人口增长率(Population_growthrate)、人均收入(IncomePercapita)DV:成人识字率Between-SubjectsFactorsAFR37AMR22EMR8EUR22SEAR7WPR11123456regionValueLabelNParameterEstimatesDependentVariable:AdultLiteracyRate94.9375.54517.121.00083.935105.940-21.5505.243-4.110.000-31.954-11.146-1.3075.303-.246.806-11.8299.216-19.7766.823-2.899.005-33.3
本文标题:【复旦大学首批FIST项目传播学研究方法讲义】7-回归分析中的定性变量
链接地址:https://www.777doc.com/doc-5485879 .html