您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 电子商务 > 【复旦大学首批FIST项目传播学研究方法讲义】2-数据整理、描述和可视化方法
2、数据整理以及描述性统计分析DataManagementandDescriptiveStatisticsWinsonPeng彭泰权复旦大学2013年FIST课程·传播研究方法OutlineWhatisStatistics?UnivariateStatisticsoCentralTendencyoDispersionoDistributionDataManagementandVisualizationinSPSSoDataInputoDataTransformation•RECODE•COMPUTEoLine,Pie,Histogram,Scatterplot,……WhatIsStatistic?Astatisticisanestimatedvalue,basedonasample,ofaquantitativecharacteristic(i.e.,aparameter)ofthestudypopulation.Adescriptivestatisticisconcernedwithonlyaquantitativecharacteristicofthesample.Aninferentialstatisticisconcernedwiththechancesbywhichasamplestatisticcancorrectlyestimateapopularparameter.KeyTermsPopulation(Parameter)Sample(Statistic)InferentialStatisticDescriptiveStatisticSamplingCalculationEstimationInferenceDescriptiveStatisticsTodescribeothedistribution(mean,percentage,etc.)ofsomevariables(e.g.,age,mediause,knowledge,opinion)ofasample;otheassociation(e.g.,correlation)between/amongvariablesofthesample;othedifferenceinsomedependentvariablesbetween/amonggroupsofanindependentvariable(controlvs.experimentgroups)InferentialStatisticsTohelpdetermine:oifthedistribution/association/differenceofthesamplecanbegeneralizedtothepopulation;oconfidencelevel:thechances(e.g.,95%)thefindingfromasampleistrueinthepopulation;osignificancelevel:theoppositetoconfidencelevel(e.g.,p.05=confidencelevel95%)oconfidenceinterval:therange(precision)ofthepredictionfromsampletopopulation.CausalAnalysisTohelpdetermineoiftheimpactofindependentvariable(s)isstatisticallysignificantonthedependentvariable•T-test:differencebetweentwogroups;•Analysisofvariance(ANOVA):differenceamongthreeormoregroups;•Multipleregression:theimpactofseveralindependentvariablesonadependentvariableUnivariateAnalysisParametersvs.Statistics:SomeExamplesPopulationParameterSampleStatisticMeanμVarianceσ2s2StandardDeviationσsCorrelationrxDescriptiveStatisticsUnivariatestatistics(i.e.,thedistributionofavariableinasample):oCentraltendency:mean,percentage,etc.oDispersion(spread,variability):variance,standarddeviation,etc.oDistribution:skewness,kurtosisBivariate/multivariatestatistics(i.e.,arelationshipbetweentwoormorevariablesinasample):oRelationshipbetweenIVandDV:correlation,regressioncoefficient,etc.oDifferenceinDVbetween/amonggroupsofIV:means,percentages,etc.MeasuresofCentralTendencyIntervalScale:oMeanoMedianoModeNominalScale:oPercentage(binomial)oPercentage(multinomial)Mean(forIndividualCases)nxxnii1ni1[5.1]Mean(forGroupedCases)wherexjisthemedianofthej-thgroup,andfjisthefrequencyofthej-thgroup,andjvaries1tok.nxfxkjjj1[5.2]Example:CalculatingMeanbasedGroupedDataGroupIncomeMedian(xj)Freq.(fj)xjfjLow$1000$500400$200,000Middle$1000-2999$2,000500$1,000,000High$3000+$4,000100$400,000总计1,000$1,600,000600,1$000,1000,600,1$xPercentagewherepjisthepercentageofthej-thgroup,fjisthefrequencyofthegroup,andnisthesamplesize.nfpjj[5.3]MeasuresofDistributionShapeSkewness:thesymmetryofafrequencydistribution.Apositivevalueindicatesarightskewnesswhereasanegativevaluealeftskewness.Thelargerthevalue,themoreasymmetricadistributionis.Kurtosis:thepeakedness(asopposedtoflatness)ofafrequencydistribution.Thelargerthevalue,themoreconcentratedaroundthemeanadistributionis.Five-numbersummary:oMinimumoFirstPercentile(25%)oMedian(50%)oThirdPercentile(75%)oMaximumComparingtheMode,Median,andMeanThreefactorsinchoosingameasureofcentraltendency1.Levelofmeasurement2.Shapeorformofthedistributionofdata3.ResearchObjectiveLevelofMeasurementLevelofmeasurementModeMedianMeanNominalYesOrdinalYesYesIntervalYesYesYesShapeoftheDistributionInsymmetricaldistribution–mode,median,andmeanhaveidenticalvaluesInskeweddata,themeasuresofcentraltendencyaredifferentoSkewnessrelevantonlyattheintervallevelMeanheavilyinfluencedbyextremeoutliersomedianbestmeasureinthissituationResearchObjectiveChoiceofreportedcentraltendencydependsonthelevelofprecisionrequired.Mostpublishedresearchrequiresmedianand/ormeancalculations.Inskeweddata,medianmorebalancedviewForadvancedstatisticalanalyses,meanusuallypreferredInlargedatasets,meanmoststablemeasureTransformationofDataDistributionLineartransformation(e.g.,xi*=a±xi)changesthemeanofadistribution;Nonlineartransformationchangestheshape(skewnessand/orkurtosis)ofadistribution:ologarithmorsquare-rootreducesrightskewness(e.g.,xi*=ln(xi),orxi*=);oanti-logarithmorsquarereducesleftskewness(e.g.,xi*=,orxi*=xi2).ixixeGraphicPresentationofCentralTendencyandDistributionIntervalScaleoFrequencytableoHistogram(withacontinuousscaleforthex-axis)oLine(withtimeonthex-axis)NominalScaleoFrequencytableoBar(withadiscreetscaleforthex-axis)oPie(for6orfewercategories)FrequencyTable(Example)ValueFrequencyPercentageValidPCTCumulativePCT1User25025%28%28%2Potential20020%22%50%3Nonuser45045%50%100%9Missing10010%--Total1,000100%100%“Percentage”isbasedontotalcases;“ValidPercent”isbasedonvalidcases;“CumulativePercent,”alsobasedonvalidcases,combinespreviousandcurrentcategories.MeasuresofDispersionMeasuresofcentraltendencyareanincompletemeasurealoneIndexofhowscoresaredistributedaroundthec
本文标题:【复旦大学首批FIST项目传播学研究方法讲义】2-数据整理、描述和可视化方法
链接地址:https://www.777doc.com/doc-5467621 .html