您好,欢迎访问三七文档
基础统计学BasicStatisticsBasicStatistics2数据类型TypesofData数据Data计量型Variable计数型Attribute·离散性Discrete·连续性Continuous•名义性•序数性•OrdinalNominalBasicStatistics3计量值和计数值VariableandAttributeData定量的数据叫做计量值。这些可测量的数据往往是用来回答如“多长”,“多少体积”,“多少时间”此类问题的(Datathatquantitativeiscalledvariabledata.Themeasureddatathatareanswerstoquestionslike”howlong”“whatvolume”&“howmuchtime”.)定性的数据叫做计数值。这些数据是用来回答如“多少”,“多久一次”此类问题(Whiledatathatisqualitativeiscalledattributedata.Counteddataareanswerstothequestionsof“howmany”or“howoften”)例子:上班时间和迟到记录(Example:Travelingtowork)Daytime18.0527.5538.1547.45DayLateOntime1v2v3v4vVariabledataAttributedataDataVariableAttribute•Discrete•Continuous•Ordinal•NominalBasicStatistics4连续性和离散性ContinuousandDiscreteDataVariableAttribute•Discrete•Continuous•Ordinal•Nominal连续变量(ContinuousVariable)-能够将刻度尺寸划分地更精确(Infinitelydivisiblescaleintodecimalorcontinuum.)-数据通过测量获得(Dataobtainedbymeasuring.)-数据可在趋势图中展示出来(Datadisplayedintrendchart.)-比如:温度,时间和内径直径(E.g.temperature,timeandIDdiameter)离散变量(DiscreteVariable)-在某一指定区域内不能再分,数据通过计算获得(cannotbeplottedonaninfinitelydivisiblescale.dataobtainedbycounting.)-数据可以通过直方图来表示(DataBarchart.)-例如:将缺陷的数目或零件故障的数目细分为1.5便没有任何意义(E.g.subdivisionsarenotmeaningfulasnumberofdefectsornumberofpartfailures1.5)BasicStatistics5名义性和序数性NominalandOrdinalDataVariableAttribute•Discrete•Continuous•Ordinal•Nominal名义性(Nominal)-直观的,如:男性和女性以及准时和迟到(Categoricale.gMaleandFemale&on-timeandlate.)-允许没有次序的计划安排(Noorderingschemeispossible)-没有可比较性(Noonevalueisgreaterthananother)-例如:一箱盒子包含下面的颜色:(E.g.Aboxofcassettecontainedthefollowingcolors:)绿色(Blue)17黄色(Yellow)11黑色(Black)10序数性(Ordinal)-直观和有序的(Categoricalandorderable).-如:某种饮料的等级为1到10,如10作为最高等级必然为大部分人们所选.(E.g.Therateofthesoftdrink;1to10,10indicateshighergradingformostpeoplechoose.)-又如:产品的缺陷数目被做划分如下:(Productdefectsaretabulatedasfollows)A1’16B132C942BasicStatistics6统计学Statistics统计学是一门讲述通过对数据的收集,陈述,分析,诠释进行一系列处理以用于决策及解决问题的分支科学.(Statisticsisthebranchofsciencethatdealswiththecollection,presentation,analysis&interpretationofdataforthepurposeofdecision-makingandproblem-solving.)统计学作为品质改善上一项重要的技术,可用于描述和理解可变性.(Statisticsisacriticalskillinqualityimprovementasstatisticaltechniquescanbeusedtodescribeandtounderstandvariability.)BasicStatistics7总体和样本(PopulationvsSample)总体(Population)»预测量的整体对象范围(theentiresetofmeasurementsofinterest)样本(Sample)»来自总体的一个子集(asubsetofdatafromthepopulation)参数(Parameters)»代表总体的测量数值(numericalmeasuresofapopulation)统计学(Statistics)»代表样本的测量数值(numericalmeasuresofasample)PopulationSampleX,Sμ,σBasicStatistics8参量和统计(ParametersvsStatistics)ParameterStatistic均值(Mean)方差(Variance)标准偏差(StandardDeviation)x22ssBasicStatistics9数据的测量NumericalMeasures描述数组的特性Describesthecharacteristicsofthedataset.主要的数组衡量(Keynumericalmeasures):•位置的衡量(中值趋势)measuresoflocation(centraltendency)•分散程度的衡量(方差)measuresofdispersion(variation)•形状的衡量(分布)measuresofshape(distribution)BasicStatistics10测量的位置(MeasuresofLocation)•均值Mean•中值Median•众值ModeBasicStatistics11均值(Mean)均值是指所观察一组样品的平均值;例子:SSI房10个员工的平均高度计算如下:(Meanisaverageoftheobservationforasampleofsize;n.Example:heightofthe10employeesinSSIroom)Mean,x=1.65+1.68+1.71+1.65+1.67+1.65+1.68+1.62+1.60+1.65=16.56/10=1.656BasicStatistics12中值(Median)中值的优点是不被数组里的最大值或最小值而影响.Theadvantageofthemedianisthatitisnotinfluencedverymuchbyhigherorlowervalues.中值是将一组数据由上升或下降趋势排列后所取的中间数值.如果是一个偶数数组,中值则是由中间两个数据和的平均值得到.(Medianisthemiddlevalueinasetofdatapointssortedeitherordescendingorder.Ifanevennumberofdatapoints,themiddleofthelistishalfwaybtwthe2middledatapoints.)1.601.621.651.651.651.661.671.681.681.71Median=(1.65+1.66)/2=1.655BasicStatistics13均值对中值MeanVsMedian:例1:Example1如有观察数组是:(Ifthesampleobservationsare)1342786那么此数组的均值中值是4.4和4.(Thesamplemeanandmedianare4.4and4respectively.)两个数据都表示出这组数据中心趋势的合理量度.Bothquantitiesgiveareasonablemeasureofthecentraltendencyofthedata.如果最后一个数据的值改变为:(Ifthelastobservationischangedsothatthedataare)1342782450这组数据的均值是353.6而中值没有改变(Thesamplemeanis353.6whilethesamplemedianremainsunchanged).BasicStatistics14众值(Mode)众值是指在一组观察数据中出现频率最多的数值.Modeistheobservationthatitoccursmostfrequentlyvalueinasetofthesample/datapoints.众值是比较独特的,可以是多个,有时也众值也不存在Themodemaybeunique,ortheremaybemorethan1mode.Sometimes,themodemaynotexist.1.601.621.651.651.651.661.671.681.681.71Mode=1.65众值跟中值一样,也不会因为出现一个较大或较小的值而受影响.Asformedian,itisnotinfluencedmuchbyhigherorlowervalueBasicStatistics15众值:例2Mode:Example2如果一组观察数据是:(Ifthesampleobservationsare)3693583463110这组数据的众值是3,因为它出现了4次.(Thesamplemodeis3,sinceitoccursfourtimes.)如果一组观察数据是:(Ifthesampleobservationsare)36935834631106256这组数据的众值是3和6,因为它们都出现了4次Thesamplemodesareat3and6,sincetheybothoccurfourtimes.如果一组观察数据是:Ifthesampleobservationsare1342768这组数据没有众值Thesamplemodedoesnotexist.BasicStatistics16分散的衡量(MeasuresofDispersion)•极差Range•方差Variance•标准偏差StandardDeviationBasicStatistics171)极差Range-是一组观察数组中最大值和最小值的差距Thedifferencebetweenthelargestandthesmallestsampleobservations-例如:SSI房10位员工的高度Example:Heightof10employeesinSSI1.601.621.651.651.651.661.671.681.681.71MaxvalueMinvalueRange=1.71–1.60=0.11分散的衡量MeasuresofDispersionBasicStatistics18距差:信息的丢失Range:InformationLoss分析两组观察数据1,
本文标题:基础统计学
链接地址:https://www.777doc.com/doc-7094229 .html