您好,欢迎访问三七文档
AGeneralCoefficientofSimilarityandSomeofItsPropertiesJ.C.GowerBiometrics,Vol.27,No.4.(Dec.,1971),pp.857-871.StableURL:=0006-341X%28197112%2927%3A4%3C857%3AAGCOSA%3E2.0.CO%3B2-3BiometricsiscurrentlypublishedbyInternationalBiometricSociety.YouruseoftheJSTORarchiveindicatesyouracceptanceofJSTOR'sTermsandConditionsofUse,availableat://@jstor.org.:352007BIOMETRICS$7,857-74December1971AGENERALCOEFFICIENTOFSIMILARITYANDSOMEOFITSPROPERTIESJ.C.GOWERRothamstedExperimentalStation,Ha~penden,Herts.,U.R.SUMMARYAgeneralcoefficientmeasuringthesimilaritybetweentwosamplingunitsisdefined.Thematrixofsimilaritiesbetweenallpairsofsampleunitsisshowntobepositivesemi-definite(exceptpossiblywhentherearemissingvalues).Thisisimportantforthemulti-dimensionalEuclideanrepresentationofthesampleandalsoestablishessomeinequalitiesamongstthesimilaritiesrelatingthreeindividuals.Thedefinitionisextendedtocopewithahierarchyofcharacters.1.INTRODUCTIONAsimilaritycoefficientmeasurestheresemblancebetweentwoindividualsbasedoneitherorbothoftwologicallydistinctkindsofinformationpertainingtovvariablesandallowingforpossiblemissinginformation.Firstthereisinformationontheexistence,ornot,ofthevariables.Intaxonomy,wheresimilaritycoefficientsareoftenused,thismaybetheonlykindofinformationusedtobuildupataxonomicclassification.Thetaxonomisthastheproblemofdecidingwhetheracharacteroccurringinonegroupoforganismsalsooccursinanothergroup;thisistheso-calledhomologyproblem.Amissingcharactershouldnotbeconfusedwithmissinginformationbecauseitisknownthatthecharacterdefinitelydoesnotexist.Missinginformationcanoccur,forexample,withincompletefossilmaterialorwithpoordescriptionsintheliterature,fromwhichtheexistenceorother-wiseofacharactercannotbeinferred.Theothertypeofinformationpertainstoobservedvaluesofqualitativeorquantitativepropertiesofexistingcharacters.Anabsentcharactercannothaveanyassociatedpropertiesandthissuggeststhatthetwotypesofin-formationmightbeviewedhierarchically,atopicreturnedtoinsection4.Acommonsimplesituationoccurswhenallinformationisofthepresence/absencetype(orfrom2-levelqualitativecharacters).Thisgivesthefamiliar2X2associationtableshowninTable1,wherepresenceisdenotedby+andabsenceby-.ManydifferentcoefficientshavebeenderivedfromTable1.Yule'searlyworkonthissubjectwasreviewedbyYates[1952].MorerecentlySokalandSneath[I9631discussednumerousassociationcoefficients,notallofwhichhaveyetbeenused.Wearenotconcernedherewithrecommending858BIOMETRICS,DECEMBER1971TABLE1NUMBERSOFCHARACTERSOCCURRINGIN,ORABSENTFROM,TWOINDIVIDUALS:a(+,+)COMMONTOBOTHINDIVIDUALS;b(-,+)ANDC(f,-)OCCURRINGINONLYONEINDIVIDUAL;ANDd(-,-)ABSENTFROMBOTHIndividual1f-TotalsIndividual2+abafb-cdcfdTotalsa+cbfdvwhatcoefficientsshouldbeusedindifferentcircumstancesbutmerelywishtodescribeageneralcoefficientthatincludesseveralexistingonesasspecialcases,andcanthereforebeusedundermanydifferentcircumstances.Itisparticularlysuitableforincludingincomputerprogramsbecauseitcancopewithavarietyofdifferentdata-typeswithoutanyreprogrammingandalsobecausethepositivesemi-definitepropertyestablishedinsection3isaprerequisiteforcertaintypesofstatisticalandnumericalanalyses(Gower[1966]).Thiscoefficienthasbeenusedsince1960invariouscomputerprograms.Tofindouthowithasbehavedthereaderisreferredtotheasteriskedref-erencesgivenattheendofthispaper.2.THEDEFINITIONOFSIMILARITY2.1.TerminologyDichotomous,qualitative,andquantitativevariatesaredistinguished.Thetermdichotomousisreservedforcharactersthatareeitherpresentorabsentandwhoseabsenceinbothofapairofindividualsisnottakenasamatch;whenbothlevelsofatwo-levelqualitativevariatearetobetreatedonapar,thelevelswillbetermedalternatives.Adiscussionofsomeoftheconsiderationsgoverningthechoiceofscoringthetwolevelsofaresponseasdichotomousorasalternativesisdeferreduntilsection4.Qualitativecharactersmayhavemanylevels(e.g.black,green,yellow,blue)butunlikethelevelsofquanti-tativecharacterstheydonotformanorderedset,althoughforconvenienceincomputing,codednumericalvaluesmaybegiven.2.2.ThecalculationofsimilarityTwoindividualsiandjmaybecomparedonacharacterkandassignedascoresilk,zerowheniandjareconsidereddifferentandapositivefraction,orunity,whentheyhavesomedegreeofagreementorsimilarity.Therearemanywaysofcalculatingsiik,someofwhicharedescribedbelow.Some-AGENERALCOEFFICIENTOFSIMILARI
本文标题:1971 A General Coefficient of Similarity and Some
链接地址:https://www.777doc.com/doc-3256101 .html