您好,欢迎访问三七文档
当前位置:首页 > 电子/通信 > 综合/其它 > 数据挖掘Chapter8
数据挖掘导论Pang-ningTan,MichaelStieinbach,andVipinKumar著PearsonEducationLTD.范明等译人民邮电出版社第8章聚类分析:基本概念和算法8.1概述2019年10月20日星期日数据挖掘导论4WhatisClusterAnalysis?Findinggroupsofobjectssuchthattheobjectsinagroupwillbesimilar(orrelated)tooneanotheranddifferentfrom(orunrelatedto)theobjectsinothergroupsInter-clusterdistancesaremaximizedIntra-clusterdistancesareminimized2019年10月20日星期日数据挖掘导论5ApplicationsofClusterAnalysisUnderstandingGrouprelateddocumentsforbrowsinggroupgenesandproteinsthathavesimilarfunctionalitygroupstockswithsimilarpricefluctuationsSummarizationReducethesizeoflargedatasetsDiscoveredClustersIndustryGroup1Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN,Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN,Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down,Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN,Sun-DOWNTechnology1-DOWN2Apple-Comp-DOWN,Autodesk-DOWN,DEC-DOWN,ADV-Micro-Device-DOWN,Andrew-Corp-DOWN,Computer-Assoc-DOWN,Circuit-City-DOWN,Compaq-DOWN,EMC-Corp-DOWN,Gen-Inst-DOWN,Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWNTechnology2-DOWN3Fannie-Mae-DOWN,Fed-Home-Loan-DOWN,MBNA-Corp-DOWN,Morgan-Stanley-DOWNFinancial-DOWN4Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP,Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP,Schlumberger-UPOil-UPClusteringprecipitationinAustralia2019年10月20日星期日数据挖掘导论6WhatisnotClusterAnalysis?SupervisedclassificationHaveclasslabelinformationSimplesegmentationDividingstudentsintodifferentregistrationgroupsalphabetically,bylastnameResultsofaqueryGroupingsarearesultofanexternalspecification2019年10月20日星期日数据挖掘导论7NotionofaClustercanbeAmbiguousHowmanyclusters?FourClustersTwoClustersSixClusters2019年10月20日星期日数据挖掘导论8TypesofClusteringsAclustering(聚类)isasetofclusters(簇)ImportantdistinctionbetweenhierarchicalandpartitionalsetsofclustersPartitionalClusteringAdivisiondataobjectsintonon-overlappingsubsets(clusters)suchthateachdataobjectisinexactlyonesubsetHierarchicalclusteringAsetofnestedclustersorganizedasahierarchicaltree2019年10月20日星期日数据挖掘导论9PartitionalClusteringOriginalPointsAPartitionalClustering2019年10月20日星期日数据挖掘导论10HierarchicalClusteringp4p1p3p2p4p1p3p2p4p1p2p3p4p1p2p3TraditionalHierarchicalClusteringNon-traditionalHierarchicalClusteringNon-traditionalDendrogramTraditionalDendrogram2019年10月20日星期日数据挖掘导论11OtherDistinctionsBetweenSetsofClustersExclusiveversusnon-exclusiveInnon-exclusiveclusterings,pointsmaybelongtomultipleclusters.Canrepresentmultipleclassesor‘border’pointsFuzzyversusnon-fuzzyInfuzzyclustering,apointbelongstoeveryclusterwithsomeweightbetween0and1Weightsmustsumto1ProbabilisticclusteringhassimilarcharacteristicsPartialversuscompleteInsomecases,weonlywanttoclustersomeofthedataHeterogeneousversushomogeneousClusterofwidelydifferentsizes,shapes,anddensities2019年10月20日星期日数据挖掘导论12TypesofClustersWell-separatedclustersCenter-basedclustersContiguousclustersDensity-basedclustersPropertyorConceptualDescribedbyanObjectiveFunction2019年10月20日星期日数据挖掘导论13TypesofClusters:Well-SeparatedWell-SeparatedClusters:Aclusterisasetofpointssuchthatanypointinaclusteriscloser(ormoresimilar)toeveryotherpointintheclusterthantoanypointnotinthecluster.3well-separatedclusters2019年10月20日星期日数据挖掘导论14TypesofClusters:Center-BasedCenter-basedAclusterisasetofobjectssuchthatanobjectinaclusteriscloser(moresimilar)tothe“center”ofacluster,thantothecenterofanyotherclusterThecenterofaclusterisoftenacentroid,theaverageofallthepointsinthecluster,oramedoid,themost“representative”pointofacluster4center-basedclusters2019年10月20日星期日数据挖掘导论15TypesofClusters:Contiguity-BasedContiguousCluster(NearestneighbororTransitive)Aclusterisasetofpointssuchthatapointinaclusteriscloser(ormoresimilar)tooneormoreotherpointsintheclusterthantoanypointnotinthecluster8contiguousclusters2019年10月20日星期日数据挖掘导论16TypesofClusters:Density-BasedDensity-basedAclusterisadenseregionofpoints,whichisseparatedbylow-densityregions,fromotherregionsofhighdensity.Usedwhentheclustersareirregularorintertwined,andwhennoiseandoutliersarepresent6density-basedclusters2019年10月20日星期日数据挖掘导论17TypesofClusters:ConceptualClustersSharedPropertyorConceptualClustersFindsclustersthatsharesomecommonpropertyorrepresentaparticularconcept2OverlappingCircles2019年10月20日星期日数据挖掘导论18TypesofClusters:ObjectiveFunctionClustersDefinedbyanObjectiveFunctionFindsclustersthatminimizeormaximizeanobjectivefunction.Enumerateallpossiblewaysofdividingthepointsintoclustersandevaluatethe`goodness'ofeachpotentialsetofclustersbyusingthegivenobjectivefunction.(NPHard)Canhaveglobalorlocalobjectives.HierarchicalclusteringalgorithmstypicallyhavelocalobjectivesPartitionalalgorithmstypicallyhaveglobalobjectivesAvariationoftheglobalobjectivefunctionapproachistofitthedatatoaparameterizedmodel.Parametersforthemodelaredeterminedfromthedata.Mixturemodelsassumethatthedataisa‘mixture'ofanumberofstatisticaldistributions2019年10月20日星期日数据挖掘导论19TypesofClusters:ObjectiveFunction…MaptheclusteringproblemtoadifferentdomainandsolvearelatedprobleminthatdomainProximityma
本文标题:数据挖掘Chapter8
链接地址:https://www.777doc.com/doc-1655148 .html