您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 信息化管理 > k-means算法的简单示例
K-meansClusteringK-meansClusteringK-meansclusteringisasortofclusteringalgorithmanditisamethodofvectorquantization,originallyfromsignalprocessing,thatispopularforclusteranalysisindatamining.K-meansclusteringaimstopartitionnobservationsintokclustersinwhicheachobservationbelongstotheclusterwiththenearestmean,servingasaprototypeofthecluster.--FromWikipediaAlgorithmProcedure1.RandomlyselectKpointsfromcompletesamplesastheinitialcenter.(That'swhatkmeansinK-means)2.Eachpointinthedatasetisassignedtotheclosedcluster,basedupontheEuclideandistancebetweeneachpointandeachclustercenter.3.Eachcluster'scenterisrecomputedastheaverageofthepointsinthatcluster.4.Iteratestep2ormoreuntilthenewcenterofclusterequalstotheoriginalcenterofclusterorlessthanaspecifiedthreshold,thenclusteringfinished.ABCDEFIGJHExampleHowtoclusterA,B...H,Jintotwoclusters?A(1,4)B(2,4)CDEFIGJHRandomlychooseA,BasthecentreandK=2.ExampleAAdABdACdADdAFdAGdAHdAIdAJdAEdBAdBBdBCdBDdBFdBGdBHdBIdBJdBEd0111.412.243.614.475.394.245101.41122.833.614.473.614.24So,weclassifyA,CasaclusterandB,E,D,F,G,H,IandJasanothercluster.Step1and2.ABCDEFGHIJABdmeansdistanceA→BACdBCdA(1,4)B(2,4)CDEFIGJHRandomlychooseA,BasthecentreandK=2.ExampleStep3.),(jyixcenterji)5.4,1()254,211(,CA)875.2,75.3(,,,,,,,JIHGFEDBThenewcentersofthetwoclustersare(1,4.5)and(3.75,2.875)cluster1cluster2newcenterABCDEFIGJHα(1,4.5)β(3.75,2.875)ExampleAdBdCdDdFdGdHdIdJdEdAdBdCdDdFdGdHdIdJdEd0.51.120.51.121.83.914.725.594.615.322.972.083.482.753.580.911.532.411.892.25Step2again.α,βasthecentreandK=2.So,weclassifyA,B,C,D,EasaclusterandF,G,H,I,Jasanothercluster.ABCDEFGHIJABCDEFIGJHα(1,4.5)β(3.75,2.875)ExampleStep3again.α,βasthecentreandK=2.),(jyixcenterji)8.4,6.1(,,,,EDCBAP)6.1,8.4(,,,,JIHGFQThenewcentersofthetwoclustersareP(1.6,4.8)andQ(4.8,1.6)cluster2cluster1newcenterABCDEFIGJHP(1.6,4.8)Q(4.8,1.6)ExamplePAdPBdPCdPDdPFdPGdPHdPIdPJdPEdQAdQBdQCdQDdQFdQGdQHdQIdQJdQEd10.890.630.451.263.694.405.224.495.104.493.695.104.45.220.890.451.2610.63Step2again.So,weclassifyA,B,C,D,EasaclusterandF,G,H,I,Jasanothercluster.ABCDEFGHIJABCDEFIGJHP(1.6,4.8)Q(4.8,1.6)ExampleStep3again.),(jyixcenterji)8.4,6.1(,,,,EDCBAM)6.1,8.4(,,,,JIHGFNThenewcentersofthetwoclustersareequaltotheoriginalP(1.6,4.8)andQ(4.8,1.6)P,QasthecentreandK=2.newcentercluster2cluster1FinalABCDEFIGJHcluster1cluster2Clusteringfinished!Disadvantagesoneofthemaindisadvantagestok-meansisthefactthatyoumustspecifythenumberofclusters(K)asaninputtothealgorithm.Asdesigned,thealgorithmisnotcapableofdeterminingtheappropriatenumberofclustersanddependsupontheusertoidentifythisinadvance.K=2K=3
本文标题:k-means算法的简单示例
链接地址:https://www.777doc.com/doc-7508523 .html