您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 质量控制/管理 > k-mean-clustering
ResearchonK-meansclusteringalgorithmName:SunShanshanCONTENTSIntroductionClusteringK-meansClustingmethodsResultsConclusionsINTRODUCTIONWhatisclustering?Clusteringistheclassificationofobjectsintodifferentgroups,ormoreprecisely,thepartitioningofadatasetintosubsetssothatthedataineachsubset(ideally)sharesomecommontrait-oftenaccordingtosomedefineddistancemeasure.CommonDistancemeasures:•Distancemeasurewilldeterminehowthesimilarityoftwoelementsiscalculatedanditwillinfluencetheshapeoftheclusters1.TheEuclideandistance(alsocalled2-normdistance)isgivenby:1.TheManhattandistance(alsocalled1-normdistance)isgivenby:||),(1ipiiyxyxd21)||(),(ipiiyxyxdINTRODUCTION3.Themaximumnormisgivenby:||max),(1iipiyxyxdINTRODUCTIONK-meansClusting:Thek-meansalgorithmisanalgorithmtoclusternobjectsbasedonattributesintokpartitons,wherekn.Itissimilartotheexpectation-maximizationalgorithmformixturesofGaussiansinthattheybothattempttofindthecentersofnaturalclustersinthedata.Itassumesthattheobjectattributesformavectorspace.INTRODUCTIONAnalgorithmforpartitioning(orclustering)NdatapointsintoKdisjointsubsetsSjcontainingdatapointssoastominimizethesum-of-squarescriterionwherexnisavectorrepresentingthethenthdatapointandujisthegeometriccentroidofthedatapointsinSj.INTRODUCTIONSimplyspeakingk-meansclusteringisanalgorithmtoclassifyortogrouptheobjectsbasedonattributes/featuresintoKnumberofgroup.Kispositiveintegernumber.Thegroupingisdonebyminimizingthesumofsquaresofdistancesbetweendataandthecorrespondingclustercentroid.INTRODUCTIONMETHODSHowtheK-MeanClusteringalgorithmworks?METHODSStep1:Beginwithadecisiononthevalueofknumberclusters.K=2Step2:Putanyinitialpartitionthatclassifiesthedataintokclusters.Youmayassignthetrainingsamplesrand-omly,orsystematicallyasthefollowing:1.Takethefirstktrainingsampleassingle-elementclustersMETHODS2.Assigneachoftheremaining(N-k)trainingsampletotheclusterwiththenearestcentroid.Aftereachassignment,recomputehecentroidofthegainingcluster.Step3:Takeeachsampleinsequenceandcomputeitsdist-ancefromthecentroidofeachoftheclusters.Ifasampleisnotcurrentlyintheclusterwiththeclosestcentroid,switchthissampletothatclusterandupdat-ethecentroidoftheclustergainingthenewsampleandtheclusterlosingthesample.METHODSMETHODSStep4.Repeatstep3untilconvergenceisachived,thatisuntilapassthroughthetrainingsamplecausesnonewassignments.Weobtainresultthatshownintheaboveprocesses.Comparingthegroupingoflastiterationandthisiterationrevealsthattheobjectsdoesnotmovegroupanymore.Thus,thecomputationofthek-meanclusteringhasreacheditsstabilityandnomoreiterationisneeded..Itisrelativelyefficientandfast.ItcomputesresultatO(t*k*n),wherenisnumberofobjectsorpoints,kisnumberofclustersandtisnumberofiterations.RESULTSRESULTSk-meansclusteringcanbeappliedtomachinelearningordataminingUsedonacousticdatainspeechunderstandingtoconvertwaveformsintooneofkcategories(knownasVectorQuantizationorImageSegmentation).AlsousedforchoosingcolorpalettesonoldfashionedgraphicaldisplaydevicesandImageQuantization.CONCLUSIONK-meansalgorithmisusefulforundirectedknowledgediscoveryandisrelativelysimple.K-meanshasfoundwidespreadusageinlotoffields,rangingfromunsupervisedlearningofneuralnetwork,Patternrecognitions,Classificationanalysis,Artificialintelligence,imageprocessing,machinevision,andmanyothers.
本文标题:k-mean-clustering
链接地址:https://www.777doc.com/doc-3244451 .html