k-mean-clustering

ResearchonK-meansclusteringalgorithmName:SunShanshanCONTENTSIntroductionClusteringK-meansClustingmethodsResultsConclusionsINTRODUCTIONWhatisclustering?Clusteringistheclassificationofobjectsintodifferentgroups,ormoreprecisely,thepartitioningofadatasetintosubsetssothatthedataineachsubset(ideally)sharesomecommontrait-oftenaccordingtosomedefineddistancemeasure.CommonDistancemeasures:•Distancemeasurewilldeterminehowthesimilarityoftwoelementsiscalculatedanditwillinfluencetheshapeoftheclusters1.TheEuclideandistance(alsocalled2-normdistance)isgivenby:1.TheManhattandistance(alsocalled1-normdistance)isgivenby:||),(1ipiiyxyxd21)||(),(ipiiyxyxdINTRODUCTION3.Themaximumnormisgivenby:||max),(1iipiyxyxdINTRODUCTIONK-meansClusting:Thek-meansalgorithmisanalgorithmtoclusternobjectsbasedonattributesintokpartitons,wherekn.Itissimilartotheexpectation-maximizationalgorithmformixturesofGaussiansinthattheybothattempttofindthecentersofnaturalclustersinthedata.Itassumesthattheobjectattributesformavectorspace.INTRODUCTIONAnalgorithmforpartitioning(orclustering)NdatapointsintoKdisjointsubsetsSjcontainingdatapointssoastominimizethesum-of-squarescriterionwherexnisavectorrepresentingthethenthdatapointandujisthegeometriccentroidofthedatapointsinSj.INTRODUCTIONSimplyspeakingk-meansclusteringisanalgorithmtoclassifyortogrouptheobjectsbasedonattributes/featuresintoKnumberofgroup.Kispositiveintegernumber.Thegroupingisdonebyminimizingthesumofsquaresofdistancesbetweendataandthecorrespondingclustercentroid.INTRODUCTIONMETHODSHowtheK-MeanClusteringalgorithmworks?METHODSStep1:Beginwithadecisiononthevalueofknumberclusters.K=2Step2:Putanyinitialpartitionthatclassifiesthedataintokclusters.Youmayassignthetrainingsamplesrand-omly,orsystematicallyasthefollowing:1.Takethefirstktrainingsampleassingle-elementclustersMETHODS2.Assigneachoftheremaining(N-k)trainingsampletotheclusterwiththenearestcentroid.Aftereachassignment,recomputehecentroidofthegainingcluster.Step3:Takeeachsampleinsequenceandcomputeitsdist-ancefromthecentroidofeachoftheclusters.Ifasampleisnotcurrentlyintheclusterwiththeclosestcentroid,switchthissampletothatclusterandupdat-ethecentroidoftheclustergainingthenewsampleandtheclusterlosingthesample.METHODSMETHODSStep4.Repeatstep3untilconvergenceisachived,thatisuntilapassthroughthetrainingsamplecausesnonewassignments.Weobtainresultthatshownintheaboveprocesses.Comparingthegroupingoflastiterationandthisiterationrevealsthattheobjectsdoesnotmovegroupanymore.Thus,thecomputationofthek-meanclusteringhasreacheditsstabilityandnomoreiterationisneeded..Itisrelativelyefficientandfast.ItcomputesresultatO(t*k*n),wherenisnumberofobjectsorpoints,kisnumberofclustersandtisnumberofiterations.RESULTSRESULTSk-meansclusteringcanbeappliedtomachinelearningordataminingUsedonacousticdatainspeechunderstandingtoconvertwaveformsintooneofkcategories(knownasVectorQuantizationorImageSegmentation).AlsousedforchoosingcolorpalettesonoldfashionedgraphicaldisplaydevicesandImageQuantization.CONCLUSIONK-meansalgorithmisusefulforundirectedknowledgediscoveryandisrelativelysimple.K-meanshasfoundwidespreadusageinlotoffields,rangingfromunsupervisedlearningofneuralnetwork,Patternrecognitions,Classificationanalysis,Artificialintelligence,imageprocessing,machinevision,andmanyothers.

k-mean-clustering

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

河南民安物业公司制度汇编

以现代物流理念经营铁路行包运输

美国投资银行公司价值评估全套教程第一章

世界十大汽车公司

美国喜达屋酒店亚太区机电系统标准

落实《长春市特殊教育三年发展规划》

演讲稿写作与演讲技巧(1)

资源与运营管理模拟试题1

南昌智通广场商业项目推广执行竞稿策略_尚驰_73P

国土资源部油气资源战略研究中心

相关文档

相关搜索