您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 企业财务 > Cluster-Analysis-of聚类分析125页PPT
1ClusterAnalysisofMicroarrayData4/13/2009Copyright©2009DanNettleton2Clustering•Groupobjectsthataresimilartooneanothertogetherinacluster.•Separateobjectsthataredissimilarfromeachotherintodifferentclusters.•Thesimilarityordissimilarityoftwoobjectsisdeterminedbycomparingtheobjectswithrespecttooneormoreattributesthatcanbemeasuredforeachobject.3DataforClusteringattributeobject123...m14.73.85.9...1.325.26.93.8...2.935.84.23.9...4.4..................n6.31.64.7...2.04MicroarrayDataforClusteringattributeobject123...m14.73.85.9...1.325.26.93.8...2.935.84.23.9...4.4..................n6.31.64.7...2.0genestimepointsestimatedexpressionlevels5MicroarrayDataforClusteringattributeobject123...m14.73.85.9...1.325.26.93.8...2.935.84.23.9...4.4..................n6.31.64.7...2.0genestissuetypesestimatedexpressionlevels6MicroarrayDataforClusteringattributeobject123...m14.73.85.9...1.325.26.93.8...2.935.84.23.9...4.4..................n6.31.64.7...2.0genestreatmentconditionsestimatedexpressionlevels7MicroarrayDataforClusteringattributeobject123...m14.73.85.9...1.325.26.93.8...2.935.84.23.9...4.4..................n6.31.64.7...2.0samplesgenesestimatedexpressionlevels8Clustering:AnExampleExperiment•Researcherswereinterestedinstudyinggeneexpressionpatternsindevelopingsoybeanseeds.•Seedswereharvestedfromsoybeanplantsat25,30,40,45,and50daysafterflowering(daf).•OneRNAsamplewasobtainedforeachlevelofdaf.9AnExampleExperiment(continued)•Eachofthe5sampleswasmeasuredontwotwo-colorcDNAmicroarrayslidesusingaloopdesign.•Theentireprocesswerepeatedonasecondoccasiontoobtainatotaloftwoindependentbiologicalreplications.1025304045502530404550Rep1Rep2DiagramIllustratingtheExperimentalDesign11Thedafmeansestimatedforeachgenefromamixedlinearmodelanalysisprovideausefulsummaryofthedataforclusteranalysis.NormalizedDataforOneExampleGenedafdafNormalizedLogSignalEstimatedMeans+or–1SE12400genesexhibitedsignificantevidenceofdifferentialexpressionacrosstime(p-value0.01,FDR=3.2%).Wewillfocusonclusteringtheirestimatedmeanprofiles.NormalizedDataforOneExampleGenedafdafNormalizedLogSignalEstimatedMeans+or–1SE13Webuildclustersbasedonthemostsignificantgenesratherthanonallgenesbecause...•Muchofthevariationinexpressionisnoiseratherthanbiologicalsignal,andwewouldrathernotbuildclustersonthebasisofnoise.•Someclusteringalgorithmswillbecomecomputationallyexpensiveiftherearealargenumberofobjects(geneexpressionprofilesinthiscase)tocluster.14EstimatedMeanProfilesforTop36Genes15DissimilarityMeasures•Whenclusteringobjects,wetrytoputsimilarobjectsinthesameclusteranddissimilarobjectsindifferentclusters.•Wemustdefinewhatwemeanbydissimilar.•Therearemanychoices.•Letxandydenotemdimensionalobjects:x=(x1,x2,...,xm)y=(y1,y2,...,ym)e.g.,estimatedmeansatm=5fivetimepointsforagivengene.16ParallelCoordinatePlotsScatterplotx1x2ParallelCoordinatePlotCoordinateValue17Theseareparallelcoordinateplotsthateachshowonepointin5-dimensionalspace.18EuclideanDistance∑12mjjjE)y-x(||y-x||)y,x(d===1-Correlation∑∑∑1212111mjjmjjmjjjxycor)y-y()x-x()y-y)(x-x(-r-)y,x(d=====19EuclideanDistanceScatterplotx1x2dE(red,green)dE(black,red)dE(black,green)201-CorrelationDissimilarityParallelCoordinatePlotCoordinateValueTheblackandgreenobjectsareclosetogetherandfarfromtheredobject.21RelationshipbetweenEuclideanDistanceand1-CorrelationDissimilarityxymjjjmjjmjjmjjj2j2jmjjjmyjjmxjjr-)1-m(y~x~2-y~x~)y~x~2-y~x~()y~-x~(||y~-x~||).y~,...,y~,y~(y~letandsy-yy~Let).x~,...,x~,x~(x~letandsx-xx~Let12∑∑∑∑∑112121122121=+=+===========22ThusEuclideandistanceforstandardizedobjectsisproportionaltothesquarerootofthe1-correlationdissimilarity.•Wewillstandardizeourmeanprofilessothateachprofilehasmean0andstandarddeviation1(i.e.,wewillconverteachxtox).•WewillclusterbasedontheEuclideandistancebetweenstandardizedprofiles.•Originalmeanprofileswithsimilarpatternsare“close”tooneanotherusingthisapproach.~23Clusteringmethodsareoftendividedintotwomaingroups.1.PartitioningmethodsthatattempttooptimallyseparatenobjectsintoKclusters.2.Hierarchicalmethodsthatproduceanestedsequenceofclusters.24SomePartitioningMethods1.K-Means2.K-Medoids3.Self-OrganizingMaps(SOM)4.5.(Kohonen,1990;Tomayo,P.etal.,2019)25KMedoidsClustering0.ChooseKofthenobjectstorepresentKclustercenters(a.k.a.,medoids).1.GivenacurrentsetofKmedoids,assigneachobjecttothenearestmedoidtoproduceanassignmentofobjectstoKclusters.2.ForagivenassignmentofobjectstoKclusters,findthenewmedoidforeachclusterbyfindingtheobjectintheclusterthatistheclosestonaveragetoallotherobjectsinitscluster.3.Repeatsteps1and2untiltheclusterassignmentsdonotchange.26ExampleofKMedoidsClustering27StartwithKMedoids28AssignEachPointtoClosestMedoid29AssignEachPointtoClosestMedoid30AssignEachPointtoClosestMedoid31AssignEachPointtoClosestMedoid32FindNewMedoidforEachClusterNewmedoidshavesmallestaveragedistancetootherpointsintheircluster.33ReassignEachPointtoClosestMedoid34ReassignEachPointtoClosestMedoid35FindNewMedoidforEachCluster36ReassignEachPointtoClosestMedoidNoreassignmentisneeded,sotheprocedurestops.37PublicCluster1of3DAFStandardizedMe
本文标题:Cluster-Analysis-of聚类分析125页PPT
链接地址:https://www.777doc.com/doc-4003070 .html