Cluster-Analysis-of聚类分析125页PPT

1ClusterAnalysisofMicroarrayData4/13/2009Copyright©2009DanNettleton2Clustering•Groupobjectsthataresimilartooneanothertogetherinacluster.•Separateobjectsthataredissimilarfromeachotherintodifferentclusters.•Thesimilarityordissimilarityoftwoobjectsisdeterminedbycomparingtheobjectswithrespecttooneormoreattributesthatcanbemeasuredforeachobject.3DataforClusteringattributeobject123...m14.73.85.9...1.325.26.93.8...2.935.84.23.9...4.4..................n6.31.64.7...2.04MicroarrayDataforClusteringattributeobject123...m14.73.85.9...1.325.26.93.8...2.935.84.23.9...4.4..................n6.31.64.7...2.0genestimepointsestimatedexpressionlevels5MicroarrayDataforClusteringattributeobject123...m14.73.85.9...1.325.26.93.8...2.935.84.23.9...4.4..................n6.31.64.7...2.0genestissuetypesestimatedexpressionlevels6MicroarrayDataforClusteringattributeobject123...m14.73.85.9...1.325.26.93.8...2.935.84.23.9...4.4..................n6.31.64.7...2.0genestreatmentconditionsestimatedexpressionlevels7MicroarrayDataforClusteringattributeobject123...m14.73.85.9...1.325.26.93.8...2.935.84.23.9...4.4..................n6.31.64.7...2.0samplesgenesestimatedexpressionlevels8Clustering:AnExampleExperiment•Researcherswereinterestedinstudyinggeneexpressionpatternsindevelopingsoybeanseeds.•Seedswereharvestedfromsoybeanplantsat25,30,40,45,and50daysafterflowering(daf).•OneRNAsamplewasobtainedforeachlevelofdaf.9AnExampleExperiment(continued)•Eachofthe5sampleswasmeasuredontwotwo-colorcDNAmicroarrayslidesusingaloopdesign.•Theentireprocesswerepeatedonasecondoccasiontoobtainatotaloftwoindependentbiologicalreplications.1025304045502530404550Rep1Rep2DiagramIllustratingtheExperimentalDesign11Thedafmeansestimatedforeachgenefromamixedlinearmodelanalysisprovideausefulsummaryofthedataforclusteranalysis.NormalizedDataforOneExampleGenedafdafNormalizedLogSignalEstimatedMeans+or–1SE12400genesexhibitedsignificantevidenceofdifferentialexpressionacrosstime(p-value0.01,FDR=3.2%).Wewillfocusonclusteringtheirestimatedmeanprofiles.NormalizedDataforOneExampleGenedafdafNormalizedLogSignalEstimatedMeans+or–1SE13Webuildclustersbasedonthemostsignificantgenesratherthanonallgenesbecause...•Muchofthevariationinexpressionisnoiseratherthanbiologicalsignal,andwewouldrathernotbuildclustersonthebasisofnoise.•Someclusteringalgorithmswillbecomecomputationallyexpensiveiftherearealargenumberofobjects(geneexpressionprofilesinthiscase)tocluster.14EstimatedMeanProfilesforTop36Genes15DissimilarityMeasures•Whenclusteringobjects,wetrytoputsimilarobjectsinthesameclusteranddissimilarobjectsindifferentclusters.•Wemustdefinewhatwemeanbydissimilar.•Therearemanychoices.•Letxandydenotemdimensionalobjects:x=(x1,x2,...,xm)y=(y1,y2,...,ym)e.g.,estimatedmeansatm=5fivetimepointsforagivengene.16ParallelCoordinatePlotsScatterplotx1x2ParallelCoordinatePlotCoordinateValue17Theseareparallelcoordinateplotsthateachshowonepointin5-dimensionalspace.18EuclideanDistance∑12mjjjE)y-x(||y-x||)y,x(d===1-Correlation∑∑∑1212111mjjmjjmjjjxycor)y-y()x-x()y-y)(x-x(-r-)y,x(d=====19EuclideanDistanceScatterplotx1x2dE(red,green)dE(black,red)dE(black,green)201-CorrelationDissimilarityParallelCoordinatePlotCoordinateValueTheblackandgreenobjectsareclosetogetherandfarfromtheredobject.21RelationshipbetweenEuclideanDistanceand1-CorrelationDissimilarityxymjjjmjjmjjmjjj2j2jmjjjmyjjmxjjr-)1-m(y~x~2-y~x~)y~x~2-y~x~()y~-x~(||y~-x~||).y~,...,y~,y~(y~letandsy-yy~Let).x~,...,x~,x~(x~letandsx-xx~Let12∑∑∑∑∑112121122121=+=+===========22ThusEuclideandistanceforstandardizedobjectsisproportionaltothesquarerootofthe1-correlationdissimilarity.•Wewillstandardizeourmeanprofilessothateachprofilehasmean0andstandarddeviation1(i.e.,wewillconverteachxtox).•WewillclusterbasedontheEuclideandistancebetweenstandardizedprofiles.•Originalmeanprofileswithsimilarpatternsare“close”tooneanotherusingthisapproach.~23Clusteringmethodsareoftendividedintotwomaingroups.1.PartitioningmethodsthatattempttooptimallyseparatenobjectsintoKclusters.2.Hierarchicalmethodsthatproduceanestedsequenceofclusters.24SomePartitioningMethods1.K-Means2.K-Medoids3.Self-OrganizingMaps(SOM)4.5.(Kohonen,1990;Tomayo,P.etal.,2019)25KMedoidsClustering0.ChooseKofthenobjectstorepresentKclustercenters(a.k.a.,medoids).1.GivenacurrentsetofKmedoids,assigneachobjecttothenearestmedoidtoproduceanassignmentofobjectstoKclusters.2.ForagivenassignmentofobjectstoKclusters,findthenewmedoidforeachclusterbyfindingtheobjectintheclusterthatistheclosestonaveragetoallotherobjectsinitscluster.3.Repeatsteps1and2untiltheclusterassignmentsdonotchange.26ExampleofKMedoidsClustering27StartwithKMedoids28AssignEachPointtoClosestMedoid29AssignEachPointtoClosestMedoid30AssignEachPointtoClosestMedoid31AssignEachPointtoClosestMedoid32FindNewMedoidforEachClusterNewmedoidshavesmallestaveragedistancetootherpointsintheircluster.33ReassignEachPointtoClosestMedoid34ReassignEachPointtoClosestMedoid35FindNewMedoidforEachCluster36ReassignEachPointtoClosestMedoidNoreassignmentisneeded,sotheprocedurestops.37PublicCluster1of3DAFStandardizedMe

Cluster-Analysis-of聚类分析125页PPT

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

10商贸CRM08客户关系管理在中国

B-B-19保险箱钥匙遗失

生物化学-氨基酸第八章串讲

1质量手册A版

我国上市公司独立董事制度与公司绩效的实证分析—以浙江省为例(doc 27页)

省政府关于滁河防洪治理近期工程建设征地移民安置规划大纲的批复

江苏省粮食流通统计制度

精神科护理安全管理及关键流程

升和广告报告（7-8月媒介投播）

打造xx的高绩效的营销团队

相关文档

相关搜索