您好,欢迎访问三七文档
当前位置:首页 > 办公文档 > 工作范文 > 基因芯片的数据挖掘DataMiningofGeneChipsppt
基因芯片的数据挖掘DataMiningofGeneChips®DevangShahINT3470CourseProjectMarch14,2002Outline•Background–MolecularBiology101–GeneChip®Technology•Synthesis,SamplePreparation,Scanning•DataFlowandManagement•MultidrugResistance•PreliminaryDataAnalysis–AffymetrixMicroarraySuite5.0•ScoringandCompositeAnalysis•StatisticalAlgorithms(Absolute/ComparisonAnalysis)•DataMiningTool2.0(Preview)–SOMAlgorithm–CorrelationCoefficientClusteringMolecularBiology101•A-T•G-C•Traditionalmethodsresultin“onegeneatonetime”analysis•Theneedforhigherthroughputhasledtothedevelopmentoftwo(competing)technologiescDNASpottedArray•cDNAMicroarraysutilizevarioustechnologies(spotting,piezoelectric,drop-touch)tocreatearraysofentiregenomesusing500-5000bpoligosrepresentingentiregenes•MicroarraysarescannedusinglasermicroscopywithfluorescentlabeledDNAGeneChip®Technology•Utilizesphotolithographyandcombinatorialchemistrytosynthesizemicroarraysconsistingof~25meroligosrepresentingsmallerregionsofvariousgenes–ProbeCell(PMorMM)–ProbePair(PM+MM)–ProbeSet(~18Pairs)•TypicalexperimentinvolvesisolationofmRNA,synthesisofcDNA,fragmentation,labeling,andhybridization•Scanresultsinarawdatafile(*.CHP)DataManagementNatGenet1999Jan;21(1Suppl):51-5Geneexpressioninformatics--it'sallinyourmine.BassettDEJr,EisenMB,BoguskiMSUseinFunctionalGenomics•Antimicrobialresistanceinbacteria–Multi-drugpumps(MDR’s)•Facilitateextrusionofamphipathiccations(toxic)–TolC(E.coli),NorA(S.aureus)–Evolutionaryparadigm:•Whydoplantscontinuetoproduceamphipathiccompoundsiftheyareextruded???•PerhapstheyproduceMDRinhibitorsthatwhencombinedwiththesenormallyineffectiveantimicrobialsprovideastrongsynergisticeffectMethods•E.coliTolCmutantswerechallengedwiththesecompoundsforseveralhoursandthenassayedforgeneexpression•S.aureus(WT+NorAMutant)susceptibilitytovariousplantderivedcompoundsweretestedinthepresenceof5’-methoxyhydnocarpin(MDRinhibitor)andpotentialantimicrobialsisolatedLewisetal.(2000)..Proc.Natl.Acad.Sci.USA97:1433-1437.PreliminaryDataAnalysis•Averageprobecellintensityiscalculatedbaseduponthe75thpercentileof36pixels–AnAbsoluteCallisbasedonadecisionmatrixemployingthePositiveFraction,Pos/NegRatio,andLogAverageRatio•Backgroundiscalculatedfrom1of16sectorswherethelowest2%ofprobecellsareaveragedandsubtractedfromtheallprobeswithinthatsector•Noise(Q)iscalculatedfromthecellsusedinthebackgroundcalculationiiiNFSFpixelstdevNQ1ComparisonAnalysis•GlobalNormalization–MultiplyingtheaverageintensityoftheexperimentalfilebyaNFresultinginthesameaverageintensityasthebaseline•GlobalScaling–MultiplyingBOTHdatasetsbySF’sresultinginaverageintensitiesequaltoatargetintensitysetbytheuser•FoldChange11Q*QAvgDiff,AvgDiffminmaxΔAvgDiffFCcmexpbaseResults02468yhcN_b3238_stmarB_b1532_stspy_b1743_styhbW_b3160_stumuD_b1183_stsseB_b2522_styebG_b1848_stybjC_b0850_stdinD_b3645_stmarR_b1530_stmarA_b1531_stacs_b4069_stb2889_styigN_b3832_stsulA_b0958_stChange(Logratio)TolC+CoumestrolClusterAnalysis•Clusteranalysishelpsidentifygeneexpressionpatternsinlargedatasetsandgroupswithsimilarexpressionprofiles•AffymetrixDMToffers2methods–SelfOrganizingMap(SOM)(Tamayoet.al.,PNAS1999)•Centroid/K-meansAnalysis(Golubet.al.,Science1999)–CorrelationCoefficientClustering•Othermethods–HierarchicalAgglomeration(Eisenet.al.,PNAS1998)–Super-Paramagnetic(Getzet.al.,PhysicaA2000)SOMClustering•Iterativeprocessbasedonanumberofgenes(points)inkexperiments(dimensions)•Initiallyagridofcentroidsisplacedontothek-dimensionalspace•Thenumberofcentroids,eachrepresentingacluster,isdeterminedbythenumberofrowsorcolumnssetbytheuser(i.e.3x2=6)•Thealgorithmthenadjuststhecentroidpositiontowardsclustersofpoints(two-dimensional)•Eachiterationmovesthecentroidclosertothetargetpoint(s)•N=thenodebeingupdated•P=thedatapointbeingconsidered•fi(N)=positionofNatiterationi•Np=thetargetnode•=distanceNmovestowardPSOMClusteringNfPi,NN,dαNfNfip1iiSOMClusteringIteration=0Ref.GaddyGetz,WeizmannInstitute,IsraelSOMClusteringRef.GaddyGetz,WeizmannInstitute,IsraelIteration=1SOMClusteringRef.GaddyGetz,WeizmannInstitute,IsraelIteration=2SOMClusteringRef.GaddyGetz,WeizmannInstitute,IsraelIteration=3CorrelationCoefficient•(X,Y)m=meanAvgDiffforprobesetsacrossallanalyses•(X,Y)i=meanAvgDiffforprobesetfromanalysisi•canrangefrom–1to+1YX))((1YXY)Cov(X,YX,ρ1NimimiYYXXNExampleofClusteringNatGenet1999Jan;21(1Suppl):51-5Geneexpressioninformatics--it'sallinyourmine.BassettDEJr,EisenMB,BoguskiMSConclusions•SOMClustering–Fasterthanmostotherclusteringalgorithms(basedondistance)–Couldpotentiallybreaklargerclusters–Resultsdependoninitialcentroidlocation•CorrelationCoefficient–Fastbutwillonlydetermineclusterswithapositiverelationship•Chooseyourmethodbasedonyourneeds
本文标题:基因芯片的数据挖掘DataMiningofGeneChipsppt
链接地址:https://www.777doc.com/doc-8690983 .html