基因芯片的数据挖掘DataMiningofGeneChipsppt

基因芯片的数据挖掘DataMiningofGeneChips®DevangShahINT3470CourseProjectMarch14,2002Outline•Background–MolecularBiology101–GeneChip®Technology•Synthesis,SamplePreparation,Scanning•DataFlowandManagement•MultidrugResistance•PreliminaryDataAnalysis–AffymetrixMicroarraySuite5.0•ScoringandCompositeAnalysis•StatisticalAlgorithms(Absolute/ComparisonAnalysis)•DataMiningTool2.0(Preview)–SOMAlgorithm–CorrelationCoefficientClusteringMolecularBiology101•A-T•G-C•Traditionalmethodsresultin“onegeneatonetime”analysis•Theneedforhigherthroughputhasledtothedevelopmentoftwo(competing)technologiescDNASpottedArray•cDNAMicroarraysutilizevarioustechnologies(spotting,piezoelectric,drop-touch)tocreatearraysofentiregenomesusing500-5000bpoligosrepresentingentiregenes•MicroarraysarescannedusinglasermicroscopywithfluorescentlabeledDNAGeneChip®Technology•Utilizesphotolithographyandcombinatorialchemistrytosynthesizemicroarraysconsistingof~25meroligosrepresentingsmallerregionsofvariousgenes–ProbeCell(PMorMM)–ProbePair(PM+MM)–ProbeSet(~18Pairs)•TypicalexperimentinvolvesisolationofmRNA,synthesisofcDNA,fragmentation,labeling,andhybridization•Scanresultsinarawdatafile(*.CHP)DataManagementNatGenet1999Jan;21(1Suppl):51-5Geneexpressioninformatics--it'sallinyourmine.BassettDEJr,EisenMB,BoguskiMSUseinFunctionalGenomics•Antimicrobialresistanceinbacteria–Multi-drugpumps(MDR’s)•Facilitateextrusionofamphipathiccations(toxic)–TolC(E.coli),NorA(S.aureus)–Evolutionaryparadigm:•Whydoplantscontinuetoproduceamphipathiccompoundsiftheyareextruded???•PerhapstheyproduceMDRinhibitorsthatwhencombinedwiththesenormallyineffectiveantimicrobialsprovideastrongsynergisticeffectMethods•E.coliTolCmutantswerechallengedwiththesecompoundsforseveralhoursandthenassayedforgeneexpression•S.aureus(WT+NorAMutant)susceptibilitytovariousplantderivedcompoundsweretestedinthepresenceof5’-methoxyhydnocarpin(MDRinhibitor)andpotentialantimicrobialsisolatedLewisetal.(2000)..Proc.Natl.Acad.Sci.USA97:1433-1437.PreliminaryDataAnalysis•Averageprobecellintensityiscalculatedbaseduponthe75thpercentileof36pixels–AnAbsoluteCallisbasedonadecisionmatrixemployingthePositiveFraction,Pos/NegRatio,andLogAverageRatio•Backgroundiscalculatedfrom1of16sectorswherethelowest2%ofprobecellsareaveragedandsubtractedfromtheallprobeswithinthatsector•Noise(Q)iscalculatedfromthecellsusedinthebackgroundcalculationiiiNFSFpixelstdevNQ1ComparisonAnalysis•GlobalNormalization–MultiplyingtheaverageintensityoftheexperimentalfilebyaNFresultinginthesameaverageintensityasthebaseline•GlobalScaling–MultiplyingBOTHdatasetsbySF’sresultinginaverageintensitiesequaltoatargetintensitysetbytheuser•FoldChange11Q*QAvgDiff,AvgDiffminmaxΔAvgDiffFCcmexpbaseResults02468yhcN_b3238_stmarB_b1532_stspy_b1743_styhbW_b3160_stumuD_b1183_stsseB_b2522_styebG_b1848_stybjC_b0850_stdinD_b3645_stmarR_b1530_stmarA_b1531_stacs_b4069_stb2889_styigN_b3832_stsulA_b0958_stChange(Logratio)TolC+CoumestrolClusterAnalysis•Clusteranalysishelpsidentifygeneexpressionpatternsinlargedatasetsandgroupswithsimilarexpressionprofiles•AffymetrixDMToffers2methods–SelfOrganizingMap(SOM)(Tamayoet.al.,PNAS1999)•Centroid/K-meansAnalysis(Golubet.al.,Science1999)–CorrelationCoefficientClustering•Othermethods–HierarchicalAgglomeration(Eisenet.al.,PNAS1998)–Super-Paramagnetic(Getzet.al.,PhysicaA2000)SOMClustering•Iterativeprocessbasedonanumberofgenes(points)inkexperiments(dimensions)•Initiallyagridofcentroidsisplacedontothek-dimensionalspace•Thenumberofcentroids,eachrepresentingacluster,isdeterminedbythenumberofrowsorcolumnssetbytheuser(i.e.3x2=6)•Thealgorithmthenadjuststhecentroidpositiontowardsclustersofpoints(two-dimensional)•Eachiterationmovesthecentroidclosertothetargetpoint(s)•N=thenodebeingupdated•P=thedatapointbeingconsidered•fi(N)=positionofNatiterationi•Np=thetargetnode•=distanceNmovestowardPSOMClusteringNfPi,NN,dαNfNfip1iiSOMClusteringIteration=0Ref.GaddyGetz,WeizmannInstitute,IsraelSOMClusteringRef.GaddyGetz,WeizmannInstitute,IsraelIteration=1SOMClusteringRef.GaddyGetz,WeizmannInstitute,IsraelIteration=2SOMClusteringRef.GaddyGetz,WeizmannInstitute,IsraelIteration=3CorrelationCoefficient•(X,Y)m=meanAvgDiffforprobesetsacrossallanalyses•(X,Y)i=meanAvgDiffforprobesetfromanalysisi•canrangefrom–1to+1YX))((1YXY)Cov(X,YX,ρ1NimimiYYXXNExampleofClusteringNatGenet1999Jan;21(1Suppl):51-5Geneexpressioninformatics--it'sallinyourmine.BassettDEJr,EisenMB,BoguskiMSConclusions•SOMClustering–Fasterthanmostotherclusteringalgorithms(basedondistance)–Couldpotentiallybreaklargerclusters–Resultsdependoninitialcentroidlocation•CorrelationCoefficient–Fastbutwillonlydetermineclusterswithapositiverelationship•Chooseyourmethodbasedonyourneeds

基因芯片的数据挖掘DataMiningofGeneChipsppt

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

金融危机对中国农产品进出口的影响

国家基本医疗保险和工伤保险药品目录

高中毕业班英语教学质量调研考试

项目管理培训-(实用方法)

医院考核绩效指标

2会计科目

南京地铁暗挖施工方案

译林版五年级英语上册短语默写检测

播客_WEB2_0时代的典型传媒形态_国内播客现状研究

地震电子小报

相关文档

相关搜索