您好,欢迎访问三七文档
1-1Chapter7Bioinformatics1-2生物資訊學的發展•1990年代:–人類基因體計劃•1982年:–美國國家衛生院(NIH)建立了GenBank•1988年:–建立NCBI(NationalCenterforBiotechnologyInformation)1-3DefinitionofBioinformatics•Research,development,orapplicationofcomputationaltoolsandapproachesforexpandingtheuseofbiological,medical,behavioralorhealthdataincludingthosetoacquire,store,organizearchive,analyze,orvisualizesuchdata.1-41-5Whyusebioinformatics?•anexplosivegrowthintheamountofbiologicalinformation•amoreglobalperspectiveinexperimentaldesign..•data-mining-theprocessbywhichtestablehypothesesaregeneratedregardingthefunctionorstructureofageneorproteinofinterestbyidentifyingsimilarsequencesinbettercharacterizedorganisms.From生物資訊分類•生物資訊可略分為四類:–有關生物之結構、形態、顏色等巨觀及微觀之資訊–生物遺傳物質DNA及基因體序列及其特性的資訊–生物大分子如蛋白質及碳水化合物結構與特性之資訊–其他有關生物之生化、生理、遺傳、演化等之特性1-7Typesofbioinformaticstools•Database•Software•Webresource•演算法•圖像及訊號處理•電腦架構及資料庫管理•電腦語言•程式設計•人工智慧及訊息理論•設計與模擬作業•數值分析•統計學•軟體工程及自動化1-8主要生物資訊網站•NCBI(NationalCenterforBiotechnologyInformation)•ExPASy(ExpertProteinAnalysisSystem)•EMBnet(EuropeanMolecularBiologynetwork)1-9主要的核酸與蛋白質資料庫•GenBank(美國),EMBL(歐洲)及DDBJ(日本)•PDB/RCSB(ProteinDatabase),PIR(ProteinInformationResource),Pfam(ProteinFamilydatabase)1-101-11解析生物資訊之網路工作站•EMBOSS(EuropeanMolecularBiologyOpenSoftwareSuite)•SDSC-BiologyWorkbench1-12生物資訊學之應用•(1)數據取得及處理•(2)基因定位•(3)基因體圖譜及比較•(4)分子模型構築及模擬•(5)DNA及蛋白質序列及結構比較•(6)大分子結構預測及藥物設計•(7)分子演化等領域。1-131-141-151-16DNASequencing•AcquireSampleinformation,chromatograms,assembleddata•StoreDataandinformation,backupdata•AnalyzeQualityassessment,filterandassembledata•Predictanddiscovergenefunction•Studygeneticvariationandgeneexpression•DistributeDatatocollaboratorsandcustomers•Researchfindingstothescientificcommunity1-17找到帶病(突變)的個體比較正常/變異個體gene表現不同之處尋找突變gene發現致病gene發現新基因--傳統方法1-18SequenceDataGeneFindingFunctionPredictionNovelGene??發現新基因--genomics1-19帶遺傳疾病的個體利用家族圖譜尋找geneticmarker與疾病遺傳的關係找到致病gene找到與致病gene有關的marker發現致病基因--geneticslinkage1-201-21SomeProblemsinBioinformatics•Sequencecomparison•FragmentassemblyofDNAsequences•Physicalmapping•Evolutionarytrees•Molecularstructureprediction1-22SequenceComparison•Goals:–Databasesearch:GivenasequenceSandasetofsequencesG,tofindallthesequencesinG,whicharesimilartoS.–Similarity:Tofindwhichpartsofthesequencesarealikeandwhichpartsdiffer.-Sequencealignment(globalalignment)-Localalignment1-23SequenceAlignement•Globalalignment•Localalignment1-24LongestCommonSubsequence(1)•Tofindalongestcommonsubsequencebetweentwostrings.string1:TAGTCACGstring2:AGACTGTCLCS:AGACG•Dynamicprogramming:jijijijijijijibaifcbaifcbaifcc001max1,,11,1,1-25LongestCommonSubsequence(2)TAGTCACGAGACTGTCLCS:-AGACTGTC000000000-000001111T011111111A012222222G012223333T012233334C012333334A012344444C012344555G1-26EditDistance(1)•Tofindasmallesteditprocessbetweentwostrings.TAGTCACGAGACTGTCOperation:DMMDDMMIMIIInsertbdistcDeleteadistcbaMatchccjjiijijijiji),(),()(0min1,,11,1,1-27EditDistance(2)TAGTCACGAGACTGTC1-28Similarity•Twosequencess1ands2.•pisthematchvalueifai=bj,elseitisthemismatchvalue.•gisthegappenalty.jijijijijijijibaifgcbaifgcbaifpcc1,,11,1,max1-29SequenceAlignmenta=TAGTCACGb=AGACTGTC----TAGTCACGTAGTCAC-G--AGACT-GTC----AG--ACTGTC•Whichoneisbetter?1-30SequenceAlignmentFormulac0,0=0ci,0=ic0,j=jifaibjifai=bj2111maxmax1,11,,11,1,jijijijijiccccc1-31SequenceAlignmentExampleTAGTCAC-G---AG--ACTGTC-AGACTGTC0-1-2-3-4-5-6-7-8--1-1-2-3-4-2-3-4-5T-2100-1-2-3-4-5A-3032100-1-2G-4-12213221T-5-21143214C-6-30333213A-7-4-1254323C-8-5-2144654G
本文标题:生物资讯学的发展
链接地址:https://www.777doc.com/doc-295336 .html