您好,欢迎访问三七文档
第六章序列相似性搜索一、序列相似性搜索的任务和目的1.序列相似性搜索的任务2.序列相似性搜索的目的二、同源和相似三、序列的BLAST分析四、专门的BLAST服务器1.序列比较的任务:发现序列之间的相似性辨别序列之间的差异2.目的:相似序列相似的结构,相似的功能判别序列之间的同源性推测序列之间的进化关系一、序列相似性搜索的任务和目的1.同源(homology)-具有共同的祖先直向同源(Orthologous)共生同源(paralogous)2.相似(similarity)同源序列一般是相似的相似序列不一定是同源的二、同源和相似一般认为,蛋白质序列间至少有80个氨基酸左右的区域有25%或更高的同源性;DNA序列具有75%以上的同源性有潜在的生物学意义。三、序列的BLAST分析BLAST(BasicLocalAlignmentSearchTool)allowsrapidsequencecomparisonofaquerysequenceagainstadatabase.TheBLASTalgorithmisfast,accurate,andweb-accessible.基本局域联配搜寻工具BLASTWebsiteofBLAST(BLAST2.0)(WU-Blast2)(WU-Blast2)WhyuseBLAST?BLASTsearchingisfundamentaltounderstandingtherelatednessofanyfavoritequerysequencetootherknownproteinsorDNAsequences.Applicationsinclude•identifyingorthologsandparalogs•discoveringnewgenesorproteins•discoveringvariantsofgenesorproteins•investigatingexpressedsequencetags(ESTs)•exploringproteinstructureandfunctionFourcomponentstoaBLASTsearch(1)Choosethesequence(query)(2)SelecttheBLASTprogram(3)Choosethedatabasetosearch(4)ChooseoptionalparametersThenclick“BLAST”Step1:ChooseyoursequenceSequencecanbeinputinFASTAformat,plaintextformatorasaccessionnumberExampleoftheFASTAformatforaBLASTqueryStep2:ChoosetheBLASTprogramStep2:ChoosetheBLASTprogramblastn(nucleotideBLAST)blastp(proteinBLAST)blastx(translatedBLAST)tblastn(translatedBLAST)tblastx(translatedBLAST)ChoosetheBLASTprogramProgramInputDatabase1blastnDNADNA1blastpproteinprotein6blastxDNAprotein6tblastnproteinDNA36tblastxDNADNADNApotentiallyencodessixproteins5’CATCAA5’ATCAAC5’TCAACT5’GTGGGT5’TGGGTA5’GGGTAG5’CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC3’3’GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG5’Step3:choosethedatabasenr=non-redundant(mostgeneraldatabase)dbest=databaseofexpressedsequencetagsdbsts=databaseofsequencetagsitesgss=genomicsurveysequenceshtgs=highthroughputgenomicsequenceStep4a:SelectoptionalsearchparametersCDsearchBLASTNsearchingStep4a:SelectoptionalsearchparametersEntrez!FilterExpectWordsizeorganism增加该值可提高查询速度BLAST:optionalparametersYoucan...•choosetheorganismtosearch•turnfilteringon/off•changetheexpect(e)value•changethewordsize•changetheoutputformatfilteringStep4b:optionalformattingparametersAlignmentviewDescriptionsAlignmentstaxonomydatabasequeryprogramtaxonomyBLASTformatoptionsBLASTformatoptions:multiplesequencealignmentthresholdscore=11EVDparametersBLOSUMmatrixEffectivesearchspace=mn=lengthofqueryxdblength10.0istheEvaluegappenaltiescut-offparametersWewillgettothebottomofaBLASTsearchinafewminutes…BLASTPSearchingwithamultidomainprotein,polSearchingbacterialsequenceswithpolBLASTprogramselectionguidePiggrowthhormonemRNASequenceID:gb|M22761.1|PIGGHMALength:878NumberofMatches:Query1tttttttttttGGTGGGGAAGAGGACTTTTATTGGGATGTTAGTGGGGGACTCCAGGGAA60||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct1TTTTTTTTTTTGGTGGGGAAGAGGACTTTTATTGGGATGTTAGTGGGGGACTCCAGGGAA60Query61CA-C-AACACTAGGACCCAGCTCCCCAGACCACTCAGGGACCTGTGGACAGCTCAGCTCA118|||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct61CAACAAACACTAGGACCCAGCTCCCCAGACCACTCAGGGACCTGTGGA-----CAGCTCA115Query119CCGGCTGTGATGGCTGCAGGCCCTCGGACCTCCGTGCTCCTGGCTTTCGCCCTGCTCTGC178||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct116CCGGCTGTGATGGCTGCAGGCCCTCGGACCTCCGTGCTCCTGGCTTTCGCCCTGCTCTGC175Query179CTGCCCTGGACTCAGGAGGTGGGCGCCTTGGGAGCCATGCCCTTGTCCAGCCTATTTGCC238|||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct176CTGCCCTGGACTCAGGAGGTGGGCGCCTTCCCAGCCATGCCCTTGTCCAGCCTATTTGCC235Query239AACGCCGTGCTCCGGGCCCAGCACCTGCACCAACTGGCTGCCGACACCTACAAGGAGTTT298||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||Sbjct236AACGCCGTGCTCCGGGCCCAGCACCTGCACCAACTGGCTGCCGACACCTACAAGGAGTTT295Query421ADLLCLDQKNQNNSPSNDAAPATQQPSVILAEENKPRPLIISGTDSTHQTAHT--QLSNP478ADLCLDQKNNNSPSNDAAP+QQPSV+LEENKPRL+GT+STHQHTQLSNPSbjct421ADRLCLDQKNLNNSPSNDAAPDSQQPSVLLGEENKPRSLLTGGTESTHQAGHTQQQLSNP480Query479SSLANIDFYAQVSDITPAGSVVLSPGQKNKAGISQCDMHLEVVSPCPANFIMDNAYFCEA538SSLANIDFYAQVSDITPAGSVVLSPGQKNKAG+SQCDMHEVVSPCANFIMDNAYFCEASbjct481SSLANIDFYAQVSDITPAGSVVLSPGQKNKAGMSQCDMHPEVVSPCQANFIMDNAYFCEA540Query539DAKKCIAMAPHVEVESRVAPSFNQEDIYITTESLTTTAGRSGTAECAPSSEMPVPDYTSI598DAKKCIA+APHVEESPSFNQEDIYITTESLTTTAGRSGTAEPSSEMPVPDYTSISbjct541DAKKCIALAPHVEAESHAEPSFNQEDIYITTESLTTTAGRSGTAERVPSSEMPVPDYTSI600Query599HIVQSPQGLVLNATALPLPDKEFLSSCGYVSTDQLNKIMP638HIVQSPQGLVLNATALPLPDKEFLSSCGYVSTDQLNKIMPSbjct601HIVQSPQGLVLNATALPLPDKEFLSSCGYVSTDQLNKIMP640
本文标题:序列相似性搜索
链接地址:https://www.777doc.com/doc-5219005 .html