您好,欢迎访问三七文档
生物信息学第二章序列查询和提交一、序列信息的查询1.Acessionnumbers2.获取序列的四种方法二、序列信息的提交一、序列信息的查询AccessionnumbersarelabelsforsequencesNCBIincludesdatabases(suchasGenBank)thatcontaininformationonDNA,RNA,orproteinsequences.Youmaywanttoacquireinformationbeginningwithaquerysuchasthenameofaproteinofinterest,ortherawnucleotidescomprisingaDNAsequenceofinterest.DNAsequencesandothermoleculardataaretaggedwithaccessionnumbersthatareusedtoidentifyasequenceorotherrecordrelevanttomoleculardata.Whatisanaccessionnumber?Anaccessionnumberislabelthatusedtoidentifyasequence.Itisastringoflettersand/ornumbersthatcorrespondstoamolecularsequence.Examples(allforretinol-bindingprotein,RBP4):X02775GenBankgenomicDNAsequenceNT_030059GenomiccontigRs7079946dbSNP(singlenucleotidepolymorphism)N91759.1Anexpressedsequencetag(1of170)NM_006744RefSeqDNAsequence(fromatranscript)NP_006735RefSeqproteinAAC02945GenBankproteinQ28369SwissProtprotein1KT7ProteinDataBankstructurerecordproteinDNARNARs7079946NP_006735FourwaystoaccessDNAandproteinsequences[1]EntrezGenewithRefSeq[2]UniGene[3]EuropeanBioinformaticsInstitute(EBI)andEnsembl(separatefromNCBI)[4]ExPASySequenceRetrievalSystem(separatefromNCBI)[1]EntrezGenewithRefSeqEntrezGeneisagreatstartingpoint:itcollectskeyinformationoneachgene/proteinfrommajordatabases.Itcoversallmajororganisms.RefSeqprovidesacurated,optimalaccessionnumberforeachDNA(NM_006744)orprotein(NP_006735)FourwaystoaccessDNAandproteinsequences://(topofpage)NotethatlinkstomanyotherRBP4databaseentriesareavailableEntrezGene(middleofpage)EntrezGene(bottomofpage)添加学术术语PreviewIndexNCBI’simportantRefSeqproject:bestrepresentativesequencesRefSeq(accessibleviathemainpageofNCBI)providesanexpertlycuratedaccessionnumberthatcorrespondstothemoststable,agreed-upon“reference”versionofasequence.RefSeqidentifiersincludethefollowingformats:CompletegenomeNC_######CompletechromosomeNC_######GenomiccontigNT_######mRNA(DNAformat)NM_######e.g.NM_006744ProteinNP_######e.g.NP_006735FourwaystoaccessDNAandproteinsequences[1]EntrezGenewithRefSeq[2]UniGene[3]EuropeanBioinformaticsInstitute(EBI)andEnsembl(separatefromNCBI)[4]ExPASySequenceRetrievalSystem(separatefromNCBI)DNARNAcomplementaryDNA(cDNA)proteinUniGeneUniGene:uniquegenesviaESTsUniGeneclusterscontainmanyexpressedsequencetags(ESTs),whichareDNAsequences(typically500basepairsinlength)correspondingtothemRNAfromanexpressedgene.ESTsaresequencedfromacomplementaryDNA(cDNA)library.UniGenedatacomefrommanycDNAlibraries.Thus,whenyoulookupageneinUniGeneyougetinformationonitsabundanceanditsregionaldistribution.(Human)ClustersizeNumberofclusters18,100238,2003-423,3005-812,0009-165,60017-323,700500-10001,0502000-40001008000-16,0001216,000-30,0002UniGene:uniquegenesviaESTsConclusion:UniGeneisausefultooltolookupinformationaboutexpressedgenes.UniGenedisplaysinformationabouttheabundanceofatranscript(expressedgene),aswellasitsregionaldistributionofexpression(e.g.brainvs.liver).FourwaystoaccessDNAandproteinsequences[1]EntrezGenewithRefSeq[2]UniGene[3]EuropeanBioinformaticsInstitute(EBI)andEnsembl(separatefromNCBI)[4]ExPASySequenceRetrievalSystem(separatefromNCBI)EnsembltoaccessproteinandDNAsequences[1]EntrezGenewithRefSeq[2]UniGene[3]EuropeanBioinformaticsInstitute(EBI)andEnsembl(separatefromNCBI)[4]ExPASySequenceRetrievalSystem(separatefromNCBI)ExPASytoaccessproteinandDNAsequences://点击SRS检索系统Exampleofhowtoaccesssequencedata:HIV-1polTherearemanypossibleapproaches.BeginatthemainpageofNCBI,andtypeanEntrezquery:hiv-1polSearchingforHIV-1pol:Followingthe“genome”linkyieldsamanageablethreeresultsExampleofhowtoaccesssequencedata:HIV-1polFortheEntrezquery:hiv-1polthereareabout40,000nucleotideorproteinrecords(and100,000recordsforasearchfor“hiv-1”),butthesecaneasilybereducedintwoeasysteps:--specifytheorganism,e.g.hiv-1[organism]--limittheoutputtoRefSeq!only1RefSeqover100,000nucleotideentriesforHIV-1二、序列信息的提交过去我们认为自己的命运决定于我们的星座,现在我们知道,在很大的程度上,我们的命运决定于我们的基因中。————詹姆斯.沃森
本文标题:02序列查询和提交
链接地址:https://www.777doc.com/doc-3150614 .html