您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 其它文档 > 79lecture-19(宾夕法尼亚大学二代测序数据分析教程)
2013-BMMB597D:AnalyzingNextGenerationSequencingDataWeek10,Lecture19NickStolerTheHuckInstitutesoftheLifeSciencesPennStateSequencedatatogenotypes●AcommonsequencingworkflowSequencingreadsAlignmentsVariantcallsFASTQSAM/BAMVCFalistofshortsequencesalistofshortsequencesandwheretheyareinthegenomealistoflocationsinthegenomeandwhatthebaseisateachWhatarevariantcalls?●Naivevariantcalling-Checkallthereadsthatcoverbasechr1:291-Addupthebasesatchr1:291-e.g.10A's,2G's∙IsthisanA/Gheterozygoussiteortwosequencingerrors?●Actualvariantcallers-Estimatelikelihoodofavariantsitevsasequencingerror∙Sequencingerrorrate∙QualityscoresVCF:VariantCallFormat●Representalistoflocationsandthevariantcallateach-Simple,right?●Yesandno.-Simplefoundation∙Locationandbase-Complex“bonusfeatures”∙Indels,structuralvariants,etc.∙Multiplesamples∙HaplotypephasingVCF:Thesimplepart●location,referencebase,yourbase-CHROM/POS,REF,ALT-alotlikewgsim'smutations.txtVCF:TherestVCF:Thefullcolumnlist*****●Variantcallconfidence-likePhredscoreandMAPQ:Multiplevariants●Whatifyourreadshavemorethan1baseatonelocation?-wgsim'smutations.txt∙IUPACnotation●VCFjustgivescomma-separatedlists-REFALT-AA,C:Complexvariants●Canshowshortindels-CCT(insertT)-ACGA(deleteCG)VCF:Multiplesamples●VCFcanhaveavariablenumberofcolumns!●Columnheadingsarethesamplenames●VCFcanrepresentSNVcalls●andmuch,muchmore-Indels(GGC)-Multiplevariantspersite(inALTcolumn)-Multiplesamples(SAMPLEcolumns)●Checkposterforquickoverview-●Checkfullspecificationfordetails-●SamtoolsmpileupBCF-BCFistoVCFasBAMistoSAM∙(roughly)-TheBCFdoesn'tholdactualcalls∙encodeslikelihoodsforallvariants●BcftoolsviewVCF-Performstheactualvariantcalling-u:uncompressedoutput-D:includereaddepthinoutput-f:use../refs/sc.faasreference-v:onlyoutputnon-referencesites-c:doSNPcalling-g:callgenotypesatvariantsitesLiH.AstatisticalframeworkforSNPcalling,mutationdiscovery,associationmappingandpopulationgeneticalparameterestimationfromsequencingdata.Bioinformatics(2011)27(21):2987-2993.Morempileuptricks●CombinemultipleBAMfilesintooneBCF●OnlyincludeoneregionHomework19●Takeyourmutations.txtfilefromwgsim(orcreateanotherone)andcreateapartialVCFfilefromthefirst10lines(butskiponeswithindels)-Onlythelastheaderline(#CHROM)-Onlythefirst5columns-RefertoIUPACnucleicacidcodesfornon-ACGTbases∙meansitgeneratedreadswithbothAandTatthislocation●Usesamtools/bcftoolstocreateafullVCFfilefromthealignmentsyoucreatedintheprevioushomework
本文标题:79lecture-19(宾夕法尼亚大学二代测序数据分析教程)
链接地址:https://www.777doc.com/doc-6103602 .html