您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 其它文档 > lecture-12(宾夕法尼亚大学二代测序数据分析教程)
2013%&%BMMB%597D:%Analyzing%Next%Generaon%Sequencing%Data%%%Week%6,%Lecture%12%István'Albert''Biochemistry%and%Molecular%Biology%%and%Bioinformacs%Consulng%Center%%Penn%State%Install%bioawk%• bioawk%&%%a%version%of%awk%that%is%bioinformacs%aware%• Can%read%fastq%files%one%record%at%a%me%%• Can%reverse%complement%sequences,%understands%many%bioinformacs%file%formats%bioawk%examples%See%more%details%and%examples%on%the%course%webpage%Sequencing%technologies%• DNA%!%library%prep%%(fragments)%!%sequencing%• DNA%is%double%stranded%• Sequencing%operates%on%a%single%strand%thus%each%strand%will%be%sequenced%separately%Paired%end%sequencing%• More%informaon:%connect%reads%that%belong%to%the%original%fragment%• Nomenclature:%paired6end'and%mated6pairs'are%different%technologies%• The%technology%is%vendor%specific%with%quirks%and%tacit%assumpons%Paired%end%(PE)%sequencing%%(most%common)%reverse%strand%forward%strand%sequencing%direcon%sequencing%direcon%DNA%FRAGMENT%insert%size%We%end%up%with%two%reads%that%are%known%to%have%come%from%the%different%strands%of%the%same%DNA%fragment%–%insert%sizes%200&600bp%%Sequences%both%end%of%the%same%DNA%fragment%Paired%end%(PE)%sequencing%%short%fragments,%long%reads%reverse%strand%forward%strand%sequencing%direcon%sequencing%direcon%DNA%FRAGMENT%insert%size%We%end%up%with%two%reads%that%are%known%to%have%come%from%the%different%strands%of%the%same%DNA%fragment%–%insert%sizes%200&600bp%%Sequences%both%end%of%the%same%DNA%fragment%overlap%Read%merging/sching%Mated&pair%(MP)%sequencing%mated%pair%insert%sizes%!%2000%–%5000bp%long%%%(may%change%as%new%protocols%are%developed)%SOLiD%Mate&Pair%protocol%Same%strand%F3%R3%Dealing%with%paired%data%• Make%sure%to%understand%which%parts%of%the%DNA%fragments%have%been%sequenced.%%• Consult%your%sequencing%operator%for%details%on%the%library%preparaon.%• When%in%doubt%you%can%operate%in%single%end%mode,%then%visualize%the%results%(covered%in%later%lectures)%%• Verify%how%the%pairs%are%located%relave%to%one%another.%(sanity%check)%• Consult%vendor%materials%!%comprehensive%but%will%also%contain%a%lot%of%details%that%are%not%relevant%More%strategies%• Just%about%all%aligners%can%deal%with%standard%paired%end%(PE)%sequencing%data%%• A%few%can%deal%with%mate&pair%(MP)%and%their%variaons%!%see%Novoalign,'check'vendor%recommended%tools%• Finally%you%may%turn%the%pairs%into%standard%PE%by%reverse%complemenng%the%proper%reads.%Compeng%representaons%SE%–%single%end%reads,%%PE%–%paired%end%reads%%Paired%end%reads%come%in%either%%• two%files%with%the%exact%same%number%of%lines%and%IDs,%where%a%pair%is%present%on%the%same%line”%%• a%single%file%where%pairs%are%consecuve%%records%(interleaved)%The%read%order%is%now%also%essenal%Regardless%of%representaon%one%now%needs%to%ensure%that%the%order%of%reads%will%keep%matching%%Read%removal%needs%to%take%place%on%both%files%or%both%lines%if%the%file%is%interleaved.%Quick%%PE%checklist%• How%are%my%pairs%oriented?%• How%is%the%data%formaeed?%– are%the%reads%in%the%same%file%(interleaved?)%– are%the%reads%in%separate%files?%– what%is%the%naming%convenon?%– what%is%the%expected%insert%(fragment)%size%and%its%distribuon%(minimum,%maximum%insert%sizes)%%Summary:%paired%end%vs%mated%pairs%• Paired%ends%is%supported%by%some%technologies%where%it%is%possible%to%sequence%from%both%ends%of%a%clone.%%• Mate%pairs%involves%making%circular%fragments%using%a%linker%sequence,%and%fragmenng%them%around%the%linker,%and%then%sequencing%the%result%• The%distance%between%mate%pairs%are%much%longer%(2&5kb),%while%paired&end%fragments%are%rarely%more%than%500bp%apart%• The%technologies%keep%evolving%within%a%year%!%make%sure%to%ask%quesons%from%the%facility%managers!%oponal:%install%Trimmomac%• It%is%a%great%tool%to%deal%with%paired'end'reads'• Lacks%some%opons%that%cutadapt%has%• But%it%has%opons%cutadapt%does%not%directly%support%%Homework%12%• Download%dataset%lect12.tar.gz'• It%contains%a%paired%end%read%dataset%• Using%techniques%learned%in%the%past%two%lectures%idenfy%problems%that%the%data%may%have%and%improve%its%overall%quality%%
本文标题:lecture-12(宾夕法尼亚大学二代测序数据分析教程)
链接地址:https://www.777doc.com/doc-6445850 .html