您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 其它文档 > lecture-11(宾夕法尼亚大学二代测序数据分析教程)
2013%&%BMMB%597D:%Analyzing%Next%Generaon%Sequencing%Data%%%Week%6,%Lecture%11%István'Albert''Biochemistry%and%Molecular%Biology%%and%Bioinformacs%Consulng%Center%%Penn%State%Tool%installaon%• A%bioinformacian’s%job%requires%them%to%evaluate%and%install%a%large%number%of%tools%• The%ease%of%installaon%usually%correlates%with%the%quality%of%tool%• Documentaon%is%essenal%!%otherwise%it%is%no%more%than%a%black%box%Package%manager%for%the%MAC:%homebrew%It%allows%you%to%install%some%libraries%and%tools%that%will%be%required%later.%Linux%already%has%package%managers%apt/get,%yum%etc.%Steps%to%installing%tools%Determine%the%distribuon%type%%1. Executable''(binary)'code.%%Download%the%code%and%you%are%done.%Easy%to%install%!%may%not%be%opmized%to%your%system%2. Source'(text)'code.%%Download%the%code%and%compile'it%(see%next%slides)%%Determine%the%type%of%the%source%code%1. Source%is%of%a%compiled'language'that%will%be%turned%into%a%binary%program%(typically%C%but%could%be%others)%2. Source%is%of%an%interpreted'language%that%will%run%the%code:%java,%perl,%python,%ruby%%%Check%list%for%source%code%%that%needs%compilaon%1.%Does%it%have%a%configure%script?%If%yes%then%run%it%%./configure, 2.%Now%run%make%%make,%Ideally%you%should%be%done.%%This%will%create%the%binary.%%%(The%program%may%need%library%dependencies.%Then%those%need%to%be%installed%as%above)%%%Checklist%for%interpreted%languages%1. You%need%to%have%the%language%installed.%Most%modern%computers%have%perl,%python,%java%installed%by%default.%2. The%source%code%may%have%“dependencies”%–%a%much%dreaded%word%could%lead%to%a%lengthy%procedure%of%downloading%other%code%that%in%turn%may%depend%on%other%and%other%etc…%Automated%installaon%• Language%specific%–%will%require%installing%a%language%specific%package%manager%%• Python%has%easy_install%and%pip,'Perl%has%MCPAN,'ruby%has'gem''%easy_install,install,package6name,,or,,,pip,install,package6name,,Installing%good%tools%is%very%easy%–%not%so%good%ones%are%mini%puzzles%–%badly%designed%tools%are%incredibly%frustrang%Quality%Control%and%Filtering%• Removing%or%altering%the%data%based%on%objecve%measures%• Isn’t%that%data%massaging?%%• Good%queson%–%%one%needs%to%be%very%careful%not%to%bias%data%Understanding%sequencing%• Library%prep%has%many%steps%• Sequencing%may%introduce%arfacts%• Always%try%to%understand%what%the%instrument%does%and%may%happen%when%things%are%not%opmal%%• See%Short'Guide'to'Illumina'sequencing%on%the%webpage%Random'DNA'fragment'%sequencing%with%Illumina'One%read%v%Fragmentaon%v%v%Sequencer%Adapter%Ligaon%%v%FastQC%report%shows%biases%Our%job%is%to%fix%this%and%we%need%to%install%tools%for%that%Quality%control%operaons%• Modify%the%FASTQ%records%to%remove%data%that%was%labeled%as%being%inaccurate%%Typical%operaons%are%to%%• remove%(discard)%reads%%&%careful'with'this!'• shorten%reads%(trimming)%by%quality%or%by%removing%paferns%%Fastq%Quality%Shootout%Biostar%Queson%of%the%Day%Tool%List%• Seqtk%–%fastest%tool%• Cutadapt%–%adaptor%cugng%• NGS'Tookit%–%perl,%has%good%manual%• TrimmomaQc%–%java,%somewhat%obscure%usage%• Prinseq%–%beaful%manual%and%website,%appears%to%be%slow%%• Biopieces%–%its%is%not%a%tool%it%is%more%of%a%life&style.%Lots%of%installaon%steps.%Installaon%One%me%tasks%(see%code%repository):%%• Mac:%install%Homebrew%%• Using%Hombrew%install%git%%• Using%easy_install%install%pip'%Use%git%and%pip%to%install%tools%in%the%future%%• clone%the%seqtk%repository%with%git%and%make%it,%link%to%~/bin'• pip%install%cutadapt,%does%not%need%to%be%linked%Other%tools%• See%the%Shootout'–%serves%as%supplementary%informaon%• Quality%control%ohen%goes%way%beyond%read%manipulaon%and%can%be%thought%as%a%pre&analysis%–%at%that%point%it%should%not%be%called%QC%though.%• Some%tools%may%have%parcular%features%that%directly%apply%to%your%research%Homework%11%%%• Install%cutadapt%and%seqtk%%• Use%data%sample1.fq'and%sample2.fq'distributed%with%Lecture'10%%• Remove%adapters%with%cutadapt'and/or%trim%your%sequences%with%seqtk'(you%may%use%other%tools%as%well)%• Run%FASTQC%on%the%cut/trimmed%data.%Select%a%plot%from%each%report%and%explain%the%differences%that%you%see.%
本文标题:lecture-11(宾夕法尼亚大学二代测序数据分析教程)
链接地址:https://www.777doc.com/doc-6269161 .html