您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 其它文档 > lecture-6(宾夕法尼亚大学二代测序数据分析教程)
2013%&%BMMB%597D:%Analyzing%Next%Generaon%Sequencing%Data%%%Week%3,%Lecture%6%István'Albert''Bioinformacs%Consulng%Center%%Penn%State%Setup:%facilitates%later%work%1. Create%a%folder/directory%for%source%code%and%one%for%shortcuts%to%binary%programs.%%%2. Put%everything%that%you%download%into%src%and%we%will%link%to%binaries%via%the%bin%folder%%3. Oponal:%add%the%bin%folder%to%the%PATH%environment%variable.:%export'PATH=$PATH:~/bin%%BLAST:%Basic%Local'Alignment%Search%Tool%It%is%a%heurisc%search:%may%not%find%all%matches%(although%misses%are%rare)%but%it%is%very%fast%and%efficient%at%searching%for%alignments%in%large%amounts%of%data%We’ll%demonstrate%the%setup%with%the%%BLAST+%command%line%Download%BLAST+%%There%are%two%versions:%blast+%(new%blast)%and%legacy'blast%(old%blast)%Tutorials%don’t%always%label%them%properly.%Terminology%• New%blast+%!%uses%programs%such%as%makeblastdb,%blastn,%blastp%and%has%search%tasks%such%as%megablast'• Old%blast%!%uses%programs%such%as%formatdb,%blastall,%megablast'and%has%search%strategies%such%as'blastn,'blastp''(If%you%find%the%above%confusing%you%are%not%alone)%Get%the%binary%for%your%system%Download%and%unpack%–%then%move%the%enre%blast+%directory%to%the%src%folder%you’ve%set%up%before.%The%list%of%blast+%tools%Link%programs%into%the%bin%folder%ln%–s%!%creates%“symbolic%links”%(shortcuts)%%Make%sure%to%fully%specify%both%paths%source%!%desnaon%%You%can%now%run%the%tool%as%source%desnaon%Easier%to%run%the%tools%you%need%Every%command%linked%into%you%bin%folder%now%can%be%run%via%~/bin/command'You%will%need%to%link%the%rest%of%the%commands%into%the%bin'folder%as%needed.%Alternavely%alter%the%PATH%variable:%export'PATH=$PATH:/~src/ncbiDblastD2.2.28+/bin/%Running%blast+'• We%need%a%target%database%that%we%want%to%search%(we%can%download%target%databases%or%make%one%from%exisng%data)%• We%need%a%query%that%we%will%search%with%• We%need%a%search%strategy%!%what%is%it%that%we%are%looking%for%%Prepare%a%target%database%• Extract%the%sequence%part%from%the%file%in%lecture%2%This%will%extract%the%fasta%sequence%%from%the%gff%file%Here%is%our%sequence%file%%in%FASTA%format%Formafng%our%target%database%Keep%reference%genomes%in%separate%folders.%%Tools%usually%create%a%large%number%of%addional%files%that%would%cluher%your%workspace%%Link%the%makeblastdb'program%into%the%bin%folder%%Create%the%blast%searchable%database%Indexing/creang%a%database%!%reformafng%the%data%to%support%fast%searching%Create%a%small%dataset%and%run%%the%blast%search%on%it%%There%are%different%search%strategies.%The%one%that%we%use%is%searching%nucleodes%%against%nucleodes%blastn.%(we’ll%cover%other%opons%in%the%next%lectures)%%Extract%the%first%three%lines%from%the%yeast%genome%and%make%that%into%a%query%%Should%work%well%–%aier%all%we%are%searching%the%big%data%with%a%small%part%of%itself%The%blast%default%blast%results%file%%is%opmized%for%understanding%the%alignments%We%will%learn%more%about%how%to%read%a%blast%output%in%the%next%lectures%Default%Blast%results%%lower%case%=%low%complexity%You%can%change%the%output:%tab%delimited%Homework%6%• With%a%method%of%your%choice%create%a%fasta%file%that%contains%three%(non&idencal)%records.%The%sequences%for%each%the%records%should%be%at%least%80%bases%long.%%• Use%this%file%as%query%and%use%blastn%to%align%against%the%yeast%genome.%Report:%%1.%how%many%alignments%do%you%get%2.%how%many%chromosomes%have%had%hits%%3.%what%was%the%lowest%percent%identy%that%was%sll%reported%%(if%you%get%no%alignments%you%need%to%create%different%query%sequences)%Tip:%using%the%%characters%to%redirect%(instead%of%)%will%append%to%a%file%
本文标题:lecture-6(宾夕法尼亚大学二代测序数据分析教程)
链接地址:https://www.777doc.com/doc-6445853 .html