您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 资本运营 > DNA序列拼接中deBruijn图结构的研究-王东阳
DNA。DNAEuler。EulerdeBruijndeBruijnk-merk-merk-1k-mer。k-merk-2deBruijn。k-merdeBruijn。TP309.2A2095-2163201102-0020-07StudyofDeBruijnGraphforDNASequenceAssemblyWANGDongyangRENShijunWANGYadongAbstract:DNAsequencingisoneofthemostbasicdirectionsofbioinformaticsresearch.However,mostgenomesarenotaone-timegain.SoDNAassemblytechniqueisusedtosplicethefragmentobtainedinexperiments.Recently,thefragmentsobtainedinexperimentsbecomeshorter.TheEulerPathalgorithmhasmoreadvantagestodealwiththeseshorterfragments,theconstructionofdeBruijngraphisakeystepoftheEulerPathalgorithm.k-1basepairoverlapisalwaysmadebetweentwok-mers.Butthestudyofthispaperfindsthatiflessthank-2basepairoverlapbetweentwok-mersismade,theconstructionofdeBruijngraphwillbechangedstrongly.Thispapermakesadetailedanalysisoftheseeffects,anddesignsanexperimenttoverifytheanalysis.Theresultoftheexperimentshowsthatthedislocationofk-merswillsignificantlyaffecttheconstructionofdeBruijngraph.Keywords:2011-07-221985-1962-、、1964-),,,,,,:。0、、、,,DNA,。,,DNA(basepair,bp),DNADNA。DNA:readF,DNAS,DNAFS。,:HamiltonEuler。HamiltonNP-,。EulerDNAdeBruijnEuler[1-4],Pevzner[5]SBH(SequencingbyHybridization)。deBruijn:(1)readF=邀f1,f2,…,fn妖,read(k-mer)。k-merk,read,read,k,,k,,k-merread,k-merdeBr-uijn。(2)k-meruv,uk-1vk-1,uv[6]。deBruijn,Euler。deBruijn,Fread,k-mer3,read:AGATACTACTTCAread(1)k-mer,1。DNAdeBruijnDNAEulerdeBruijnBioinformatics;GenomeSequencing;DNAAssembly;EulerPath;DeBruijnGraph150001(SchoolofComputerScienceandTechnology,HarbinInstituteofTechnology,Harbin150001,China)INTELLIGENTCOMPUTERANDAPPLICATIONS1220118Vol.1No.2Aug.2011AGAGATATATACACTCTTTCATTC1deBruijnReadAGATACTk-mer:AGAGATATATACACT1readk-merATCTTCTTCTTCTTCCTCCGCTTATTATTATTATTCTTCC2TipAAGAAGACGACTACTGCGACACTCTCGACTCG4repeatTCTTCTTATTATTATTATTCTTCGATCTTCTACTAATAATAATT3Bubble,read,(2)k-mer,1deBruijn。,Euler:EULER-SR[7]、Vel-vet[8]、[9]IDBA[10]。Euler,deBruijn,,deBruijn,k-mer,deBruijn,。deBruijn,k-mer,deBruijn,,。,k-merdeBruijn。1deBruijn,EulerdeBruijn,,。1.1(1)Tip[8]read,。read。,read:ATCTTCCGATCTTATTCCread,,k-mer4,deBruijn2。2,Euler,。(2)Bubble[8]read,。,readTip。read:ATCTTATTCGATCTAATTCGread,,k-mer4,deBruijn3。,Euler,。1.2(repeat)AAGACTCGACTG,k-mer4,deBruijn4。,,,,。Myers[11]mate-pair,mate-pairread,,()。,mate-pair,。2deBruijndeBruijn,·21·2DNAdeBruijnTCTCTCTGCTGTGTATATCT98deBruijnATTCATTCTCTCTCTGATCTGTATCTGT8deBruijnATTCTTCTTCTCCTCTTCTGCTGTATCTTATCGTATTGTA7deBruijnAAGAGACTCTCGCGACACTG62RepeatATCTCTTCTCCGCTTATATTTTCC52TipRead:AGATACTk-mer:AGAATAACT2k-merreadk-merdeBruijn,k-merk-merk-1,k-mer,,,k-mer,k-merk-2,deBruijn。,k-mer,2(read1),deBruijn,,deBruijn。read。k-mer4deBruijn,read,。(1)Tipread:ATCTTCCGATCTTATTCCdeBruijn5。,5,,read,,,。(2)BubbleBubbleread,BubbleTip,,。(3)Repeatread:AAGACTCGACTGdeBruijn6。,,,6Repeat,,,,。,,deBruijn,,TipBubble,Repeat。deBruijn。,,,k-mer,,,read:ATTCTCTGTCTGTATCTdeBruijn7。ATTCTCTGTATCT。,readdeBruijn8。readk-mer,。read,readk-merread,k-merread,k-mer,k-mer,8,k-merTCTGk-merCTGT,,,k-merTCTGk-merCTGT,k-mer,deBruijn:ATTCTCTGTATCT,。,,k-merk-1,k-1·22·TipBubbleRepeat10deBruijnmerk-2,,k-mer,,repeat,。3deBruijn,Repeat,,,:N:readL:read,readK:k-mer,i:,deBruijnrepeat,:,,,,,,Repeat,TipBubble,,repeat,f(i),f(i)repeat。f(i)f(i)。3.1deBruijn1,deBruijnreadk-mer,readL-K+1k-mer;2,k-merk-mer,readk-mer(L-K+1)2;,read,i1k-mer(i-1)(L-K+1)ik-mer,k-mer(L-K+1)i。Nreadk-merN(L-K+1)i,k-mer,N(L-K+1)i,,Nread。,:,。,an,,,an+1,,k-merk-mer,,14k,an14k,4k-an4k(k4k)。,,k-mer:an+1=an4kan+(4k-an4k)(an+1)(1):an+1=(1-14k)an+1(2)a1=1,an:an=4k+(1-4k)(1-14k)n-1(3)n。Nread,N(L-K+1)ik-mer,:aN(L-K+1)i=4k+(1-4k)(1-14k)[N(L-K+1)i-1](4)3.2deBruijn,read,k-mer,:(1)k-merreadk-mer,k-merk-i;(2)k-merreadk-mer,k-ireadk-merk-merk-i。k-mer,,Nreadk-mer()(,(4)aN(L-K+1)i),N(L-K+1)i-aN(L-K+1)i,:N(L-K+1)i-aN(L-K+1)i=N(L-K+1)i-[4k+(1-4k)(1-14k)[N(L-K+1)i-1]](5)N(L-K+1)i1,:·23·2DNAdeBruijnN(L-K+1)i-aN(L-K+1)i=N(L-K+1)i-[4k+(1-4k)(1-14k)N(L-K+1)i](6)readk-mer。,readk-merreadk-mer。,:k-mer,readk-merreadk-mer,,readk-mer。k-merk-merk-i,14k-i,,k-mer,,,(N-1)(L-K+1)i(),deBruijn,14k14k-i,readN(L-K+1)i(N-1)(L-K+1)i,readk-merreadk-mer:h(i)=4k-i+(1-4k-i)(1-14k-i)[(N-1)(L-K+1)i-1](7)(N-1)(L-K+1)i1,Nread,1,Nread:h(i)=N[4k-i+(1-4k-i)(1-14k-i)N(L-K+1)i](8),i,,:f(i)=N(L-K+1)i-[4k+(1-4k)(1-14k)N(L-K+1)i]+N[4k-i+(1-4k-i)(1-14k-i)N(L-K+1)i]i,f(i)。f(i),,,A=N(L-K+1),T=4k,T1,:f(i)=Ai-[T-T(T-1T)Ai]+N[T4-i+(1-T4-i)(1-4iT)Ai]g(i)=(1-T4-i)(1-4iT)Ai,g'(i)=(1-T4-i)(1-4iT)Ai[Tln44i-T+A4iln4(4i-T)i-Aln(1-4iT)i2],f(i)f'(i)=-Ai2-T(T-1T)Ailn(T-1T)Ai2-NT4-iln4+N(1-T4-i)(1-4iT)Ai[Tln44i-T+A4iln4(4i-T)i-Aln(1-4iT)i2]f(i),,f'(1)f'(k-1),f(i)。f'(1)=-A-(1-1T)AA-NTln44+N(1-T4)(1-4T)A[Tln44-T+4ln4A4-T-Aln(1-4T)]1T4,(1-T4)-T4,1-4T1,ln(1-4T)0,T4,4-T-T,:f'(1)=-A-(1-1T)AA-NTln44[1-(1-1T)A-AT]A=N(L-K+1),T=4k,Nread,Lread,Kk-mer,k10,TA,,1-(1-1T)A-AT>0,f'(1)<0。f(i)。f'(k-1)=-A(k-1)2-(1-1T)Ak-1A(k-1)2-4Nln4-3N(34)Ak-1[-43ln4-Aln43(k-1)-Aln34(k-1)2]f'(k-1)Ak。,,f(i),read,readk-mer。4:f(i):read,read,read,read,read;readL;k-merk。:i(0<i<k,),,f(i)。:forifrom1tok-1,。forto,,read,readik-mer,,read,k-mer。1·24·2E7300400GBWindowsVistaHomeBasic3ideBruijnfiread36,k-mer18read36,k-mer20119102
本文标题:DNA序列拼接中deBruijn图结构的研究-王东阳
链接地址:https://www.777doc.com/doc-5660068 .html