您好,欢迎访问三七文档
当前位置:首页 > 行业资料 > 其它行业文档 > 硕士论文-微生物种群DNA测序分析与分类
上海交通大学硕士学位论文微生物种群DNA测序分析与分类姓名:皮雄军申请学位级别:硕士专业:计算机软件与理论指导教师:张丽清20060101DNA,DNA8-10contigcontigDNADNADNABPDNADNABPK-MeansDNAABSTRACTBioinformaticsisaninterdisciplinewhichproposedseveralyearsagoanddevelopsfastrecently.Itwasproposedbecausemassivebiologicaldatacouldn’tbedealtwithtraditionalbiologicaldisciplines.Itemploysthetoolsofmathematics,computerscienceandbiologycomprehensivelyandincludesalltheaspectsofbiologyinformationsuchasacquiring,pro-cessing,storing,analyzingandexplainingthebiologicaldatatoclarifyandunderstandthebiologicalsignificancebeneaththedatawhichwillcertainlybeneficialtothedevelopmentofbiology.Microorganismcommunitysequencingisaresearchfieldthatdevelopsfastrecently.Byextractingthemicroorganismcommunitysamplefromtheenvironmentsuchasseaorwaterbeneathandanalyzingthegenomesofthemicroorganism,researchershavediscoveredmanynewgenes.Exploringtheinterconnectionbetweenthesegenesandtheireffectsontheenvironmentisofgreatsignificancetotheenvironmentprotectionandzoologysystem.Becausemicroorganismcommunitysequencingdiffersgreatlyfromtraditionalsequenc-ing,theproblemsinthecommunitysequencingcannotberesolvedbytraditionalmethods.Intraditionalsequencing,thecoveragerateishighandgenerally8-10sincehighprecisionisimportant.Incommunitysequencing,becauseofthegreatnumberofspeciesandhighcostofsequencing,notallthespeciescanbesequenced,thereisnooverlapbetweenmanycontigswhicharetheproductsofassembling.However,Clusteringthefragmentsfromthesamegenomeintoonegroupisofbiologicalsignificance.AnumberofstringandtextprocessingtechniqueshavebeenintroducedintoDNAse-quenceanalysissinceDNAsequencecanbeconsideredastextcomposedoffouralphabets.Untilnow,anumberoffeatureextractionmethodshavebeendevelopedandmanyarebasedonwordfrequencyvectorCombinedwiththeconcreteprobleminourproject,weuserel-ativeentropyasametricbetweenDNAsequencesInthispaper,wedevelopanovelmethodforblindDNAclustering.firstly,wecomputetheentropyvectorbetweenDNAsequences,thenBPneuralnetworkisappliedtogetthesimilaritybetweensequences,finallyweadoptaclusteringmethodbasedonK-Meanstoclusterthesequencesfromthesamemicroorganismintoonegroup.Experimentsshowthatouralgorithmperformswell.KEYWORDS:microorganismcommunity,neuralnetwork,clustering,relativeentropyVIIESTExpressedSequenceTagANNArtificialNeuralNetworkBPBackPropagatingRBFRadioBasedFunctionPNNProbalisticNeuralNetworkSVMSupportVectorMachineLMSLeastMeanSquareLSLeastSquareRLSRecurrentLeastSquareOLSOrthogonalLeastSquareSLTStatisticalLearningTheoryERMExperienceRiskMinimizationLOOCVLeaveOneOutCrossValidationLKOCVKLeaveKOutCrossValidationFNNForwardNeuralNetworkKNNKKNearestNeighboringPDF(ProbabilityDensityFunction)SOMSelf-OrganizedMapsKLDK-LKullback-LeiblerDivergenceNCBINationalCenterforBiotechnologyInformationAAdenineGGuanineCcytosineTThymineDNADeoxyribonucleicAcidBLASTBasicLocalAlignmentSearchTool2006110200611420061141.1:1990502003425GenBankDNA70DNADNAcDNAEST5005110181.22DNA[1]10DNA18005:1121778250001.2.,,(coverage)(contigread)(gap)DNA1.31.3.1DNA30DNA500-8001KB3GBDNA[7,8,10,15,17,21,25,26]DNA3C=N¤r=G,CNrGP=1¡(1¡r=G)N(1.1)CP1.3.220kb(contig)contigcontig1.4DNADNA61DNADNA2DNA43SVMBPRBFPNNSVM4:K-Means,SOMCUREK-MeansSOM5DNABPK-Means6DNADNADNADNADNAAGCTDNADNA2.1DNA44:G–(guanine)C–(cytosine)A–(adenine),T–(thymine)DNA20DNA:DNADNA2.2DNA2.2.1DNA2.2.1.1(1;0;0;0)(0;0;1;0)DNA¯[28]4¤W(WDNA62.2.1.2[3,9]A,G,C,TA=1+jT=1¡jC=¡1¡jG=¡1+j.AG,CTxDNA2.2.2DNA[13,18,49]DNAA,G,C,TSDNAXXSDNA:1.2.wwXS(2.1)abab2.2.3DNADNA[2]DNA:DNADNADNADNA[51]DNADNADNA7abbSabSSabSS2.1abab2.2.3.1DNAnDNAA;G;C;T[4]LLLnWLLWL=wL;1;wL;2;:::wL;KK=4LL1L¡n+1CXL=(cXL;1;cXL;2;:::;cXL;K)(2.1)pXL=CXL=(n¡L+1)(2.2)X=ATATCn=6,8W3=fATA;TAT;TAC;AAA;:::g(2.3)CX3=f2;1;1;0g(2.4)PX3=f0:5;0:25;0:25;0;:::g(2.5)2.2.3.2DNAd(X;Y)1)d(X;Y)=0d(X;Y)=0X=Y2)d(X;Y)=d(Y;X)3)d(X;Y)+d(Y;Z)=d(X;Z)2.2.3.31986BlaisdellDNABLAST[24]dEL(X;Y)=(cXL¡cYL)T¤(cXL¡cYL)=KXi=1(cXL;i¡cYL;i)2(2.6)2.2.3.4MahalanobisDNA[24,2]dEL(X;Y)=(cXL¡cYL)T¤S¡1(cXL¡cYL)(2.7)SLDNA92.2.3.5[13,9]dcosL(X;Y)=µXY(2.8)cos(µXY=(C(LX))T£CYLkC(LX)k£kCYLk(2.9)XXX02.2.3.6KOLMOGOROVDNAGenCompressKOLMOGOROVdKC(X;Y)=1¡K(X)¡K(XjY)K(XY)(2.10)[6]KOLMOGOROVDNA2.2.3.7[14]XY(K-L,KLD)KL(XjY)=4LXi=1pXL;ilogpXL;i=pYL;i(2.11)KL(XjY)0KL(XjY)=0K=Y[24],K-LDNADNAK-LK-LK-L10DNADNABPRBFPNN(SVM)3.1(),,,,[23],–3.1.1(ArtificialNeuralNetwork,ANN)(3.1)3.1Xi(i=123...n)wibf(:)X1fw1X2yw2Xnwnb3.112-61(a)-61-1(b)sign3.2I=nXi=1wiXi¡µ(3.1)y=f(I)(3.2)f(x)S:i.y01f(x)3-2(a)f(x)=(1x¸00x0(3.3)y11f(x)3-2(b)sgnsgn(x)=(1x¸00x0(3.4)ii.S(0;1)(¡1;1)S(Sigmoid)3.3(3.3)(3.5)f(x)¯¯S¯1f(x)=11+exp(¡¯x)(3.5)13-x6f(x)=11+exp(¡¯x)¯=2¯=1¯=0:53.3S3.1.214InputLayer HiddenLayer OutputLayer 3.4BP1234563.1.3BPBPBPDNABP(3.4)BP3BP:S:15I,HJPMJPMBPBPmBPm10:out=[0;0;:::1;:::0;0]T(3.6)m1BP²step;S.(¡1;1)O1,O2,...,OJ(J)TOjjOj¡1jT(3.7)OjjOj¡1j·T(3.8)16jOj¡1jIjOj1,0;OjjOj¡1jjOj¡1j·TIjjOj¡1jTT(5),j0j¡1j·TTTTBP:(1)(0;1);(2);(3);(4);(5);(3)BP:(1);(2)3.1.3.1BPBP(BackPropagation)BP::;;17!!!BP±(1)Ak=(ak1;ak2;:::;akn)(k=1;2;:::m)(3.9)mn;Yk=(yk1;yk2;:::;ykq)(3.10)q;M-Pskj=nXi=1!tijaki¡µj(j=1;2;:::p)(3.11)!tij;µj;pskjSbkj=f(skj)=11+e¡skj(j=
本文标题:硕士论文-微生物种群DNA测序分析与分类
链接地址:https://www.777doc.com/doc-295833 .html