您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 经营企划 > 基于连锁不平衡的标签SNP预测
华中科技大学硕士学位论文基于连锁不平衡的标签SNP预测姓名:方哲翔申请学位级别:硕士专业:生物信息技术指导教师:周艳红20071110I(singlenucleotidepolymorphism,SNP)90%SNPSNPSNPSNPtagSNPSNPSNPSNPSNPSNPSNPlinkagedisequilibrium,LDSNPtagSNPPRESNPSNPhaplotypeblockSNPSNP%genotypeMAF-PHWPvalSNPSNPtagSNPPRESNP—SNPdbSNPdbSNPSNPSNP;SNP;;;IIAbstractSinglenucleotidepolymorphism(SNP)isthemostcommontypeofgeneticvariantsinhumangenometakingupmorethan90%ofhumangeneticpolymorphisms.SNPsandhaplotypesinthehumangenomehavebeenwidelyusedintheidentificationofdisease-associatedgenesandthestudyofpharmacogenomics.RecentstudieshaverevealedthatmanySNPsarestronglycorrelated,whichmakesitfeasibletochooseasmallamountofSNPsthatcontainalmosttheentiregeneticmodelinformation.TheuseoftagSNPscangreatlyreducegenotypingexpenseandispromisingtoimprovetheeffectofassociationstudies.Therefore,howtoeffectivelypredictthetagSNPhasbecomeoneofthemostimportantstudiesinthefieldofBioinformatics,andastudyoftagSNPpredictionispresented.AcomputationalprogramtagSNPPREisdevelopedtopredicttagSNPsonthegenotypedataset.First,itpartitionsthehaplotypeblocksbasedonlinkagedisequilibriumofpairwiseSNPsandusesthegreedyexhaustivehybridapproachtoworkoutallthepotentialtagSNPsets.ThenitpredictsthebesttagSNPsetoutofpotentialtagSNPsetsusingthreestatisticalfeatures(%genotypeMAFandHWPval).TestingresultsonthewidelyusedgenotypedatasetdemonstratethattagSNPPREhasbetterpredictionaccuracy.InordertominenewfeaturesforthepredictionoftagSNPs,asecondarylocalSNPdatabaseispreliminarilyconstructed.TheSNPdatabaseisbuiltbasedontheinformationandbiologicaldataprovidedbythefamousdatabasedbSNP.Thedownloadedprimarydataisanalyzedandprocessed,andthenissuccessivelyshiftedintolocaldatabaseforfurtherresearch.ThepreliminaryconstructionofSNPdatabasehasfinished,andthenthesearchservicehasbeenprovided.Keywords:singlenucleotidepolymorphism;tagSNP;linkagedisequilibrium;haplotypeblock;haplotype111.1((11))(90608020)((22))(20050487037)(3)(505010)(4)(5)1.230DNA99.9%0.1%singlenucleotidepolymorphismSNPDNA90%SNP[1~7]SNP2SNPDNASNPSNPPharmacogenomicsSNPDNASNPSNP[8,9]SNPSNPSNPSNPSNPSNPSNPSNPSNPSNPSNPSNPSNPSNP[10~13][14,15]SNPSNPSNPSNPSNPSNPSNPSNPSNP3SNPSNPSNP1.31.3.1SNPSNPSNPSNP[16~25]SNPSNPSNPSNPSNPSNPSNP2r4SNPSNPSNPSNPSNP2rSNP1.3.2SNPSNPSNPSNPSNPSNPSNPSNPSNP33[26]ClarkClarkExcoffier-Hardy-WeinbergEquilibrium,H-W(ExpectationMaximizationEM)StephensMarkovChainMonteCarlo,MCMC5SSDStephens-Smith-Donnelly1.3.3SNPSNPPatil[1]SNPa%SNPSNPSNPJohnson[10]SNPSNPCardon[11]SNPSNPSNPproportionofdiversityexplained,PDESNPCarlson[14]ldselectSNPSNPSNPSNPZhang[12]SNP1.4SNPSNPSNP(%genotype)(MAF)Hardy-WeinbergP(HWPval)SNPSNPtagSNPPREDalySNPTagger[15]SNPSNPNCBIdbSNPSNP6SNP1.5SNPSNPSNPSNPSNPtagSNPPRESNPSNP72SNP2.1SNPSNPDNASNPSNPSNP2.2SNPSNPDNA90%AGallele2.14bpDNASNPSNPAG2.1SNPgenotype2.1SNPAAAGGGAAGGAGSNPSNPSNPgenotypingSNP81000SNPSNP1%SNPSNPhaplotype5%SNPSNPSNP2.2DNA6SNP6SNP3ACATGTACCGCTGTCGGASNP31SNP4SNPSNPATAGGG3SNPSNPSNPSNP2.2SNPSNP2.3[27~29]SNP[30]9411f12f21f22f11112fff+=+,(2.1)11121fff+=+,(2.2)22122fff+=+,(2.3)21222fff+=+,(2.4)111111221221Dfffffff++=−=−iii,(2.5)DD2rLewontinD′21212Drffff++++=iii.(2.6)10D01212min(,)DDffff++++′=ii.(2.7)D0,1122min(,)DDffff++++′=ii.(2.8)01D′12r2r12r2r21r2.42001Daly5q31500bp3~92kb2~490%SNPJeffreyssinglespermtypingMHCSNP[31~34]112.4.1Patil80%PatilSNPSNPSNPSNPSNPSNPPatil2.4.2GabrielSNPD′D′D′Wangfour-gametetest,FGTSNP4410SNP3SNP412[35]D′2.5SNPSNPSNP2rSNPSNP133SNP3.1SNPSNPSNPSNPSNPSNPSNPSNPSNPSNPSNPSNPSNPSNPSNPSNP%genotypeSNPSNP3.2GMNSNPs1,s2,sNSNPr2(02r(si,sj)10SNP1SNPsisj)2SNPSNP2r2rr02raiajr0r0=0.50.8aiajajaiajSNPai14SNPSNPTG2SNPPQPSNPSNPQSNPSNPQSNPTSNPPamCam={a:aQ2ra,amr0}QamSNP|Cam|CamPSNPTP=G\TP=QSNPr2SNPSNPSNP1SNPSNPSNP[36]SNPSNP(r2(si,sj)0)SNPSNPBreadthFirstSearch(BFS)3.3SNPSNPSNPSNPSNPSNPSNPSNP1SNPSNPSNPSNP(r2(si,sj)0.80.8)SNPN(si)GN(si)SNPSNP15SNPSNPSNPSNPNUMSNPGSNPSNPSNPSNPSNPSNP1NUMSNP3.3.11TP=Q=G2Pam|Cam|3|Cam|=0amTQ4P|Cam|SNPamaxamaxTQSNP52-4Q3.3.2BreadthFirstSearchSNPGiG=GiGiGjGiKi=1KiSNPPiQiGiKi=Ki+1SNPSNP{Tijj=1Ji}JiSNPSNPSNPGSNPKiJiKiKiKi2SNPKi2Ki-1SNPKSNPKi=Ki-1SNP16NUM3.4SNP%genotypeMAFHWPvalSNPSNP%genotypeSNP0%genotype1%genotypeSNPSNP%genotypeSNP75%MAFSNP5%MAF5%SNPSNPHWPvalHardy-WeinbergPSNPH-WSNP0.01HWPval0.01SNPSNPSNPSNP3.5SNPtagSNPPRESNPSNPtagSNPPRE3.1SNPLDA[37,38]173.1SNPtagSNPPRE3.6SNPtagSNPPRESNPtagSNPPRESNPSNPSNP%genotypeSNPSNP184SNP4.1SNPtagSNPPREtagSNPPRESNP%MinNum%HaploRecomSNPSNP4.2SNPDaly[39,40]SNP(HapBlockSTAMPA)SNP5q31500kbpDNA103SNP387129(trio)4.3tagSNPPRESNP%MinNum%HaploRecom[41~43]%MinNumSNPSNPSNP%MinNum%HaploRecomSNPSNPSNPSNPSNPSNP%HaploRecomGevalt[44]%HaploRecomtagSNPPREHaploview[45]SNPTagger4.14.219TaggertagSNPPRESNP4.14.1TaggertagSNPPRESNP2322TaggertagSNPPRE164TaggertagSNPPRE4.14.2tagSNPPRETaggertagSNPPRESNP4.1tagSNPPRETaggerTaggertagSNPPRESNP103103SNP23221644.1Haploview4.2tagSNPPRESNP181825101437162446126103SNPSNPSNP204.3tagSNPPRETaggertagSNPTaggertagSNPPRE%MinNum12.50%12.50%1%HaploRecom%MinNum20.00%20.00%2%HaploRecom%MinNum14.28%14.28%3%HaploRecom%MinNum18.64%18.64%4%HaploRecom87.21%90.79%%MinNum17.72%17.72%%HaploRecom90.39%91.58%GevaltTa
本文标题:基于连锁不平衡的标签SNP预测
链接地址:https://www.777doc.com/doc-629113 .html