您好,欢迎访问三七文档
SensitiveandaccuratepeptideidentificationwithMascotPercolatorMarkusBroschmb8@sanger.ac.ukWednesday,10June2009L.Käll,J.D.Storey,M.J.MacCoss,W.S.Noble,JProteomeRes7,29(2008).L.Käll,J.D.Storey,M.J.MacCoss,W.S.Noble,JProteomeRes7,40(2008).M.Brosch,J.Choudhary,inScoringandvalidationoftandemMSpeptideidentificationmethods,Eds.(HumanaPress,2009).02040600e+004e+058e+05Densityestimateofdata−100−100TotalPSMsIncorrectPSMsCorrectPSMsTerminology:FPR,FDR,PEPFrequencyMascotscoreWednesday,10June200902040600e+004e+058e+05Densityestimateofdata!100!100TotalPSMsIncorrectPSMsCorrectPSMsB'BAScoreFrequencybaFPR=B/(B'+B)FDR=B/A=(!PEP)/Ai=1AiPEP=b/aTerminology:FDR&PEPL.Käll,J.D.Storey,M.J.MacCoss,W.S.Noble,JProteomeRes7,29(2008).L.Käll,J.D.Storey,M.J.MacCoss,W.S.Noble,JProteomeRes7,40(2008).M.Brosch,J.Choudhary,inScoringandvalidationoftandemMSpeptideidentificationmethods,Eds.(HumanaPress,2009).Wednesday,10June2009Terminology:FDRvsq-valueJ.D.Storey,R.Tibshirani,ProcNatlAcadSciUSA100,9440(2003).Figure:L.Käll,J.D.Storey,M.J.MacCoss,W.S.Noble,JProteomeRes7,29(2008).(i.e,manyofthePSMsarecorrect),theacceptedmethodformultipletestingcorrectionistoestimatethefalsediscoveryrate(FDR).10,11StoreyandTibshirani12provideadescriptionofFDRmethodsthatisaccessibletononstatisticiansandthatincludesmorerecentdevelopments.Inourcase,theFDRassociatedwithaparticularscorethresholdisdefinedastheexpectedpercentageofacceptedPSMsthatareincorrect,wherean“acceptedPSM”isonethatscoresabovethethreshold(Manyproteomicspapersincorrectlyrefertothisquantityasthe“falsepositiverate.”)However,otherscientificfieldsdefinethefalsepositiverateasthefractionoftruenullteststhatarecalledsignificant,13–17whereasthefalsediscoveryrateisdefinedasthefractionoftruenulltestsamongallofthosethatarecalledsignificant).Forexample,atanFDRof1%,ifweaccept500PSMs,thenweexpectfiveofthosematchestobeincorrect.ThesimplestwaytocalculatetheFDRisanalogoustothecalculationofp-values,above.Foragivenscorethreshold,wecountthenumberofdecoyPSMsabovethethresholdandthenumberoftargetPSMsabovethethreshold.WecannowestimatetheFDRbysimplycomputingtheratioofthesetwovalues.Forexample,atascorethresholdof3.0,weobserve3849acceptedtargetPSMsand219accepteddecoyPSMs,yieldinganestimatedFDRof5.7%.Figure4plotsthenumberofacceptedPSMsasafunctionoftheestimatedFDR,andtheserieslabeled“SimpleFDR”wascomputedusingtheratioofaccepteddecoysversusacceptedtargets.EstimatingthePercentageofIncorrectTargetPSMsAslightlymoresophisticatedmethodforcalculatingtheFDRtakesintoaccounttheobservationthat,whereasalldecoyPSMsareincorrectbyconstruction,notalltargetPSMsarecorrect.Ideally,thepresenceoftheseincorrecttargetPSMsshouldbefactoredintotheFDRcalculation.Forexample,supposethatamong10000targetPSMs,8000areincorrectand2000arecorrect.Wewouldliketoknowthe8000quantitysothatwecanadjustourFDRestimates.Figure2showsthatthedistributionsofscoresassignedtotargetanddecoyPSMsaresimilar,exceptthatthetargetPSMscoredistributionhasaheaviertailtotheright.ThistailarisesbecausethesetoftargetPSMsiscomprisedofamixtureofcorrectandincorrectPSMs.Figure5showssimulateddistribu-tionsthatillustratetheunderlyingphenomenon.Forthissimulation,weassumethatourPSMscorefunctionfollowsanormaldistribution,andwesetthestandarddeviationto0.7(Theassumptionofnormalityisforthepurposesofillustrationonly;themethodswedescribeheredonotrequireanyparticularformofdistribution,nordoweassumethatXCorrisnormallydistributed).ForincorrectPSMs,wesetthemeanofthedistributionto1.0,andforcorrectPSMs,wechangethemeanto3.0.Oursimulateddatasetcontains10000decoyPSMs,8000incorrecttargetPSMs,and2000correcttargetPSMs.Thefigureshowstheresultingdecoyscoredistribution(blackline),thetargetscoredistribution(blueline),anditstwocomponentdistributions(dottedanddashedbluelines).Inthissimulateddataset,thepercentageofincorrecttargets(PIT)is80%.ThisPITisequivalenttotheratiooftheareaunderthedottedblueline(theincorrecttargetPSMs)totheareaunderthesolidblackline(thedecoyPSMs).ThePITisimportantbecauseitallowsustoreducetheestimatedFDRassociatedwithagivensetofacceptedtargetPSMs.Inoursimulation,ifweacceptXdecoyPSMswithscoresaboveacertainthreshold,thenweexpecttofind0.8XincorrecttargetPSMsabovethesametheshold.AmoreaccurateestimateoftheFDR,therefore,istomultiplythepreviousestimate—theFigure4.MappingfromthenumberofidentifiedPSMstotheestimatedfalsediscoveryrate.(A)ThefigureplotsthenumberofPSMsabovethethresholdasafunctionoftheestimatedfalsediscoveryrate.TwodifferentmethodsforcomputingtheFDRareplotted,withandwithoutanestimateofthepercentageofincorrecttargetPSMs(PIT).TheverticallinecorrespondstoanXCorrof3.0.(B)Azoomed-inversionofpanelA,withtheestimatedFDRshownasadottedlineandtheq-valueshownasasolidline.Figure5.SimulatedtargetanddecoyPSMscoredistributions.AssigningSignificancetoPeptidesperspectivesJournalofProteomeResearch•Vol.xxx,No.xx,XXXXCdecreasingscorethresholdnewTPsnewFPsWednesday,10June200920406080−50005001000MascotscoredistributionsMascotscoreFrequencyScorecutoffTargetDecoyTarget−DecoyTarget/DecoydatabasesearchingDensity0Target/Decoyconcept:R.E.Moore,M.K.Young,
本文标题:mascot
链接地址:https://www.777doc.com/doc-6294253 .html