您好,欢迎访问三七文档
Probabilitiesofspuriousconnectionsingenenetworks:ApplicationtoexpressiontimeseriesDavidR.Bickel*OfficeofBiostatisticsandBioinformaticsMedicalCollegeofGeorgiaAugusta,GA30912-4900Motivation:Thereconstructionofgenenetworksfromgeneexpressionmicroar-raysisgainingpopularityasmethodsimproveandasmoredatabecomeavailable.Thereliabilityofsuchnetworkscouldbejudgedbytheprobabilitythataconnec-tionbetweengenesisspurious,resultingfromchancefluctuationsratherthanfromatruebiologicalrelationship.Results:Unlikethefalsediscoveryrateandpositivefalsediscoveryrate,thedecisivefalsediscoveryrate(dFDR)isexactlyequaltoaconditionalprobabilitywithoutassumingindependenceortherandomnessofhypothesistruthvalues.Thispropertyisusefulnotonlyinthecommonapplicationtothedetectionofdifferen-tialgeneexpression,butalsoindeterminingtheprobabilityofaspuriousconnec-tioninareconstructedgenenetwork.EstimatorsofthedFDRcanestimateeachofthreeprobabilities:1.Theprobabilitythattwogenesthatappeartobeassociatedwitheachotherlacksuchassociation.2.Theprobabilitythatatimeorderingobservedfortwoassociatedgenesismislead-ing.3.Theprobabilitythatatimeorderingobservedfortwogenesismisleading,eitherbecausetheyarenotassociatedorbecausetheyareassociatedwithoutalagintime.Thefirstprobabilityappliestobothstaticanddynamicgenenetworks,andtheothertwoonlyapplytodynamicgenenetworks.Availability:Cross-platformsoftwarefornetworkreconstruction,probabilityestimation,andplottingisfreefrom://@prueba.infoD.R.BickelPDFprocessedwithCutePDFevaluationedition*AddressafterMarch31,2004:Pioneer,BioinformaticsandDiscoveryResearch,7250NW62ndAve.,P.O.Box552,Johnston,IA50131-05522D.R.Bickel1IntroductionVariationsofthefalsediscoveryrateofBenjaminiandHochberg(1995)havebeensuccessfullyappliedtotheproblemofdetectingdifferentialgeneexpressionbetweentwoormoregroupsonthebasisofmicroarraydata(Efronetal.2001;EfronandTibshirani2002;Mülleretal.2002;Pepeetal.2003;Storey2002,2003;Bickel2004a,b).Inthiscontext,adiscoveryofdifferentialexpressionistherejectionofthenullhypothesisthatageneisnotdifferentiallyexpressed,andsuchadiscoveryisfalseiftherereallyisnodifferentialexpres-sion,i.e.,iftherejectednullhypothesisistrue.Conversely,anondiscoveryofdifferentialexpressionisthenonrejectionofanullhypothesis,soanondiscoveryisfalseifthereisdiffer-entialexpression,i.e.,ifthenonrejectednullhypothesisisfalse.Falsediscoveryratemethodsnotonlytendtohavemuchlowerfalsenondiscoveryratesthanmethodsofcontrollingfamily-wiseerrorrates,buttheyhaveasimplerinterpretation:dependingonwhichmathematicaldefinitionofthefalsediscoveryrateisused,itiseitherexactlyequaltoaprobabilityundergeneralconditions(EfronandTibshirani2002;Fernandoetal.2004),oritisapproximatelyequaltoaprobabilityundermorerestrictiveconditions(Efronetal.2001;Storey2002,2003).Theprobabilityinquestionistheprobabilitythatageneconsidereddifferentiallyexpressedisnotreallydifferentiallyexpressed.Moregenerally,itistheprobabilitythatadiscoveryisfalse,which,bydefinition,istheprobabilitythatarejectednullhypothesisistrue.Thisprop-ertyofthefalsediscoveryratewillbeusedtoanswerthequestion,Whatistheprobabilitythatagivenrelationshipinagenenetworkreconstructedfromgeneexpressiondataisspuri-ous?Thus,itwillbeseenthatthemethodologyoffalsediscoveryratesnotonlyaidsinthedetectionofdifferentialexpression,butalsoinanotherimportantuseofmicroarraytechnol-3D.R.Bickelogy,thatofreverse-engineeringregulatorynetworksofgenes.Manymodelsofgenenetworkshavebeenappliedtotheanalysisofmicroarraydata(DeJong2002),andthefollowingfalsediscoveryratemethodsaresuitableforallgenenet-worksthatspecifydefiniterelationshipsbetweengenes,includinganynetworkthatcanberepresentedasanundirectedordirectedgraph.Thosemethodswillbeillustratedwithanet-workthatcanbeseenasadynamicgeneralizationofnetworksconstructedbyconsideringtwogenestobeconnectediftheabsolutevalueofthecorrelationcoefficientbetweentheirexpres-sionvaluesissufficientlyhigh.Thereareseveralvariationsofsuchnetworks(Butteetal..2000;Rho,Jeong,andKahng2003),allofwhichfallinthegeneralclassofspatialnetworks(Herrmann,Berthélemy,andProvero2003),anexampleofatypeofnetworkthatcouldbene-fitfromtheproposedmethods.Letx1tandx2trepresenttheexpressionvaluesofthefirstandsecondgeneattimet,respectively.(Therearemanypossibledefinitionsofexpressionvalue;somepossibilitiesaregivenbelow,buthereitisjustassumedthattheexpressionvaluetendstoincreasewiththeamountofmRNA.)Withoutlossofgenerality,definethecoexpressionvaluetobetheabso-lutevalueofthelinearcorrelationcoefficient.Then,ifx1tandx2tareweaklystationaryt,thecoexpressionvaluebetweenthetwogenesatalagtimeofis(1)r1,2rx10,x2rx1,x20.Ifr1,20,thenthetwogenesaresaidtobecoregulated.(Alternately,theyarecoregu-latedonlyifr1,2isgreaterthanorequaltotheminimumcoregulationthatisbiologicallymeaningfulinthesenseofBickel(2004a).)Letoptimalbethetimelagatwhichthecoregu
本文标题:Probabilities of spurious connections in gene netw
链接地址:https://www.777doc.com/doc-4855112 .html