您好,欢迎访问三七文档
1ARACNE:AnAlgorithmfortheReconstructionofGeneRegulatoryNetworksinaMammalianCellularContextAdamA.Margolin1,2,IlyaNemenman2,KatiaBasso3,ChrisWiggins2,4,GustavoStolovitzky5,RiccardoDallaFavera3,AndreaCalifano1,2,*1DepartmentofBiomedicalInformatics,2JointCentersforSystemsBiology,3InstituteforCancerGenetics,4DepartmentofAppliedPhysicsandAppliedMathematics,ColumbiaUniversity,NewYork,NY100325IBMT.J.WatsonResearchCenter,YorktownHeights,N.Y.10598*Correspondingauthor:1130St.NicholasAvenueRoom910,NewYork,NY10032.Emailaddresses:AAM:adam@dbmi.columbia.edu,IN:ilya.nemenman@columbia.edu,KB:kb451@columbia.edu,CW:chw2@columbia.edu,GS:gustavo@us.ibm.com,RDF:rd10@columbia.edu,AC:califano@c2b2.columbia.edu2AbstractBackgroundElucidatinggeneregulatorynetworksiscrucialforunderstandingnormalcellphysiologyandcomplexpathologicphenotypes.Existingcomputationalmethodsforthegenome-widereverseengineeringofsuchnetworkshavebeensuccessfulonlyforlowereukaryoteswithsimplegenomes.HerewepresentARACNE,anovelalgorithm,usingmicroarrayexpressionprofiles,specificallydesignedtoscaleuptothecomplexityofregulatorynetworksinmammaliancells,yetgeneralenoughtoaddressawiderrangeofnetworkdeconvolutionproblems.Thismethodusesaninformationtheoreticapproachtoeliminatethemajorityofindirectinteractionsinferredbyco-expressionmethods.ResultsWeprovethatARACNEreconstructsthenetworkexactly(asymptotically)iftheeffectofloopsinthenetworktopologyisnegligible,andweshowthatthealgorithmworkswellinpractice,eveninthepresenceofnumerousloopsandcomplextopologies.WeassessARACNEsabilitytoreconstructtranscriptionalregulatorynetworksusingbotharealisticsyntheticdatasetandamicroarraydatasetfromhumanBcells.OnsyntheticdatasetsARACNEachievesverylowerrorratesandoutperformsestablishedmethods,suchasRelevanceNetworksandBayesianNetworks.ApplicationtothedeconvolutionofgeneticnetworksinhumanBcellsdemonstratesARACNEsabilitytoinfervalidatedtranscriptionaltargetsofthec-MYCproto-oncogene.Wealsostudytheeffectsofmis-estimationofmutualinformationonnetworkreconstruction,andshowthatalgorithmsbasedonmutualinformationrankingaremoreresilienttoestimationerrors.ConclusionsARACNEshowspromiseinidentifyingdirecttranscriptionalinteractionsinmammaliancellularnetworks,aproblemthathaschallengedexistingreverseengineeringalgorithms.Thisapproachshouldenhanceourabilitytousemicroarraydatatoelucidatefunctionalmechanismsthatunderliecellularprocessesandtoidentifymoleculartargetsofpharmacologicalcompoundsinmammaliancellularnetworks.3BackgroundCellularphenotypesaredeterminedbythedynamicalactivityoflargenetworksofco-regulatedgenes.Thusdissectingthemechanismsofphenotypicselectionrequireselucidatingthefunctionsoftheindividualgenesinthecontextofthenetworksinwhichtheyoperate.Becausegeneexpressionisregulatedbyproteins,whicharethemselvesgeneproducts,statisticalassociationsbetweengenemRNAabundancelevels,whilenotdirectlyproportionaltoactivatedproteinconcentrations,shouldprovidecluestowardsuncoveringgeneregulatorymechanisms.Consequently,theadventofhighthroughputmicroarraytechnologiestosimultaneouslymeasuremRNAabundancelevelsacrossanentiregenomehasspawnedmuchresearchaimedatusingthesedatatoconstructconceptualgenenetworkmodelstoconciselydescribetheregulatoryinfluencesthatgenesexertoneachother.Genome-wideclusteringofgeneexpressionprofiles[1]providesanimportantfirststeptowardsthisgoalbygroupingtogethergenesthatexhibitsimilartranscriptionalresponsestovariouscellularconditions,andarethereforelikelytobeinvolvedinsimilarcellularprocesses.However,theorganizationofgenesintoco-regulatedclustersprovidesaverycoarserepresentationofthecellularnetwork.Inparticular,itcannotseparatestatisticalinteractionsthatareirreducible(i.e.,direct)fromthosearisingfromcascadesoftranscriptionalinteractionsthatcorrelatetheexpressionofmanynon-interactinggenes.Moregenerally,asappreciatedinstatisticalphysics,longrangeorder(i.e.,highcorrelationamongnon-directlyinteractingvariables)caneasilyresultfromshortrangeinteractions[2].Thuscorrelations,oranyotherlocaldependencymeasure,cannotbeusedastheonlytoolforthereconstructionofinteractionnetworkswithoutadditionalassumptions.Withinthelastfewyearsanumberofsophisticatedapproachesforthereverseengineeringofcellularnetworks(alsocalleddeconvolution)fromgeneexpressiondatahaveemerged(reviewedin[3]).Theirgoalistoproduceahigh-fidelityrepresentationofthecellularnetworktopologyasagraph,wheregenesarerepresentedasverticesandareconnectedbyedgesrepresentingdirectregulatoryinteractions.Thecriteriafordefininganedge,aswellasitsbiologicalinterpretation,remainimpreciseandvarybetweenapplications.Forexample,graphicalmodeling[4]definesedgesasparent-childrelationshipsbetweenmRNAabundancelevelsthataremostlikelytoexplainthedata,integrativemethods[5]useindependentexperimentalcluestodefineedgesasthoseshowingevidenceofphysicalinteractions,andotherstatistical/informationtheoreticalmethods[6]identifyedgeswiththestrongeststatisticalassociationsbetweenmRNAabundancelevels.Allavailableapproachessuffertovariousdegreesfromproblemssuchasoverfitting,highcomputational
本文标题:ARACNE An Algorithm for the Reconstruction of Gene
链接地址:https://www.777doc.com/doc-3357521 .html