BICA AND RANDOM SUBSPACE ENSEMBLES FOR DNA MICROAR

March22,200617:17ProceedingsTrimSize:9inx6inapolloni-cibb06-ﬁnalBICAANDRANDOMSUBSPACEENSEMBLESFORDNAMICROARRAY-BASEDDIAGNOSISB.APOLLONIANDG.VALENTINIDipartimentodiScienzedell’Informazione,Universit`adegliStudidiMilano,ViaComelico39/41,20135Milano,Italy{apolloni,valentini}@dsi.unimi.itA.BREGADipartimentodiMatematica“F.Enriquez”,Universit`adegliStudidiMilanoViaSaldini50,20133Milano,Italyandrea.brega@unimi.itWecomparetwoensemblemethodstoclassifyDNAmicroarraydata.Themeth-odsusediﬀerentstrategiestofacethecourseofdimensionalityplaguingthesedata.Oneofthemprojectsdataalongrandomcoordinates,theothercompressesthemintoindependentbooleanvariables.Bothresultinrandomfeatureextractionprocedures,feedingSVMsasbaselearnersforamajorityvotingensembleclassi-ﬁer.Theclassiﬁcationcapabilitiesarecomparable,degradingoninstancesthatareacknowledgedanomalousintheliterature.1.IntroductionThetraditionaltaxonomyofmalignancies,basedontheirmorphological,histopathological,andclinicalcharacteristics,maybesometimesineﬀectiveforacorrectdiagnosisandprognosisoftumors1.Indeedamorereﬁneddiagnosismaybeachievedexploitingthegenome-widebio-molecularchar-acteristicsoftumors,usinghighthroughputbio-technologiesbasedonlargescalehybridizationtechniques(e.g.DNAmicroarray)5.OneofthemaindrawbacksthatcharacterizesDNAmicroarraydataisrepresentedbytheirveryhighdimensionalityandlowcardinality.In-deediswellknownthatinthesecasesthecurseofdimensionalityproblemarises.Henceseveralworkspointedouttheimportanceoffeatureselectionmethodstoreducethedimensionalityoftheinputspace7.Analternativeapproachisrepresentedbydatacompressiontechniquesthatcanreducethe1March22,200617:17ProceedingsTrimSize:9inx6inapolloni-cibb06-ﬁnal2dimensionalityofthedata,whileapproximatelypreservingtheirinforma-tioncontent.Asfortheirprocessing,severalauthorsrecentlyproposedtoapplyensemblemethodsforimprovingtheperformanceofstate-of-the-artclassiﬁcationalgorithmsinthecontextofgeneexpressiondataanalysis4.Inthispaperwecomparetwoensemblemethodsbasedondata-compressiontechniquesforDNA-microarray-baseddiagnosis.Theﬁrstoneexploitsrandomprojectionstolowerdimensionalsubspaces8,whilethesecondperformsdatacompressionthroughaBooleanIndependentCom-ponentAnalysis(BICA)algorithm13.Whiletheﬁrstmethodhasjustbeenappliedtogeneexpressiondataanalysis3,BICAhasneverbeenpreviouslyappliedtoDNAmicroarraydataanalysis.Inthenexttwosectionsweintroducethemethods,andinSect.4weexperimentallyanalyzetheeﬀectivenessofthetwoapproaches,applyingthemtoDNAmicroarray-basesdiagnosisoftumors.2.RSE:RandomSubspaceEnsembleThereductionofthedimensionalityinthecontextofsupervisedanalysisofdataisusuallypursuedthroughfeatureselectionmethods.Manymethodscanbeapplied,rangingfromﬁltermethods,wrappermethods,informationtheorybasedtechniquesand”embedded”methods(seee.g.6forarecentreview).Werecentlyexperimentedadiﬀerentapproach3basedonrandomsub-spaceensemblemethods8.Foraﬁxedn,nfeatures(genes)arerandomlyselected,accordingtotheuniformdistribution.Thenthedataoftheorig-inald-dimensionaltrainingsetisprojectedtotheselectedn-dimensionalsubspace.Theresultingdatasetisusedtotrainasuitablebaselearnerandthisprocessisrepeatedνtimesgivingraisetoanensembleofνlearningmachinestrainedondiﬀerentrandomlyselectedsubsetsoffeatures.Theresultingsetofclassiﬁersarethencombinedbyusingmajorityvoting.Thismethodavoidssomecomputationaldiﬃcultyoffeatureselection(featureselectionisanNP-hardproblem),andaparallelimplementationcanbeprovidedinanaturalway.Anywayfeatureselectionmethodscanexplicitlyselectsetsofrelevantfeatures,whilethisinformationcannotbedirectlyobtainedthroughRSensembles.Ontheotherhand,withdiﬀer-entrandomprojectionsofthedatawecanimprovediversitybetweenbaselearners9,whiletheoverallaccuracyoftheensemblecanbeenhancedthroughaggregationtechniques.Asaconsequencetheperformanceofagivenclassiﬁcationalgorithmmaybeenhanced.Ahigh-levelpseudo-codeMarch22,200617:17ProceedingsTrimSize:9inx6inapolloni-cibb06-ﬁnal3ofthemethodissummarizedinFig.1.Inparticular,SubspaceprojectionRandomSubspaceEnsembleAlgorithmInput:-AdatasetD={(xj,tj)|1≤j≤m},xj∈X⊂Rd,tj∈C={1,...,k}-alearningalgorithmL-subspacedimensionnd-numberofthebaselearnersmOutput:-Finalhypothesishran:X→Ccomputedbytheensemble.beginfori=1toνbeginDi=Subspaceprojection(D,n)hi=L(Di)endhran(x)=argmaxt∈Ccard({i|hi(x)=t})end.Figure1.High-levelpseudo-codeoftheRSEmethodprocedureselectsan-subsetA={α1,...,αn}from{1,2,...,d},andre-turnsasoutputthenewdatasetDi={(PA(xj),tj)|1≤j≤m},wherePA(x1,...,xd)=(xα1,...,xαn).ThenewdatasetDiisthengivenasinputtoalearningalgorithmLwhichoutputsaclassiﬁerhi.Alltheclassiﬁersobtainedareﬁnallyaggre-gatedthroughmajorityvoting,wherecard()measuresthecardinalityofaset.3.BICAnetworkAsuitablewayoftakingdecisionsbasedondataistosplitthedecisionprocessintwosteps.Theﬁrstisdevotedtopreprocessingdatainafeasiblewaysuchthattheycanbeinterpretedinthesecondone.Asforthefor-mer,itmirrorsrealvectorsintobooleanones,thatshouldreﬂectrelevantfeaturesoftheoriginaldatapatterns.Stressingthefactthatindependenceisapropertyoftherepresentationofthedatathatweuse,wesearc

BICA AND RANDOM SUBSPACE ENSEMBLES FOR DNA MICROAR

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

中山市财政局BI商业智能软件采购项目

电子商务采购管理PPT

电力需求侧管理基本概念

XXXX年房地产投资策略

房地产行业情景路线采购谈判培训-90PPT

年产30万吨西玛西系列控释专用肥扩建工程

曲靖恒大名都首期工程砌体施工专项方案终极版

房地产企业全成本管理

限额领料制度

贵州轮胎优化流程分析总结（PPT 142页）

相关文档

相关搜索