您好,欢迎访问三七文档
ParallelAlgorithmsforForwardandBackSubstitutioninDirectSolutionofSparseLinearSystems∗ANSHULGUPTAIBMT.J.WATSONRESEARCHCENTERYORKTOWNHEIGHTS,NY10598ANSHUL@WATSON.IBM.COMVIPINKUMARDEPARTMENTOFCOMPUTERSCIENCEUNIVERSITYOFMINNESOTAMINNEAPOLIS,MN55455KUMAR@CS.UMN.EDU∗ThisworkwassupportedbyIST/BMDOthroughArmyResearchOfficecon-tractDA/DAAH04-93-G-0080,NSFgrantNSG/1RI-9216941,andbyArmyHighPerformanceComputingResearchCenterundertheauspicesoftheDepartmentoftheArmy,ArmyResearchLaboratorycooperativeagreementnumberDAAH04-95-2-0003/contractnumberDAAH04-95-C-0008,thecontentofwhichdoesnotnecessarilyreflectthepositionorthepolicyofthegovernment,andnoofficialendorsementshouldbeinferred.AccesstocomputingfacilitieswereprovidedbyMinnesotaSupercomputerInstitute,CrayResearchInc.andbythePitts-burghSupercomputingCenter.Relatedpapersareavailablevia://:reordering,symbolicfac-torization,numericalfactorization,andsolvingthelower-andupper-triangularsystemsresultingfromfactorization.Sincenumericalfac-torizationiscomputationallythemostexpensivephase,asignificantresearchefforthasbeendirectedtowardsdevelopingefficientandscalableparallelsparsefactorizationalgorithms.Wehaverecentlyproposed[4]aparallelsparseCholeskyfactorizationalgorithmthatisoptimallyscalableforawideclassofproblems.ExperimentshaveshownthatthisalgorithmcaneasilyspeedupCholeskyfactorizationbyafactorofatleastafewhundredonupto1024processors.Withsuchspeedupsinnumericalfactorization,itisimperativethatthere-mainingphasesofthesolutionprocessbeparallelizedeffectivelyinordertoscaletheperformanceoftheoverallsolver.Furthermore,withoutanoverallparallelsolver,thesizeofthesparsesystemsthatcanbesolvedmaybeseverelyrestrictedbytheamountofmemoryavailableonauniprocessorsystem.Inthispaper,weaddresstheproblemofperformingthefinalphaseofforwardandbackwardsubstitutioninparallelonadistributedmem-orymultiprocessor.Wepresentadetailedanalysisoftheparallelcomplexityandscalabilityofparallelalgorithmdescribedbrieflyin[5]toobtainasolutiontothesystemofsparselinearequationsoftheformsLY=BandUX=Y,whereLisalowertriangularmatrixandUisanuppertriangularmatrix.HereLandUareobtainedfromthenumericalfactorizationofasparsecoefficientmatrixAoftheoriginalsystemAX=Btobesolved.IfA,L,andUareN×Nmatrices,2thenX,Y,andBareN×mmatrices,wheremisthenumberofright-handsidevectorsforwhichthesolutiontothesparselinearsystemwithAasthecoefficientmatrixisdesired.Ouranalysisandexperi-mentsshowthat,althoughnotasscalableasthebestparallelsparseCholeskyfactorizationalgorithms,parallelsparsetriangularsolverscanyieldreasonablespeedupsinruntimeonhundredsofprocessors.Wealsoshowthatforawideclassofproblems,thesparsetriangularsolversdescribedinthispaperareoptimalandareasymptoticallyasscalableasadensetriangularsolver.Forasingleright-handside(m=1),ourexperimentsshowa256-processorperformanceofupto435MFLOPSonaCrayT3D,onwhichthesingle-processorperformanceforthesameproblemis≈8.6MFLOPS.Withm=30,themaximumsingle-processorand256-processorperformanceobservedinourexperimentsis≈30MFLOPSand≈3050MFLOPS,respectively.Tothebestofourknowledge,thisisthehighestperformanceandspeedupforthisproblemreportedonanymassivelyparallelcomputer.Inadditiontotheperformanceandscalabilityanalysisofparallelsparsetriangularsolvers,wediscusstheredistributionofthetriangu-larfactormatrixamongtheprocessorsbetweennumericalfactoriza-tionandtriangularsolution,anditsimpactonperformance.In[4],wedescribeanoptimaldata-distributionschemeforCholeskyfactoriza-tionofsparsematrices.ThisdistributionleavesgroupsofconsecutivecolumnsofLwithidenticalpatternofnon-zeros(henceforthcalledsupernodes)withatwo-dimensionalpartitioningamonggroupsofprocessors.However,thisdistributionisnotsuitableforthetriangularsolvers,whicharescalableonlywithaone-dimensionalpartition-ingofthesupernodalblocksofL.Weshowthatifthesupernodesaredistributedinasubtree-to-subcubemanner[2]thenthecostofconvertingthetwo-dimensionaldistributiontoaone-dimensionaldis-tributionisonlyaconstanttimesthecostofsolvingthetriangularsystems.Fromourexperiments,weobservedthatthisconstantisfairlysmallontheCrayT3D—atmost0.9forasingleright-handsid
本文标题:Parallel Algorithms for Forward and Back Substitut
链接地址:https://www.777doc.com/doc-5085513 .html