您好,欢迎访问三七文档
SuperLUDIST:AScalableDistributed-MemorySparseDirectSolverforUnsymmetricLinearSystemsXIAOYES.LILawrenceBerkeleyNationalLaboratoryandJAMESW.DEMMELUniversityofCaliforniaatBerkeleyWepresentthemainalgorithmicfeaturesinthesoftwarepackageSuperLUDIST,adistributed-memorysparsedirectsolverforlargesetsoflinearequations.Wegiveindetailourparallelizationstrategies,withafocusonscalabilityissues,anddemonstratethesoftware’sparallelperformanceandscalabilityoncurrentmachines.ThesolverisbasedonsparseGaussianelimination,withaninnovativestaticpivotingstrategyproposedearlierbytheauthors.Themainadvantageofstaticpivotingoverclassicalpartialpivotingisthatitpermitsapriorideterminationofdatastructuresandcommunicationpatterns,whichletsusexploittechniquesusedinparallelsparseCholeskyalgorithmstobetterparallelizebothLUdecompositionandtriangularsolutiononlarge-scaledistributedmachines.CategoriesandSubjectDescriptors:G.1.3[NumericalAnalysis]:NumericalLinearAlgebra—Sparse,structured,andverylargesystems(directanditerativemethods);G.4[MathematicalSoftware]:MathematicalSoftware—ParallelandvectorimplementationsGeneralTerms:Algorithms,PerformanceAdditionalKeyWordsandPhrases:Sparsedirectsolver,supernodalfactorization,parallelism,distributed-memorycomputers,scalability1.INTRODUCTIONParallelizingsparsedirectsolvershasbeenanactiveresearchareainthepastdecade.Ourgoalistoimplementasparsedirectsolverfornonsymmetricma-tricesasscalablyaspossibleondistributedmemorymachines.ThisworkwassupportedinpartbytheNationalEnergyResearchScientificComputingCenter(NERSC),whichissupportedbytheDirector,OfficeofAdvancedScientificComputingResearch,DivisionofMathematical,Information,andComputationalSciencesoftheU.S.DepartmentofEnergyundercontractnumberDE-AC03-76SF00098,andwassupportedinpartbytheNationalScienceFoundationCooperativeAgreementNo.ACI-9619020,NSFGrantNo.ACI-9813362,andDepartmentofEnergyGrantNos.DE-FG03-94ER25219andDE-FC03-98ER25351,andUTSub-contractNo.ORA4466fromARPAContractNo.DAAL03-91-C0047.Authors’addresses:X.S.Li,LawrenceBerkeleyNationalLab,MS50F-1650,OneCyclotronRd.,Berkeley,CA94720;email:xsli@lbl.gov;J.W.Demmel,ComputerScienceDivision,UniversityofCalifornia,Berkeley,Berkeley,CA94720;email:demmel@cs.berkeley.edu.Permissiontomakedigital/hardcopyofpartorallofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatthecopiesarenotmadeordistributedforprofitorcommercialadvantage,thecopyrightnotice,thetitleofthepublication,anditsdateappear,andnoticeisgiventhatcopyingisbypermissionofACM,Inc.Tocopyotherwise,torepublish,topostonservers,ortoredistributetolistsrequirespriorspecificpermissionand/orafee.C°2003ACM0098-3500/03/0600-0110$5.00ACMTransactionsonMathematicalSoftware,Vol.29,No.2,June2003,Pages110–140.SuperLUDIST:AScalableSparseDirectSolver²111Itisimportanttosayexactlywhatwemeanbyscalability,becauseithassomereasonablesoundingbutunachievableinterpretations.Forinstance,ifthen-by-nmatrixequationtobesolvedarisesfromadifferentialequationlikeLaplace’sequation,thenwecannotaspiretoachievetheO(n)complexityofmethodslikemultigrid.Wealsodonotclaimlinearspeedupsforfixedproblemsizes,sincethisdependssomuchontheparticularsparsematrixstructure.However,wedocomeclosetolinearspeedupsforconstant-work-per-processorscalingonreasonablemodelproblems(seeSection4.4).Moreprecisely,forusscalabilitywillmean“asscalableassolvingasym-metricpositivedefinite(spd)linearsystembyasparsedirectmethod,”ormorebriefly“asscalableassparseCholesky.”Thereasonforthisisthatthenonsym-metricproblemisstrictlymoredifficultthanthespdcase,sothatwecannothopetodobetteringeneral.OurclaimofscalabilityisbasedonourabilitytouseallthetechniquesexploitedtoparallelizesparseCholesky(seebelow).Thepricewepayisaverysmallprobabilityofnumericalinstability.Wenotethatthisnumericalinstabilityneveroccurredonourextensivetestsetforthedefaultparametersettingsofourcode,andinanyeventisalwaysdetectedandreportedbythecode.TheadvantageofsparseCholeskyoverthenonsymmetriccaseisthatpivotscanbechoseninanyorderfromthemaindiagonalwhileguaranteeingstabil-ity.Thisletsusperformpivotchoicebeforenumericalfactorizationbegins,inordertominimizefill-in,maximizeparallelism,precomputethenonzerostructureoftheCholeskyfactor,andoptimizethetwo-dimensional(2D)dis-tributeddatastructuresandcommunicationpattern.Researchershavebeenquitesuccessfulinachieving“scalable”performanceforsparseCholeskyfactor-ization;availablecodesincludeCAPSS[HeathandRaghavan1997],MUMPS-SYM[Amestoyetal.2001a],PaStix[Henonetal.1999],PSLDLT[Rothberg1996],andPSPACES[Guptaetal.1997].Incontrast,fornonsymmetricorindefinitesystems,fewdistributed-memorycodesexist.TheyaremorecomplicatedthanCholeksyforatleasttworeasons.Firstandforemost,somekindofnumericalpivotingisnecessaryforstability.Classicalpartialpivoting[GolubandVanLoan1996]orthesparsevariantofthresholdpivoting[Duffetal.1986]typicallycausethefill-insandwork-loadtobegenerateddynamicallyduringfactorization.Therefore,wemustei-therdesigndynamicdatastructuresandalgorithmstoaccommodatethesefill-ins[Amestoyetal.2001a],orelseusestaticdatastructureswhichcangrosslyov
本文标题:SuperLU DIST A scalable distributed-memory sparse
链接地址:https://www.777doc.com/doc-6238644 .html