SuperLU DIST A scalable distributed-memory sparse

SuperLUDIST:AScalableDistributed-MemorySparseDirectSolverforUnsymmetricLinearSystemsXIAOYES.LILawrenceBerkeleyNationalLaboratoryandJAMESW.DEMMELUniversityofCaliforniaatBerkeleyWepresentthemainalgorithmicfeaturesinthesoftwarepackageSuperLUDIST,adistributed-memorysparsedirectsolverforlargesetsoflinearequations.Wegiveindetailourparallelizationstrategies,withafocusonscalabilityissues,anddemonstratethesoftware’sparallelperformanceandscalabilityoncurrentmachines.ThesolverisbasedonsparseGaussianelimination,withaninnovativestaticpivotingstrategyproposedearlierbytheauthors.Themainadvantageofstaticpivotingoverclassicalpartialpivotingisthatitpermitsapriorideterminationofdatastructuresandcommunicationpatterns,whichletsusexploittechniquesusedinparallelsparseCholeskyalgorithmstobetterparallelizebothLUdecompositionandtriangularsolutiononlarge-scaledistributedmachines.CategoriesandSubjectDescriptors:G.1.3[NumericalAnalysis]:NumericalLinearAlgebra—Sparse,structured,andverylargesystems(directanditerativemethods);G.4[MathematicalSoftware]:MathematicalSoftware—ParallelandvectorimplementationsGeneralTerms:Algorithms,PerformanceAdditionalKeyWordsandPhrases:Sparsedirectsolver,supernodalfactorization,parallelism,distributed-memorycomputers,scalability1.INTRODUCTIONParallelizingsparsedirectsolvershasbeenanactiveresearchareainthepastdecade.Ourgoalistoimplementasparsedirectsolverfornonsymmetricma-tricesasscalablyaspossibleondistributedmemorymachines.ThisworkwassupportedinpartbytheNationalEnergyResearchScientiﬁcComputingCenter(NERSC),whichissupportedbytheDirector,OfﬁceofAdvancedScientiﬁcComputingResearch,DivisionofMathematical,Information,andComputationalSciencesoftheU.S.DepartmentofEnergyundercontractnumberDE-AC03-76SF00098,andwassupportedinpartbytheNationalScienceFoundationCooperativeAgreementNo.ACI-9619020,NSFGrantNo.ACI-9813362,andDepartmentofEnergyGrantNos.DE-FG03-94ER25219andDE-FC03-98ER25351,andUTSub-contractNo.ORA4466fromARPAContractNo.DAAL03-91-C0047.Authors’addresses:X.S.Li,LawrenceBerkeleyNationalLab,MS50F-1650,OneCyclotronRd.,Berkeley,CA94720;email:xsli@lbl.gov;J.W.Demmel,ComputerScienceDivision,UniversityofCalifornia,Berkeley,Berkeley,CA94720;email:demmel@cs.berkeley.edu.Permissiontomakedigital/hardcopyofpartorallofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatthecopiesarenotmadeordistributedforproﬁtorcommercialadvantage,thecopyrightnotice,thetitleofthepublication,anditsdateappear,andnoticeisgiventhatcopyingisbypermissionofACM,Inc.Tocopyotherwise,torepublish,topostonservers,ortoredistributetolistsrequirespriorspeciﬁcpermissionand/orafee.C°2003ACM0098-3500/03/0600-0110$5.00ACMTransactionsonMathematicalSoftware,Vol.29,No.2,June2003,Pages110–140.SuperLUDIST:AScalableSparseDirectSolver²111Itisimportanttosayexactlywhatwemeanbyscalability,becauseithassomereasonablesoundingbutunachievableinterpretations.Forinstance,ifthen-by-nmatrixequationtobesolvedarisesfromadifferentialequationlikeLaplace’sequation,thenwecannotaspiretoachievetheO(n)complexityofmethodslikemultigrid.Wealsodonotclaimlinearspeedupsforﬁxedproblemsizes,sincethisdependssomuchontheparticularsparsematrixstructure.However,wedocomeclosetolinearspeedupsforconstant-work-per-processorscalingonreasonablemodelproblems(seeSection4.4).Moreprecisely,forusscalabilitywillmean“asscalableassolvingasym-metricpositivedeﬁnite(spd)linearsystembyasparsedirectmethod,”ormorebrieﬂy“asscalableassparseCholesky.”Thereasonforthisisthatthenonsym-metricproblemisstrictlymoredifﬁcultthanthespdcase,sothatwecannothopetodobetteringeneral.OurclaimofscalabilityisbasedonourabilitytouseallthetechniquesexploitedtoparallelizesparseCholesky(seebelow).Thepricewepayisaverysmallprobabilityofnumericalinstability.Wenotethatthisnumericalinstabilityneveroccurredonourextensivetestsetforthedefaultparametersettingsofourcode,andinanyeventisalwaysdetectedandreportedbythecode.TheadvantageofsparseCholeskyoverthenonsymmetriccaseisthatpivotscanbechoseninanyorderfromthemaindiagonalwhileguaranteeingstabil-ity.Thisletsusperformpivotchoicebeforenumericalfactorizationbegins,inordertominimizeﬁll-in,maximizeparallelism,precomputethenonzerostructureoftheCholeskyfactor,andoptimizethetwo-dimensional(2D)dis-tributeddatastructuresandcommunicationpattern.Researchershavebeenquitesuccessfulinachieving“scalable”performanceforsparseCholeskyfactor-ization;availablecodesincludeCAPSS[HeathandRaghavan1997],MUMPS-SYM[Amestoyetal.2001a],PaStix[Henonetal.1999],PSLDLT[Rothberg1996],andPSPACES[Guptaetal.1997].Incontrast,fornonsymmetricorindeﬁnitesystems,fewdistributed-memorycodesexist.TheyaremorecomplicatedthanCholeksyforatleasttworeasons.Firstandforemost,somekindofnumericalpivotingisnecessaryforstability.Classicalpartialpivoting[GolubandVanLoan1996]orthesparsevariantofthresholdpivoting[Duffetal.1986]typicallycausetheﬁll-insandwork-loadtobegenerateddynamicallyduringfactorization.Therefore,wemustei-therdesigndynamicdatastructuresandalgorithmstoaccommodatetheseﬁll-ins[Amestoyetal.2001a],orelseusestaticdatastructureswhichcangrosslyov

SuperLU DIST A scalable distributed-memory sparse

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

半导体电子组件化镀生产线项目可研报告经信委备案用(通

汽车宾馆项目加盟说明

汽车法语大全

矿产地质志编写指南

GMP自检概述

零缺陷——现代质量经营新思维

澜沧上游沿江公路工程质量管理办法(07月)

楠苑高端住宅项目广告推广思路提案_108页_XXXX年

行政管理中的国学应用研究

《投资创业必备宝典》，打开财富之门的金钥匙快乐创业宝典价值3

相关文档

相关搜索