您好,欢迎访问三七文档
ARTICLEINPRESSCOMPHY:3672Pleasecitethisarticleinpressas:M.Baboulinetal.,Acceleratingscientificcomputationswithmixedprecisionalgorithms,ComputerPhysicsCommunications(2008),doi:10.1016/j.cpc.2008.11.005JID:COMPHYAID:3672/FLA[m5G;v1.18;Prn:3/12/2008;9:27]P.1(1-8)ComputerPhysicsCommunications•••(••••)•••–•••ContentslistsavailableatScienceDirectComputerPhysicsCommunicationsficcomputationswithmixedprecisionalgorithms✩MarcBaboulina,AlfredoButtarib,JackDongarrac,d,e,JakubKurzakc,∗,JulieLangouc,JulienLangouf,PiotrLuszczekg,StanimireTomovcaDepartmentofMathematics,UniversityofCoimbra,Coimbra,PortugalbFrenchNationalInstituteforResearchinComputerScienceandControl,Lyon,FrancecDepartmentofElectricalEngineeringandComputerScience,UniversityTennessee,Knoxville,TN,USAdOakRidgeNationalLaboratory,OakRidge,TN,USAeUniversityofManchester,Manchester,UnitedKingdomfDepartmentofMathematicalandStatisticalSciences,UniversityofColoradoDenver,Denver,CO,USAgMathWorks,Inc.,Natick,MA,USAarticleinfoabstractArticlehistory:Received2September2008Receivedinrevisedform9November2008Accepted10November2008PACS:02.60.DcKeywords:NumericallinearalgebraMixedprecisionIterativerefinementOnmodernarchitectures,theperformanceof32-bitoperationsisoftenatleasttwiceasfastastheperformanceof64-bitoperations.Byusingacombinationof32-bitand64-bitfloatingpointarithmetic,theperformanceofmanydenseandsparselinearalgebraalgorithmscanbesignificantlyenhancedwhilemaintainingthe64-bitaccuracyoftheresultingsolution.TheapproachpresentedherecanapplynotonlytoconventionalprocessorsbutalsotoothertechnologiessuchasFieldProgrammableGateArrays(FPGA),GraphicalProcessingUnits(GPU),andtheSTICellBEprocessor.ResultsonmodernprocessorarchitecturesandtheSTICellBEarepresented.ProgramsummaryProgramtitle:ITER-REFCatalogueidentifier:AECO_v1_0ProgramsummaryURL::CPCProgramLibrary,Queen’sUniversity,Belfast,N.IrelandLicensingprovisions:StandardCPClicence,:7211No.ofbytesindistributedprogram,includingtestdata,etc.:41862Distributionformat:tar.gzProgramminglanguage:FORTRAN77Computer:desktop,serverOperatingsystem:Unix/LinuxRAM:512MbytesClassification:4.8Externalroutines:BLAS(optional)Natureofproblem:Onmodernarchitectures,theperformanceof32-bitoperationsisoftenatleasttwiceasfastastheperformanceof64-bitoperations.Byusingacombinationof32-bitand64-bitfloatingpointarithmetic,theperformanceofmanydenseandsparselinearalgebraalgorithmscanbesignificantlyenhancedwhilemaintainingthe64-bitaccuracyoftheresultingsolution.Solutionmethod:Mixedprecisionalgorithmsstemfromtheobservationthat,inmanycases,asingleprecisionsolutionofaproblemcanberefinedtothepointwheredoubleprecisionaccuracyisachieved.Acommonapproachtothesolutionoflinearsystems,eitherdenseorsparse,istoperformtheLUfactorizationofthecoefficientmatrixusingGaussianelimination.First,thecoefficientmatrixAisfactoredintotheproductofalowertriangularmatrixLandanuppertriangularmatrixU.PartialrowpivotingisingeneralusedtoimprovenumericalstabilityresultinginafactorizationPA=LU,wherePisapermutationmatrix.ThesolutionforthesystemisachievedbyfirstsolvingLy=Pb(forwardsubstitution)andthensolvingUx=y(backwardsubstitution).Duetoround-offerrors,thecomputed✩ThispaperanditsassociatedcomputerprogramareavailableviatheComputerPhysicsCommunicationshomepageonScienceDirect().*Correspondingauthor.E-mailaddress:kurzak@eecs.utk.edu(J.Kurzak).0010-4655/$–seefrontmatterPublishedbyElsevierB.V.doi:10.1016/j.cpc.2008.11.005ARTICLEINPRESSCOMPHY:3672Pleasecitethisarticleinpressas:M.Baboulinetal.,Acceleratingscientificcomputationswithmixedprecisionalgorithms,ComputerPhysicsCommunications(2008),doi:10.1016/j.cpc.2008.11.005JID:COMPHYAID:3672/FLA[m5G;v1.18;Prn:3/12/2008;9:27]P.2(1-8)2M.Baboulinetal./ComputerPhysicsCommunications•••(••••)•••–•••solution,x,carriesanumericalerrormagnifiedbytheconditionnumberofthecoefficientmatrixA.Inordertoimprovethecomputedsolution,aniterativeprocesscanbeapplied,whichproducesacorrectiontothecomputedsolutionateachiteration,whichthenyieldsthemethodthatiscommonlyknownastheiterativerefinementalgorithm.Providedthatthesystemisnottooill-conditioned,thealgorithmproducesasolutioncorrecttotheworkingprecision.Runningtime:seconds/minutesPublishedbyElsevierB.V.1.IntroductionOnmodernarchitectures,theperformanceof32-bitoperationsisoftenatleasttwiceasfastastheperformanceof64-bitopera-tions.Therearetworeasonsforthis.Firstly,32-bitfloatingpointarithmeticisusuallytwiceasfastas64-bitfloatingpointarith-meticonmostmodernprocessors.Secondlytheamountofbytesmovedthroughthememorysystemishalved.InTable1,wepro-videsomehardwarenumbersthatsupporttheseclaims.OnAMDOpteron246,IBMPowerPC970,andIntelXeon5100,thesingleprecisionpeakistwicethedoubleprecisionpeak.OntheSTICellBE,thesingleprecisionpeakisfourteentimesthedoublepreci-sionpeak.Notonlysingleprecisionisfasterthandoubl
本文标题:Accelerating-Scientific-Computations-with-Mixed-Pr
链接地址:https://www.777doc.com/doc-4687014 .html