Accelerating-Scientific-Computations-with-Mixed-Pr

ARTICLEINPRESSCOMPHY:3672Pleasecitethisarticleinpressas:M.Baboulinetal.,Acceleratingscientiﬁccomputationswithmixedprecisionalgorithms,ComputerPhysicsCommunications(2008),doi:10.1016/j.cpc.2008.11.005JID:COMPHYAID:3672/FLA[m5G;v1.18;Prn:3/12/2008;9:27]P.1(1-8)ComputerPhysicsCommunications•••(••••)•••–•••ContentslistsavailableatScienceDirectComputerPhysicsCommunicationsﬁccomputationswithmixedprecisionalgorithms✩MarcBaboulina,AlfredoButtarib,JackDongarrac,d,e,JakubKurzakc,∗,JulieLangouc,JulienLangouf,PiotrLuszczekg,StanimireTomovcaDepartmentofMathematics,UniversityofCoimbra,Coimbra,PortugalbFrenchNationalInstituteforResearchinComputerScienceandControl,Lyon,FrancecDepartmentofElectricalEngineeringandComputerScience,UniversityTennessee,Knoxville,TN,USAdOakRidgeNationalLaboratory,OakRidge,TN,USAeUniversityofManchester,Manchester,UnitedKingdomfDepartmentofMathematicalandStatisticalSciences,UniversityofColoradoDenver,Denver,CO,USAgMathWorks,Inc.,Natick,MA,USAarticleinfoabstractArticlehistory:Received2September2008Receivedinrevisedform9November2008Accepted10November2008PACS:02.60.DcKeywords:NumericallinearalgebraMixedprecisionIterativereﬁnementOnmodernarchitectures,theperformanceof32-bitoperationsisoftenatleasttwiceasfastastheperformanceof64-bitoperations.Byusingacombinationof32-bitand64-bitﬂoatingpointarithmetic,theperformanceofmanydenseandsparselinearalgebraalgorithmscanbesigniﬁcantlyenhancedwhilemaintainingthe64-bitaccuracyoftheresultingsolution.TheapproachpresentedherecanapplynotonlytoconventionalprocessorsbutalsotoothertechnologiessuchasFieldProgrammableGateArrays(FPGA),GraphicalProcessingUnits(GPU),andtheSTICellBEprocessor.ResultsonmodernprocessorarchitecturesandtheSTICellBEarepresented.ProgramsummaryProgramtitle:ITER-REFCatalogueidentiﬁer:AECO_v1_0ProgramsummaryURL::CPCProgramLibrary,Queen’sUniversity,Belfast,N.IrelandLicensingprovisions:StandardCPClicence,:7211No.ofbytesindistributedprogram,includingtestdata,etc.:41862Distributionformat:tar.gzProgramminglanguage:FORTRAN77Computer:desktop,serverOperatingsystem:Unix/LinuxRAM:512MbytesClassiﬁcation:4.8Externalroutines:BLAS(optional)Natureofproblem:Onmodernarchitectures,theperformanceof32-bitoperationsisoftenatleasttwiceasfastastheperformanceof64-bitoperations.Byusingacombinationof32-bitand64-bitﬂoatingpointarithmetic,theperformanceofmanydenseandsparselinearalgebraalgorithmscanbesigniﬁcantlyenhancedwhilemaintainingthe64-bitaccuracyoftheresultingsolution.Solutionmethod:Mixedprecisionalgorithmsstemfromtheobservationthat,inmanycases,asingleprecisionsolutionofaproblemcanbereﬁnedtothepointwheredoubleprecisionaccuracyisachieved.Acommonapproachtothesolutionoflinearsystems,eitherdenseorsparse,istoperformtheLUfactorizationofthecoeﬃcientmatrixusingGaussianelimination.First,thecoeﬃcientmatrixAisfactoredintotheproductofalowertriangularmatrixLandanuppertriangularmatrixU.PartialrowpivotingisingeneralusedtoimprovenumericalstabilityresultinginafactorizationPA=LU,wherePisapermutationmatrix.ThesolutionforthesystemisachievedbyﬁrstsolvingLy=Pb(forwardsubstitution)andthensolvingUx=y(backwardsubstitution).Duetoround-offerrors,thecomputed✩ThispaperanditsassociatedcomputerprogramareavailableviatheComputerPhysicsCommunicationshomepageonScienceDirect().*Correspondingauthor.E-mailaddress:kurzak@eecs.utk.edu(J.Kurzak).0010-4655/$–seefrontmatterPublishedbyElsevierB.V.doi:10.1016/j.cpc.2008.11.005ARTICLEINPRESSCOMPHY:3672Pleasecitethisarticleinpressas:M.Baboulinetal.,Acceleratingscientiﬁccomputationswithmixedprecisionalgorithms,ComputerPhysicsCommunications(2008),doi:10.1016/j.cpc.2008.11.005JID:COMPHYAID:3672/FLA[m5G;v1.18;Prn:3/12/2008;9:27]P.2(1-8)2M.Baboulinetal./ComputerPhysicsCommunications•••(••••)•••–•••solution,x,carriesanumericalerrormagniﬁedbytheconditionnumberofthecoeﬃcientmatrixA.Inordertoimprovethecomputedsolution,aniterativeprocesscanbeapplied,whichproducesacorrectiontothecomputedsolutionateachiteration,whichthenyieldsthemethodthatiscommonlyknownastheiterativereﬁnementalgorithm.Providedthatthesystemisnottooill-conditioned,thealgorithmproducesasolutioncorrecttotheworkingprecision.Runningtime:seconds/minutesPublishedbyElsevierB.V.1.IntroductionOnmodernarchitectures,theperformanceof32-bitoperationsisoftenatleasttwiceasfastastheperformanceof64-bitopera-tions.Therearetworeasonsforthis.Firstly,32-bitﬂoatingpointarithmeticisusuallytwiceasfastas64-bitﬂoatingpointarith-meticonmostmodernprocessors.Secondlytheamountofbytesmovedthroughthememorysystemishalved.InTable1,wepro-videsomehardwarenumbersthatsupporttheseclaims.OnAMDOpteron246,IBMPowerPC970,andIntelXeon5100,thesingleprecisionpeakistwicethedoubleprecisionpeak.OntheSTICellBE,thesingleprecisionpeakisfourteentimesthedoublepreci-sionpeak.Notonlysingleprecisionisfasterthandoubl

Accelerating-Scientific-Computations-with-Mixed-Pr

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

电力系统大扰动对汽轮发电机组轴系扭振影响的研究

haipappy化验工作总结

第9课_西欧和日本经济的发展(1)

甲壳素行业分析

化妆品企业办公用品及电讯管理岗岗位说明书

新课程理念下物理总复习的策略

中共绵阳市委组织部（通知）

工作分析，雇员参与和弹性工作计划(ppt 28)

符合条件人员名单

豪泺泡沫消防车操作规程

相关文档

相关搜索