您好,欢迎访问三七文档
1Abstract—RemoteDifferentialCompression(RDC)protocolscanefficientlyupdatefilesoveralimited-bandwidthnetworkwhentwositeshaveroughlysimilarfiles;nositeneedstoknowthecontentofanother'sfilesapriori.Wepresentaheuristicapproachtoidentifyandtransferthefiledifferencesthatisbasedonfindingsimilarfiles,subdividingthefilesintochunks,andcomparingchunksignatures.OurworksignificantlyimprovesuponpreviousprotocolssuchasLBFSandRSYNCinthreeways.Firstly,wepresentanovelalgorithmtoefficientlyfindtheclientfilesthatarethemostsimilartoagivenserverfile.Ouralgorithmrequires96bitsofmeta-dataperfile,independentoffilesize,andthusallowsustokeepthemetadatainmemoryandeliminatetheneedforexpensivediskseeks.Secondly,weshowthatRDCcanbeappliedrecursivelytosignaturestoreducethetransfercostforlargefiles.Thirdly,wedescribenewwaystosubdividefilesintochunksthatidentifyfiledifferencesmoreaccurately.WehaveimplementedourapproachinDFSR,astate-basedmulti-masterfilereplicationserviceshippingaspartofWindowsServer2003R2.OurexperimentalresultsshowthatsimilaritydetectionproducesresultscomparabletoLBFSwhileincurringamuchsmalleroverheadformaintainingthemetadata.Recursivesignaturetransferfurtherincreasesreplicationefficiencybyuptoseveralordersofmagnitude.I.INTRODUCTIONstheamountofdatasharedovertheInternetcontinuestogrowrapidly,usersstillexperiencehighcostsandlongdelaysintransferringlargeamountsofinformationacrossthenetwork.However,itoftenhappensthatalargefractionoftheinformationthatistransmittedisredundant,astherecipientmayalreadyhavestoredasimilar(ifnotidentical)copyofthedata.Forinstance,considerthecaseofagroupofpeoplecollaboratingoveremailtoproducealargePowerPointpresentation,sendingitbackandforthasanattachmenteachtimetheymakechanges.Ananalysisoftypicalincrementalchangesshowsthatveryoftenjustasmallfractionofthefilechanges.Therefore,adramaticreductioninbandwidthcanbeachievedifjustthedifferencesarecommunicatedacrossthenetwork.Achangeaffecting16KBina3.5MBfilerequiresabout3stotransmitovera56Kbpsmodem,comparedto10minutesforafulltransfer.Delta-compressionutilitiessuchasdiff[1][11][13],vcdiff[17],xdelta[21],BSDiff[25],orzdelta[32]maybeusedtoproduceasuccinctdescriptionofthedifferencesbetweentwofilesifbothfiles(theoldandthenewversion)arelocallyavailabletoasender.However,inmanydistributedsystemsthisassumptionmaybeoverlyrestrictive,sinceitisdifficultorinfeasibletoknowwhicholdcopiesofthefiles(ifany)othernodeshold.(Anotableexceptionisthecaseofsoftwaredistribution,wherethesendermaystorethepreviousversionsofthebinariesandpre-computedifferences).Adifferentclassofprotocolscanbeusediftheoldandthenewversionofthefileareattheoppositeendsofaslownetworkconnection.TheseRemoteDifferentialCompression(RDC)protocolsheuristicallynegotiateasetofdifferencesbetweenarecipientandasenderthathavetwosufficientlysimilarversionsofthesamefile.Whilenotaspreciseaslocaldeltacompression,RDCmayhelptogreatlyreducethetotalamountofdatatransferred.IntheLowBandwidthFileSystem(LBFS)[24],anRDCprotocolisusedtooptimizethecommunicationbetweenasenderandarecipientbyhavingbothsidessubdividealloftheirfilesintochunksandcomputestrongchecksums,orsignatures,foreachchunk.Whenaclientneedstoaccessorcopyafilefromtheserver,thelatterfirsttransmitsthelistofsignaturesforthatfiletotheclient,whichdetermineswhichofitsoldchunksmaybeusedtoreconstructthenewfile,andrequeststhemissingchunks.Thekeytothisprotocolisthatthefilesaredividedindependentlyontheclientandserver,bydeterminingchunkboundariesfromdatafeatures.Comparedtochunkingatfixedboundaries(anapproachusedbyRSYNC[25][34]),thisdata-dependentchunkingopensupotherapplications,suchasusingasystem-widedatabaseofchunksignaturesontheclient.ThispaperbuildsupontheLBFSapproachinthecontextofDFSR,ascalablestate-basedmulti-masterfilesynchronizationservicethatispartofWindowsServer2003R2.SomeoftheprimaryusesofDFSRincludethedistributionofcontentfromasmallnumberofhubstoalargeOptimizingFileReplicationoverLimited-BandwidthNetworksusingRemoteDifferentialCompressionDanTeodosiu,NikolajBjørner,YuriGurevich,MarkManasse,JoePorkka{danteo,nbjorner,gurevich,manasse,jporkka}@microsoft.comMicrosoftCorporationA2numberofspokenodes,thecollectionofcontentfromspokesbacktothehubsforbackupandarchivalpurposes,andad-hoccollaborationbetweenspokes.Hubsandspokesmaybearrangedinauser-definedanddynamicallymodifiabletopology,ranginguptoafewthousandnodes.Inmostactualconfigurations,spokeswillbegeographicallydistributedandwilloftenhavealimited-bandwidthconnectiontotherestofthesystem;satellitelinksorevenmodem-basedconnectionsarenotuncommon.Therefore,efficientuseofconnectionbandwidthisoneoftheforemostcustomerrequirementsforDFSR.InourRDCimplementation,wesignificantlyimproveuponLBFSaswellasothersimilarprotocols,suchasthewidelyusedRSYNCprotocol[25][34],inthreedifferentways.Thefirstcontributionisanovelandveryefficientwayforallowingaclienttolocateasetoffilesthatarelikelytobesimilartothefilethatneedstobetransferredfromaserver.Oncethissetofsimilarfileshasbeenfound,theclientmayreuseanychunksfromthesefilesduringtheRDCprotocol.Notethatinthecont
本文标题:Optimizing File Replication over Limited-Bandwidth
链接地址:https://www.777doc.com/doc-3334742 .html