您好,欢迎访问三七文档
当前位置:首页 > 商业/管理/HR > 销售管理 > The_Google_File_System Final
TheGoogleFileSystemBySanjayGhemawat,HowardGobioff,andShun-TakLeung(PresentedatSOSP2003)IntroductionGoogle–searchengine.Applicationsprocesslotsofdata.Needgoodfilesystem.Solution:GoogleFileSystem(GFS).MotivationalFactsMorethan15,000commodity-classPC's.Multipleclustersdistributedworldwide.Thousandsofqueriesservedpersecond.Onequeryreads100'sofMBofdata.Onequeryconsumes10'sofbillionsofCPUcycles.GooglestoresdozensofcopiesoftheentireWeb!Conclusion:Needlarge,distributed,highlyfault-tolerantfilesystem.TopicsDesignMotivationsArchitectureRead/Write/RecordAppendFault-TolerancePerformanceResultsDesignMotivations1.Fault-toleranceandauto-recoveryneedtobebuiltintothesystem.2.StandardI/Oassumptions(e.g.blocksize)havetobere-examined.3.Recordappendsaretheprevalentformofwriting.4.GoogleapplicationsandGFSshouldbeco-designed.GFSArchitecture(Analogy)Onasingle-machineFS:Anupperlayermaintainsthemetadata.Alowerlayer(i.e.disk)storesthedatainunitscalled“blocks”.UpperlayerstoreIntheGFS:Amasterprocessmaintainsthemetadata.Alowerlayer(i.e.asetofchunkservers)storesthedatainunitscalled“chunks”.GFSArchitectureMasterMetadataChunkserverLinuxFSChunkserverLinuxFSClient(requestformetadata)(metadatareponse)(read/writerequest)(read/writeresponse)GFSArchitectureWhatisachunk?Analogoustoblock,exceptlarger.Size:64MB!StoredonchunkserverasfileChunkhandle(~chunkfilename)usedtoreferencechunk.ChunkreplicatedacrossmultiplechunkserversNote:TherearehundredsofchunkserversinaGFSclusterdistributedovermultipleracks.GFSArchitectureWhatisamaster?Asingleprocessrunningonaseparatemachine.Storesallmetadata:FilenamespaceFiletochunkmappingsChunklocationinformationAccesscontrolinformationChunkversionnumbersEtc.GFSArchitectureMaster-ChunkserverCommunication:Masterandchunkservercommunicateregularlytoobtainstate:Ischunkserverdown?Aretherediskfailuresonchunkserver?Areanyreplicascorrupted?Whichchunkreplicasdoeschunkserverstore?Mastersendsinstructionstochunkserver:Deleteexistingchunk.Createnewchunk.GFSArchitectureServingRequests:Clientretrievesmetadataforoperationfrommaster.Read/Writedataflowsbetweenclientandchunkserver.Singlemasterisnotbottleneck,becauseitsinvolvementwithread/writeoperationsisminimized.OverviewDesignMotivationsArchitectureMasterChunkserversClientsRead/Write/RecordAppendFault-TolerancePerformanceResultsAndnowfortheMeat…ReadAlgorithmApplicationGFSClient(filename,byterange)Master(filename,chunkindex)(chunkhandle,replicalocations)213ReadAlgorithmApplicationGFSClientChunkServerChunkServerChunkServer(chunkhandle,byterange)(datafromfile)(datafromfile)456ReadAlgorithm1.Applicationoriginatesthereadrequest.2.GFSclienttranslatestherequestfrom(filename,byterange)-(filename,chunkindex),andsendsittomaster.3.Masterrespondswithchunkhandleandreplicalocations(i.e.chunkserverswherethereplicasarestored).4.Clientpicksalocationandsendsthe(chunkhandle,byterange)requesttothatlocation.5.Chunkserversendsrequesteddatatotheclient.6.Clientforwardsthedatatotheapplication.ReadAlgorithm(Example)IndexerGFSClient(crawl_99,2048bytes)(crawl_99,index:3)(ch_1003,{chunkservers:4,7,9})213Mastercrawl_99Ch_1001{3,8,12}Ch_1002{1,8,14}Ch_1003{4,7,9}ReadAlgorithm(Example)Calculatingchunkindexfrombyterange:(Assumption:Filepositionis201,359,161bytes)Chunksize=64MB.64MB=1024*1024*64bytes=67,108,864bytes.201,359,161bytes=67,108,864*2+32,569bytes.So,clienttranslates2048byterange-chunkindex3.ReadAlgorithm(Example)ApplicationGFSClientChunkServer#4ChunkServer#7ChunkServer#9(ch_1003,{chunkservers:4,7,9})(2048bytesofdata)(2048bytesofdata)456WriteAlgorithmApplicationGFSClient(filename,data)Master(filename,chunkindex)(chunkhandle,primaryandsecondaryreplicalocations)213WriteAlgorithmApplicationGFSClient4BufferChunkPrimarySecondaryBufferChunkSecondaryBufferChunk(Data)(Data)(Data)WriteAlgorithmApplicationGFSClient5D1|D2|D3|D4ChunkPrimarySecondaryD1|D2|D3|D4ChunkSecondaryD1|D2|D3|D4Chunk(Writecommand)67(writecommand,serialorder)WriteAlgorithmApplicationGFSClient(empty)ChunkPrimarySecondary(empty)ChunkSecondary(empty)Chunk(response)(response)89WriteAlgorithm1.Applicationoriginateswriterequest.2.GFSclienttranslatesrequestfrom(filename,data)-(filename,chunkindex),andsendsittomaster.3.Masterrespondswithchunkhandleand(primary+secondary)replicalocations.4.Clientpusheswritedatatoalllocations.Dataisstoredinchunkservers’internalbuffers.5.Clientsendswritecommandtoprimary.WriteAlgorithm6.Primarydeterminesserialorderfordatainstancesstoredinitsbufferandwritestheinstancesinthatordertothechunk.7.Primarysendsserialordertothesecondariesandtellsthemtoperformthewrite.8.Secondariesrespondtotheprimary.9.Primaryrespondsbacktoclient.Note:Ifwritefailsatoneofchunkservers,clientisinformedandretriesthewrite.RecordAppendAlgorithmImportantoperationatGoogle:Mergingresultsfrommultiplemachinesinonefile.Usingfileasproducer-consumerqueue.1.Applicationoriginatesrecordappendrequest.2.GFSclienttranslatesrequestandsendsittomaster.3.Masterrespondswithchunkhandleand(primary+secondary)replicalocat
本文标题:The_Google_File_System Final
链接地址:https://www.777doc.com/doc-5453049 .html