大数据架构与应用——Storm计算平台56

ApacheStorm——开发实践案例：Usingcheck-instobuildaheatmapofbarsTopology设计开发案例：Usingcheck-instobuildaheatmapofbarsTopology设计开发CheckinsGeocodeLookupHeatMapBuilderPersistor案例：Usingcheck-instobuildaheatmapofbarsTopology设计开发CheckinsTopology设计开发GeocodeLookupTopology设计开发PersistorTopology设计开发Topology设计开发PerformanceTuningbuilder.setSpout(checkins,newCheckins(),4);Topology设计开发PerformanceTuningbuilder.setBolt(geocode-lookup,newGeocodeLookup(),8);Topology设计开发Executorsandtasksbuilder.setBolt(geocode-lookup,newGeocodeLookup(),8)builder.setBolt(geocode-lookup,newGeocodeLookup(),8).setNumTasks(8)TopologystreamgroupingsShufflegroupingdistributeoutgoingtuplesfromonecomponenttothenextinamannerthat’srandombutevenlyspreadout.Fieldsgroupingnsuretupleswiththesamevaluesforaselectedsetoffieldsalwaysgotothesameinstanceofthenextbolt.Topology设计开发DesignbybreakdownintofunctionalcomponentsTopologyDesignTopology设计开发TopologyDesignDesignbybreakdownintocomponentsatpointsofrepartitionTopology设计开发Simplestfunctionalcomponentsvs.lowestnumberofrepartitionsTopology设计开发Reliability？可靠的消息处理案例：e-commercecreditcardauthorizationflow可靠的消息处理案例：e-commercecreditcardauthorizationflow可靠的消息处理AuthorizeCreditCard可靠的消息处理ProcessedOrderNotification可靠的消息处理TupleTree可靠的消息处理BaseBasicBolt—Implicitanchoring,acking,andfailingAnchoringoutgoingordertuplewillbeautomaticallyanchoredtotheincomingordertuple.outputCollector.emit(newValues(order));AckingthetuplethatwassenttoitwillbeautomaticallyackedFailingthrowingaFailedExceptionorReportedFailed-Exception可靠的消息处理BaseRichBolt—Explicitanchoring,acking,andfailingAnchoringoutputCollector.emit(tuple,newValues(order))AckingoutputCollector.ack(tuple)FailingoutputCollector.fail(tuple);可靠的消息处理可靠的消息处理HandlingfailuresandknowingwhentoretryRetriableNonretriableUnknownerrors可靠的消息处理Spout’sroleinguaranteedmessageprocessingStorm调用nextTuple()获取一个新tuple.Spout使用SpoutOutputCollector向下游提供tuple.当提供tuple时,Spout通过一个messageId唯一标识这个tuple.spoutOutputCollector.emit(tuple,messageId);当tuple发送到下游bolts时，Storm开始跟踪这个tuple树.跟踪是通过下游bolts的锚定（anchoring）和应答（acking）实现。当Storm侦测到一个tuple被成功处理,Storm会调用spout的ack(ObjectmessageId)方法.如果一个tuple超时或者下游bolt使之失败（fail），Storm会调用spout的fail(ObjectmessageId)方法.可靠的消息处理Spout’sroleinguaranteedmessageprocessing可靠的消息处理RabbitMQSpout可靠的消息处理RabbitMQSpout可靠的消息处理DegreesofreliabilityinStormAt-most-onceprocessingAt-least-onceprocessingExactly-onceprocessing可靠的消息处理Betterat-least-onceprocessing可靠的消息处理NimbusandSupervisorsinStormclusterStormClusterZookeeperClusterinStormclusterStormClusterWorkernodeStormClusterQuestionAnswerWhatifaworkernodedies?Supervisorwillrestartitandnewtaskswillbeassignedtoit.Alltuplesthatweren’tfullyackedattimeofdeathwillbefullyreplayedbythespout.Thisiswhythespoutneedstosupportreplaying(reliablespout)andthedatasourcebehindthespoutalsoneedstobereliable(supportingreplay).Whatifaworkernodecontinuouslyfailstostartup?Nimbuswillreassigntaskstoanotherworker.Whatifanactualmachinethatrunsworkernodesdies?Nimbuswillreassignthetasksonthatmachinetohealthymachines.StormClusterQuestionAnswerWhatifNimbusdies?BecauseNimbusisbeingrunundersupervision(usingatoollikedaemontoolsormonit),itshouldgetrestartedandcontinueprocessinglikenothinghappened.WhatifaSupervisordies?BecauseSupervisorsarebeingrunundersupervision(usingatoollikedaemontoolsormonit),theyshouldgetrestartedlikenothinghappened.IsNimbusasinglepointoffailure?Notnecessarily.Supervisorsandworkernodeswillcontinuetoprocess,butyoulosetheabilitytoreassignworkerstoothermachinesordeploynewtopologies.StormClusterTridentinStormTridentinStormTridentisahigh-levelAPIonStormNativeStormvsTridentHOWWHATTridentinStormTridentinStormTridentoperationsFunctions—Operateonanincomingtupleandemitoneormorecorrespondingtuples.Filters—Decidetokeeporfilteroutanincomingtuplefromthestream.Splits—Splittingastreamwillresultinmultiplestreamswiththesamedataandfields.Merges—Streamscanbemergedonlyiftheyhavethesamefields(samefieldnamesandsamenumberoffields).Joins—Joiningisfordifferentstreamswithmostlydifferentfields,exceptforoneormorecommonfield(s)tojoinon(similartoaSQLjoin).Grouping—Groupbyspecificfield(s)withinapartition(moreonpartitionslater).Aggregation—Performcalculationsforaggregatingsetsoftuples.Stateupdater—Persisttuplesorcalculatedvaluestoadatastore.Statequerying—Queryadatastore.Repartitioning—Repartitionthestreambyhashingonspecificfield(s)(similartoafieldsgrouping)orinarandommanner(similartoashufflegrouping).TridentinStormStreamsdiff:NativevsTridentTridentinStormWhatiskafka?It’sapublish-subscribemessagebroker,rethoughtasadistributedcommitlog.It’sadistributed,partitioned,replicatedcommitlogservicethatprovidesthefunctionalityofamessagingsystem,butwithauniquedesign.PartitioningfordistributingaKafkatopicModelingstorageasacommitlogFunctionaladvantagesofKafka消息不会被立即丢弃，所以消费者(consumer)可以决定何时推进消息的偏移量(offset)。因此在kafka中很容易重试消息。类似的，如果队列中积累了大量的消息，而这些消息已经过时，消费者可以推进一个大的偏移量，忽略掉这部分过期的消息。如果消费者需要按批处理消息，可以在一个partition中按批推进偏移量。如果存在不同的应用需要从同一个topic中订阅相同的消息，消费者可以从topic的相同分区中获取这些消息。实现该功能的机制是因为消息在被一个消费者读取后不会立即丢弃，而每个消费者在partition中控制自己的偏移量。如果想确保每条消息只被一个消费

大数据架构与应用——Storm计算平台56

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

Mobile最佳移动数据应用-MicrosoftCor

重庆房地产案名集

第一篇民用建筑设计(48学时)

城市轨道交通票务管理单元2自动售检票系统

春亿农业科技商业计划书（PPT39页)

数控加工工艺与编程教案（DOC100页）

董事会1

施工方城农村集体土地使用确权登记发证项目

某物业公司绩效考核与薪酬体系方案（PPT 49页）

《会计学》习题

相关文档

相关搜索

大数据架构与应用——Storm计算平台56

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

Mobile最佳移动数据应用-MicrosoftCor

重庆房地产案名集

第一篇 民用建筑设计(48学时)

城市轨道交通票务管理单元2自动售检票系统

春亿农业科技商业计划书（PPT39页)

数控加工工艺与编程教案（DOC100页）

董事会1

施工方城农村集体土地使用确权登记发证项目

某物业公司绩效考核与薪酬体系方案（PPT 49页）

《会计学》习题

相关文档

相关搜索

第一篇民用建筑设计(48学时)