您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据挖掘与识别 > 大数据架构与应用——Storm计算平台56
ApacheStorm——开发实践案例:Usingcheck-instobuildaheatmapofbarsTopology设计开发案例:Usingcheck-instobuildaheatmapofbarsTopology设计开发CheckinsGeocodeLookupHeatMapBuilderPersistor案例:Usingcheck-instobuildaheatmapofbarsTopology设计开发CheckinsTopology设计开发GeocodeLookupTopology设计开发PersistorTopology设计开发Topology设计开发PerformanceTuningbuilder.setSpout(checkins,newCheckins(),4);Topology设计开发PerformanceTuningbuilder.setBolt(geocode-lookup,newGeocodeLookup(),8);Topology设计开发Executorsandtasksbuilder.setBolt(geocode-lookup,newGeocodeLookup(),8)builder.setBolt(geocode-lookup,newGeocodeLookup(),8).setNumTasks(8)TopologystreamgroupingsShufflegroupingdistributeoutgoingtuplesfromonecomponenttothenextinamannerthat’srandombutevenlyspreadout.Fieldsgroupingnsuretupleswiththesamevaluesforaselectedsetoffieldsalwaysgotothesameinstanceofthenextbolt.Topology设计开发DesignbybreakdownintofunctionalcomponentsTopologyDesignTopology设计开发TopologyDesignDesignbybreakdownintocomponentsatpointsofrepartitionTopology设计开发Simplestfunctionalcomponentsvs.lowestnumberofrepartitionsTopology设计开发Reliability?可靠的消息处理案例:e-commercecreditcardauthorizationflow可靠的消息处理案例:e-commercecreditcardauthorizationflow可靠的消息处理AuthorizeCreditCard可靠的消息处理ProcessedOrderNotification可靠的消息处理TupleTree可靠的消息处理BaseBasicBolt—Implicitanchoring,acking,andfailingAnchoringoutgoingordertuplewillbeautomaticallyanchoredtotheincomingordertuple.outputCollector.emit(newValues(order));AckingthetuplethatwassenttoitwillbeautomaticallyackedFailingthrowingaFailedExceptionorReportedFailed-Exception可靠的消息处理BaseRichBolt—Explicitanchoring,acking,andfailingAnchoringoutputCollector.emit(tuple,newValues(order))AckingoutputCollector.ack(tuple)FailingoutputCollector.fail(tuple);可靠的消息处理可靠的消息处理HandlingfailuresandknowingwhentoretryRetriableNonretriableUnknownerrors可靠的消息处理Spout’sroleinguaranteedmessageprocessingStorm调用nextTuple()获取一个新tuple.Spout使用SpoutOutputCollector向下游提供tuple.当提供tuple时,Spout通过一个messageId唯一标识这个tuple.spoutOutputCollector.emit(tuple,messageId);当tuple发送到下游bolts时,Storm开始跟踪这个tuple树.跟踪是通过下游bolts的锚定(anchoring)和应答(acking)实现。当Storm侦测到一个tuple被成功处理,Storm会调用spout的ack(ObjectmessageId)方法.如果一个tuple超时或者下游bolt使之失败(fail),Storm会调用spout的fail(ObjectmessageId)方法.可靠的消息处理Spout’sroleinguaranteedmessageprocessing可靠的消息处理RabbitMQSpout可靠的消息处理RabbitMQSpout可靠的消息处理DegreesofreliabilityinStormAt-most-onceprocessingAt-least-onceprocessingExactly-onceprocessing可靠的消息处理Betterat-least-onceprocessing可靠的消息处理NimbusandSupervisorsinStormclusterStormClusterZookeeperClusterinStormclusterStormClusterWorkernodeStormClusterQuestionAnswerWhatifaworkernodedies?Supervisorwillrestartitandnewtaskswillbeassignedtoit.Alltuplesthatweren’tfullyackedattimeofdeathwillbefullyreplayedbythespout.Thisiswhythespoutneedstosupportreplaying(reliablespout)andthedatasourcebehindthespoutalsoneedstobereliable(supportingreplay).Whatifaworkernodecontinuouslyfailstostartup?Nimbuswillreassigntaskstoanotherworker.Whatifanactualmachinethatrunsworkernodesdies?Nimbuswillreassignthetasksonthatmachinetohealthymachines.StormClusterQuestionAnswerWhatifNimbusdies?BecauseNimbusisbeingrunundersupervision(usingatoollikedaemontoolsormonit),itshouldgetrestartedandcontinueprocessinglikenothinghappened.WhatifaSupervisordies?BecauseSupervisorsarebeingrunundersupervision(usingatoollikedaemontoolsormonit),theyshouldgetrestartedlikenothinghappened.IsNimbusasinglepointoffailure?Notnecessarily.Supervisorsandworkernodeswillcontinuetoprocess,butyoulosetheabilitytoreassignworkerstoothermachinesordeploynewtopologies.StormClusterTridentinStormTridentinStormTridentisahigh-levelAPIonStormNativeStormvsTridentHOWWHATTridentinStormTridentinStormTridentoperationsFunctions—Operateonanincomingtupleandemitoneormorecorrespondingtuples.Filters—Decidetokeeporfilteroutanincomingtuplefromthestream.Splits—Splittingastreamwillresultinmultiplestreamswiththesamedataandfields.Merges—Streamscanbemergedonlyiftheyhavethesamefields(samefieldnamesandsamenumberoffields).Joins—Joiningisfordifferentstreamswithmostlydifferentfields,exceptforoneormorecommonfield(s)tojoinon(similartoaSQLjoin).Grouping—Groupbyspecificfield(s)withinapartition(moreonpartitionslater).Aggregation—Performcalculationsforaggregatingsetsoftuples.Stateupdater—Persisttuplesorcalculatedvaluestoadatastore.Statequerying—Queryadatastore.Repartitioning—Repartitionthestreambyhashingonspecificfield(s)(similartoafieldsgrouping)orinarandommanner(similartoashufflegrouping).TridentinStormStreamsdiff:NativevsTridentTridentinStormWhatiskafka?It’sapublish-subscribemessagebroker,rethoughtasadistributedcommitlog.It’sadistributed,partitioned,replicatedcommitlogservicethatprovidesthefunctionalityofamessagingsystem,butwithauniquedesign.PartitioningfordistributingaKafkatopicModelingstorageasacommitlogFunctionaladvantagesofKafka消息不会被立即丢弃,所以消费者(consumer)可以决定何时推进消息的偏移量(offset)。因此在kafka中很容易重试消息。类似的,如果队列中积累了大量的消息,而这些消息已经过时,消费者可以推进一个大的偏移量,忽略掉这部分过期的消息。如果消费者需要按批处理消息,可以在一个partition中按批推进偏移量。如果存在不同的应用需要从同一个topic中订阅相同的消息,消费者可以从topic的相同分区中获取这些消息。实现该功能的机制是因为消息在被一个消费者读取后不会立即丢弃,而每个消费者在partition中控制自己的偏移量。如果想确保每条消息只被一个消费
本文标题:大数据架构与应用——Storm计算平台56
链接地址:https://www.777doc.com/doc-28366 .html