flume总体集群建设方案

Flume集群Flume介绍Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统，支持在系统中定制各类数据发送方，用于收集数据；同时，Flume提供对数据进行简单处理，并写到各种数据接受方（可定制）的能力。Flume的逻辑架构：Flume逻辑上分三层架构：agent，collector，storageagent用于采集数据，agent是flume中产生数据流的地方，同时，agent会将产生的数据流传输到collector。collectorcollector的作用是将多个agent的数据汇总后，加载到storage中。storagestorage是存储系统，可以是一个普通file，也可以是HDFS，HIVE，HBase等。MasterMaster是管理协调agent和collector的配置等信息，是flume集群的控制器。在Flume中，最重要的抽象是dataflow（数据流），dataflow描述了数据从产生，传输、处理并最终写入目标的一条路径。对于agent数据流配置就是从哪得到数据，把数据发送到哪个collector。对于collector是接收agent发过来的数据，把数据发送到指定的目标机器上。Flume的特性•Reliability：Flume提供3中数据可靠性选项，包括End-to-end、Storeonfailure和Besteffort。其中End-to-end使用了磁盘日志和接受端Ack的方式，保证Flume接受到的数据会最终到达目的，但是效率是最差的。Storeonfailure在目的不可用的时候，数据会保持在本地硬盘，效率会比end-to-end高，但是会出现日志丢失的情况。Besteffort不做任何QoS保证，效率最高，日志记录没有保证。•Scalability：Flume的3大组件：collector、master和storagetier都是可伸缩的。需要注意的是，Flume中对事件的处理不需要带状态，它的Scalability可以很容易实现。•Manageability：master能够动态管理flume集群节点，多master情况，Flume利用ZooKeeper和gossip，保证配置数据的一致性。•Extensibility：基于Java，用户可以为Flume添加各种新的功能，如通过继承Source，用户可以实现自己的数据接入方式，实现Sink的子类，用户可以将数据写往特定目标，同时，通过SinkDecorator，用户可以对数据进行一定的预处理。注：Flume框架对hadoop和zookeeper的依赖只是在jar包上，并不要求flume启动时必须将hadoop和zookeeper服务也启动。Flume的分布式安装————此为目前集群的flume安装过程部署flume在集群上，按照如下步骤：在集群上的每台机器上安装flume选择一个或多个节点当做master修改静态配置文件在至少一台机器上启动一个master，所有节点启动flumenode接下来的一章描述如何手动修改配置文件为集群上的节点指定master，如何为参数设置默认值，本章的后一部分描述对于大系统的数据流配置，如何通过增加collector来扩充系统容量，如何提高可靠性通过增加更多的master。注意：flume集群整个集群的网络环境要保证稳定，可靠，否则会出现一些莫名错误（比如：agent端发送不了数据到collector）。集群每台机器上安装flume场景：操作系统版本：CentOS5.6Hadoop版本：0.20.2Jdk版本：jdk1.6.0_26安装flume版本：flume-0.9.4步骤1：下载flume最新版本，现在服务器上安装的是flume-distribution-0.9.4的版本，下载地址是目前flume的安装包是放在/data/sysdir/install_tar文件件的下面。步骤2：解压flume安装包到/data/sysdir文件夹下面，在命令行中输入tarzxvf/data/sysdir/install_tar/flume-distributin-0.9.4.tar.gz-C/data/sysdir步骤3：修改etc/profile文件，加入：exportFLUME_HOME=/data/sysdir/flume-distribution-0.9.4exportPATH=.:$PATH::$FLUME_HOME/bin步骤4：验证安装及其他安装完毕后，运行flume命令，会看到以下输出：flume配置文件位置：$FLUME_HOME/conf下选择一个或多个节点当做master对于master的选择情况，可以在集群上定义一个master，也可以为了提高可用性选择多个节点做为master,单点master模式：容易管理，但在系统的容错和扩展性有缺陷多点master模式：通常是运行3/5个master，能很好的容错原文如下：（）Standalonemode-thisiswheretheMasterrunsonasinglemachine.Thisiseasytoadminister,andsimpletoset-up,buthasdisadvantageswhenitcomestoscalabilityandfault-tolerance.Distributedmode-thisiswheretheMasterisconfiguredtorunonseveralmachines-usuallythreeorfive.ThisoptionscalestoservemanyFlows,andalsohasgoodfault-toleranceproperties.Flumemaster数量的选择原则原文如下：ThedistributedFlumeMasterwillcontinuetoworkcorrectlyaslongasmorethanhalfthephysicalmachinesrunningitarestillworkingandhaven’tcrashed.Thereforeifyouwanttosurviveonefault,youneedthreemachines(because3-1=23/2).Foreveryextrafaultyouwanttotolerate,addanothertwomachines,sofortwofaultsyouneedfivemachines.Notethathavinganevennumberofmachinesdoesn’tmaketheFlumeMasterusage:flumecommand[args...]commandsinclude:dumpTakesaspecifiedsourceanddumpstoconsolenodeStartaFlumenode/agent(withwatchdog)masterStartaFlumeMasterserver(withwatchdog)versionDumpflumebuildversioninformationnode_nowatchStartaflumenode/agent(nowatchdog)master_nowatchStartaFlumeMasterserver(nowatchdog)classclassRunspecifiedfullyqualifiedclassusingFlumeenvironment(nowatchdog)forexample:flumecom.cloudera.flume.agent.FlumeNodeshellStarttheflumeshellkillmasterKillarunningmasteranymorefault-tolerant-fourmachinesonlytolerateonefailure,becauseiftwoweretofailonlytwowouldbeleftfunctioning,whichisnotmorethanhalfoffour.Commondeploymentsshouldbewellservedbythreeorfivemachines.分布式的master能够继续正常工作不会崩溃，的前提是正常工作的master数量超过总master数量的一半。Flumemaster的作用主要有两个：原文如下：TheMasterhastwomainjobstoperform.ThefirstistokeeptrackofallthenodesinaFlumedeploymentandtokeeptheminformedofanychangestotheirconfiguration.ThesecondistotrackacknowledgementsfromtheendofaFlumeflowthatisoperatinginreliablemodesothatthesourceatthetopofthatflowknowswhentostoptransmittinganevent.Master主要有两个工作，第一是跟踪各节点的配置情况，通知节点配置的改变，第二是跟踪来自flow的结尾操控在可靠的模式下（E2E）的信息，以至于让flow的源头知道什么时候停止传输event。目前集群flumemaster的选择情况10.168.0.174、10.168.0.181、10.168.0.188为flumemaster修改静态配置文件Site-specific设置对于flume节点和master通过在每一个集群节点的conf/flume-site.xml是可配置的，如果这个文件不存在，设置的属性默认的在conf/flume--conf.xml中，在接下来的例子中，在flume的节点上设置master名，让节点自己去寻找叫“master”的flumeMasterconf/flume-conf.xml.?xmlversion=1.0??xml-stylesheettype=text/xslhref=configuration.xsl?configurationpropertynameflume.master.servers/namevaluemaster/value/property/configuration在多master的情况下需要如下配置：propertynameflume.master.servers/namevaluehadoopmaster.com,hadoopedge.com,datanode4.com/valuedescriptionAcomma-separatedlistofhostnames,oneforeachmachineintheFlumeMaster./description/propertypropertynameflume.master.store/namevaluezookeeper/valuedescriptionHowtheFlumeMasterstoresnodeconfigurations.Mustbeeither'zookeeper'or'memory'./description/propertypropertynameflume.master.serverid/namevalue2/valuedescriptionTheuniqueidentifierforamachineinaFlumeMasterensemble.Mustbedifferen

flume总体集群建设方案

免费阅读已结束，点击付费阅读剩下 ... 页

阅读已结束，您可以下载文档离线阅读

LKM-龙记-塑料模具-大水口模架-模具-mould

跟着韩都衣舍学习店铺运营-服装店必学精品知识（DOC9页）

掌握六大防骗技巧轻松网购家电

13建设工程评标办法

主体结构施工方案(方案改版)

环境工程系

路基石方爆破专项施工方案

金融学学习提纲

《劳动合同法》

家用电器产品资源管理系统的研究与应用

相关文档

相关搜索