您好,欢迎访问三七文档
当前位置:首页 > IT计算机/网络 > 数据挖掘与识别 > 分布式hadoop与spark集群搭建
1、设置root用户密码,以root用户登录,设置方式如下sudo-sgedit/etc/lightdm/lightdm.conf[SeatDefaults]greeter-session=unity-greeteruser-session=Ubuntugreeter-show-manual-login=trueallow-guest=false启用root帐号:(Ubuntu默认是禁止root账户的)sudopasswdroot设置好密码,重启系统,选择“login”,输入“root”,再输入密码就可以了。2、配置机器的/etc/hosts和/etc/hostname并安装ssh设置三台机器之间的无密码登录,在“/etc/hostname”文件中把三台机器的hostname分别设置了SparkMaster、SparkWorker1、SparkWorker2并在每台机器的“/etc/hosts”配置如下IP和机器名称的对应关系:127.0.0.1localhost192.168.32.131SparkMaster192.168.32.132SparkWorker1192.168.32.133SparkWorker2#ThefollowinglinesaredesirableforIPv6capablehosts::1ip6-localhostip6-loopbackfe00::0ip6-localnetff00::0ip6-mcastprefixff02::1ip6-allnodesff02::2ip6-allrouters可通过ipconfig来查看ip地址。可以pingSparkWorker1来查看ip是否配置成功下面配置ssh无密码登陆:1)apt-getinstallssh2)/etc/init.d/sshstart,启动服务3)ps-e|grepssh,验证服务是否正常启动4)设置免密登陆,生成私钥和公钥:ssh-keygen-trsa-P“”再/root/.ssh中生成两个文件:id_rsa和id_rsa.pub,id_rsa为私钥,id_rsa.pub为公钥,我们将公钥追加到authorized_keys中,cat~/.ssh/id_rsa.pub~/.ssh/authorized_keys将SparkWorker1、SparkWorker2的id_rsa.pub传给SparkMaster,使用scp命令进行复制:SparkWorker1上,scp~/.ssh/id_rsa.pubroot@SparkMaster:~/.ssh/id_rsa.pub.SparkWorker1SparkWorker2上,scp~/.ssh/id_rsa.pubroot@SparkMaster:~/.ssh/id_rsa.pub.SparkWorker2然后将公钥添加到SparkMaster的authorized_keys中,SparkMaster上,cd~/.sshcatid_rsa.pub.SparkWorker1authorized_keyscatid_rsa.pub.SparkWorker2authorized_keys再将SparkMaster的authorized_keys复制到SparkWorker1、SparkWorker2的.ssh目录下:scpauthorized_keysroot@SparkWorker1:~/.ssh/authorized_keysscpauthorized_keysroot@SparkWorker2:~/.ssh/authorized_keys至此,ssh无密登陆已配置完毕。sshSparkMastersshSparkWorker1sshSparkWorker2在一台机器上可以登录其他系统无需密码。3、配置java环境SparkMaster上,jdk-8u25-linux-i586.tar.gzmkdir/urs/lib/javacd/urs/lib/javatar-zxvfjdk-8u25-linux-i586.tar.gzgedit~/.bashrc在最后面添加,后面都用得上#JAVAexportJAVA_HOME=/usr/lib/java/jdk1.8.0_25exportJRE_HOME=${JAVA_HOME}/jreexportCLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/libexportHADOOP_HOME=/usr/local/hadoop/hadoop-2.6.0exportHADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/nativeexportHADOOP_OPTS=-Djava.library.path=$HADOOP_INSTALL/libexportSCALA_HOME=/usr/lib/scala/scala-2.11.4exportSPARK_HOME=/usr/local/spark/spark-1.2.0-bin-hadoop2.4exportIDEA_HOME=/usr/local/idea/idea-IC-139.659.2exportPATH=${IDEA_HOME}/bin:${SPARK_HOME}/bin:${SCALA_HOME}/bin:${HADOOP_HOME}/bin:${JAVA_HOME}/bin:$PATHsource~/.bashrc,使配置生效。java-version可查看版本号,可验证是否成功。在SparkWorker1,SparkWorker2上以同样方法配置,也可通过scp复制。scp-r/usr/lib/java/jdk1.8.0_25root@SparkWorker1:~/usr/lib/java/scp-r/usr/lib/java/jdk1.8.0_25root@SparkWorker2:~/usr/lib/java/scp~/.bashrcroot@SparkWorker1:~/.bashrcscp~/.bashrcroot@SparkWorker2:~/.bashrc复制完成后,在SparkWorker1,SparkWorker2上source~/.bashrc使配置生效。4、配置hadoop环境SparkMaster上,hadoop-2.6.0.tar.gzmkdir/urs/lib/hadoopcd/urs/lib/hadooptar-zxvfhadoop-2.6.0.tar.gzcdhadoop-2.6.0mkdirdfscddfsmkdirnamemkdirdatacd..mkdirtmp接下来开始修改hadoop的配置文件,首先进入Hadoop2.6.0配置文件区:cdetc/hadoop第一步修改配置文件hadoop-env.sh,在其中加入“JAVA_HOME”,指定我们安装的“JAVA_HOME”:#Thejavaimplementationtouse.exportJAVA_HOME=/usr/lib/java/jdk1.8.0_25第二步修改配置文件yarn-env.sh,在其中加入“JAVA_HOME”,#someJavaparametersexportJAVA_HOME=/usr/lib/java/jdk1.8.0_25if[$JAVA_HOME!=];then#echorunjavain$JAVA_HOMEJAVA_HOME=$JAVA_HOMEfi第三步修改配置文件mapred-env.sh,在其中加入“JAVA_HOME”,如下所示:#exportJAVA_HOME=/home/y/libexec/jdk1.6.0/exportJAVA_HOME=/usr/lib/java/jdk1.8.0_25exportHADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000exportHADOOP_MAPRED_ROOT_LOGGER=INFO,RFA第四步修改配置文件slaves,设置Hadoop集群中的从节点为SparkWorker1和SparkWorker2,SparkWorker1SparkWorker2第五步修改配置文件core-site.xml,如下所示:?xmlversion=1.0encoding=UTF-8??xml-stylesheettype=text/xslhref=configuration.xsl?!--LicensedundertheApacheLicense,Version2.0(theLicense);youmaynotusethisfileexceptincompliancewiththeLicense.YoumayobtainacopyoftheLicenseat!--Putsite-specificpropertyoverridesinthisfile.--configurationpropertynamefs.defaultFS/namevaluehdfs://SparkMaster:9000/valuedescriptionThenameofdefaultfilesystem/description/propertypropertynamehadoop.tmp.dir/namevalue/home/local/hadoop/hadoop-2.6.0/tmp/valuedescriptionAbaseforothertemporarydirectories/description/property/configuration第六步修改配置文件hdfs-site.xml,如下所示:?xmlversion=1.0encoding=UTF-8??xml-stylesheettype=text/xslhref=configuration.xsl?!--LicensedundertheApacheLicense,Version2.0(theLicense);youmaynotusethisfileexceptincompliancewiththeLicense.YoumayobtainacopyoftheLicenseat!--Pu
本文标题:分布式hadoop与spark集群搭建
链接地址:https://www.777doc.com/doc-2642436 .html