CDH集群安装

1.安装工具

1
2
3
4
5
6
mkdir -p /data/software/
cd /data/software/
wget http://www.theether.org/pssh/pssh-1.4.3.tar.gz
tar -zxvf pssh-1.4.3.tar.gz
cd pssh-1.4.3/
python setup.py install

准备工作

  • 查看服务器系统版本

    1
    2
    [root@xxx software]#  cat /etc/redhat-release
    CentOS Linux release 7.2.1511 (Core)
  • 首先修改各服务器的主机名

    1
    2
    3
    hostnamectl set-hostname  hadoop1
    localectl set-locale LANG=zh_CN.utf8
    .....
  • 在各服务器生成秘钥

    1
    ssh-keygen -t rsa
  • 修改/etc/hosts

    1
    vim /etc/hosts
  • 配置免密码登录
    将Cloudera Manager Server的秘钥放到追加到各服务器/root/.ssh/authorized_keys

  • 同步时间,不是必须的(用的腾讯云)

    1
    2
    # crontab -e
    */20 * * * * /usr/sbin/ntpdate ntpupdate.tencentyun.com >/dev/null &
  • 添加用户和组

    1
    2
    3
    4
    5
    6
    7
    8
    useradd --system --home=/data/work/cm-5.7.2 --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm

    id cloudera-scm
    下为可选操作
    useradd sqoop2 -g sqoop
    groupadd hdfs
    groupadd hadoop
    useradd hdfs -g hadoop -d /var/lib/hadoop-hdfs/ -c 'Hadoop HDFS'
  • 修改部分配置[应该不是必须的]
    以root 用户执行命令

    1
    2
    3
    echo 10 > /proc/sys/vm/swappiness
    echo never > /sys/kernel/mm/transparent_hugepage/defrag
    echo never > /sys/kernel/mm/transparent_hugepage/enabled
  • 安装JDK8并配置好环境

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    root@hadoop001 work]# tar -cvf jdk8.tar /data/work/java1.8
    root@hadoop001 work]# scp jdk8.tar root@hadoop019:/data/software/
    [root@hadoop019 work]# vim /etc/profile
    export JAVA_HOME=/data/work/jdk/jdk1.8.0_111/
    export JRE_HOME=/data/work/jdk/jdk1.8.0_111//jre
    export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
    export LD_LIBRARY_PATH=/usr/lib/oracle/11.2/client64/lib/:$LD_LIBRARY_PATH
    export PATH=$PATH:$JAVA_HOME/bin:$MVN_HOME/bin:$JRE_HOME/bin
    [root@hadoop019 work]# source /etc/profile
    [root@hadoop019 work]# java -version

挂载磁盘

查看所有磁盘 fdisk -l
查看现有磁盘信息 df -hT
格式化磁盘 mkfs.xfs /dev/vdb
挂载磁盘
mkdir /data
mount /dev/vdb /data

下载Cloudera Manager

下载相应的cloudera-manager-centos7-cm*.tar.gz到/opt/cloudera-manager/
下载地址:http://archive.cloudera.com/cm5/cm/5/

1
2
3
4
wget https://archive.cloudera.com/cm5/cm/5/cloudera-manager-centos7-cm5.16.1_x86_64.tar.gz
wget http://archive.cloudera.com/cdh5/parcels/5.16.1/CDH-5.16.1-1.cdh5.16.1.p0.3-el7.parcel
wget http://archive.cloudera.com/cdh5/parcels/5.16.1/CDH-5.16.1-1.cdh5.16.1.p0.3-el7.parcel.sha1
wget http://archive.cloudera.com/cdh5/parcels/5.16.1/manifest.json

安装第三方依赖

1
yum install chkconfig python bind-utils psmisc libxslt zlib sqlite fuse fuse-libs redhat-lsb cyrus-sasl-plain cyrus-sasl-gssapi

创建数据库

如下命令是创建部署各个服务所需的数据库,我本人倾向用不用先创建好,用的时候就可以直接部署服务了,不必再来数据库进行创建。
1
2
3
4
5
create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database monitor DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

解压并修改配置文件

1
2
3
4
5
tar -zxvf cloudera-manager-centos7-cm5.16.1_x86_64.tar.gz

vim cm-5.16.1/etc/cloudera-scm-agent/config.ini
修改server_host为Cloudera Manager Server的地址
server_host=hadoop1

配置仓库目录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@hadoop1 cloudera-manager]# mkdir -p /opt/cloudera/parcel-repo
[root@hadoop1 cloudera-manager]# chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
[root@hadoop1 cloudera-manager]# cd /opt/cloudera/parcel-repo
[root@hadoop1 parcel-repo]# cp /data/work/cloudera-manager/
CDH-5.16.1-1.cdh5.16.1.p0.3-el7.parcel cloudera/ cm-5.16.1/
CDH-5.16.1-1.cdh5.16.1.p0.3-el7.parcel.sha1 cloudera-manager-centos7-cm5.16.1_x86_64.tar.gz manifest.json
[root@hadoop1 parcel-repo]# cp /data/work/cloudera-manager/CDH-5.16.1-1.cdh5.16.1.p0.3-el7.parcel*
cp:是否覆盖"/data/work/cloudera-manager/CDH-5.16.1-1.cdh5.16.1.p0.3-el7.parcel.sha1"? ^C
[root@hadoop1 parcel-repo]# cp /data/work/cloudera-manager/CDH-5.16.1-1.cdh5.16.1.p0.3-el7.parcel* ./
[root@hadoop1 parcel-repo]# ls
CDH-5.16.1-1.cdh5.16.1.p0.3-el7.parcel CDH-5.16.1-1.cdh5.16.1.p0.3-el7.parcel.sha1
[root@hadoop1 parcel-repo]# cp /data/work/cloudera-manager/manifest.json ./
[root@hadoop1 parcel-repo]# ls
CDH-5.16.1-1.cdh5.16.1.p0.3-el7.parcel CDH-5.16.1-1.cdh5.16.1.p0.3-el7.parcel.sha1 manifest.json

初始化数据库
此操作在主节点上进行,初始脚本配置数据库scm_prepare_database.sh,操作命令如下:

[root@cdh01 ~]/opt/cloudera-manager/cm-5.7.2/share/cmf/schema/scm_prepare_database.sh mysql -hcdh01 -uroot -proot –scm-host cdh01 scmdbn scmdbu scmdbp
说明:这个脚本就是用来创建和配置CMS需要的数据库的脚本。各参数是指:

mysql:数据库用的是mysql,如果安装过程中用的oracle,那么该参数就应该改为oracle。

-hcdh01:数据库建立在cdh01主机上面。也就是主节点上面。

-uroot:root身份运行mysql。-proot:mysql的root密码是root。

--scm-host cdh01:CMS的主机,一般是和mysql安装的主机是在同一个主机上。

最后三个参数是:数据库名,数据库用户名,数据库密码。

执行完成命令正常如下:


在这个地方就可以解释上面为什么要改jar名了。

配置CDH从节点目录

在所有的节点上创建parcels目录,操作如下:

mkdir -p /opt/cloudera/parcels
chown cloudera-scm:cloudera-scm /opt/cloudera/parcels
解释:Clouder-Manager将CDH从主节点的/opt/cloudera/parcel-repo目录中抽取出来,分发解压激活到各个节点的/opt/cloudera/parcels目录中。

启动Agent

1
/opt/cloudera-manager/cm-5.11.0/etc/init.d/cloudera-scm-agent start

在管理页面进行配置

  • 登录http://hadoop1:7180
  • 点击“主机”->“所有主机” ,可以看到所有节点,包括刚才添加的节点。这个时候新添加节点的状态是红的,正常情况下,过2分钟,节点的状态变为绿色正常。
  • 点击右上角“向集群添加主机”,然后点击 继续
  • 输入hadoop05,点击 搜索 ,出现如下界面
  • 点击 当前管理的主机(1)
  • 按照提示进行下边的操作

为新节点分配角色

例如:回到首页面,点击 HDFS ,点击 实例 ,选中新添加节点,点击 添加角色实例,然后按照提示依次进行。

CDH集群扩容记录

系统为Centos7.5,向集群添加hadoop019服务器(下边图片中的服务器名为hadoop04…,大致流程是这样的,之后会完善的)。

准备工作

  • 查看服务器系统版本

    1
    2
    [root@VM_10_10_centos ~]# cat /etc/redhat-release
    CentOS Linux release 7.5.1804 (Core)
  • 首先修改各新服务器的主机名

    1
    2
    hostnamectl set-hostname  hadoop019
    localectl set-locale LANG=zh_CN.utf8

Read More

Git命令查询远程URL路径

If you want only the remote URL, or referential integrity has been broken:

git config –get remote.origin.url

If you require full output or referential integrity is intact:

git remote show origin

When using git clone (from GitHub, or any source repository for that matter) the default name for the source of the clone is “origin”. Using git remote show will display the information about this remote name. The first few lines should show:

1
2
3
4
5
6
C:\Users\jaredpar\VsVim> git remote show origin
* remote origin
Fetch URL: git@github.com:jaredpar/VsVim.git
Push URL: git@github.com:jaredpar/VsVim.git
HEAD branch: master
Remote branches:

If you want to use the value in the script, you would use the first command listed in this answer.

Nginx学习笔记2 负载均衡

Introduction
Load balancing across multiple application instances is a commonly used technique for optimizing resource utilization, maximizing throughput, reducing latency, and ensuring fault-tolerant configurations.

It is possible to use nginx as a very efficient HTTP load balancer to distribute traffic to several application servers and to improve performance, scalability and reliability of web applications with nginx.

Load balancing methods
The following load balancing mechanisms (or methods) are supported in nginx:

round-robin — requests to the application servers are distributed in a round-robin fashion,
least-connected — next request is assigned to the server with the least number of active connections,
ip-hash — a hash-function is used to determine what server should be selected for the next request (based on the client’s IP address).
Default load balancing configuration
The simplest configuration for load balancing with nginx may look like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
http {
upstream myapp1 {
server srv1.example.com;
server srv2.example.com;
server srv3.example.com;
}

server {
listen 80;

location / {
proxy_pass http://myapp1;
}
}
}

Read More