您好,登錄后才能下訂單哦!
HDFS Hadoop 分布式文件系統
分布式文件系統
分布式文件系統可以有效解決數據的存儲和管理難題
– 將固定于某個地點的某個文件系統,擴展到任意多個地點/多個文件系統
– 眾多的節點組成一個文件系統網絡
– 每個節點可以分布在不同的地點,通過網絡進行節點間的通信和數據傳輸
– 人們在使用分布式文件系統時,無需關心數據是存儲在哪個節點上、或者是從哪個節點從獲取的,只需要像使用本地文件系統一樣管理和存儲文件系統中的數據
HDFS 角色及概念
? 是Hadoop體系中數據存儲管理的基礎。它是一個高度容錯的系統,用于在低成本的通用硬件上運行。
? 角色和概念
– Client
– Namenode
– Secondarynode
– Datanode
? NameNode
– Master節點,管理HDFS的名稱空間和數據塊映射信息,配置副本策略,處理所有客戶端請求。
? Secondary NameNode
– 定期合并 fsimage 和fsedits,推送給NameNode
– 緊急情況下,可輔助恢復NameNode,
? 但Secondary NameNode并非NameNode的熱備。
? DataNode
– 數據存儲節點,存儲實際的數據
– 匯報存儲信息給NameNode。
? Client
– 切分文件
– 訪問HDFS
– 與NameNode交互,獲取文件位置信息
– 與DataNode交互,讀取和寫入數據。
? Block
– 每塊缺省64MB大小
– 每塊可以多個副本
搭建部署 HDFS 分布式文件系統
實驗環境準備:
# vim /etc/hosts
.. ..
192.168.4.1master
192.168.4.2node1
192.168.4.3node2
192.168.4.4node3
# sed -ri "/Host */aStrictHostKeyChecking no" /etc/ssh/ssh_config
# ssh-keygen
# for i in {1..4}
> do
> ssh-copy-id 192.168.4.${i}
> done
# for i in {1..4} //同步本地域名
> do
> rsync -a /etc/hosts 192.168.4.${i}:/etc/hosts
> done
# rm -rf /etc/yum.repos.d/*
# vim /etc/yum.repos.d/yum.repo //配置網絡yum
[yum]
name=yum
baseurl=http://192.168.4.254/rhel7
gpgcheck=0
# for i in {2..4}
> do
> ssh 192.168.4.${i} "rm -rf /etc/yum.repos.d/*"
> rsync -a /etc/yum.repos.d/yum.repo 192.168.4.${i}:/etc/yum.repos.d/
> done
# for i in {1..4}
> do
> ssh 192.168.4.${i} 'sed -ri "s/^(SELINUX=).*/\1disabled/" /etc/selinux/config ; yum -y remove firewalld'
> done
//所有機器重啟
搭建完全分布式
系統規劃:
主機 角色 軟件
192.168.4.1 master NameNode SecondaryNameNode HDFS
192.168.4.2 node1 DataNode HDFS
192.168.4.3 node2 DataNode HDFS
192.168.4.4 node3 DataNode HDFS
在所有系統上安裝java 環境和調試工具jtarps
# for i in {1..4}
> do
> ssh 192.168.4.${i} "yum -y install java-1.8.0-openjdk-devel.x86_64"
> done
# which java
/usr/bin/java
# readlink -f /usr/bin/java
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre/bin/java
安裝 hadoop
# tar -xf hadoop-2.7.3.tar.gz
# mv hadoop-2.7.3 /usr/local/hadoop
修改配置
# cd /usr/local/hadoop/
# sed -ri "s;(export JAVA_HOME=).*;\1/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre;" etc/hadoop/hadoop-env.sh
# sed -ri "s;(export HADOOP_CONF_DIR=).*;\1/usr/local/hadoop/etc/hadoop;" etc/hadoop/hadoop-env.sh
# sed -n "25p;33p" etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-3.b17.el7.x86_64/jre
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
//配置參數說明 網站http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/core-default.xml
# vim etc/hadoop/core-site.xml
.. ..
<configuration>
<property>
<name>fs.defaultFS</name> //默認的文件系統
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name> //所有程序存放位置 hadoop根目錄
<value>/var/hadoop</value>
</property>
</configuration>
//所有機器上創建 根目錄
# for i in {1..4}
> do
> ssh 192.168.4.${i} "mkdir /var/hadoop"
> done
//配置參數說明 網站http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
# vim etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.http-address</name> //配置namenode 地址
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name> //配置 secondarynamenode 地址
<value>master:50090</value>
</property>
<property>
<name>dfs.replication</name> //配置數據存儲幾份
<value>2</value>
</property>
</configuration>
# vim etc/hadoop/slaves //配置去那些主機上尋找 DataNode
node1
node2
node3
配置完成以后,把 hadoop 的文件夾拷貝到所有機器
# for i in {2..4}
> do
> rsync -azSH --delete /usr/local/hadoop 192.168.4.${i}:/usr/local/ -e "ssh"
> done
//在 NameNode 下執行格式化 Hadoop
# ./bin/hdfs namenode -format
看見 successfully formatted. 說明 格式化成功了
//在沒有報錯的情況下 啟動集群
# ./sbin/start-dfs.sh
啟動以后分別在 namenode 和 datanode執行命令
# for i in master node{1..3}
> do
> echo $i
> ssh ${i} "jps"
> done
master
4562 SecondaryNameNode
4827 NameNode
5149 Jps
node1
3959 DataNode
4105 Jps
node2
3957 Jps
3803 DataNode
node3
3956 Jps
3803 DataNode
# ./bin/hdfs dfsadmin -report //查看注冊成功的節點
Configured Capacity: 160982630400 (149.93 GB)
Present Capacity: 150644051968 (140.30 GB)
DFS Remaining: 150644039680 (140.30 GB)
DFS Used: 12288 (12 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (3):
Name: 192.168.4.2:50010 (node1)
Hostname: node1
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3446755328 (3.21 GB)
DFS Remaining: 50214117376 (46.77 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.58%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Jan 29 21:17:39 EST 2018
Name: 192.168.4.4:50010 (node3)
Hostname: node3
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3445944320 (3.21 GB)
DFS Remaining: 50214928384 (46.77 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.58%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Jan 29 21:17:39 EST 2018
Name: 192.168.4.3:50010 (node2)
Hostname: node2
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3445878784 (3.21 GB)
DFS Remaining: 50214993920 (46.77 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.58%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Jan 29 21:17:39 EST 2018
namenode
secondarynamenode
datanode
HDFS 基本使用
HDFS 基本命令 幾乎和shell命令相同
# ./bin/hadoop fs -ls hdfs://master:9000/
# ./bin/hadoop fs -mkdir /test
# ./bin/hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2018-01-29 21:35 /test
# ./bin/hadoop fs -rmdir /test
# ./bin/hadoop fs -mkdir /input
# ./bin/hadoop fs -put *.txt /input //上傳文件
# ./bin/hadoop fs -ls /input
Found 3 items
-rw-r--r-- 2 root supergroup 84854 2018-01-29 21:37 /input/LICENSE.txt
-rw-r--r-- 2 root supergroup 14978 2018-01-29 21:37 /input/NOTICE.txt
-rw-r--r-- 2 root supergroup 1366 2018-01-29 21:37 /input/README.txt
# ./bin/hadoop fs -get /input/README.txt /root/ //下載文件
# ls /root/README.txt
/root/README.txt
HDFS 增加節點
– 1. 配置所有hadoop環境,包括主機名、ssh免密碼登錄、禁用 selinux、iptables、安裝 java 環境
[root@newnode ~]# yum -y install java-1.8.0-openjdk-devel.x86_64
[root@master ~] # cat /etc/hosts
192.168.4.1 master
192.168.4.2 node1
192.168.4.3 node2
192.168.4.4 node3
192.168.4.5 newnode
– 2. 修改namenode的slaves文件增加該節點
[root@master ~]# cd /usr/local/hadoop/etc/hadoop/
[root@master hadoop]# echo newnode >> slaves
– 3. 把namnode的配置文件復制到配置文件目錄下
# cat /root/rsyncfile.sh
#!/bin/bash
for i in node{2..4}
do
rsync -azSH --delete /usr/local/hadoop/etc/hadoop ${i}:/usr/local/hadoop/etc/ -e 'ssh' &
done
wait
[root@master hadoop]# bash /root/rsyncfile.sh
[root@newnode ~]# rsync -azSH --delete master:/usr/local/hadoop /usr/local
– 5. 在該節點啟動Datanode
[root@newnode ~]# cd /usr/local/hadoop/
[root@newnode hadoop]# ./sbin/hadoop-daemon.sh start datanode
[root@newnode hadoop]# jps
4007 Jps
3705 DataNode
– 6. 查看集群狀態
[root@master hadoop]# cd /usr/local/hadoop/
[root@master hadoop]# ./bin/hdfs dfsadmin -report
Safe mode is ON
Configured Capacity: 268304384000 (249.88 GB)
Present Capacity: 249863049216 (232.70 GB)
DFS Remaining: 249862311936 (232.70 GB)
DFS Used: 737280 (720 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (5):
...
Name: 192.168.4.5:50010 (newnode)
Hostname: newnode
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 3662835712 (3.41 GB)
DFS Remaining: 49998036992 (46.56 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.17%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jan 28 20:30:23 EST 2018
...
– 7. 設置同步帶寬,并同步數據
[root@master hadoop]# ./bin/hdfs dfsadmin -setBalancerBandwidth 67108864
[root@master hadoop]# ./sbin/start-balancer.sh -threshold 5
縮減節點
– 配置NameNode的hdfs-site.xml
– dfs.replication 副本數量
– 增加 dfs.hosts.exclude 配置
[root@master hadoop]# vim etc/hadoop/hdfs-site.xml
...
<property>
<name>dfs.hosts.exclude</name>
<value>/usr/local/hadoop/etc/hadoop/exclude</value>
</property>
...
– 增加 exclude 配置文件,寫入要刪除的節點 ip
[root@master hadoop]# vim etc/hadoop/slaves
node1
node2
node3
[root@master hadoop]# vim etc/hadoop/exclude
newnode
# cat /root/rsyncfile.sh
#!/bin/bash
for i in node{1..5}
do
rsync -azSH --delete /usr/local/hadoop/etc/hadoop ${i}:/usr/local/hadoop/etc/ -e 'ssh' &
done
wait
[root@master hadoop]# bash /root/rsyncfile.sh
[root@master hadoop]# ./bin/hdfs dfsadmin -refreshNodes
[root@master hadoop]# ./bin/hdfs dfsadmin -report
...
Name: 192.168.4.6:50010 (newnode)
Hostname: newnode
Decommission Status : Decommission in progress //數據遷移狀態
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 3662950400 (3.41 GB)
DFS Remaining: 49997914112 (46.56 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.17%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jan 28 20:52:01 EST 2018
...
[root@master hadoop]# ./bin/hdfs dfsadmin -report
...
Name: 192.168.4.6:50010 (newnode)
Hostname: newnode
Decommission Status : Decommissioned //最終狀態
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 3662950400 (3.41 GB)
DFS Remaining: 49997914112 (46.56 GB)
DFS Used%: 0.00%
DFS Remaining%: 93.17%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Jan 28 20:52:43 EST 2018
...
//當節點狀態變為 Decommissioned 狀態時 才能停止節點
[root@newnode hadoop]# ./sbin/hadoop-daemon.sh stop datanode
[root@newnode hadoop]# jps
4045 Jps
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。