Hive如何實現DML數據操作、分區表和分桶表

發布時間：2021-12-16 14:12:28 來源：億速云閱讀：141 作者：小新欄目：大數據

這篇文章主要為大家展示了“Hive如何實現DML數據操作、分區表和分桶表”，內容簡而易懂，條理清晰，希望能夠幫助大家解決疑惑，下面讓小編帶領大家一起研究并學習一下“Hive如何實現DML數據操作、分區表和分桶表”這篇文章吧。

1、DML數據操作

1.1、數據導入

1.通過load data導入
	load data [local] inpath '數據的path' [overwrite] 
		#[local] ：如果不加該字段表示路徑為HDFS。加上local表示本地路徑
		#[overwrite] ：如果加該字段第二次導入會覆蓋第一次導入的數據。不加會追加
		
	into table 表名 [partition (partcol1=val1,…)];
		#[partition (partcol1=val1,…)] ：指定分區的字段（后面再說）。
		
tip：set hive.exec.mode.local.auto=true; 使用本地模式去跑MR（只有在一定條件下才跑本地不滿足還跑集群）


-----------------------------------------------------------
2.通過查詢語句向表中插入數據（Insert）

	2.1 直接向表中插入新的數據
		insert into student values(1,'aa');

	2.2 將查詢的結果插入到表中(注意：查詢的結果的列數和原表的列必須保持一致（列的數量和類型）)
		insert overwrite table 表名 sql語句;


--------------------------------------------------------------
3.查詢語句中創建表并加載數據（As Select）
	create table if not exists 表名
	as sql語句;
	
	
	
----------------------------------------------------------------
4.創建表時通過Location指定加載數據路徑
	create table if not exists student3(
	id int,
	name string
	)
	row format delimited fields terminated by '\t'
	location '/input';


--------------------------------------------------------------------
5.導入數據（只有導出的數據才能導入）
	注意：表必須不存在，否則會報錯
	import table 庫名.表名  from 'HDFS導出的路徑';

1.2、數據導出

1. insert導出
	insert overwrite [local] directory '路徑'
	row format delimited fields terminated by '\t' #指定分隔符
            sql查詢語句;
	#local:如果加上該字段導出的路徑為本地。如果不加該字段導出的路徑為HDFS

    例：
	insert overwrite local directory '/opt/module/hive/datas2' 
	row format delimited fields terminated by '\t'
	select * from db4.student3;

	insert overwrite directory '/output' 
	row format delimited fields terminated by '\t'
	select * from db4.student3;


-------------------------------------------------------------------
2. Hadoop命令導出到本地

	hadoop fs -get '表中數據的路徑'  '本地路徑'
	hdfs dfs -get '表中數據的路徑'  '本地路徑'
	在hive客戶端中 ：dfs -get '表中數據的路徑'  '本地路徑'


--------------------------------------------------------------------
3.Hive Shell 命令導出
	bin/hive -e 'select * from 表名;' > 本地路徑;


--------------------------------------------------------------------
4 Export導出到HDFS上

	export table 庫名.表名 to 'HDFS路徑';


--------------------------------------------------------------------
5.Sqoop導出
	后面會提。。。

2、分區表和分桶表

2.1、分區表

一 創建分區表
	create table 表名(
		deptno int, dname string, loc string
	)
	partitioned by (字段名 字段類型) #指定分區字段
	row format delimited fields terminated by '\t';

   案例：
	create table dept_partition(
	deptno int, dname string, loc string
	)
	partitioned by (day string)
	row format delimited fields terminated by '\t';


---------------------------------------------------------------------------------
二 分區表的操作：

	1.添加分區
	alter table 表名 add partition(分區字段名='值') partition(分區字段名='值') .......
	
	2.查看分區
	show partitions 表名;
	
	3.刪除分區
	alter table 表名 drop partition(分區字段名='值'),partition(分區字段名='值').......
	
	4.向分區表中添加數據
	load data [local] inpath '路徑' [overwrite] into table 表名 partition(分區字段名='值');


---------------------------------------------------------------------------------------
三 創建二級分區表
	create table 表名(
	deptno int, dname string, loc string
	 )
	partitioned by (字段名1 字段類型, 字段名2 字段類型,......)
	row format delimited fields terminated by '\t';

   案例：
	create table dept_partition2(
	deptno int, dname string, loc string
	)
	partitioned by (day string, hour string)
	row format delimited fields terminated by '\t';


   向二級分區表中添加數據（在load數據時如果分區不存在則直接創建）：
	load data local inpath '/opt/module/hive/datas/dept_20200401.log' into table
	dept_partition2 partition(day='20200401', hour='12');

	load data local inpath '/opt/module/hive/datas/dept_20200402.log' into table
	dept_partition2 partition(day='20200401', hour='13');


---------------------------------------------------------------
四 數據和分區的關聯方式

	1.執行修復命令
		msck repair table 表名;

	2.方式二：上傳數據后添加分區
		alter table 表名 add partition(字段名='值');

	3.方式三：創建文件夾后load數據到分區(會直接創建該分區)
		load data local inpath '/opt/module/hive/datas/dept_20200402.log' into table
		dept_partition2 partition(day='20200401', hour='13');

2.2、分桶表

一 創建分桶表：
	create table 表名(id int, name string)
	clustered by(id) #id:分桶字段。分桶時就會根據此id進行分桶。
	into 桶的數量 buckets
	row format delimited fields terminated by '\t';

   案例：
	create table stu_buck(id int, name string)
	clustered by(id) 
	into 4 buckets
	row format delimited fields terminated by '\t';

   注意：
	 1.在hive的新版本當我們向一個分桶表中load數據時會跑MR
		所以load數據的路徑最好放在HDFS上。

	 2.我們分桶的數量要和ReduceTask的數量相等。

	 3.分桶的原則：根據分桶的字段的內容的hashCode值 % 分桶的數量 算出數據應該進入到哪個桶。

以上是“Hive如何實現DML數據操作、分區表和分桶表”這篇文章的所有內容，感謝各位的閱讀！相信大家都有了一定的了解，希望分享的內容對大家有所幫助，如果還想學習更多知識，歡迎關注億速云行業資訊頻道！

向AI問一下細節

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

Hive如何實現DML數據操作、分區表和分桶表

1、DML數據操作

1.1、數據導入

1.2、數據導出

2、分區表和分桶表

2.1、分區表

2.2、分桶表

猜你喜歡

亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

Hive如何實現DML數據操作、分區表和分桶表

1、DML數據操作

1.1、數據導入

1.2、數據導出

2、分區表和分桶表

2.1、分區表

2.2、分桶表

猜你喜歡

最新資訊

相關推薦

相關標簽