您好,登錄后才能下訂單哦!
這篇文章給大家分享的是有關Hive數據定義語言DDL有哪些的內容。小編覺得挺實用的,因此分享給大家做個參考,一起跟隨小編過來看看吧。
一、數據庫相關操作(DATABASE)
1.1 創建數據庫(create)
CREATE DATABASE [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
默認地址:/user/hive/warehouse/db_name.db/table_name/partition_name/…
1.2 使用數據庫(use)
USE database_name;
注意:只有use時不用寫DATABASE關鍵字
1.3 刪除數據庫(drop)
DROP DATABASE [IF EXISTS] database_name [RESTRICT|CASCADE];
1.4 修改數據庫(alter)
1.4.1 修改數據庫屬性
ALTER DATABASE database_name SET DBPROPERTIES (property_name=property_value, …);
1.4.2 修改數據庫所有者
ALTER DATABASE database_name SET OWNER [USER|ROLE] user_or_role;
二、新建和刪除數據表(Create/Drop/Truncate Table)
2.1 創建數據表(create)
# 手動建表
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
# 復制表結構
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
#數據類型
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
# 行列分隔符
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
# 文件存儲格式
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
2.1.1 行格式,文件存儲格式,SerDe
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\001’ STORED AS SEQUENCEFILE;
ROW FORMAT SERDE … STORED AS SEQUENCEFILE;
RegEx SerDe
CREATE TABLE apachelog (
host STRING,
identity STRING,
user STRING,
time STRING,
request STRING,
status STRING,
size STRING,
referer STRING,
agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?"
)
STORED AS TEXTFILE;
2.1.2 分區表(partitioned by)
CREATE TABLE page_view(
viewTime INT
,userid BIGINT
,page_url STRING
,referrer_url STRING
,ip STRING COMMENT 'IP Address of the User'
)
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
STORED AS SEQUENCEFILE;
2.1.3 外部表(external)
可以自定義HDFS存儲地址,drop表時數據不刪除,還是要指定分隔符的
CREATE EXTERNAL TABLE page_view(
viewTime INT
,userid BIGINT
,page_url STRING
, referrer_url STRING
,ip STRING COMMENT 'IP Address of the User'
,country STRING COMMENT 'country of origination'
)
COMMENT 'This is the staging page view table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
STORED AS TEXTFILE
LOCATION '<hdfs_location>';
2.1.4 利用查詢結果建表(Create Table As Select (CTAS))
根據查詢結果的列和列類型建表,可以自己指定列分隔符和文件存儲格式
CREATE TABLE new_key_value_store
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile
AS
SELECT
(key % 1024) new_key
, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair;
2.1.5 復制已有表結構(Create Table Like)
復制已有表的表結構,不復制數據(屬性一樣,僅表名不同)
CREATE TABLE empty_key_value_store
LIKE key_value_store;
2.1.6 分桶排序的表(Bucketed Sorted Tables)
CREATE TABLE page_view(
viewTime INTviewTime INT
,userid BIGINT
,page_url STRING
,referrer_url STRING
,ip STRING COMMENT 'IP Address of the User'
,ip STRING COMMENT 'IP Address of the User'
)
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
STORED AS SEQUENCEFILE;
上表按userid分桶,桶內按viewTime升序排列,可以更有效的取樣和讓內部操作更加了解數據結構,從而提高運算性能。
建表時的CLUSTERED BY和SORTED BY語句只對數據讀取有作用,對寫入沒有作用,所以在寫入數據時需要手動指定reduce數等于分桶數并且使用CLUSTERED BY和SORTED BY語句。
簡便方法是set hive.enforce.bucketing = true;這樣寫入數據時就不用那么麻煩了,系統將會自動指定reduce數等于分桶數和使用CLUSTERED BY和SORTED BY語句。分桶規則依賴于hash函數。
例如:建表語句
CREATE TABLE user_info_bucketed(
user_id BIGINT
,firstname STRING
,lastname STRING
)
COMMENT 'A bucketed copy of user_info'
PARTITIONED BY(ds STRING)
CLUSTERED BY(user_id) INTO 256 BUCKETS;
插入語句:
set hive.enforce.bucketing = true;
FROM user_id
INSERT OVERWRITE TABLE user_info_bucketed
PARTITION (ds='2009-02-25')
SELECT userid, firstname, lastname WHERE ds='2009-02-25';
2.1.7 傾斜表(Skewed Tables)
某列的幾個值出現頻率非常高,所以相比于其它值他們的運算非常慢,造成傾斜,Hive可以指定將特定的值單獨存儲到獨立文件中來提高性能。
單列舉例:
CREATE TABLE list_bucket_single (key STRING, value STRING)
SKEWED BY (key) ON (1,5,6) [STORED AS DIRECTORIES];
多列舉例:
CREATE TABLE list_bucket_multiple (col1 STRING, col2 int, col3 STRING)
SKEWED BY (col1, col2) ON (('s1',1), ('s3',3), ('s13',13), ('s78',78)) [STORED AS DIRECTORIES];
2.1.8 臨時表(Temporary Tables)
臨時表僅這個session可見,存儲在用戶臨時目錄,session結束后刪除。
如果與永久表重名,則查詢時默認為臨時表,直到不重名或者刪除臨時表。
2.2 刪除數據表(drop)
刪除外表不刪數據
刪除內表數據移到用戶垃圾箱(可以重建metadata并移回數據來恢復)
指定PURGE強制刪除不可恢復
DROP TABLE [IF EXISTS] table_name [PURGE];
2.3 清空數據表(Truncate Table)
清空表數據,指定partition時只清空特定partition的數據。
TRUNCATE TABLE table_name [PARTITION (partition_column = partition_col_value, partition_column = partition_col_value, ...)];
三、修改數據表(Alter Table/Partition/Column)
3.1 修改表(Alter Table)
3.1.1 重命名表(Rename Table)
ALTER TABLE table_name RENAME TO new_table_name;
3.1.2 修改表屬性(Alter Table Properties)
ALTER TABLE table_name SET TBLPROPERTIES (property_name = property_value, property_name = property_value, ... );
3.1.3 修改表注釋(Alter Table Comment)
ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);
3.1.4 修改SerDe屬性(Add SerDe Properties)
ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES (property_name = property_value, property_name = property_value, ... )];
ALTER TABLE table_name [PARTITION partition_spec] SET SERDEPROPERTIES (property_name = property_value, property_name = property_value, ... );
屬性的名和值都需要加引號
ALTER TABLE table_name SET SERDEPROPERTIES ('field.delim' = ',');
3.1.5 修改表存儲屬性(Alter Table Storage Properties)
ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name, ...)]
INTO num_buckets BUCKETS;
只修改metadata,用戶需要自己確保實際數據的格式與改完后的metadata相符
3.1.6 修改傾斜表屬性(Alter Table Skewed or Stored as Directories)
改為傾斜表
ALTER TABLE table_name SKEWED BY (col_name1, col_name2, ...)
ON ([(col_name1_value, col_name2_value, ...) [, (col_name1_value, col_name2_value), ...]
[STORED AS DIRECTORIES];
改為非傾斜表
ALTER TABLE table_name NOT SKEWED;
改為不單獨存儲傾斜列
ALTER TABLE table_name NOT STORED AS DIRECTORIES;
修改傾斜表位置
ALTER TABLE table_name SET SKEWED LOCATION (col_name1="location1" [, col_name2="location2", ...] );
3.2 修改分區(Alter Partition)
3.2.1 增加分區(Add Partitions)
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec
[LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;
partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)
修改分區僅修改metadata,實際數據需要用戶自己修改
3.2.2 動態分區(Dynamic Partitions)
3.2.3 重命名分區(Rename Partition)
ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;
相當于更改了分區對應的那個列的值
3.2.4 交換分區(Exchange Partition)
ALTER TABLE table_name_1 EXCHANGE PARTITION (partition_spec) WITH TABLE table_name_2;
-- multiple partitions
ALTER TABLE table_name_1 EXCHANGE PARTITION (partition_spec, partition_spec2, ...) WITH TABLE table_name_2;
將分區從一個表遷移到另一個表,要求兩個表結構一致且目標表沒有這個分區
3.2.5 修復分區(MSCK REPAIR TABLE)
當我們手動傳數據到hdfs作為一個分區時需要在metadata進行設置以便能夠識別
MSCK REPAIR TABLE table_name;
3.2.6 刪除分區(Drop Partitions)
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
[IGNORE PROTECTION] [PURGE];
刪除分區時同時刪除metadata和data,刪除的data到用戶垃圾箱
3.3 修改表或者分區(Alter Either Table or Partition)
3.3.1 修改文件格式(Alter Table/Partition File Format)
ALTER TABLE table_name [PARTITION partition_spec] SET FILEFORMAT file_format;
3.3.2 修改存儲位置(Alter Table/Partition Location)
ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION "new location";
3.4 修改列(Alter Column)
列名是大小寫敏感的
3.4.1 修改列名/數據類型/位置/注釋(Change Column Name/Type/Position/Comment)
ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type
[COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];
例子:
CREATE TABLE test_change (a int, b int, c int);
# First change column a's name to a1.
ALTER TABLE test_change CHANGE a a1 INT;
# Next change column a1's name to a2, its data type to string, and put it after column b.
ALTER TABLE test_change CHANGE a1 a2 STRING AFTER b;
# The new table's structure is: b int, a2 string, c int.
# Then change column c's name to c1, and put it as the first column.
ALTER TABLE test_change CHANGE c c1 INT FIRST;
# The new table's structure is: c1 int, b int, a2 string.
# Add a comment to column a1
ALTER TABLE test_change CHANGE a1 a1 INT COMMENT 'this is column a1';
3.4.2 添加列(Add Columns)
ALTER TABLE table_name
[PARTITION partition_spec]
ADD COLUMNS (col_name data_type [COMMENT col_comment], ...)
[CASCADE|RESTRICT]
添加的列位于普通列最后位置,在分區列之前
四、視圖的相關操作(Create/Drop/Alter View)
4.1 創建視圖(Create View)
CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ]
[COMMENT view_comment]
[TBLPROPERTIES (property_name = property_value, ...)]
AS SELECT ...;
視圖僅是邏輯對象,不是實際存儲,查詢時相當于先查出視圖再進一步查詢
不指定列名時按select的列來算
一旦創建即被凍結,表再變化不會影響視圖變化
只讀不能插入數據
SHOW CREATE TABLE 用來查看建視圖的語句
例子:
CREATE VIEW onion_referrers(url COMMENT 'URL of Referring page')
COMMENT 'Referrers to The Onion website'
AS
SELECT DISTINCT referrer_url
FROM page_view
WHERE page_url='http://www.theonion.com';
4.2 刪除視圖(Drop View)
DROP VIEW [IF EXISTS] view_name;
例子:
DROP VIEW onion_referrers;
4.3 修改視圖屬性(Alter View Properties)
ALTER VIEW view_name SET TBLPROPERTIES table_properties;
table_properties:
: (property_name = property_value, property_name = property_value, ...)
4.4 修改已有視圖(Alter View As Select)
ALTER VIEW view_name AS select_statement;
只能用于沒有分區的視圖
五、索引的相關操作
5.1 創建索引(Create Index)
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS index_type
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
[COMMENT "index comment"];
5.2 刪除索引(Drop Index)
DROP INDEX [IF EXISTS] index_name ON table_name;
5.3 修改索引(ALTER INDEX)
ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD;
六、方法的相關操作(Create/Drop/Reload Function)
6.1 臨時方法(Temporary Functions)
6.1.1 創建臨時方法(Create Temporary Function)
CREATE TEMPORARY FUNCTION function_name AS class_name;
利用類名創建臨時方法,在當前session有效,類名可以先add jar
6.1.2 刪除臨時方法(DROP TEMPORARY FUNCTION)
DROP TEMPORARY FUNCTION [IF EXISTS] function_name;
6.2 永久方法(Permanent Functions)
6.2.1 創建永久方法(Create Function)
CREATE FUNCTION [db_name.]function_name AS class_name
[USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ];
需要限制到數據庫,如果沒有寫數據庫,默認當前數據庫
需要先添加進去并用using訪問
6.2.2 刪除永久方法(DROP FUNCTION)
DROP FUNCTION [IF EXISTS] function_name;
6.2.3 重載永久方法(RELOAD FUNCTION)
RELOAD FUNCTION;
七、用戶權限相關操作(Create/Drop/Grant/Revoke Roles and Privileges)
八、查看相關信息(Show)
8.1 查看數據庫(Show Databases)
SHOW (DATABASES|SCHEMAS) [LIKE 'identifier_with_wildcards'];
用”*”表示任意個字符,”|”表示或
8.2 查看表/分區/索引(Show Tables/Partitions/Indexes)
8.2.1 查看表(Show Tables)
SHOW TABLES [IN database_name] [like 'identifier_with_wildcards'];
in語句表示數據庫,不寫默認當前數據庫
用”*”表示任意個字符,”|”表示或
8.2.2 查看分區(Show Partitions)
SHOW PARTITIONS table_name;
也可以指定分區中的一個或幾個字段
SHOW PARTITIONS table_name PARTITION(ds='2010-03-03');
SHOW PARTITIONS table_name PARTITION(hr='12');
SHOW PARTITIONS table_name PARTITION(ds='2010-03-03', hr='12');
8.2.3 查看表和分區的擴展信息(Show Table/Partition Extended)
SHOW TABLE EXTENDED [IN|FROM database_name] LIKE 'identifier_with_wildcards' [PARTITION(partition_spec)];
extended會給出表的文件信息和文件大小、修改時間等信息
如果指定了partition就不能用表的正則匹配了,只能指定具體的表
8.2.4 查看表屬性(Show Table Properties)
SHOW TBLPROPERTIES tblname;
SHOW TBLPROPERTIES tblname("foo");
第一種方式給出表的所有屬性值
第二種方式給出表的指定的屬性值
8.2.5 查看表的創建信息(Show Create Table)
SHOW CREATE TABLE ([db_name.]table_name|view_name);
既能看table,也能看view
8.2.6 查看索引(Show Indexes)
SHOW [FORMATTED] (INDEX|INDEXES) ON table_with_index [(FROM|IN) db_name];
8.3 查看列(Show Columns)
SHOW COLUMNS (FROM|IN) table_name [(FROM|IN) db_name];
8.4 查看方法(Show Functions)
SHOW FUNCTIONS "a.*";
查看所有方法時用”.*”
8.5 查看用戶和權限(Show Granted Roles and Privileges)
8.6 查看鎖(Show Locks)
SHOW LOCKS <table_name>;
SHOW LOCKS <table_name> EXTENDED;
SHOW LOCKS <table_name> PARTITION (<partition_spec>);
SHOW LOCKS <table_name> PARTITION (<partition_spec>) EXTENDED;
SHOW LOCKS <DATABASE> database_name;
8.7 查看配置信息(Show Conf)
SHOW CONF <configuration_name>;
不會列出當前配置的值,如果需要請使用set命令
8.8 查看事務信息(SHOW TRANSACTIONS)
SHOW TRANSACTIONS;
8.9 查看壓縮信息(SHOW COMPACTIONS)
SHOW COMPACTIONS;
九、描述相關信息(Describe)
9.1 描述數據庫(Describe Database)
DESCRIBE DATABASE [EXTENDED] db_name;
extended給出數據庫的屬性信息
9.2 描述表/視圖/列(Describe Table/View/Column)
沒有指定數據庫時
DESCRIBE [EXTENDED|FORMATTED]
table_name[.col_name ( [.field_name] | [.'$elem$'] | [.'$key$'] | [.'$value$'] )* ];
指定數據庫時
DESCRIBE [EXTENDED|FORMATTED]
[db_name.]table_name[ col_name ( [.field_name] | [.'$elem$'] | [.'$key$'] | [.'$value$'] )* ];
9.2.1 展示列的統計信息(Display Column Statistics)
DESCRIBE FORMATTED [db_name.]table_name column_name;
DESCRIBE FORMATTED [db_name.]table_name column_name PARTITION (partition_spec);
9.3 描述分區信息(Describe Partition)
沒有指定數據庫時
DESCRIBE [EXTENDED|FORMATTED] table_name[.column_name] PARTITION partition_spec;
指定數據庫時
DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name [column_name] PARTITION partition_spec;
例子:
DESCRIBE page_view PARTITION (ds='2008-08-08');
感謝各位的閱讀!關于“Hive數據定義語言DDL有哪些”這篇文章就分享到這里了,希望以上內容可以對大家有一定的幫助,讓大家可以學到更多知識,如果覺得文章不錯,可以把它分享出去讓更多的人看到吧!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。