查看: 2947|回复: 1

[转载]Hive基本命令整理

[复制链接]
论坛徽章:
3
Hadoop研习者初级
日期:2012-11-18 23:09:43R研习者中级
日期:2014-09-19 14:21:40scala徽章
日期:2014-11-06 14:56:26
发表于 2012-11-18 22:49 | 显示全部楼层 |阅读模式
创建表:
hive> CREATE TABLE pokes (foo INT, bar STRING);
        Creates a table called pokes with two columns, the first being an integer and the other a string
创建一个新表,结构与其他一样
hive> create table new_table like records;
创建分区表:
hive> create table logs(ts bigint,line string) partitioned by (dt String,country String);
加载分区表数据:
hive> load data local inpath '/home/Hadoop/input/hive/partitions/file1' into table logs partition (dt='2001-01-01',country='GB');
展示表中有多少分区:
hive> show partitions logs;
展示所有表:
hive> SHOW TABLES;
        lists all the tables
hive> SHOW TABLES '.*s';
lists all the table that end with 's'. The pattern matching follows Java regular
expressions. Check out this link for documentation http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html
显示表的结构信息
hive> DESCRIBE invites;
        shows the list of columns
更新表的名称:
hive> ALTER TABLE source RENAME TO target;
添加新一列
hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment');

删除表:
hive> DROP TABLE records;
删除表中数据,但要保持表的结构定义
hive> dfs -rmr /user/hive/warehouse/records;
从本地文件加载数据:
hive> LOAD DATA LOCAL INPATH '/home/hadoop/input/ncdc/micro-tab/sample.txt' OVERWRITE INTO TABLE records;
显示所有函数:
hive> show functions;
查看函数用法:
hive> describe function substr;
查看数组、map、结构
hive> select col1[0],col2['b'],col3.c from complex;

内连接:
hive> SELECT sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
查看hive为某个查询使用多少个MapReduce作业
hive> Explain SELECT sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
外连接:
hive> SELECT sales.*, things.* FROM sales LEFT OUTER JOIN things ON (sales.id = things.id);
hive> SELECT sales.*, things.* FROM sales RIGHT OUTER JOIN things ON (sales.id = things.id);
hive> SELECT sales.*, things.* FROM sales FULL OUTER JOIN things ON (sales.id = things.id);
in查询:Hive不支持,但可以使用LEFT SEMI JOIN
hive> SELECT * FROM things LEFT SEMI JOIN sales ON (sales.id = things.id);

Map连接:Hive可以把较小的表放入每个Mapper的内存来执行连接操作
hive> SELECT /*+ MAPJOIN(things) */ sales.*, things.* FROM sales JOIN things ON (sales.id = things.id);
INSERT OVERWRITE TABLE ..SELECT:新表预先存在
hive> FROM records2
    > INSERT OVERWRITE TABLE stations_by_year SELECT year, COUNT(DISTINCT station) GROUP BY year
    > INSERT OVERWRITE TABLE records_by_year SELECT year, COUNT(1) GROUP BY year
    > INSERT OVERWRITE TABLE good_records_by_year SELECT year, COUNT(1) WHERE temperature != 9999 AND (quality = 0 OR quality = 1 OR quality = 4 OR quality = 5 OR quality = 9) GROUP BY year;   
CREATE TABLE ... AS SELECT:新表表预先不存在
hive>CREATE TABLE target AS SELECT col1,col2 FROM source;
创建视图:
hive> CREATE VIEW valid_records AS SELECT * FROM records2 WHERE temperature !=9999;
查看视图详细信息:
hive> DESCRIBE EXTENDED valid_records;

union操作:
hive不支持顶层union,只能将union封装在子查询中;且必须为union的查询输出定义别名
回复

使用道具 举报

新浪微博达人勋 daniel  未实名认证
论坛徽章:
18
R研习者初级
日期:2012-07-18 17:56:36spark徽章
日期:2016-05-26 12:36:59Excel徽章
日期:2016-05-19 11:20:46机器学习徽章
日期:2015-12-24 16:09:22数据陷阱解读徽章
日期:2015-08-13 15:21:45scala徽章
日期:2015-08-13 11:58:20数据治理徽章
日期:2015-06-18 11:35:54Java徽章
日期:2014-11-06 17:45:56物流徽章
日期:2014-11-06 14:49:42Oracle研习者初级
日期:2014-09-19 13:51:34Oracle研习者高级
日期:2013-08-25 14:24:26Oracle研习者高级
日期:2013-08-25 14:23:53
发表于 2012-11-19 01:55 | 显示全部楼层
果然类 SQL。
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册 新浪微博登陆

本版积分规则

 

GMT+8, 2019-6-17 11:32 , Processed in 0.113319 second(s), 32 queries .

关闭

扫一扫加入
本版微信群