从hdfs导入数据到hive表

将文件导入(保存)到HDFS后,您需要创建一个用于映射的表,然后才能显示表。

[En]

After the file has been imported (saved) into hdfs, you need to create a table for mapping before you can show tables.

现在假设文件已导入该hdfs目录: /apps/hive/warehouse/db_name.db/tb_name (这里也可能是其他文件,如csv,txt等,如:/username/test/test.txt)

  • 方法一:建立外部分区表

    [En]

    * method 1: the external partition table is established

  • 先按照hdfs中文件的字段,建立外部分区表:

create external table if not exists db_name.tb_name_2(id string, name string) partitioned by (day string) row format delimited fields terminated by ‘\1’ STORED AS textfile LOCATION ‘/apps/hive/warehouse/db_name.db/tb_name/day=xxxxxxxx’; # location指向直到分区(假设分区为day字段,如果指向的表不是分区表,则直到该表即可——不管怎样,要使得该目录里就是数据)

  1. 建立分区

alter table db_name.tb_name_2 add partition (day=xxxxxxxx) location ‘/apps/hive/warehouse/db_name.db/tb_name/day=xxxxxxxx’;

  • 方法二:如果建立了外部未分区表,则Location直接指向数据地址:
    [En]

    * method 2: if an external non-partitioned table is established, location points directly to the data address:

如果您指向的数据地址是一些未分区的表数据,请直接指向该表:

[En]

If the data address you point to is some non-partitioned table data, point directly to that table:

create external table if not exists db_name.tb_name_2(id string, name string) row format delimited fields terminated by ‘\1’ STORED AS textfile LOCATION ‘/apps/hive/warehouse/db_name.db/tb_name’;

如果您指向的数据地址只是表的一个分区,则直接指向分区数据(尽管创建的外部表不是分区表):

[En]

If the data address you point to is only a partition of a table, point directly to the partitioned data (although the external table created is not a partitioned table):

create external table if not exists db_name.tb_name_2(id string, name string) row format delimited fields terminated by ‘\1’ STORED AS textfile LOCATION ‘/apps/hive/warehouse/db_name.db/tb_name/day=xxxxxxxx’;

  • 方法三:创建内部表:先创建表,再加载
    [En]

    * method 3: create an internal table: create the table first, and then load

建表:create table if not exists db_name.tb_name_2(id string, name string) row format delimited fields terminated by ‘\1’ STORED AS textfile;

load数据:load data inpath ‘/apps/hive/warehouse/db_name.db/tb_name’ into table db_name.tb_name_2;

  • 方法四:如果您不喜欢命令行的麻烦,可以使用python代码来完成建表和INSERT语句(spark-SQL)
    [En]

    * method 4: if you dislike command line troubles, you can use python code to complete table building and insert statements (spark-sql)

内部和外部表:

[En]

Internal and external tables:

内桌不是外桌装饰,外桌是外桌装饰。

[En]

The inner table is not decorated by external, and the outer table is decorated by external.

区别:

[En]

Difference:

内表数据由配置单元自身管理,外表数据由HDFS管理

[En]

Internal table data is managed by Hive itself, and external table data is managed by HDFS

内部表数据的存储位置为hive.metastore.warehouse.dir(默认为/user/hive/warehouse),外部表数据的存储位置由用户自行决定。

[En]

The storage location of internal table data is hive.metastore.warehouse.dir (default: / user/hive/warehouse), and the storage location of external table data is determined by yourself.

删除内部表直接删除元数据(元数据)并存储数据;删除外部表只删除元数据,不删除HDFS上的文件

[En]

Deleting the internal table directly deletes the metadata (metadata) and stores the data; deleting the external table will only delete the metadata, and the files on the HDFS will not be deleted

对内部表的修改会将修改直接同步给元数据,而对外部表的表结构和分区进行修改,则需要修复(MSCK REPAIR TABLE table_name;)

参考资料:

[En]

Reference:

https://blog.csdn.net/qq_44449767/article/details/99716613

https://www.jianshu.com/p/1a4dfd654786

Original: https://www.cnblogs.com/qi-yuan-008/p/14059122.html
Author: 落日峡谷
Title: 从hdfs导入数据到hive表

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/6865/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部