Hive Partitioning in Big Data
Partition is horizontally dividing the data into number of slice in equal and manageable manner.
Every partition is stored as directory within data warehouse table.
In data warehouse, this partition concept is common, but there are two types of Partitions are available in data warehouse concepts.
1. SQL Partition
2. Hive Partition
To more information visit OnlineITGuru big data hadoop course tutorial blog
1. SQL Partition:-
The main work of SQL Partition is dividing the large amount of data into number of slices by based on table single column only.
SQL Partition is very hard comparing to Hive Partition because SQL server only support one column for Partition.
2. Hive Partition:-
The main work of Hive Partition is also same as SQL Partition, but the main difference between SQL Partition and Hive Partition is SQL Partition only supports single column in table. While in Hive Partition, it supports multiple columns in a table.
Using Partitions can make it faster to do queries on slices of the data.
In Hive Partition concept there is two different type of partitions:-
i. Static Partition
ii. Dynamic Partition
i. Static Partition
In Static Partition, we statically add a partition in table and move the file into the partition of the table. We need to specify the partition column value in each and every LOAD statement.
Creation of Partitioned Table:-
Syntax:-
CREATE TABLE TableName(ColName1 DataType1, ColName2 DataType2)
PARTITIONED BY (PartitionColumn1 DataType2, PartitionColumn2 DataType2)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY 'delimiter character'
STORED AS <fileformat>;
hive (hivedb)> create table statictable (id int,name string,salary int) > partitioned by (city string) > row format delimited > fields terminated by '\t' > stored as textfile; OK Time taken: 23.871 seconds
In description, we can see the Partitioned column "city"
hive (hivedb)> describe statictable; OK id int name string salary int city string # Partition Information # col_name data_type comment city string Time taken: 1.498 seconds, Fetched: 9 row(s)
Loading the data in the table:-
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data1.log' into table statictable PARTITION(city= 'Pune'); Loading data to table hivedb.statictable partition (city=Pune) OK Time taken: 15.667 seconds hive (hivedb)> load data local inpath '/home/hduser/HIVE/data2.log' into table statictable PARTITION(city= 'Mumbai'); Loading data to table hivedb.statictable partition (city=Mumbai) OK Time taken: 2.727 seconds hive (hivedb)> load data local inpath '/home/hduser/HIVE/data3.log' into table statictable PARTITION(city= 'Nagpur'); Loading data to table hivedb.statictable partition (city=Nagpur) OK Time taken: 1.76 seconds
To view the created Partitions:-
hive (hivedb)> show partitions statictable; OK city=Mumbai city=Nagpur city=Pune Time taken: 0.939 seconds, Fetched: 3 row(s)
To view the content of the table:-
hive (hivedb)> select * from statictable; OK 200 Sonu 5000 Mumbai 201 Nik 3000 Mumbai 202 Sam 4000 Mumbai 203 Chotu 4000 Mumbai . . . 224 Suruchi 42500 Pune 225 Vaibhav 28500 Pune 226 Kaustubh 57500 Pune 227 Jitesh 33500 Pune 228 Mayur 44500 Pune 229 Mohit 41500 Pune Time taken: 0.677 seconds, Fetched: 80 row(s)
Adding a Partition:-
Syntax:- ALTER TABLE TableName ADD PARTITION PartitionName;
hive (hivedb)> alter table statictable ADD PARTITION (city='Chennai'); OK Time taken: 2.483 seconds hive (hivedb)> show partitions statictable; OK city=Chennai city=Mumbai city=Nagpur city=Pune Time taken: 0.561 seconds, Fetched: 4 row(s)
New Partition is added "city=Chennai"
Renaming a Partition:-
Syntax:- ALTER TABLE PARTITION ExistingPartition RENAME TO PARTITION NewPartition
hive (hivedb)> alter table statictable PARTITION (city='Chennai') RENAME TO PARTITION (city='Indore'); OK Time taken: 2.964 seconds hive (hivedb)> show partitions statictable; OK city=Indore city=Mumbai city=Nagpur city=Pune Time taken: 0.549 seconds, Fetched: 4 row(s)
Partition is renamed from Chennai to Indore
Dropping a Partition:-
Syntax:- ALTER TABLE TableName DROP PARTITION PartitionName;
hive (hivedb)> alter table statictable drop PARTITION (city='Indore'); Dropped the partition city=Indore OK Time taken: 5.042 seconds hive (hivedb)> show partitions statictable; OK city=Mumbai city=Nagpur city=Pune Time taken: 0.463 seconds, Fetched: 3 row(s)
Creation of Partition on multiple columns:-
hive (hivedb)> create table multiplepartitiontable (id int,name string,salary int) > partitioned by (city string,dated string) > row format delimited > fields terminated by '\t' > stored as textfile; OK Time taken: 0.637 seconds
In description, we can see the Partitioned column "city and dated" columns
hive (hivedb)> describe multiplepartitiontable; OK id int name string salary int city string dated string # Partition Information # col_name data_type comment city string dated string Time taken: 0.137 seconds, Fetched: 11 row(s)
Loading the data in the table:-
hive (hivedb)> load data local inpath '/home/hduser/HIVE/data1.log' into table multiplepartitiontable PARTITION(city= 'Pune',dated='01/02/2017'); Loading data to table hivedb.multiplepartitiontable partition (city=Pune, dated=01/02/2017) OK Time taken: 1.88 seconds hive (hivedb)> load data local inpath '/home/hduser/HIVE/data2.log' into table multiplepartitiontable PARTITION(city= 'Pune',dated='02/02/2017'); Loading data to table hivedb.multiplepartitiontable partition (city=Pune, dated=02/02/2017) OK Time taken: 1.478 seconds hive (hivedb)> load data local inpath '/home/hduser/HIVE/data3.log' into table multiplepartitiontable PARTITION(city= 'Nagpur',dated='03/02/2017'); Loading data to table hivedb.multiplepartitiontable partition (city=Nagpur, dated=03/02/2017) OK Time taken: 1.394 seconds hive (hivedb)> load data local inpath '/home/hduser/HIVE/data2.log' into table multiplepartitiontable PARTITION(city= 'Mumbai',dated='03/02/2017'); Loading data to table hivedb.multiplepartitiontable partition (city=Mumbai, dated=03/02/2017) OK Time taken: 1.466 seconds
To see the created Partitions:-
hive (hivedb)> show partitions multiplepartitiontable; OK city=Mumbai/dated=03%2F02%2F2017 city=Nagpur/dated=03%2F02%2F2017 city=Pune/dated=01%2F02%2F2017 city=Pune/dated=02%2F02%2F2017 Time taken: 0.198 seconds, Fetched: 4 row(s)
To view the content of the table:-
hive (hivedb)> select * from multiplepartitiontable; OK 200 Sonu 5000 Mumbai 03/02/2017 201 Nik 3000 Mumbai 03/02/2017 202 Sam 4000 Mumbai 03/02/2017 203 Chotu 4000 Mumbai 03/02/2017 204 Aju 5000 Mumbai 03/02/2017 . . . . 304 Chiku 1050 Pune 02/02/2017 305 Monu 3550 Pune 02/02/2017 306 Chutki 5550 Pune 02/02/2017 307 Tunu 3550 Pune 02/02/2017 308 Laddu 4350 Pune 02/02/2017 309 Tinku 4550 Pune 02/02/2017 310 Piku 2450 Pune 02/02/2017 Time taken: 4.57 seconds, Fetched: 101 row(s)
To learn complete course visit ITGuru big data online training Blog.
No comments: