Compound data type

In addition to string, Hive supports struct, Map, array, unionType, and other complex data types. Arraye and Map are more commonly used in enterprises.

The data type describe Syntax examples
STRUCT Like structs or “objects” in C, you can access the contents of an element through the “dot” notation. struct{‘John’, ‘Doe’}
MAP A MAP is a collection of key-value pairs of elements that can be accessed using keys. map(‘fisrt’, ‘John’, ‘last’, ‘Doe’)
ARRAY An array is a collection of variables with the same data type and name. Array(‘John’, ‘Doe’)

1. The Array type

The ARRAY type is made up of a series of elements of the same data type that can be accessed by subscripts. For example, if there is an ARRAY variable fruits, which consists of [‘apple’,’orange’,’mango’], then fruits[1] can be used to access the element orange, because the subscript of ARRAY type starts from 0.

Create a database table with array as the data type

create table person(name string,work_locations array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ', ';
Copy the code

data

biansutao beijing,shanghai,tianjin,hangzhou
linan changchu,chengdu,wuhan
Copy the code

Data warehousing

LOAD DATA LOCAL INPATH '/home/hadoop/person.txt' OVERWRITE INTO TABLE person;
Copy the code

The query

hive> select * from person;
biansutao       ["beijing","shanghai","tianjin","hangzhou"]
linan   ["changchu","chengdu","wuhan"]
Time taken: 0.355 seconds
hive> select name from person;
linan
biansutao
Time taken: 12.397 seconds
hive> select work_locations[0] from person;
changchu
beijing
Time taken: 13.214 seconds
hive> select work_locations from person;   
["changchu","chengdu","wuhan"]
["beijing","shanghai","tianjin","hangzhou"]
Time taken: 13.755 seconds
hive> select work_locations[3] from person;
NULL
hangzhou
Time taken: 12.722 seconds
hive> select work_locations[4] from person;
NULL
NULL
Time taken: 15.958 seconds
Copy the code

In Hive, the array name is the same as that in Java. The difference is that there is no index out of bounds exception in Hive. If an index out of bounds exception occurs, only NULL is returned. Hive also provides a function array_contains to determine whether an array contains a particular value.

select * from person where array_contains(work_locations,'beijing');
Copy the code

2. The Map types

A MAP contains key->value key-value pairs that allow access to elements through keys. For example, “userlist” is a map type, where username is key and password is value; Userlist [‘username’] = userlist[‘username’];

Create a database table

create table score(name string, score map<string,int>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ', '
MAP KEYS TERMINATED BY ':';
Copy the code

Data to be stored

Biansutao 'mathematics' : 80,' language ', 89, 'English' : 95 jobs' language ': 60,' mathematics', 80, 'English', 99Copy the code

Data warehousing

LOAD DATA LOCAL INPATH '/home/hadoop/score.txt' OVERWRITE INTO TABLE score;
Copy the code

The query

hive> select * fromscore; Biansutao {" math ":80, "Chinese".89, "English".95} jobs {" language ":60, "mathematics".80, "English".99}
Time taken: 0.665 seconds
hive> select name from score;
jobs
biansutao
Time taken: 19.778 seconds
hive> select t.score fromscore t; {" Chinese ":60, "mathematics".80, "English".99} {" math ":80, "Chinese".89, "English".95}
Time taken: 19.353 seconds
hive> select t.score['Chinese'] from score t;
60
89
Time taken: 13.054 seconds
hive> select t.score['English'] from score t;
99
95
Time taken: 13.769 seconds
Copy the code

Change the delimiter of the map field

Storage Desc Params:	 	 
	colelction.delim    	##                  
	field.delim         	\t                  
	mapkey.delim        	=                   
	serialization.format	\t                  
Copy the code

Display the table attributes by displaying desc formatted tableName. Hive 2.1.1: colelction. Delim: colelction. Delim: colelction.

alter table t8 set serdepropertyes('colelction.delim'=', ');
Copy the code

3. The Struct type

Structs can contain elements of different data types. These elements can be obtained by “point syntax”, such as user is a STRUCT type, then the address of the user can be obtained by user-.address. Kind of like objects in Java

Create table

CREATE TABLE test(id int,course struct<course:string,score:int>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ', ';
Copy the code

data

1 english,80
2 math,89
3 chinese,95
Copy the code

Put in storage

LOAD DATA LOCAL INPATH '/home/hadoop/test.txt' OVERWRITE INTO TABLE test;
Copy the code

The query

hive> select * from test;
OK
1       {"course":"english","score":80}
2       {"course":"math","score":89}
3       {"course":"chinese","score":95}
Time taken: 0.275 seconds
hive> select course from test;
{"course":"english","score":80}
{"course":"math","score":89}
{"course":"chinese","score":95}
Time taken: 44.968 seconds
select t.course.course from test t; 
english
math
chinese
Time taken: 15.827 seconds
hive> select t.course.score from test t;
80
89
95
Time taken: 13.235 seconds
Copy the code

**size(Map) function: ** To obtain the Map length. Return value type: int

**map_keys(Map) function: ** To obtain all keys in the Map; Return value type: array

**map_values(Map) function: ** To obtain all values in the Map; Return value type: array

Check whether a map contains a key value: array_contains(map_keys(t.params),’k0′);

4. Uniontype type

Uniontype can be understood as generic. At the same time, one element in the union is in effect. The elements in unionType share memory. Create_union (tag, val1, val2) tag is a number that starts at 0 and must be less than the number of arguments behind it. Uniontype means that this field can store any of the types you define. For example, I have a field that is uni unionType

which means uni can store any of the three types of

but only one at a time, I can’t say I have a field

at the same time because that doesn’t satisfy the first normal form.
,>
,>
,>

Before we start, we’ll introduce a function create_union. You can call it desc function extended create_union; What is the definition of this function

+--------------------------------------------------------------------------------------------+
|                      tab_name                                                              |
+--------------------------------------------------------------------------------------------+
| create_union(tag, obj1, obj2, obj3, ...) - Creates a union with the object for given tag   |
| Example:                                                                                   |
|   > SELECT create_union(1, 1, "one") FROM src LIMIT 1;                                     |
|   one                                                                                      |
| Function class:org.apache.hadoop.hive.ql.udf.generic.GenericUDFUnion                       |
| Function type:BUILTIN                                                                      |
+--------------------------------------------------------------------------------------------+
Copy the code

This function returns a value of type unionType, just as the date function returns a value of type date. So how does this function work? The first argument you can think of as the index of the array, the rest of the arguments you can think of as the array, and the unionType that returns is the value of the index specified in the array, but the return value

It’s going to have a subscript

select  create_union(0.10.10.0.'10.1');
Copy the code

The return value is a map-like structure, where 0 is the subscript and 10 is the value at subscript 0.

Let’s do a full demo

create table uniontab(
    arr array<string>,
    mp MAP<INT, string>,
    stru struct<name:string, age:int>,
    uni uniontype<int.decimal(2.0), string>
)
row format 
  delimited fields terminated by ', '
  collection items terminated by The '#'
  map keys terminated by ':'
  lines terminated by '\n';
Copy the code

Next we insert a piece of data, as shown below, and you will see an error

insert into uniontab select array('d','e','f'), map(1,'zjx',2,'wly'), named_struct('name', 'wly', 'age', 17), Create_union (0,10.0, '10.1');Copy the code

Cannot convert column 3 from uniontype<int,string> to uniontype<int,decimal(2,0),string>. (state=42000,code=10044)

Uniontype

cannot be converted to uniontype

Create_union (0,10.0, ‘10.1’) is of type uniontype

which cannot be converted to our table definition uniontype

, Uniontype

= unionType

= unionType

= unionType

= unionType

= unionType

= unionType

= unionType

= unionType
,string>
,string>
,string>
,string>
,string>
,string>
,string>
,string>
,decimal(2,0),string>
,string>
,decimal(2,0),string>
,string>

insert into uniontab select array('d'.'e'.'f'), map(1.'zjx'.2.'wly'), named_struct('name'.'wly'.'age'.17), create_union(0.2.10.0.'101.1');
Copy the code

That’s right up there

5. Combined complex data types are not supported

We might sometimes want to create a complex data set type, such as the following a field, which is itself a Map with a string key and an Array value.

Build table

create table test1(id int,a MAP<STRING,ARRAY<STRING>>)
row format delimited fields terminated by '\t' 
collection items terminated by ', '
MAP KEYS TERMINATED BY ':';
Copy the code

Import data

1 English :80,90,70 2 math:89,78,86 3 Chinese :99,100,82 LOAD DATA LOCAL INPATH '/home/hadoop/test1.txt' OVERWRITE INTO TABLE test1;Copy the code

Here is the query data:

hive> select * from test1;
OK
1	{"english":["80"],"90":null, "70" :null}
2	{"math":["89"],"78":null, "86" :null}
3	{"chinese":["99"],"100":null, "82" :null}
Copy the code

We want “English “:[“80”, “90”, “70”]. In fact, 90 and 70 are also used as Map keys. Value is null. The data elements are terminated by “, “and the separators are “,”. The data is terminated by “map, struct, array”. Three key-value pairs are separated by commas; The MAP KEYS TERMINATED BY ‘:’ defines the key-value separator in the MAP to be “:”. The first “English” is readable and the following “value” is set to null.

conclusion

Complex types commonly used much, but sometimes will still use, because may be related to history, or are the builders of the number of warehouse, because the complex type is against our definition of a database, although is not the same as the number of warehouse and database, but the three paradigms in the long run, the influence of the table on the definition of building table can think about when they really need this kind of complex data types

type The name of the Create a function access The introduction of version
Array An array type array(‘d’,’e’,’f’) The subscript 0.14
Map K – v type map(1,’zjx’,2,’wly’) The field names 0.14
Struct Structural type named_struct(‘name’, ‘wly’, ‘age’, 17) . 0.0
Uniontype The joint type Create_union (1,10.0, ‘10.1’); 0.7.0

The type of Uniontype is not easy to understand, and there is not much online information, so I focus on explaining it, we can experience more.

There are also many functions derived for compound data types, you can refer to them when you need them. Later I will add them to this article and keep improving them. If you think there is anything that needs to be improved, you can point it out