This is the 20th day of my participation in the August More Text Challenge

Hive data type

1.1 Basic data types

Hive Data Types Java data types The length of the example
TINYINT byte 1byte signed integer 20
SMALINT short 2byte Signed integer 20
INT int 4byte Signed integer 20
BIGINT long 8byte signed integer 20
BOOLEAN boolean Boolean type, true or false TRUE FALSE
FLOAT float Single-precision floating point number 3.14159
DOUBLE double A double – precision floating – point number 3.14159
STRING string Character series. Character sets can be specified. You can use single or double quotation marks. ‘Now is the time’ “For all Good Men”
TIMESTAMP Time to type
BINARY An array of bytes

The Hive String type is equivalent to the database vARCHar type. This type is a variable String, but it cannot state how many characters it can store. It can theoretically store up to 2GB of characters.

1.2 Collection data types

The data type describe Syntax examples
STRUCT Like structs in C, you can access element content through “dot” notation. For example, if the data type of a column is STRUCT{first STRING, last STRING}, the first element can be referenced by the field.first. 2 Struct () e.g. struct<street:string, city:string>
MAP A MAP is a collection of key-value pairs of tuples that can be accessed using array notation. For example, if the data type of a column is MAP, where the key -> value pairs are ‘first’ -> ‘John’ and ‘last’ -> ‘Doe’, the last element can be retrieved by the field name [‘ last ‘] Map () for example map<string, int>
ARRAY An array is a collection of variables of the same type and name. These variables are called elements of the array, and each element of the array has a number starting from zero. For example, if the array value is [‘ John ‘, ‘Doe’], the second element can be referenced by the array name [1]. Array () such as Array

Hive provides three complex data types: Array, Map, and struct. Array and Map are similar to Java’s array and Map, while Struct and C’s Struct are similar in that they encapsulate collections of named word fields, allowing arbitrary levels of nesting of complex data types

Case practice:

  • 1. If a table has the following row, we use JSON format to represent the data result.

    {"name": "songsong", "friends": ["bingbing", "lili"], // list Array, "children": {// key values Map, "xiao Song ": Struct, "street": "hui long guan", "city": "Beijing"}}Copy the code
  • 2. Based on the above data structure, we created corresponding tables in Hive and imported data

    Create a local test file named test.txt

     songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijingyangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
    Copy the code

    Note: Relationships between elements in maps, structs, and arrays can all be represented by the same character, using “_” here.

  • 3. Create the test table on Hive

     create table test(  name string,    friends array<string>,  children map<string, int>,  address struct<street:string, city:string>)row format delimited fields terminated by ','collection items terminated by '_'map keys terminated by ':'lines terminated by '\n';
    Copy the code

    Field parsing:

    • Row format DELIMited fields terminated by ‘,’ — column separator
    • Collection items terminated by ‘_’ –MAP STRUCT and ARRAY separators
    • Map keys terminated by ‘:’ — The separator of key and value in map
    • lines terminated by ‘\n’; — Line separator
  • 4. Import text data into the measurement table

     load data local inpath '/opt/module/hive/datas/test.txt' into table test;
    Copy the code
  • 5. Access data in three sets of columns. The following are array, map and struct access methods respectively

    hive (default)> select friends[1],children['xiao song'],address.city from testwhere name="songsong"; Ok_c0_c1 Citylili 18 beijingTime taken: 0.076 seconds, 1 row(s)Copy the code

1.3 Type Conversion

Hive atomic data types can be converted implicitly, similar to Java type conversion. For example, if an expression uses an INT, TINYINT is automatically converted to an INT, but Hive does not convert an INT to TINYINT.

1.3.1 Implicit Type Conversion rules
  • 1. Any integer type can be implicitly converted to a wider range of types, such as tinyint to int and int to bigint
  • 2. All integer types, floats, and strings can be implicitly converted to DOUBLE
  • 3, TINYINT, smallint, int can be converted to Float
  • Boolean types cannot be converted to any other type
1.3.2 Data type conversion can be performed using cast operation display

For example, cast (‘ 1 ‘as INT) converts string 1 to the integer 1. If the cast force fails, such as cast (‘x’ as INT), the expression returns null.

0: jdbc:hive2://hadoop102:10000> select '1'+2, cast('1'as int) + 2; + -- -- -- -- -- - + -- -- -- -- -- - + - + | _c0 | _c1 | + -- -- -- -- -- - + -- -- -- -- -- - + - + 3.0 | | 3 | + -- -- -- -- -- - + -- -- -- -- -- - + - +Copy the code