Understanding Hive data types is the foundation of Hive programming. When using Hive to build tables, you must first understand the common data types and the types of data that can be stored in Hive. Hive supports most of the basic data types in relational databases, and also supports three set types (STRUCT,MAP, and ARRAY) that are rare in relational databases. However, one of the best ways to learn about technology is to look at official documentation.

Hive about data typesThis section describes data types on the Hive official website

Hive Data types and Usage

The following describes the hive data types supported by Hive v0.80 and later. Similar to relational databases, Hive supports integer and floating point data of various lengths, as well as Boolean and variable length strings. Hive implements float,double, and other data types in the same way as the Java implementation. Hive implements Java interfaces, so it also specifies the memory usage of data types.

1. Numerical type (bold is commonly used)

In Hive, the integer type is INT by default. If a number is larger than an INT range, it is automatically interpreted as BIGINT or a suffix is used to describe 100Y,100S, or 100L(tinyInt, SmallInt, BIGINT).

type Support.
TINYINT 1-byte signed integer, from -128 to 127
SMALLINT 2-byte signed integer, from -32,768 to 32,767
INT/INTEGER 4-byte signed integer, from -2,147,483,648 to 2,147,483,647,INTERGER
BIGINT 8 – byte signed integer, from 9223372036854775808 to 9223372036854775807 for additional use int
FLOAT 4-byte Single precision Floating point number, commonly used single precision 3.14159
DOUBLE 8-byte Double precision floating point number specifies the floating point with a larger storage capacity. 3.114159
DECIMAL Hive’s decimal is based on Java BigDecimal, which is used in Java to represent decimal types with arbitrary precision. All regular number operations (such as +, -, *, /) and related UDFs (such as Floor, Ceil, Round, and so on) use and support Decimal. 2. Decimal can be converted to and from other numeric types, and Decimal supports both scientific and non-scientific notation. Therefore, whether your data set contains data such as 4.004E + 3 (scientific notation) or 4004 (non-scientific notation), or a combination of the two, you can use Decimal.3. Starting with Hive 0.13, users can use the DECIMAL(Precision, scale) syntax to define the precision and scale of DECIMAL data types when creating tables. If precision is not specified, the default is 10. If scale is not specified, it defaults to 0 (no decimal places).

Decimal uses a demonstration of the numeric type:

1.decimalThe use ofdecimal(precision, scale), preceded by the integer part and followed by the decimal part. If the integer part is not defined, the default length is10If the decimal part is not specified, the default value is0.If the length is exceeded, it will be truncated. The default length for the following decimal places is0, is rounded off, the whole number is in1. hive (fdm_sor)> select CAST(1234567891.523456 AS DECIMAL) from aaaa limit 1;  
1234567892

2.If the length of data to be converted exceedsdecimalIf length is specified, the result is not truncated and is directlynullValue, thoughdecimalThe maximum storage length is38Decimal place. The default integer length is10, now need to convert data integer length exceeds10, the entire result isNULL
hive(fdm_sor)> select CAST(12345678910.523456 AS DECIMAL) from aaaa limit 1;
NULL

3.usedecimalConvert other types of data todecimal, specifies the length to be rounded off. hive (fdm_sor)> select CAST(123456789.1234567 AS DECIMAL(20.5)) from aaaa limit 1; 
123456789.12346
Copy the code

2. Character data type

Varchar, char, string

1.varcharFixed length. Specify length when using. String hive (fdM_SOR)> select CAST("ABCDEFGHICD" AS VARCHAR(10)) from aaaa limit 1; 
ABCDEFGHIC

2.varcharWhen creating a table, specify the length; otherwise, an error occurs. If the length is small, data insertion will be truncated directly. Hive (FDM_SOR)> create table tmp_varchar(id varchar(10)) 
              > ;

hive (fdm_sor)> insert overwrite table tmp_varchar  
              > select "123456789122" from aaaa;  
tmp_varchar.id
1234567891
1234567891
1234567891
1234567891


3.In the same way,charThe use of the also needs to specify the length, justcharThe length of the thanvarcharIt's a lot smaller. It's a little bit like a databaseCREATE TABLE foo (bar CHAR(10))

4.In general, use string if you can use string.Copy the code

3. Date-based data type

type Directions for use
TIMESTAMP 1. Start from Hive 0.8.0. Used to represent UTC time (time standard time). Provides convenient UDFs (to_UTC_TIMESTAMP, from_UTC_timestamp) for time zone conversion. 2. All existing date-time UDFs (month, day, year, hour, etc.) use TIMESTAMP data type. Second, TIMESTAP supports integer, floating point, and string data. Specific use later introduced in the actual development of the use is not particularly much.
DATE Start Hive 0.12.0 DATE Indicates a specific year/month/day in the format of YYYY-MM-DD. For example, DATE ‘2013-01-01’. Date types have no time component. Values supported for the Date type range from 0000-01-01 to 9999-12-31, depending on the original support for the Java Date type.Date types can only be converted between Date, Timestamp, or String types.
INTERVAL Started with Hive 1.2.0, not used much in actual development.
--1. Use timestamp to create a date-type field, which can store floating point, integer, or string time data
create table fdm_sor.tmp_aaaaa(
a int,
b bigint,
c timestamp
)
--2. Test data
hive (fdm_sor)> insert overwrite  table fdm_sor.tmp_aaaaa   -- Stores string time
              > select 1.2, "the 2019-05-22 21:23:34"from fdm_sor.tmp_aaaaa limit 3;
hive (fdm_sor)> select * from fdm_sor.tmp_aaaaa;
OK
1	2	2019- 05- 22 21:23:34
1	2	2019- 05- 22 21:23:34

hive (fdm_sor)> insert overwrite  table fdm_sor.tmp_aaaaa  -- Store numerical data
              > select 3.4.12334324 from fdm_sor.tmp_aaaaa limit 2;
hive (fdm_sor)> select * from fdm_sor.tmp_aaaaa;
OK
3	4	1970- 01- 01 11:25:34.324
3	4	1970- 01- 01 11:25:34.324
Copy the code

4. Other data types

  • BOOLEAN Type: TRUE or FALSE
  • BINARY: an array of bytes, similar to the varbinary of a database.

5 Compound data type, not commonly used

arrays ARRAY(data_type)
maps MAP(primitive_type, data_type)
structs STRUCTcol_name: data_type [COMMENT col_comment],…
union UNIONTYPE (data_type, data_type,…).

2.Hive conversion between different data types

1. Conversions between data types

Hive is a Java implementation development, bottom hive a lot of data type is also consistent with Java, so for the same type of data type conversion also follow the Java automatically “transformation” up rules, the type of data in the low and high logic operation, type of data, high implicit automatically converted into a type of data types. And then we do the calculation. For example, a comparison between 1 and 1.23 will automatically convert 1 to 1.0 for computational comparison.

2. Conversion of different data types

When comparing data of different data types in Hive, you need to use the cast function to manually convert data.

The cast function uses: cast(valule as type), where value is the data to be converted, as is the fixed keyword, and type is the type to be converted

>select 
cast(" 1223"as double),
cast(" 456.23"as int),
cast(" 1.99"as int),
cast("abc" as int) 
from aaaa;
>1223.0.456.1.NULL
Copy the code

Note: the above results show that the cast conversion function is not universal, only the “stored type value” meets the conversion conditions. Otherwise, the result is NULL. For example, when “ABC” is converted to double, ABC is clearly not a number, so the conversion fails and the result is NULL. In addition, if cast is used to convert high-type data into low-type data, the cast function will intercept directly, resulting in loss of data accuracy and even wrong results.