“This is the 19th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021”

Chapter 80 SQL command UNION

Combine two or more SELECT statements.

The outline

select-statement {UNION [ALL] [%PARALLEL] select-statement}

select-statement {UNION [ALL]  [%PARALLEL] (query)}

(query) {UNION [ALL]  [%PARALLEL] select-statement}

(query) {UNION [ALL]  [%PARALLEL] (query)}
Copy the code

parameter

  • ALL– Optional – Keyword literal.

If specified, duplicate data values are returned. If omitted, duplicate data values are suppressed.

  • %PARALLEL– optional –%PARALLELThe keyword.

If specified, each side of the union runs in parallel as a separate process.

  • select-statement– Retrieves data from a databaseSELECTStatements.
  • query– Queries that combine one or more SELECT statements.

describe

A UNION combines two or more queries into a single query that retrieves data into a result. A query composed of a UNION can be a simple query consisting of a single SELECT statement or a composite query.

In order to achieve union between SELECT statements, the number of columns specified in each branch must match. Specifying a SELECT with a different number of columns causes a SQLCODE -9 error. You can specify NULL columns in one SELECT to be paired with data columns in another SELECT to match the number of columns. Such as:

SELECT Name,Salary,BirthDate
FROM Sample.Employee
UNION ALL
SELECT Name,NULL,BirthDate
FROM Sample.Person
Copy the code

SQL determines the result column data types by automatically evaluating all branches of the UNION query and returning the highest priority data type :VARCHAR, DOUBLE, NUMERIC, BIGINT, INTEGER, SMALLINT, TINYINT. Other data types, such as DATE, are not assigned priority. For example, the following program returns the data type TINYINT, even though the DATE data type has higher priority in other contexts.

SELECT MyTinyIntField FROM Table1
     UNION ALL
SELECT MyDateField FROM Table2
Copy the code

If you want to return a data type other than the one listed, you must use an explicit CAST statement, as in the following example:

SELECT CAST(MyTinyInt AS DATE) FROM Table1
     UNION ALL
SELECT MyDateField FROM Table2
Copy the code

If the columns in the union branch differ in length, precision, or scale, the maximum value is assigned to the resulting column.

The resulting column name is taken from the name of the column (or column alias) in the first branch of the union. In cases where the corresponding columns in both branches do not have the same name, it may be useful to use the same column alias to identify the result column in all branches.

If any columns in any UNION branch are empty, the resulting column metadata is reported as empty.

The string field in the UNION result has the collation type of the corresponding SELECT field, but if the field collation does not match, the exact collation is assigned.

UNION and UNION ALL

Plain UNION eliminates duplicate rows in the result (all values are the same). UNION ALL preserves duplicate rows in the result.

Fields with different precision do not have the same value. For example, the values 33(NUMERIC(9)) and 33.00(NUMERIC(9,2)) are not considered identical.

Fields with different collations do not have the same value. For example, MyStringField and %SQLUPPER(MyStringField) are not considered the same, even if both values are capitalized.

TOP and ORDER BY clauses

The UNION statement can end with the ORDER BY clause, which sorts the results. The ORDER BY applies to the entire statement; It must be part of the outermost query, not a subquery. It does not have to be paired with the TOP clause. The following example shows the use of ORDER BY: two SELECT statements SELECT data, the data is combined BY UNION, and ORDER BY sorts the results:

SELECT Name,Home_Zip FROM Sample.Person
  WHERE Home_Zip %STARTSWITH 9
UNION
SELECT Name,Office_Zip FROM Sample.Employee
  WHERE Office_Zip %STARTSWITH 8
ORDER BY Home_Zip 
Copy the code

Using a column number that does not correspond to a SELECT list column in ORDER BY causes a SQLCODE -5 error. Using a column name in ORDER BY that does not correspond to a SELECT list column causes a SQLcode-6 error.

The UNION SELECT statement (or both) can also contain an ORDER BY clause, but it must be paired with the TOP clause. The ORDER BY is used to determine which rows are selected BY the TOP clause. The following example shows the use of ORDER BY: Both SELECT statements use ORDER BY to sort their rows, which determines which rows are selected as the top row. The selected data is combined by UNION, and then the final ORDER by sorts the results:

SELECT TOP 5 Name,Home_Zip FROM Sample.Person
  WHERE Home_Zip %STARTSWITH 9
  ORDER BY Name
UNION
SELECT TOP 5 Name,Office_Zip FROM Sample.Employee
  WHERE Office_Zip %STARTSWITH 8
  ORDER BY Office_Zip
ORDER BY Home_Zip
Copy the code

TOP can be applied to the first SELECT in the union, or to the result of the union, depending on the position of the ORDER BY clause:

  • TOP... ORDER BYApplied to theUNIONResults: IfUNIONLocated in theFROMClause, thenTOPandORDER BYWill apply toUNIONResults. Such as:
SELECT TOP 10 Name,Home_Zip
  FROM (SELECT Name,Home_Zip FROM Sample.Person
          WHERE Name %STARTSWITH 'A'
        UNION
        SELECT Name,Home_Zip FROM Sample.Person
          WHERE Home_Zip %STARTSWITH 8)
ORDER BY Home_Zip
Copy the code

  • TOPThis applies to the first oneSELECT;ORDER BYApply toUNIONThe results. Such as:
SELECT TOP 10 Name,Home_Zip 
  FROM Sample.Person
  WHERE Name %STARTSWITH 'A'
UNION
SELECT Name,Home_Zip FROM Sample.Person
  WHERE Home_Zip %STARTSWITH 8
ORDER BY Home_Zip
Copy the code

Enclose parentheses

UNION supports optional parentheses for one or both SELECT statements, or for the entire UNION statement. One or more pairs of parentheses can be specified. Here are all valid uses of parentheses:

(SELECT...).UNION SELECT. (SELECT...).UNION (SELECT...). ((SELECT...). )UNION ((SELECT...). ) (SELECT.UNION SELECT...). ((SELECT...).UNION (SELECT...). )Copy the code

A separate cache query is generated each time parentheses are used.

The UNION/OR optimization

By default, SQL automatic optimization converts UNION subqueries to OR conditions as it sees fit. This UNION/OR transformation allows EXISTS and other low-level predicates to be migrated to top-level conditions so that they can be used in the IRIS query optimizer index. This default conversion is desirable in most cases. However, in some cases, this UNION/OR transformation can be costly. %NOUNIONOROPT query optimization option disables this automatic UNION/OR conversion for all conditions in the WHERE clause associated with the FROM clause. Therefore, in complex queries, automatic UNION/OR optimization can be disabled for one subquery and allowed for the others.

If a condition containing a subquery is applied to the UNION, the condition is applied within each UNION operand, not at the end. This allows subquery optimization to be applied on each UNION operand. For a description of subquery optimization options, see the FROM clause. In the following example, the WHERE clause condition applies to each subquery in the union, not the result of the union:

SELECT Name,Age FROM 
  (SELECT Name,Age FROM Sample.Person
   UNION SELECT Name,Age FROM Sample.Employee)
WHERE Age IN (SELECT TOP 5 Age FROM Sample.Employee WHERE Age>55 ORDER BY Age)
Copy the code

Combine all aggregation optimizations

The automatic SQL optimization of UNION ALL pushes the top-level aggregation into the branch of the UNION. Whether or not the %PARALLEL keyword is used, performance can be significantly improved, such as:

SELECT COUNT(*) FROM (SELECT item1 FROM table1 UNION ALL SELECT item2 FROM table2) 
Copy the code

Optimization:

SELECT SUM(y) FROM (SELECT COUNT(*) AS y FROM table1 UNION ALL SELECT COUNT(*) AS y FROM table2) 
Copy the code

This optimization applies to all top-level aggregation functions (not just COUNT), including queries with multiple top-level aggregation functions. To apply this optimization, the external query must be an “onerow” query with no WHERE or GROUP BY clause, it cannot reference %VID, and the UNION ALL must be the only stream in its FROM clause. Aggregation cannot be nested. Any used aggregation function cannot use %FOREACH() grouping or DISTINCT.

Parallel processing

The keyword %PARALLEL supports PARALLEL and distributed processing on multiprocessor systems. It enables IRIS to perform parallel processing on UNION queries, assigning each query to a separate process on the same machine. In some cases, the procedure sends the query to another machine for processing. These processes communicate through pipes, and IRIS creates one or more temporary files to hold subquery results. The main process combines the result rows and returns the final result. Compare Show plans with and without the %Parallel keyword. To determine the number of processors on the current SYSTEM, use the % system.util.numberofCPU () method.

In general, the more effort it takes to generate each row, the more advantageous %Parallel becomes.

Specifying the %PARALLEL keyword disables automatic merging or optimization.

The following example shows the use of the %PARALLEL keyword:

SELECT Name FROM Sample.Employee WHERE Name %STARTSWITH 'A'
UNION %PARALLEL
SELECT Name FROM Sample.Person WHERE Name %STARTSWITH 'A'
ORDER BY Name
Copy the code

SELECT Name FROM Sample.Employee WHERE Name %STARTSWITH 'A'
UNION ALL %PARALLEL
SELECT Name FROM Sample.Person WHERE Name %STARTSWITH 'A'
ORDER BY Name
Copy the code

%PARALLEL for SELECT queries and their subqueries. %PARALLEL cannot be used for INSERT command subqueries.

Adding the %PARALLEL keyword may not apply to all UNION queries and may result in an error. The following SQL constructs typically do not support UNION %PARALLEL execution: external joins, related fields, in-predicate conditions containing subqueries, or collection predicates. The for SOME predicate supports UNION %PARALLEL, but the for SOME %ELEMENT collection predicate does not. To determine whether a UNION query can successfully use %PARALLEL, test each branch of the UNION separately. Test each branch query separately by adding the FROM %PARALLEL keyword. If one of the FROM %PARALLEL queries generates a query plan that does not show parallelization, the UNION query will not support %PARALLEL.

UNION ALLAnd aggregate function

SQL automatic optimization pushes the UNION ALL aggregate function into the UNION branch subquery. SQL evaluates the aggregate value of each subquery and then combines the results to return the original aggregate value. Such as:

SELECT COUNT(Name) FROM (SELECT Name FROM Sample.Person
                    UNION ALL SELECT Name FROM Sample.Employee)
Copy the code

Optimization:

SELECT SUM(y) FROM (SELECT COUNT(Name) AS y FROM Sample.Person
                    UNION ALL SELECT COUNT(Name) AS y FROM Sample.Employee)
Copy the code

This can lead to substantial performance improvements. This optimization is applied whether or not the %PARALLEL keyword is used. This optimization applies to multiple aggregation functions.

This optimization transformation only occurs in the following cases:

  • External queryFROMThe clause must contain only oneUNION ALLStatements.
  • External queries cannot containWHEREClause orGROUP BYClause.
  • External queries cannot contain%VID(View ID) field.
  • Aggregate functions cannot containDISTINCTor%FOREACHThe keyword.
  • Aggregate functions cannot be nested.

The sample

The following example creates a result with a row for each Name in the two tables; If Name is found in both tables, two rows are created. When Name is an employee, it lists the location of the office and connects the word “office” to the state, along with the title of the employee. When Name is a person, it lists the primary location, concatenates the word “home” to status, and < NULL > denotes the title. The ORDER BY clause operates on the result; The merged rows are sorted by name:

SELECT Name,Office_State||' office' AS State,Title 
FROM Sample.Employee
UNION
SELECT Name,Home_State||' home'.NULL
FROM Sample.Person
ORDER BY Name
Copy the code

The following two examples show the effect of the ALL keyword. In the first example, UNION returns only unique values. In the second example, UNION ALL returns ALL values, including duplicates:

SELECT Name
FROM Sample.Employee
WHERE Name %STARTSWITH 'A'
UNION
SELECT Name
FROM Sample.Person
WHERE Name %STARTSWITH 'A'
ORDER BY Name
Copy the code

SELECT Name
FROM Sample.Employee
WHERE Name %STARTSWITH 'A'
UNION ALL
SELECT Name
FROM Sample.Person
WHERE Name %STARTSWITH 'A'
ORDER BY Name
Copy the code