This is the 15th day of my participation in Gwen Challenge.

Hello, I am Wukong.

Summary:

This article is a summary of the operation basis of union, intersection and difference set in set operation.

There are three kinds of set operations:

1. Union operation

2. Intersection operation

3. Difference set operation

Here is a mind map for set operations:

 

Why set operations are used:

1. More convenient in set operations than join queries and EXISTS/NOT EXISTS.

The following SQL script will help you create databases, create tables, and insert data.

Download the script file: tsqlals2008.zip

Set operation

1. Set operation

(1) Operations performed on two or more sets of input.

(2) Multiple sets: An intermediate result set generated by two input queries that may contain duplicate records.

(3) T-SQL supports three set operations: UNION (UNION), intersection (INTERSECT), difference (EXCEPT)

2. Grammar

Basic format of set operation:

Enter query 1

< set operator >

Input query 2

[ORDER BY]

3. The required

(1) The input query cannot contain the ORDER BY sentence;

(2) An ORDER BY sentence can be optionally added for the whole set operation result;

(3) Each single query can contain all logical query processing stages (processing the ORDER BY sentence that controls the ORDER);

(4) Both queries must contain the same number of columns;

(5) The corresponding column must have compatible data types. Compatible data types: Lower-priority data types must be implicitly converted to higher-level data types. For example, if the first column of input query 1 is int and the first column of input query 2 is float, the lower data type int can be implicitly converted to a higher float. If the first column of the input query 1 is of type CHAR and the first column of the input query 2 is of type datetime, the conversion fails: The conversion fails when converting date and/or time from a string;

(6) The column name in the set operation result is determined by the input query 1. If the result column is to be allocated to the result, the corresponding alias should be allocated to the input query 1;

(7) In the set operation, when the row is compared, the set operation considers the two NULL equal;

(8) UNION Supports both DISTINCT and ALL. The statement with DISTINCT cannot be displayed. If ALL is not specified, DISTINCT is used by default.

(9) INTERSET and EXCEPT use DISTINCT by default. ALL is not supported.

2. UNION set operation

1. Venn diagram of union

Union: The union of two sets is A set that contains all the elements of sets A and B.

The shaded area in the figure represents the union of sets A and B

2.UNION ALL set operation

Query1 UNION ALL Query2 returns (m+n) rows, Query1 UNION ALL Query2 returns (m+n) rows;

(2) UNION ALL does not delete duplicate rows, so it results in multiple sets rather than a true set;

(3) The same line may appear more than once in the result.

3.UNION DISTINCT set operation

(1) If Query1 returns M rows, Query2 returns N rows, and Query1 and Query2 have the same H rows, then Query1 UNION Query2 returns (m+ n-H) rows;

(2) UNION deletes duplicate rows, so its result is a set;

(3) The same row appears only once in the result.

(4) The statement DISTINCT cannot be specified. If ALL is not specified, DISTINCT is used by default.

(5) When Query1 and Query2 compare whether a row record is equal, the NULL column is considered to be equal.

INTERSECT set operation

1. Venn diagram of intersection

Intersection: The intersection of two sets (denoted as sets A and B) is the set consisting of all elements belonging to both A and B.

The shaded areas in the figure represent the intersection of sets A and B

2.INTERSECT DISTINCT set operation

(1) hypothesis Query1 return m line, Query2 return n lines, have the same h Query1 and Query2 line, the Query1 intersects Query2 return h line;

INTERSECT set operation logically first deletes duplicate rows in two input sets (turning the sets into sets) and then returns rows that only appear in both sets;

INTERSECT deletes duplicate rows, so its result is a set;

(4) The same row appears only once in the result.

(5) DISTINCT clauses cannot be specified. If ALL is not specified, DISTINCT is used by default.

(6) When Query1 and Query2 compare whether a row record is equal, the NULL column is considered to be equal.

The INTERSECT set operation can be replaced by an inner join or an EXISTS predicate, but NULL must be handled, otherwise when NULL values are compared by both methods, the result will be UNKNOWN and such rows will be filtered.

3.INTERSECT ALL set operation

(1) ANSI SQL supports the INTERSECT set operation with the ALL option, but SQL Server2008 does not currently implement this operation. An alternative for the T-SQL implementation is provided later;

(2) Assuming that Query1 returns M rows and Query2 returns N rows, if row R appears x times in Query1 and y times in Query2, then row R should appear minimum(x, y) times after the INTERSECT ALL operation.

INTERSECT ALL set operation for T-SQL implementation is provided below: common table expression + ranking function

WITH    INTERSECT_ALL
          AS ( 
				SELECT   ROW_NUMBER(a)OVER ( PARTITION BY country, region, city ORDER BY ( SELECT
                                                              0))AS rownum ,
                        country ,
                        region ,
                        city
               FROM     HR.Employees
               INTERSECT
               SELECT   ROW_NUMBER(a)OVER ( PARTITION BY country, region, city ORDER BY ( SELECT
                                                              0))AS rownum ,
                        country ,
                        region ,
                        city
               FROM     Sales.Customers
             )
    SELECT  country ,
            region ,
            city
    FROM    INTERSECT_ALL
Copy the code

The results are as follows:

Where UK NULL London has four repeated rows,

Using ORDER BY (SELECT < constant >) in the OVER clause of the sort function tells SQL Server not to care about the ORDER of the rows.

EXCEPT (difference set) set operation

1. Venn diagram of difference set

Difference set: The set of two sets (denoted as sets A and B) consisting of all elements that belong to set A but not to set B.

The shaded areas in the figure represent the difference between sets A and B

2.EXCEPT DISTINCT set arithmetic

(1) Assuming that Query1 returns M rows and Query2 returns N rows, and that Query1 and Query2 have the same H rows, Query1 INTERSECT Query2 returns m-H rows, while Query2 INTERSECT Query1 returns n-H rows

(2) An EXCEPT set operation logically removes duplicate rows from two input sets (turning sets into sets) and then returns only rows in the first set and not all rows in the second set.

(3) EXCEPT removes duplicate rows, so its result is a collection;

(4) EXCEPT is asymmetric, and the result of the difference set depends on the context of the two queries.

(5) The same row appears only once in the result.

(6) DISTINCT clauses cannot be specified. If ALL is not specified, DISTINCT is used by default.

(7) When Query1 and Query2 compare whether a row record is equal, the NULL column is considered to be equal.

INTERSECT set operation can be replaced by left outer join or NOT EXISTS predicate, but NULL must be handled, otherwise when NULL values are compared by both methods, the result will be UNKNOWN and such rows will be filtered out.

EXCEPT ALL set operation

(1) ANSI SQL supports EXCEPT set arithmetic with ALL option, but SQL Server2008 does not currently implement this arithmetic. An alternative for the T-SQL implementation is provided later;

(2) Assuming that Query1 returns M rows and Query2 returns N rows, if row R occurs x times in Query1 and y times in Query2, and x>y, then row R should occur x-y times after the EXCEPT ALL operation.

The following provides an EXCEPT ALL set operation for a T-SQL implementation: common table expression + ranking function

WITH    INTERSECT_ALL
          AS(SELECT   ROW_NUMBER(a)OVER ( PARTITION BY country, region, city ORDER BY ( SELECT
                                                              0))AS rownum ,
                        country ,
                        region ,
                        city
               FROM     HR.Employees
               EXCEPT
               SELECT   ROW_NUMBER(a)OVER ( PARTITION BY country, region, city ORDER BY ( SELECT
                                                              0))AS rownum ,
                        country ,
                        region ,
                        city
               FROM     Sales.Customers
             )
    SELECT  country ,
            region ,
            city
    FROM    INTERSECT_ALL
Copy the code

The results are as follows:

Priority of set operation

1.INTERSECT>UNION=EXCEPT

2. Calculate the INTERSECT first, and then process the same operations of priority from left to right.

3. Parentheses can be used to control the precedence of the set operation, which has the highest precedence.

Six, special treatment

1. Only ORDER BY can be directly applied to the result of set operation;

2. Other stages, such as table operators, WHERE, GROUP BY, and HAVING, do not support direct application to the result of a set operation. Table expressions can be used to avoid this limitation. For example, define a table expression based on a query containing a set operation, and then apply any required logical query processing to the table expression in an external query;

3. The ORDER BY clause cannot be applied directly to a single query in a set operation. The TOP+ORDER BY clause + table expression can be used to avoid this limitation. For example, define a table expression based on the TOP query and then participate in the set operation through an external query that uses the table expression.

7. Exercises

1. Write a query that returns customers and employees with order activity in January 2008 and no order activity in February 2008.

Expected Results:

EXCEPT

(1) First use query 1 to query the customers and employees who had order activities in January 2008

(2) Use query 2 to query customers and employees for order activity in February 2008

(3) Query customers and employees with order activity in January 2008 and no order activity in February 2008 with difference set operator

SELECT  custid ,
        empid
FROM    Sales.Orders
WHERE   orderdate > = '20080101'
        AND orderdate < '20080201'
EXCEPT
SELECT  custid ,
        empid
FROM    Sales.Orders
WHERE   orderdate > = '20080201'
        AND orderdate < '20080301'` ` `sql

**Scheme 2:NOT EXISTS**Ensure that custid and EMPID cannot benullTo useNOT EXISTSQuery if cusTID or empID is presentnullIf the value exists, it cannot be usedNOT EXISTSQuery because compareNULLThe result of the value is zeroUNKNOWNIt works like thisNOT EXISTSThe rows of the subquery returned by the query are filtered out, so there are more external queries at the endNULLValue of the row, the final query results will be moreNULLThe value of the line.Copy the code

SELECT custid , empid FROM Sales.Orders AS O1 WHERE orderdate >= ‘20080101’ AND orderdate < ‘20080201’ AND NOT EXISTS ( SELECT * FROM Sales.Orders WHERE orderdate >= ‘20080201’ AND orderdate < ‘20080301’ AND custid = o1.custid AND empid = o1.empid ) ORDER BY O1.custid ,

    O1.empid
Copy the code
If I insert two rows into the sales.Orders table: insert cutid=NULL,empid=1,orderdate='20080101'` ` `sql
INSERT  INTO [TSQLFundamentals2008].[Sales].[Orders]
        ( [custid] ,
          [empid] ,
          [orderdate] ,
          [requireddate] ,
          [shippeddate] ,
          [shipperid] ,
          [freight] ,
          [shipname] ,
          [shipaddress] ,
          [shipcity] ,
          [shipregion] ,
          [shippostalcode] ,
          [shipcountry]
        )
VALUES  ( NULL ,
          1 ,
          '20080101' ,
          '20080101' ,
          '20080101' ,
          1 ,
          1 ,
          'A' ,
          '20080101' ,
          'A' ,
          'A' ,
          'A' ,
          'A'
        )
GO
 
Copy the code

Insert cutid = NULL, empid = 1, orderdate = ‘20080201’

INSERT  INTO [TSQLFundamentals2008].[Sales].[Orders]
        ( [custid] ,
          [empid] ,
          [orderdate] ,
          [requireddate] ,
          [shippeddate] ,
          [shipperid] ,
          [freight] ,
          [shipname] ,
          [shipaddress] ,
          [shipcity] ,
          [shipregion] ,
          [shippostalcode] ,
          [shipcountry]
        )
VALUES  ( NULL ,
          1 ,
          '20080201' ,
          '20080101' ,
          '20080101' ,
          1 ,
          1 ,
          'A' ,
          '20080101' ,
          'A' ,
          'A' ,
          'A' ,
          'A'
        )
GO
Copy the code

If cutid=NULL and empID =1, select cutid=NULL and empID =1

If cutid=NULL and empID =1, cutid=NULL and empID =1 will not be filtered out

The following solution solves the above problem by handling cutid=NULL, or empID =NULL. Return to 50 lines

SELECT  custid ,
        empid
FROM    Sales.Orders AS O1
WHERE   orderdate > = '20080101'
        AND orderdate < '20080201'
        AND NOT EXISTS ( SELECT *
                     FROM   Sales.Orders
                     WHERE  orderdate > = '20080201'
                            AND orderdate < '20080301'
                            AND ( custid = o1.custid
                                  AND empid = o1.empid
                                )
                            OR ( o1.custid IS NULL
                                 AND custid IS NULL
                                 AND empid = o1.empid
                               )
                            OR ( o1.custid IS NULL
                                 AND custid IS NULL
                                 AND o1.empid IS NULL
                                 AND custid IS NULL
                               )
                            OR ( custid = o1.custid
                                 AND o1.empid IS NULL
                                 AND custid IS NULL))Copy the code

2. Write a query that returns customers and employees with order activity in January 2008 and February 2008.

Expected Results:

 

Option 1: INTERSECT

(1) First use query 1 to query the customers and employees who had order activities in January 2008

(2) Use query 2 to query customers and employees for order activity in February 2008

(3) Use intersection operator to query customers and employees with order activity in January 2008 and February 2008

SELECT  custid ,
        empid
FROM    Sales.Orders
WHERE   orderdate > = '20080101'
        AND orderdate < '20080201'
INTERSECT
SELECT  custid ,
        empid
FROM    Sales.Orders
WHERE   orderdate > = '20080201'
        AND orderdate < '20080301'
Copy the code

Scheme 2: EXISTS

The value of CUStid and empID must not be null in order to perform a query in EXISTS. If there is a null value in CUStid or empID, the result of comparing null values in EXISTS is UNKNOWN. Such rows are filtered out for subqueries returned by an EXISTS query, so the final external query will have fewer rows with NULL values and the final query result will have fewer rows with NULL values.

SELECT  custid ,
        empid
FROM    Sales.Orders AS O1
WHERE   orderdate > = '20080101'
        AND orderdate < '20080201'
        AND EXISTS ( SELECT *
                     FROM   Sales.Orders
                     WHERE  orderdate > = '20080201'
                            AND orderdate < '20080301'
                            AND custid = o1.custid
                            AND empid = o1.empid )
ORDER BY O1.custid ,
        O1.empid
Copy the code

If I insert two rows into the sales.Orders table:

Insert cutid = NULL, empid = 1, orderdate = ‘20080101’

Insert cutid = NULL, empid = 1, orderdate = ‘20080201’

Empid =1; cutid=NULL; empID =1; cutid=NULL; empID =1

Cutid =NULL and EMPID =1 will be filtered out

The following solution solves the above problem by handling cutid=NULL, or empID =NULL. Return 6 lines.

SELECT  custid ,
        empid
FROM    Sales.Orders AS O1
WHERE   orderdate > = '20080101'
        AND orderdate < '20080201'
        AND EXISTS ( SELECT *
                     FROM   Sales.Orders
                     WHERE  orderdate > = '20080201'
                            AND orderdate < '20080301'
                            AND ( custid = o1.custid
                                  AND empid = o1.empid
                                )
                            OR ( o1.custid IS NULL
                                 AND custid IS NULL
                                 AND empid = o1.empid
                               )
                            OR ( o1.custid IS NULL
                                 AND custid IS NULL
                                 AND o1.empid IS NULL
                                 AND custid IS NULL
                               )
                            OR ( custid = o1.custid
                                 AND o1.empid IS NULL
                                 AND custid IS NULL))Copy the code

Write a query to return customers and employees who had order activity in January 2008 and February 2008, but no order activity in 2007

Expected Results:

INTERSECT + EXCEPT

SELECT  custid ,
        empid
FROM    Sales.Orders
WHERE   orderdate > = '20080101'
        AND orderdate < '20080201'
INTERSECT
SELECT  custid ,
        empid
FROM    Sales.Orders
WHERE   orderdate > = '20080201'
        AND orderdate < '20080301'
EXCEPT
SELECT  custid ,
        empid
FROM    Sales.Orders
WHERE   orderdate > = '20070101'
        AND orderdate < '20080101'
Copy the code

Scheme 2: EXISTS + NOT EXISTS

SELECT  custid ,
        empid
FROM    Sales.Orders AS O1
WHERE   orderdate > = '20080101'
        AND orderdate < '20080201'
        AND EXISTS ( SELECT *
                     FROM   Sales.Orders
                     WHERE  orderdate > = '20080201'
                            AND orderdate < '20080301'
                            AND ( custid = o1.custid
                                  AND empid = o1.empid
                                )
                            OR ( custid = o1.custid
                                 AND custid IS NULL
                                 AND empid = o1.empid
                               )
                            OR ( custid = o1.custid
                                 AND custid IS NULL
                                 AND empid = o1.empid
                                 AND custid IS NULL
                               )
                            OR ( custid = o1.custid
                                 AND empid = o1.empid
                                 AND custid IS NULL
                               )
                            OR ( custid = o1.custid
                                 AND custid IS NULL
                                 AND empid = o1.empid
                                 AND custid IS NULL))AND NOT EXISTS ( SELECT *
                         FROM   Sales.Orders
                         WHERE  orderdate > = '20070101'
                                AND orderdate < '20080101'
                                AND ( custid = o1.custid
                                      AND empid = o1.empid
                                    )
                                AND ( custid = o1.custid
                                      AND empid = o1.empid
                                    )
                                OR ( custid = o1.custid
                                     AND custid IS NULL
                                     AND empid = o1.empid
                                   )
                                OR ( custid = o1.custid
                                     AND custid IS NULL
                                     AND empid = o1.empid
                                     AND custid IS NULL
                                   )
                                OR ( custid = o1.custid
                                     AND empid = o1.empid
                                     AND custid IS NULL
                                   )
                                OR ( custid = o1.custid
                                     AND custid IS NULL
                                     AND empid = o1.empid
                                     AND custid IS NULL))ORDER BY O1.custid ,
        O1.empid
Copy the code

References:

SQL2008 technical Insider: T-SQL Language Basics

Welcome to My Github Star: github.com/Jackson0714…

Welcome to follow my official account: “Wukong Chat Framework”

About the author: 8-year Veteran of Internet workplace | full stack engineer | super dad after 90 | open source practitioner | owner of public number 10,000 fans original number. Blue Bridge signed the author, the author of “JVM performance tuning practice” column, handwritten a set of 70,000 words SpringCloud practice summary and 30,000 words distributed algorithm summary. Welcome to follow my public account “Wukong Chat framework”, free access to learning materials.

I am Wukong, strive to become stronger, become super Saiya people!