This is the 21st day of my participation in Gwen Challenge.

Hello, I am Wukong.

Star: github.com/Jackson0714… Personal website: www.passjava.cn

Database topics

  • Why NoSQL when you have MySQL?
  • TempDB usage and performance issues
  • Table expressions (1)
  • Table expressions (2)
  • Explain SQL set operations in detail
  • Explain join queries in SQL
  • Explain subqueries in SQL

An overview of the

This article is mainly a summary of the basics of subqueries.

 

Key words:

External query: The query result set is returned to the caller

Internal query: The query result set is returned to the external query.

Independent subquery: Independent subquery A subquery that is independent of its external query and can run a separate subquery. Logically, an independent subquery is executed once before an external query is executed, and then the external query continues the query using the results of the subquery.

Related subquery: a subquery that references a table that appears in an external query. The query depends on the external query and cannot call it independently. Logically, the subquery evaluates each external row individually.

Scalar quantum query: a subquery that returns a single value. A scalar query can appear anywhere a single value is expected to be used in an external query.

Multivalued subquery: in a column

Why use subqueries?

You can avoid the need to split operations into multiple steps in a query solution and store intermediate query results in variables.

Independent subquery

1. Independent scale quantum query

Example: Return the maximum employee information for empID from the hr.employees table.

There are two steps:

A. Define a variable maxID. Query the empID of the employee with the largest EMPID through independent scalar quantum query, and then save this EMPID into the variable @maxID

B. Select empID = @maxID from WHERE

DECLARE @maxid AS INT = ( SELECT    MAX(empid)
                          FROM      HR.Employees
                        )
SELECT  *
FROM    HR.Employees
WHERE   empid = @maxid
Copy the code

A simpler approach is to nest a subquery that requires only one query statement to query for the maximum employee information for empID

SELECT  *
FROM    HR.Employees
WHERE   empid = ( SELECT    MAX(empid)
                  FROM      HR.Employees
                )
Copy the code

Note:

  1. A valid scalar quantum query cannot return more than one value, and if it returns more than one value, it may fail at run time.

  2. If the scalar query does not return any value, the result is converted to NULL, and the NULL row is compared to UNKNOWN. The query filter does not return any rows that the filter expression evaluates to UNKNOWN.

2. Independent multi-value subquery

(1) The syntax format of multi-value sub-query

< scalar expressions, > IN (< multivalued subquery, >)

Example: Return information that title contains orders processed by employees of the Manager

Scheme 1: independent multi – valued sub-query

SELECT  *
FROM    Sales.Orders
WHERE   empid IN ( SELECT   empid
                   FROM     HR.Employees
                   WHERE    HR.Employees.title LIKE '%Manager' )
Copy the code

Scheme two: inner join query

SELECT  *
FROM    Sales.Orders
        INNER JOIN HR.Employees ON Sales.Orders.empid = HR.Employees.empid
WHERE   HR.Employees.title LIKE '%Manager'
Copy the code

Similarly, many places can solve problems with either subqueries or join queries. Sometimes the database engine interprets two queries the same way, but in other cases it interprets them differently. You can solve the problem with one query first, and then try to replace the subquery with a join or subquery with a join if the performance is not good.

3. Distinct keyword of the subquery

When we want to remove duplicate values from a subquery, we think that we don’t need to specify the DISTINCT keyword in the subquery, which is not necessary because the database engine will help us remove duplicate values without having to display the distinct keyword.

Second, related sub-query

1. Related sub-query

What is a dependent subquery: A column that references a table that appears in an external query, depends on the external query, and cannot run a subquery independently. Logically, the subquery evaluates each external row individually.

Example: Query each customer to return all orders placed on the last day of his participation.

Expected Results:

Number of affected rows: 90

  1. First, the maximum order date is queried with independent scalar quantum query and returned to the external query
SELECT  MAX(orderdate)
FROM    sales.Orders AS O2
Copy the code
  1. The external query is filtered with o1.orderdate to filter out orders equal to the maximum orderdate

  2. Because we need to query every customer’s order, we change the independent standard quantum query into a related sub-query, and associate the sub-query o2. cusTID with the external query o1. custid.

For each row in O1, the subquery is responsible for returning the maximum order date for the current customer. If the order date of a row in O1 matches the order date returned by the subquery, the order date in O1 is the current customer’s maximum order date, in which case the query returns the row in the O1 table.

SELECT  MAX(orderdate)
FROM    sales.Orders AS O2
WHERE   O2.custid = O1.custid
Copy the code

Combine the above steps to get the following query statement:

SELECT  orderid,orderdate,custid
FROM    sales.Orders AS O1
WHERE   O1.orderdate = ( SELECT MAX(orderdate)
                         FROM   sales.Orders AS O2
                         WHERE  O2.custid = O1.custid
                       )
Copy the code

2. EXISTS predicates

  1. < external query > WHERE EXISTS
  2. Its input is a subquery: the predicate returns TRUE if the subquery can return any rows, and FALSE otherwise.
  3. If there are multiple sub-query results, the SQL SERVER engine returns one record immediately. This method is called short-circuit processing.
  4. The Exist predicate only cares about the existence of matching rows, not the columns specified in the SELECT list, so using SELECT * FROM TABLE has no negative effect, but in order to expandThe column names of the code have less overhead, but are still recommendedWildcard, query statements should be kept as natural and intuitive as possible, unless there is a very compelling reason to sacrifice code in this area.
  5. A NOT EXISTS predicate is the opposite of an EXISTS predicate

3. Exercises

1. Write a query that returns all Orders generated on the last day of activity in the Orders table.

Expected Results:

An independent subquery returns the date of the last day, and an external query filters out all orders whose order date equals the last day.

SELECT  orderid ,
        orderdate ,
        custid ,
        empid
FROM    Sales.Orders
WHERE   orderdate = ( SELECT    MAX(orderdate)
                      FROM      Sales.Orders

                    )
Copy the code

2. Query all orders placed by the customer with the largest number of orders.

Expected Results:

Independent subquery (); independent subquery ();

(1) First use sub-query to query the customer ID with the largest number of orders

(2) Then the ID is returned to the external query, which filters out all orders placed by the customer through the customer ID

Scheme 1: independent scalar quantum query

SELECT  custid ,
        orderid ,
        orderdate ,
        empid
FROM    Sales.Orders
WHERE   custid = ( SELECT TOP ( 1 ) WITH TIES
                            O.custid
                   FROM     Sales.Orders AS O
                   GROUP BY custid
                   ORDER BY COUNT(*) DESC

                 )
Copy the code

Note:

TOP ( 1 ) WITH TIES O.custid
Copy the code

Find all sorted rows equal to the first record, O.custid

Because the total order number of the customer with the most orders is 31 and there is only one customer (CusTID =71), the final query result shows only all orders placed by the customer with CusTID =71.

Select employees who have not processed orders since May 1st, 2008.

Expected Results:

This question examines the use of independent subqueries, which can also use two steps to query the results.

(1) First use the subquery to return all employees who have processed orders after May 1, 2008 (including this date), and return the empIDS of these employees to the external query

(2) Then the external query uses NOT IN to filter out all employees who have NOT processed orders since May 1, 2008 (inclusive)

Scheme 1: independent scale quantum query + NOT IN

SELECT  *
FROM    HR.Employees
WHERE   empid NOT IN ( SELECT   empid
                       FROM     Sales.Orders
                       WHERE    orderdate > = '20080501' )
Copy the code

4. Query the customer who placed an order in 2007 but did not place an order in 2008

Expected output:

 

Scheme 1: internal join + independent scalar quantum query

  1. Query Collection1 from 20070101 to 20071231
SELECT DISTINCT C.custid,companyname FROM Sales.Orders O
	INNER JOIN Sales.Customers AS C ON C.custid = O.custid
	WHERE (orderdate < = '20071231' AND orderdate > = '20070101')
Copy the code
  1. Query all customers who placed orders from 20080101 to 20081231 with Collection2
SELECT C.custid FROM Sales.Orders O
	INNER JOIN Sales.Customers AS C ON C.custid = O.custid
	WHERE (orderdate < = '20081231' AND orderdate > = '20080101')
Copy the code

3. The subset of Collection1 that does not contain Collection2 is the customer that placed an order in 2007 and placed an order in 2008

SELECT DISTINCT C.custid,companyname FROM Sales.Orders O
	INNER JOIN Sales.Customers AS C ON C.custid = O.custid
	WHERE (orderdate < = '20071231' AND orderdate > = '20070101')
	AND C.custid NOT IN 
(
	SELECT C.custid FROM Sales.Orders O
	INNER JOIN Sales.Customers AS C ON C.custid = O.custid
	WHERE (orderdate < = '20081231' AND orderdate > = '20080101'))Copy the code

Scheme 2: The related subquery EXISTS+NOT EXISTS

  1. Query Collection1 from 20070101 to 20071231

  2. Query all customers who placed orders from 20080101 to 20081231 with Collection2

3. The subset of Collection1 that does not contain Collection2 is the customer that placed an order in 2007 and placed an order in 2008

SELECT  C.custid ,
        companyname
FROM    Sales.Customers AS C
WHERE   EXISTS ( SELECT *
                 FROM   Sales.Orders AS O
                 WHERE  O.custid = C.custid
                        AND ( orderdate < = '20071231'
                              AND orderdate > = '20070101'))AND NOT EXISTS ( SELECT *
                         FROM   Sales.Orders AS O
                         WHERE  O.custid = C.custid
                                AND ( orderdate < = '20081231'
                                      AND orderdate > = '20080101'))Copy the code

According to scheme 1 and 2, we can conclude that INNER JOIN+ independent subquery can be replaced by Exists+ related subquery

5. Query the customer who ordered Product No. 12

Expected Results:

Scheme 1: internal join multiple tables

SELECT DISTINCT
        C.custid ,
        companyname
FROM    Sales.Customers AS C
        INNER JOIN Sales.Orders AS O ON C.custid = O.custid
        INNER JOIN Sales.OrderDetails AS D ON O.orderid = D.orderid
WHERE   D.productid = '12'
Copy the code

Scheme 2: Nested related subqueries

SELECT  C.custid ,
        companyname
FROM    Sales.Customers AS C
WHERE   EXISTS ( SELECT *
                 FROM   Sales.Orders AS O
                 WHERE  O.custid = C.custid
                        AND EXISTS ( SELECT *
                                     FROM   Sales.OrderDetails AS D
                                     WHERE  D.orderid = O.orderid
                                            AND D.productid = '12'))Copy the code

References:

SQL2008 technical Insider: T-SQL Language Basics

Welcome to follow my official account: “Wukong Chat Framework”

About the author: 8-year Veteran of Internet workplace | full stack engineer | super dad after 90 | open source practitioner | owner of public number 10,000 fans original number. Blue Bridge signed the author, the author of “JVM performance tuning practice” column, handwritten a set of 70,000 words SpringCloud practice summary and 30,000 words distributed algorithm summary. Welcome to follow my public account “Wukong Chat framework”, free access to learning materials.

I am Wukong, strive to become stronger, become super Saiya people!