Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

Subqueries join with tables

  • Subqueries (nested SQL)

SELECT statements are SQL queries. All the SELECT statements we’ve seen so far have been simple queries, that is, a single statement that retrieves data from a single database table.

SQL also allows you to create subqueries, which are queries nested within other queries.

Use subqueries for filtering

The order is stored in two tables. The Orders table stores a row for each order that contains the order number, customer ID, and order date. The items for each order are stored in the associated OrderItems table. The Orders table does not store customer information. It stores only the customer ID.

The actual customer information is stored in the Customers table.

Now, if I need to list all the customers who ordered the item TNT2, how do I retrieve it?

--(1) Retrieve all order numbers containing item TNT2.
 select order_num from orderitems where prod_id = 'TNT2';
+-----------+
| order_num |
+-----------+
|     20005 |
|     20007 |
+-----------+
--(2) Retrieve the ids of all customers with the order numbers listed in the previous step
 select cust_id from orders where order_num IN (20005.20007);
+---------+
| cust_id |
+---------+
|   10001 |
|   10004 |
+---------+
--(3) Retrieve customer information for all customer ids returned in the previous step.
 select cust_name,cust_contact from customers where cust_id in (10001.10004);
+----------------+--------------+
| cust_name      | cust_contact |
+----------------+--------------+
| Coyote Inc.    | Y Lee        |
| Yosemite Place | Y Sam        |
+----------------+--------------+
Copy the code

Instead of hard-coding the data returned by the SQL, we can convert the WHERE clauses into subqueries:

 select cust_name,cust_contact 
 from customers 
 where cust_id in (select cust_id 
 					from orders 
 					where order_num IN (select order_num 
 										from orderitems 
 										where prod_id = 'TNT2'));
+----------------+--------------+
| cust_name      | cust_contact |
+----------------+--------------+
| Coyote Inc.    | Y Lee        |
| Yosemite Place | Y Sam        |
+----------------+--------------+
MySQL actually has to execute 3 SELECT statements in order to execute the above SELECT statement.
The innermost subquery returns a list of order numbers used in the WHERE clause of the outer subquery.
The outer subquery returns a list of customer ids used in the WHERE clause of the outermost query.
The outermost query does return the required data.
Copy the code

The code presented here works and gets the desired result.

However, using subqueries is not always the most efficient way to perform this type of data retrieval.

Use subqueries as computed fields

Another way to use subqueries is to create computed fields.

Suppose we need to display the total number of orders for each customer in the Customers table. The order is stored in the Orders table with the corresponding customer ID.

Retrieve the list of customers from the Customers table.
 select cust_id,cust_name from customers ;
+---------+----------------+
| cust_id | cust_name      |
+---------+----------------+
|   10001 | Coyote Inc.    |
|   10002 | Mouse House    |
|   10003 | Wascals        |
|   10004 | Yosemite Place |
|   10005 | E Fudd         |
+---------+----------------+

-- (2) Count the number of orders in the Orders table for each customer retrieved.
 select count(*) as orders  from orders  where cust_id = 10001;
+--------+
| orders |
+--------+
|      2 |
+--------+
Copy the code

To perform a COUNT() calculation for each customer, COUNT() should be treated as a subquery.

select cust_id,cust_name,
		(select count(*) 
         	from orders
         	where orders.cust_id = customers.cust_id) as orders 
         )
from customers 
order by cust_name;
+---------+----------------+--------+
| cust_id | cust_name      | orders |
+---------+----------------+--------+
|   10001 | Coyote Inc.    |      2 |
|   10005 | E Fudd         |      1 |
|   10002 | Mouse House    |      0 |
|   10003 | Wascals        |      6 |
|   10004 | Yosemite Place |      1 |
+---------+----------------+--------+
Copy the code

Orders is a computed field that is established by the subquery in parentheses. This subquery is executed once for each customer retrieved. In this example, the subquery was executed five times because five customers were retrieved.

Note: The WHERE clause in the subquery is slightly different from the previous WHERE clause because it uses fully qualified column names. This type of subquery is called a related subquery. This syntax must be used whenever column names can be ambiguous (table names and column names are separated by a period). Because there are two CUST_ID columns, one in Customers and one in Orders, you need to compare the two columns to correctly match the orders with their corresponding customers. If the column names are not fully qualified, MySQL will assume that you are comparing cust_ID in the Orders table to yourself.

  • A relational table

    One of the most powerful features of SQL is the ability to join tables in the execution of a data retrieval query.

    Before you can use joins effectively, you must understand some of the basics of relational tables and relational database design.

    Suppose you have a database table that contains a catalog of products in a row for each category.
    The information to be stored for each item includes product description and price, as well as information about the supplier that produced the product.Product list: Product, description, price, Supplier name, Supplier address, Supplier contact information A6... . Faw Audi... .520li .. . BMW... . .Now, if there are multiple items produced by the same supplier, where will the supply be stored
    -- What about the business information (supplier name, address, contact information, etc.)?Product, description, price, supplier name, supplier address, supplier contact information A6... . Faw Audi... .520li .. . BMW... . A8 .. . Audi... .Copy the code

    Having the same data more than once is never a good thing, and this factor is fundamental to relational database design.

    Relational tables are designed to ensure that information is decomposed into multiple tables, one table for each type of data.

    The tables are related to each other through some common values, known as relational in relationship design.

    In this example, two tables can be created, one to store vendor information and the other to store product information.

    The VENDORS table contains all vendor information
    |vend_id | vend_name | vend_address| vend_city ....
    
    The -- Products table stores only product information, and it stores no vendor information other than vendor ids (vendors' primary key).
    prod_id | vend_id | prod_name | prod_price | prod_desc
    
    Copy the code

    The vendors table’s primary key, also known as the foreign key of Products, associates the Vendors table with the Products table and uses the vendor ID to find out the vendors’ details from the Vendors table. The benefits of this are as follows:

    • Supplier information is not repeated, so as not to waste time and space;
    • If vendor information changes, only individual records in the VENDORS table can be updated, leaving the data in the vendors table unchanged;
    • Since there is no duplication of data, it is obvious that the data is consistent, which makes processing the data easier

    Relational data can be stored efficiently and easily processed. Therefore, relational databases scale much better than non-relational databases.

  • Table joins

    If data is stored in multiple tables, how can a single SELECT statement retrieve the data?

    The answer is to use joins. Simply put, a join is a mechanism for associating tables in a SELECT statement, hence the name join.

    Using special syntax, you can join multiple tables to return a set of output, joining the correct rows in the associated table at run time.

    Join creation is simple, specifying all the tables to join and how they are related.
    select vend_name,prod_name,prod_price 
    from vendors,products 
    where vendors.vend_id = products.vend_id 
    order by vend_name,prod_name;
    +-------------+----------------+------------+
    | vend_name   | prod_name      | prod_price |
    +-------------+----------------+------------+
    | ACME        | Bird seed      |      10.00 |
    | ACME        | Carrots        |       2.50 |
    | ACME        | Detonator      |      13.00 |
    | ACME        | Safe           |      50.00 |
    | ACME        | Sling          |       4.49 |
    | ACME        | TNT (1 stick)  |       2.50 |
    | ACME        | TNT (5 sticks) |      10.00 |
    | Anvils R Us | . 5 ton anvil   |       5.99 |
    | Anvils R Us | 1 ton anvil    |       9.99 |
    | Anvils R Us | 2 ton anvil    |      14.99 |
    | Jet Set     | JetPack 1000   |      35.00 |
    | Jet Set     | JetPack 2000   |      55.00 |
    | LT Supplies | Fuses          |       3.42 |
    | LT Supplies | Oil can        |       8.99 |
    +-------------+----------------+------------+
    14 rows in set (0.00 sec)
    
    The two tables are joined correctly with the WHERE clause, which indicates that MySQL matches vend_id in the VENDORS table and vend_id in the Products table.
    -- You can see that the two columns to match are specified as vendors. Vend_id and products.vend_id. This fully qualified column name is needed here, because if only vend_id is given, MySQL doesn't know which one it refers to (there are two of them, one in each table).
    Fully qualified column names (table and column names separated by a dot) must be used when the referenced column may be ambiguous.
    Copy the code

    When joining two tables, what you’re really doing is pairing each row in the first table with each row in the second table.

    As a filter condition, the WHERE clause contains only those rows that match the given condition (in this case, the join condition).

    Can you imagine the above SQL without the WHERE condition?

    select vend_name,prod_name,prod_price from vendors,products

    Without the WHERE condition, each row in the first table pairs with each row in the second table, regardless of whether they logically match

    The result returned by a table relation with no join condition is the Cartesian product. The number of rows retrieved will be the number of rows in the first table multiplied by the number of rows in the second table.

    Don’t forget the WHERE clause

    You should ensure that all joins have a WHERE clause, otherwise MySQL will return much more data than it wants.

    Similarly, the WHERE clause should be guaranteed to be correct. Incorrect filtering criteria will cause MySQL to return incorrect data

    In fact, a slightly different syntax can be used for this type of join to explicitly specify the type of join.

    select vend_name,prod_name,prod_price from vendors inner join products on vendors.vend_id = products.vend_id;

    The relationship between the two tables is part of the FROM clause and is specified as an INNER JOIN.

    When using this syntax, join conditions are given in a specific ON clause instead of a WHERE clause.

    The actual conditions passed to ON are the same as those passed to WHERE.

    The SQL specification prefers the INNER JOIN syntax.

  • Join multiple tables

    SQL has no limit on the number of tables that can be joined in a SELECT statement.

    The basic rules for creating joins are the same. First list all the tables, then define the relationships between the tables.

    select prod_name,vend_name,prod_price,quantity 
    from orderitems,products,vendors 
    	where products.vend_id = vendors.vend_id 
    	and orderitems.prod_id = products.prod_id 
    	and order_num = 20005;
    Copy the code

    MySQL associates each table specified at run time to handle joins. This process can be very resource-intensive, so care should be taken not to join unnecessary tables. The more tables are joined, the more performance degrades.

  • Use the table alias AS

    In addition to being used for column names and computed fields, SQL also allows you to alias table names.

    There are two main reasons for doing this:

    • Shorten SQL statements;

    • Allows the same table to be used multiple times in a single SELECT statement

    It should be noted that table aliases are only used in query execution. Unlike column aliases, table aliases are not returned to the client

  • Since the coupling

    Let’s say you find a problem with an item whose ID is DTNTR and want to know if the supplier that made the item has problems with other items. This query requires that you first find the vendor that produces the item with ID DTNTR, and then find out what other items that vendor produces.

     -- Use subqueries (nested queries)
     select prod_id,prod_name 
     from products 
     	where vend_id = (select vend_id from products where prod_id = 'DTNTR');
    +---------+----------------+
    | prod_id | prod_name      |
    +---------+----------------+
    | DTNTR   | Detonator      |
    | FB      | Bird seed      |
    | FC      | Carrots        |
    | SAFE    | Safe           |
    | SLING   | Sling          |
    | TNT1    | TNT (1 stick)  |
    | TNT2    | TNT (5 sticks) |
    +---------+----------------+
    
    -- Same query with join:
     select p1.prod_id,p1.prod_name 
     from products as p1,products as p2 
     	where p1.vend_id = p2.vend_id and p2.prod_id = 'DTNTR';
    +---------+----------------+
    | prod_id | prod_name      |
    +---------+----------------+
    | DTNTR   | Detonator      |
    | FB      | Bird seed      |
    | FC      | Carrots        |
    | SAFE    | Safe           |
    | SLING   | Sling          |
    | TNT1    | TNT (1 stick)  |
    | TNT2    | TNT (5 sticks) |
    +---------+----------------+
    The two tables needed in this query are actually the same tables, so the Products table appears twice in the FROM clause. Although this is perfectly legal, a reference to Products is ambiguous because MySQL doesn't know which instance of the Products table you are referring to.
    To resolve this problem, table aliases are used. The first occurrence of products is alias P1 and the second occurrence is alias P2. You can now use these aliases as table names.
    For example, the SELECT statement explicitly gives the full name of the desired column using the P1 prefix. If not, MySQL will return an error because there are two columns named prod_id and prod_NAME, respectively. MySQL does not know which column it wants (even though they are actually the same column). WHERE (by matching vend_id in P1 and vend_id in P2) joins the two tables first, then filters the data against prod_ID in the second table to return the required data
    Copy the code

    Self-joins are often used as external statements to replace the subquery statements used to retrieve data from the same table.

    Although the end result is the same, sometimes processing joins is much faster than processing subqueries.

  • External links

    Many joins associate rows in one table with rows in another table. But sometimes you need to include rows that have no associated rows.

    For example, you might need to use joins to do the following:

    • Count how many orders have been placed by each customer, including those that have not been placed so far;
    • List all products and order quantity, including products that no one ordered;
    • Calculate average sales size, including customers who have not placed orders so far

    In the example above, the join contains rows that have no associated rows in related tables. This type of join is called an outer join.

    -- Internal connection. It retrieves all customers and their orders:
    select customers.cust_id,orders.order_num from customers inner join orders on customers.cust_id = orders.cust_id;
    +---------+-----------+
    | cust_id | order_num |
    +---------+-----------+
    |   10001 |     20005 |
    |   10001 |     20009 |
    |   10003 |     20006 |
    |   10004 |     20007 |
    |   10005 |     20008 |
    +---------+-----------+
    5 rows in set (0.00 sec)
    
    External join syntax is similar. Retrieve all customers, including those without orders
     select customers.cust_id,orders.order_num from customers left join orders on customers.cust_id = orders.cust_id;
    +---------+-----------+
    | cust_id | order_num |
    +---------+-----------+
    |   10001 |     20005 |
    |   10001 |     20009 |
    |   10002 |      NULL |
    |   10003 |     20006 |
    |   10004 |     20007 |
    |   10005 |     20008 |
    +---------+-----------+
    6 rows in set (0.00 sec)
    Copy the code

    Aggregation functions can also be easily used with other joins.

    If you want to retrieve all customers and the number of orders placed by each customer, the code below uses the COUNT() function to do this

    Includes customers who didn’t place any orders.

     select customers.cust_name,customers.cust_id,count(orders.order_num) as num_ord from customers left join orders on customers.cust_id = orders.cust_id group by customers.cust_id;
    +----------------+---------+---------+
    | cust_name      | cust_id | num_ord |
    +----------------+---------+---------+
    | Coyote Inc.    |   10001 |       2 |
    | Mouse House    |   10002 |       0 |
    | Wascals        |   10003 |       1 |
    | Yosemite Place |   10004 |       1 |
    | E Fudd         |   10005 |       1 |
    +----------------+---------+---------+
    Copy the code
    • Ensure that the correct join conditions are used, otherwise incorrect data will be returned.
    • You should always provide the join condition, otherwise you get the Cartesian product.
    • You can have multiple tables in a join, and you can even use different join types for each join. While this is legal and generally useful, you should test each join separately before testing them together. This will make troubleshooting easier
  • Combined query UNION

    MySQL also allows you to execute multiple queries (multiple SELECT statements) and return the results as a single query result set.

    These combined queries are often called union or compound Queries.

    The UNION rules

    • A UNION must consist of two or more SELECT statements separated by the keyword UNION (therefore, if you combine four SELECT statements, you will use three UNION keywords).
    • Each query in the UNION must contain the same column, expression, or aggregate function (although the columns need not be listed in the same order)
    • Column data types must be compatible: the types do not have to be identical, but they must be types that the DBMS can implicitly convert (for example, different numeric types or different date types).
    Let's say you need a list of all items with a price less than or equal to 5, and you also want to include all items made by suppliers 1001 and 1002.
    
    -- Query the first result
     select vend_id,prod_id,prod_price from products where prod_price < = 5;
    +---------+---------+------------+
    | vend_id | prod_id | prod_price |
    +---------+---------+------------+
    |    1003 | FC      |       2.50 |
    |    1002 | FU1     |       3.42 |
    |    1003 | SLING   |       4.49 |
    |    1003 | TNT1    |       2.50 |
    +---------+---------+------------+
    4 rows in set (0.00 sec)
    
    -- Query the second result
     select vend_id,prod_id,prod_price from products where vend_id in(1001.1002);
    +---------+---------+------------+
    | vend_id | prod_id | prod_price |
    +---------+---------+------------+
    |    1001 | ANV01   |       5.99 |
    |    1001 | ANV02   |       9.99 |
    |    1001 | ANV03   |      14.99 |
    |    1002 | FU1     |       3.42 |
    |    1002 | OL1     |       8.99 |
    +---------+---------+------------+
    5 rows in set (0.00 sec)
    
    
    Use union to execute two SQL statements together
     select vend_id,prod_id,prod_price from products where prod_price < = 5 
     union 
     select vend_id,prod_id,prod_price from products where vend_id in(1001.1002);
    +---------+---------+------------+
    | vend_id | prod_id | prod_price |
    +---------+---------+------------+
    |    1003 | FC      |       2.50 |
    |    1002 | FU1     |       3.42 |
    |    1003 | SLING   |       4.49 |
    |    1003 | TNT1    |       2.50 |
    |    1001 | ANV01   |       5.99 |
    |    1001 | ANV02   |       9.99 |
    |    1001 | ANV03   |      14.99 |
    |    1002 | OL1     |       8.99 |
    +---------+---------+------------+
    8 rows in set (0.09 sec)
    
    This statement consists of the previous two SELECT statements separated by the UNION keyword.
    -- UNION instructs MySQL to execute two SELECT statements and combine the output into a single query result set
    Here is the same result, using multiple conditions for where
    select vend_id,prod_id,prod_price from products where prod_price < = 5 or vend_id in (1001.1002);
    +---------+---------+------------+
    | vend_id | prod_id | prod_price |
    +---------+---------+------------+
    |    1001 | ANV01   |       5.99 |
    |    1001 | ANV02   |       9.99 |
    |    1001 | ANV03   |      14.99 |
    |    1003 | FC      |       2.50 |
    |    1002 | FU1     |       3.42 |
    |    1002 | OL1     |       8.99 |
    |    1003 | SLING   |       4.49 |
    |    1003 | TNT1    |       2.50 |
    +---------+---------+------------+
    8 rows in set (0.00 sec)
    
    In this simple example, using UNION might be more complicated than using the WHERE clause.
    But for more complex filtering conditions, or cases where data is retrieved from multiple tables rather than a single table, using a UNION might make processing simpler.
    
    Copy the code
    • Now consider the following question: How many pieces of data do the statements above return?

    • The first SQL returned 4 rows, the second SQL returned 5 rows, so how many rows did the union return?

      The UNION automatically removes duplicate rows from the query result set (in other words, it behaves the same as using multiple WHERE clause conditions in a single SELECT statement).

      This is the default behavior of the UNION, but you can change it if you want. If you want to return ALL matching rows, you can use UNION ALL instead of UNION

    • Sort the combined query results

      The output of the SELECT statement is ordered BY the ORDER BY clause. When combining queries with a UNION, only one ORDER BY clause can be used, which must appear after the last SELECT statement.

      There is no such thing as ordering one part of a result set in one way and ordering another part in another way, so multiple ORDER BY clauses are not allowed.

   select vend_id,prod_id,prod_price from products where prod_price < = 5 union select vend_id,prod_id,prod_price from products where vend_id in(1001.1002) order by  prod_price;
  +---------+---------+------------+
  | vend_id | prod_id | prod_price |
  +---------+---------+------------+
  |    1003 | FC      |       2.50 |
  |    1003 | TNT1    |       2.50 |
  |    1002 | FU1     |       3.42 |
  |    1003 | SLING   |       4.49 |
  |    1001 | ANV01   |       5.99 |
  |    1002 | OL1     |       8.99 |
  |    1001 | ANV02   |       9.99 |
  |    1001 | ANV03   |      14.99 |
  +---------+---------+------------+
  8 rows in set (0.00 sec)
  
  -- This UNION uses the ORDER BY clause after the last SELECT statement.
  Although the ORDER BY clause appears to be just part of the last SELECT statement, MySQL actually uses it to sort all the results returned BY all SELECT statements.
Copy the code

Finally, welcome to pay attention to my personal wechat public account “Little Ape Ruochen”, get more IT technology, dry goods knowledge, hot news