SQL complex queries refer to subqueries.

Why are subqueries called complex queries? Because subqueries are equivalent to query nested queries, and because nesting results in almost infinite complexity (infinite nesting), they are called complex queries. Here is a simple example of a subquery:

SELECT pv FROM (
  SELECT pv FROM test
)
Copy the code

Mysql > SELECT pv FROM test; mysql > SELECT pv FROM test; mysql > SELECT pv FROM test; So complex queries are not necessarily complex, and you may even write complex queries equivalent to ordinary queries. Avoid this meaningless behavior.

We’ll also take this opportunity to understand why subqueries can do this.

Understand the nature of queries

When we look up a table, what does the database think we’re looking up?

This is important because both statements are legal:

SELECT pv FROM test

SELECT pv FROM (
  SELECT pv FROM test
)
Copy the code

Why can a database treat a subquery as a table? To understand these concepts, it is necessary to abstract the query content: any query location is one or more records.

SELECT pv FROM test (pv FROM test), pv FROM test (pv FROM test), pv FROM test (pv FROM test)

Not only can “FROM” be followed BY a single entry or multiple entries, but also SELECT, GROUP BY, WHERE, and HAVING can be followed BY multiple entries.

At this point, it’s easy to understand the subquery variations, such as WHERE or GROUP BY, in a subquery, because the query results in multiple records anyway:

SELECT sum(people) as allPeople, sum(gdp), city FROM (
  SELECT people, gdp, city FROM test
  GROUP BY city
  HAVING sum(gdp) > 10000
)
Copy the code

This example has some business implications. Subqueries are executed from the inside out, so we look at the internal logic first: filter out the population breakdown of all regions with a total GDP of more than 10,000, grouped by city. The outer query then adds up the total population, so that we can compare the total population and GDP of each region with GDP over 10,000, which is convenient for comparison of these key cities.

This example is still a bit unnatural, because we don’t need to write complex queries, but simple queries are equivalent:

SELECT sum(people) as allPeople, sum(gdp), city FROM test
GROUP BY city
HAVING sum(gdp) > 10000
Copy the code

So why bother? Because the real use of complex queries is not here.

view

Because subqueries exist, it is possible to extract subqueries in a similar way to extracting variables. The extracted abstraction is the view:

CREATE VIEW my_table(people, gdp, city)
AS
SELECT sum(people) as allPeople, sum(gdp), city FROM test
GROUP BY city
HAVING sum(gdp) > 10000

SELECT sum(people) as allPeople, sum(gdp), city FROM my_table
Copy the code

The advantage is that this view can be reused by multiple SQL statements, which not only makes it more maintainable, but also requires only one query when executed.

Note that SELECT can be used with any view, but INSERT, DELETE, UPDATE for a view must satisfy the following conditions:

  1. DISTINCT deduplication is not used.
  2. FROM a single table.
  3. GROUP BY and HAVING are not used.

Because the above modes all lead to the view as aggregated data, it is not convenient to do other operations than lookup.

Another knowledge point is the MATERIALIZED view, that is, the use of MATERIALIZED description view:

CREATE MATERIALIZED VIEW my_table(people, gdp, city)
AS.Copy the code

This view will fall off the disk, why support this feature? As a temporary table, a common view cannot use indexes and other optimization methods, resulting in low query performance. Therefore, materialized view is a common performance optimization method.

When it comes to performance optimizations, there are some common ideas that spread read complexity over write, such as aggregating new table drops in advance or solidifying CASE statements into fields.

Scalar quantum query

WHERE can also be queried with subqueries, such as:

SELECT city FROM test
WHERE gdp > (
  SELECT avg(gdp) from test
)
Copy the code

This allows you to find cities with above-average GDP.

So why can’t I just write it this way?

SELECT city FROM test
WHERE gdp > avg(gdp) WHERE cannot use aggregate functions
Copy the code

It looks nice, but in fact we introduced in the first article WHERE cannot be followed by an aggregate query, because that would aggregate the entire parent query. So why do subqueries work? Since the subquery is aggregated as a subquery, the parent query is not aggregated, so this fits our intent.

So where the above example is not appropriate, using AVG (GDP) directly on the current query will lead to aggregation, and we don’t want to aggregate the current query, but we want to aggregate the average GDP, so we need to use sub-query!

In retrospect, why is this section called scaled-quantum queries? Scalars are single values, and since AVG (GDP) aggregates only one value, WHERE can be used as a single value. On the other hand, if a subquery does not use an aggregate function or GROUP BY, then you cannot use WHERE >, but you can use WHERE IN, which involves thinking about single versus multiple records, which we’ll move on to the next section.

Single and multiple records

The introduction to scalar quantum queries stated that WHERE > must be a single value. It is possible to use a subquery WHERE to return multiple records, as long as the appropriate conditional statement is used, such as IN:

SELECT area FROM test
WHERE gdp IN (
  SELECT max(gdp) from test
  GROUP BY city
)
Copy the code

IN the example above, the sub-query groups by city and finds the record with the largest GDP IN each group. Therefore, if the data granularity is region, then we find the records with the largest GDP IN each city, and then the parent query finds the complex results that match GDP by WHERE IN. So we ended up listing the regions with the largest GDP in each city.

But in fact, the WHERE > statement and the plural query results will not error, but does not make any sense, so we need to understand whether the query results are single or multiple, and select the appropriate conditions when determining WHERE. The syntax for the plural query results is WHERE IN, WHERE SOME, and WHERE ANY.

Associated subquery

The so-called associated sub-query, that is, there is an association between the father and son query, in this case, the sub-query must not be independently executed first, after all, there is an association with the parent query, so the associated sub-query is to execute the outer layer query, and then execute the inner layer query. Note that for each parent query, the child query is executed once, so performance is not high (of course, SQL will cache the results of the child query with the same parameters).

So what is this correlation? What is associated is the condition that the child query executes when the parent query of each row. This may be a little convoluted, but here’s an example:

SELECT * FROM test where gdp > (
  select avg(gdp) from test
  group by city
)
Copy the code

For this example, we want to find the average GDP that is greater than the average GDP grouped by cities, for example, Beijing is compared by Beijing and Shanghai is compared by Shanghai. SQL does not know how to compare a parent query to the same city, so simply add a WHERE condition to create an associative subquery:

SELECT * FROM test as t1 where gdp > (
  select avg(gdp) from test as t2 where t1.city = t2.city
  group by city
)
Copy the code

The result of the sub-query is recalculated and the average value is limited to the same cities every time the condition of WHERE GDP > is judged.

conclusion

Learn to flexibly use the parent and child query, to master the complex query.

SQL first citizen is a set, so the so-called father and son query is a flexible combination of the father and son set, these sets can appear in almost any position, according to the number of sets, whether aggregation, associated conditions, derived scalar query, associated sub-query.

More in-depth understanding requires a lot of practice cases, but it’s not all that different, and once you’ve mastered complex queries, you’ll be able to understand most SQL cases.

The discussion address is: intensive reading SQL Complex query · Issue #403 · ascoders/weekly

If you’d like to participate in the discussion, pleaseClick here to, with a new theme every week, released on weekends or Mondays. Front end Intensive Reading – Helps you filter the right content.

Copyright Notice: Freely reproduced – Non-commercial – Non-derivative – Remain signed (Creative Commons 3.0 License)