Many programmers see SQL as a threat. SQL is one of the few declarative languages that operates in a completely different way from command-line languages, object-oriented programming languages, and even functional languages (although some consider SQL to be a functional language as well).

We write SQL every day and use it in open source software jOOQ. I want to introduce the beauty of SQL to those who still struggle with it, so this article has been written specifically for the following readers:

1. People who use SQL in their work but don’t fully understand it.

2. People who can skillfully use SQL but do not understand its syntax logic.

3. People who want to teach others SQL.

This article focuses on the SELECT sentence. Other Data Manipulation Language (DML) commands will be introduced in other articles.

Fully Understand SQL in 10 easy steps

SQL is a declarative language

The first thing to keep in mind is this concept: “declaration.” SQL is an example of what you want from raw data, rather than telling the computer how to get it. Isn’t that great?

To put it simply, SQL declares attributes of a result set, and the computer uses SQL to select data from the database that matches the declaration, rather than telling the computer what to do.

SELECT first_name, last_name FROM employees WHERE salary > 100000

The above example is easy to understand, we don’t care where the employee records come from, all we need is the data of the people with high salaries.

Where do we learn this?

If SQL is so simple, what makes people “smell bad”? The main reason is that we subconsciously think in terms of imperative programming. Something like this: “Computer, do this step first, then that step, but first check if conditions A and B are met.” For example, using variables to pass arguments, using looping statements, iterating, calling functions, and so on, are all thought habits of imperative programming.

SQL syntax is not executed in syntactic order

A feature of SQL statements that confuses most people is that the order in which SQL statements are executed does not match the syntax order of the statements. The syntax order of SQL statements is:

  • SELECT[DISTINCT]
  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • UNION
  • ORDER BY

SQL statements are executed in a different order than they are in the order in which they are executed:

  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • SELECT
  • DISTINCT
  • UNION
  • ORDER BY

There are three things to note about the order in which SQL statements are executed:

1, FROM is the first step of SQL statement execution, not SELECT. The first step for a database to execute an SQL statement is to load data from the hard disk into a data buffer so that it can be manipulated. (Translator’s Note: The first thing that happens is loading data from The disk into memory, in order to operate on such data. In common databases, such as Oracle, data is extracted from the hard disk into a data buffer.)

SELECT is executed after most of the statements have been executed, strictly after FROM and GROUP BY. It’s important to understand this, which is why you can’t use a field in WHERE that you aliased in SELECT as a judgment condition.

SELECT A.x + A.y AS z FROM A WHERE z = 10Copy the code

If you want to reuse the alias Z, you have two options. Or we can rewrite the expression z represents:

SELECT A.x + A.y AS z
FROM A
WHERE (A.x + A.y) = 10
Copy the code

… Or resort to derived tables, generic data expressions, or views to avoid alias reuse. See the examples below.

3. UNION always precedes ORDER BY, both syntactically and in ORDER of execution. Many people think that every UNION segment can be sorted using ORDER BY, but this is not true based on the SQL language standard and how SQL is executed in different databases. Although some databases allow SQL statements to sort subqueries or derived tables, this does not mean that the order remains the same after the UNION operation.

Note: Not all databases parse SQL statements the same way. MySQL, PostgreSQL, and SQLite don’t do this in the second point above.

What have we learned?

Since not all databases execute SQL predictions in this way, what are we learning? The lesson is to always remember that the syntax order of SQL statements is not the same as the order in which they are executed, so that we can avoid common errors. If you can remember the difference between the syntax order of SQL statements and the order in which they are executed, you can easily understand some common SQL problems.

Of course, a language is programmer friendly if its syntax order is designed to directly reflect the order in which its statements are executed, and this programming language design concept has been applied by Microsoft in LINQ.

3, SQL language is the core of the table references (table References)

Because of the difference between the syntax order and execution order of SQL statements, many students will think that the field information in SELECT is the core of SQL statements. The real core is the reference to the table.

According to SQL, the FROM statement is defined as:

<from clause> ::= FROM <table reference> [ { <comma> <table reference> }... ]

The “output” of the FROM statement is a union table FROM the union of all referenced tables on a dimension. Let’s take our time:

FROM a, b

The output of the above FROM statement is a join table that joins tables A and B. If table A has three fields and table B has five fields, the “output table” has eight (=5+3) fields.

The data in this joint table is ab, the Cartesian product of a and B. In other words, every entry in table A is paired with every entry in table B. If table A has three data items and table B has five data items, then the combined table will have 15 (=53) data items.

The result of the FROM output is filtered BY the WHERE statement and processed BY the GROUP BY statement to form a new output. And we’ll come back to that.

If we look at it from the point of view of set theory (relational algebra), a table of a database is a set of relationships of data elements, and each SQL statement changes one or more of these relationships, resulting in new relationships of data elements (that is, new tables).

What have we learned?

When you think about a problem in terms of a table, it’s easy to understand how the data moves through the pipeline of SQL statements.

4. Flexible table references make SQL statements more powerful

Flexible table references can make SQL statements more powerful. A simple example is the use of joins. Strictly speaking, the JOIN statement is not part of the SELECT, but is a special table reference statement. The join definition of a table in the SQL language standard is as follows:

<table reference> ::=
    <table name>
  | <derived table>
  | <joined table>
Copy the code

Take the previous example:

FROM a, b

A may enter the following connections:

a1 JOIN a2 ON a1.id = a2.id

Put it in the previous example and it becomes:

FROM a1 JOIN a2 ON a1.id = a2.id, b

Although it is not common practice to join one join table with another by commas, you can do it. As a result, the final output table has a1+ A2 + B fields.

(translator’s Note: The degree of the degree is the degree of the degree.) If we view a table, we can imagine that each table is composed of two horizontal and vertical dimensions. The horizontal dimension is called columns. The vertical dimension represents each piece of data (record).

The use of derived tables in SQL statements is even more powerful than table joins, which we’ll cover next.

What have we learned?

Thinking in terms of table references makes it easy to understand how data is processed by SQL statements and helps you understand what complex table references do.

More importantly, understand that a JOIN is the keyword that builds the JOIN table, not part of the SELECT statement. Some databases allow joins for INSERTS, updates, and deletes.

5. Table joins are recommended for SQL statements

Let’s take a look at this sentence:

FROM a, b

Advanced SQL programmers may learn to advise you not to use commas to JOIN tables instead of joins. This will improve the readability of your SQL statements and avoid errors.

Using commas to simplify SQL statements can sometimes cause mental confusion. Consider the following statement:

FROM a, b, c, d, e, f, g, h
WHERE a.a1 = b.bx
AND a.a2 = c.c1
AND d.d1 = b.bc
-- etc...
Copy the code

The benefits of using the JOIN statement are as follows:

  • Security. The JOIN is very close to the table to JOIN so that errors are avoided.

  • More JOIN methods, the JOIN statement can distinguish between outer JOIN and inner JOIN.

What have we learned?

Always JOIN tables using joins and never use commas after FROM.

6. Different join operations in SQL statements

There are five basic ways to join tables in SQL statements:

  • EQUI JOIN
  • SEMI JOIN
  • ANTI JOIN
  • CROSS JOIN
  • DIVISION

EQUI JOIN

This is the most common type of JOIN operation, and it contains two types of JOIN:

  • INNER JOIN (or JOIN)
  • OUTER JOIN (including: LEFT, RIGHT, FULL OUTER JOIN)

The difference is best illustrated by an example:

-- This table reference contains authors and their books.
-- There is one record for each book and its author.
-- authors without books are NOT included
author JOIN book ON author.id = book.author_id

-- This table reference contains authors and their books
-- There is one record for each book and its author.
-- ... OR there is an "empty" record for authors without books
-- ("empty" meaning that all book columns are NULL)
author LEFT OUTER JOIN book ON author.id = book.author_id

Copy the code

SEMI JOIN

This connection can be represented IN SQL IN one of two ways: IN or EXISTS. SEMI is Latin for “half”. This joins only part of the target table. What does that mean? Think again about the link above about author and title. Let’s imagine a situation where we don’t need the author/title combination, just the author information for those books in the title list. So we could write:

-- Using IN
FROM author
WHERE author.id IN (SELECT book.author_id FROM book)

-- Using EXISTS
FROM author
WHERE EXISTS (SELECT 1 FROM book WHERE book.author_id = author.id)

Copy the code

While there are no strict rules about when you should use IN and when you should use EXISTS, there are a few things you should know:

  • IN is more readable than EXISTS

  • EXISTS is more expressive than IN (more suitable for complex statements)

  • There is no performance difference between the two (although for some databases the performance difference can be significant)

Since you can also get the author information for the book in the title table using an INNER JOIN, many beginners will think that you can do a DISTINCT de-duplication and write the SEMI JOIN statement like this:

-- Find only those authors who also have books
SELECT DISTINCT first_name, last_name
FROM author
JOIN book ON author.id = book.author_id
Copy the code

This is a bad way to write for several reasons:

  • SQL statements perform poorly because a DISTINCT operation requires the database to repeatedly read data from disk into memory. (Translator’s note: DISTINCT is an expensive operation, but each database may operate DISTINCT differently.)

  • This is not entirely correct: while it may not be a problem today, as SQL statements become more complex, it becomes very difficult to reproduce the correct results.

More on the dangers of DISTINCT abuse can be found in this blog post

(blog.jooq.org/2013/07/30/…). .

ANTI JOIN

This is the opposite of a SEMI JOIN. This connection can be used by adding a NOT keyword before either IN or EXISTS. For example, let’s list authors who don’t have books in the title list:

-- Using IN
FROM author
WHERE author.id NOT IN (SELECT book.author_id FROM book)

-- Using EXISTS
FROM author
WHERE NOT EXISTS (SELECT 1 FROM book WHERE book.author_id = author.id)

Copy the code

SEMI JOIN is also a good reference for performance, readability, and expressiveness.

This post explains how to handle NULL when using NOT IN, but I won’t go into detail because it’s a bit of a departure from this topic

(blog.jooq.org/2012/01/27/…). .

CROSS JOIN

The join process is the product of two joined tables: each entry in the first table corresponds to each entry in the second table. We’ve seen that before, that’s how a comma is used in a FROM statement. In practice, CROSS joins are rarely used, but when they are, you can use SQL statements like this:

author CROSS JOIN book
Copy the code

DIVISION

DIVISION is indeed a freak. In short, if JOIN is a multiplication operation, then DIVISION is the inverse of JOIN. The relationships of divisions are difficult to express in SQL, since this is a beginner’s guide and explaining divisions is beyond our purpose. But those of you who are interested can still look at these three articles

(blog.jooq.org/2012/03/30/…).

(en.wikipedia.org/wiki/Relati…).

(www.simple-talk.com/sql/t-sql-p)… .

Recommended reading →_→Draw diagrams to explain SQL union statements”

What have we learned?

Learned a lot! Let’s go over it again in our mind. SQL is a reference to a table, and JOIN is a complex way of referring to a table. However, there is a difference between the expression of SQL language and the actual logical relations we need. Not all logical relations can find the corresponding JOIN operation, so we need to accumulate and learn relational logic in daily life, so that you can choose the appropriate JOIN operation in the future SQL statement.

SQL derived tables as variables

Previously, we learned that SQL is a declarative language and that SQL statements cannot contain variables. But you can write statements that look like variables. These are called derived tables:

To put it bluntly, a derived table is a subquery in parentheses:

-- A derived table
FROM (SELECT * FROM author)
Copy the code

Note that sometimes we can define a related name (what we call an alias) for a derived table.

-- A derived table with an alias
FROM (SELECT * FROM author) a
Copy the code

Derived tables can effectively avoid problems caused by SQL logic. For example, if you want to reuse the results of a query using SELECT and WHERE statements, you can write:

-- Get authors' first and last names, and their age in days SELECT first_name, last_name, age FROM ( SELECT first_name, last_name, current_date - date_of_birth age FROM author ) -- If the age is greater than 10000 days WHERE age > 10000Copy the code

It is important to note that in some databases, as well as in the SQL: 1990 standard, derived tables are classified as the next level, common Table experssion. This allows you to reuse derived tables multiple times in a SELECT statement. The above example is (almost) equivalent to the following statement:

WITH a AS (
  SELECT first_name, last_name, current_date - date_of_birth age
  FROM author
)
SELECT *
FROM a
WHERE age > 10000
Copy the code

Of course, you can also create a separate view for “A” so that you can reuse the derived table in a wider scope. More information you can read the following article (en.wikipedia.org/wiki/View_%…). .

What have we learned?

We reiterate that SQL statements are generally references to tables, not to fields. To take advantage of this, don’t be afraid to use derived tables or other more complex statements.

SQL > select * from GROUP BY

Let’s recall the previous FROM statement:

FROM a, b

Now we apply GROUP BY to the above statement:

GROUP BY A.x, A.y, B.z

The result of the above statement is a reference to a new table with three fields. When you apply GROUP BY, SELECT columns that do not use aggregate functions after GROUP BY. When you use GROUP BY, the number of columns that you can operate on at the next logical level is reduced, including columns in SELECT.

  • Note that other fields can use aggregate functions:
SELECT A.x, A.y, SUM(A.z)
FROM A
GROUP BY A.x, A.y
Copy the code
  • It’s also worth noting that MySQL doesn’t adhere to this standard, which is really confusing. This is not to say that MySQL doesn’t have GROUP BY functionality. But don’t be fooled BY MySQL. GROUP BY changes the way tables are referenced. You can refer to a field in SELECT and GROUP it in GROUP BY like this.

What have we learned?

GROUP BY, again, operates on a table reference to convert it to a new reference.

A SELECT statement is essentially a mapping of relationships

I personally like the word “mapping,” especially in relation to relational algebra. Projection projection projection projection projection projection projection projection projection projection projection projection projection projection projection projection Once you have created a reference to a table, modified and morphed it, you can map it step by step to another model. The SELECT statement is like a “projector” and can be thought of as a function that logically transforms the data in the source table into the data in the target table.

SELECT statements allow you to manipulate each field and generate the data you need through complex expressions.

SELECT statements have many special rules, and you should be familiar with at least a few of them:

  1. You can only use fields that can be referenced by a table;
  2. If you have a GROUP BY statement, you can only use the fields following the GROUP BY statement or aggregate functions.
  3. When you do not have GROUP BY in your statement, you can use window functions instead of aggregate functions.
  4. When you do not have GROUP BY in your statement, you cannot use aggregate functions and other functions at the same time;
  5. There are ways to encapsulate ordinary functions in aggregate functions;

Some of the more complex rules are enough for another article. For example: why can’t you use both ordinary and aggregate functions in a SELECT statement without GROUP BY? (Clause 4 above)

Here’s why:

  1. Intuitively, this does not make logical sense.
  2. If intuition doesn’t convince you, grammar certainly will. The SQL: 1999 standard introduced GROUPING SETS, and the SQL: 2003 standard introduced group SETS: Group BY(). Whenever you have an aggregate function and there is no explicit GROUP BY statement, an ambiguous, empty GROUPING SET is applied. Thus, the original rules of logical order are broken, and mapping (i.e., SELECT) relationships affect logical relationships first, and syntactic relationships second. (Translator’s Note: This passage is very difficult to understand. In both aggregate function and ordinary function SQL statement, if there is no GROUP BY grouping, SQL statement default view of the whole table as a GROUP, when the aggregate function for a field of aggregation statistics, each record in the table reference lost meaning, all the data are aggregated into a statistical value, It doesn’t make sense for you to use other functions for each record at this point.

Confused? Yes, me too. Let’s go back to the obvious.

What have we learned?

The SELECT statement is probably the hardest part of SQL statements, even though it looks simple. All the other statements do is refer to different forms of the table. The SELECT statement consolidates these references, mapping the source table to the target table using a logical rule. The process is reversible, and we can clearly see where the target table data came from.

To learn SQL well, you need to understand the SELECT statement before you use it. Although SELECT is the first keyword in the syntax structure, it should be the last one you learn.

10, several simple keywords in SQL statements: DISTINCT, UNION, ORDER BY, and OFFSET

After learning the complex SELECT Henan opera, let’s look at something simple:

  • Set operations (DISTINCT and UNION)

  • ORDER BY, OFFSET… The FETCH)

Set operation:

Set operations are operations on sets, which in fact refer to operations on tables. Conceptually, they are well understood:

  • DISTINCT De-duplicates data after mapping

  • UNION concatenates the two subqueries and deduplicates them

  • UNION ALL concatenates two subqueries without reweighting

  • EXCEPT removes the result in the second word query from the first child query

  • INTERSECT preserves the results of both subqueries and de-duplicates them

Ordering operation:

Sorting has nothing to do with logic. This is an SQL-specific feature. The sort operation is performed not only at the end of the SQL statement, but also during the execution of the SQL statement. Use ORDER BY and OFFSET… FETCH is the most efficient way to ensure that data is sorted in order. All other sorting methods have a degree of randomness, although the results they produce are reproducible.

OFFSET… SET is a statement with no uniform syntax. Different databases have different expressions, such as MySQL and PostgreSQL. OFFSET, SQL Server, and Sybase TOP… START the AT, etc. About OFFSET.. See this article for different FETCH syntax

(www.jooq.org/doc/3.1/man)… .

Let’s use SQL in our work!

Just like any other language, learning SQL well requires a lot of practice. The 10 simple steps above will help you gain a better understanding of the SQL statements you write every day. On the other hand, you can learn a lot from common mistakes. The following two articles introduce some common SQL mistakes made by JAVA and other developers:

  • 10 Common Mistakes Java Developers Make when Writing SQL
  • 10 More Common Mistakes Java Developers Make when Writing SQL