In the last article, we learned how SQL queries are executed and what you need to be aware of when writing SQL query statements.

Next, I’ll learn more about query methods and query optimization.

 

Query based on collection and program methods

Implicit in the reverse model is the fact that there are differences between the collection – and program-based approaches to building queries.

  • The procedural approach to queries is very similar to the programming approach: you tell the system what needs to be done and how to do it. Such as the example in the previous article, query the database by executing one function and then calling another, or obtain the final query result using a logical approach that includes loops, conditions, and user-defined functions (UDFs). You’ll find that in this way, you’re always requesting subsets of data from layer to layer. This approach is also often referred to as step by step or line by line query.
  • The other is a collection-based approach that simply specifies the operations that need to be performed. All you need to do with this method is specify the conditions and requirements for the results you want to obtain from the query. When retrieving data, you don’t need to focus on the internal mechanics of implementing the query: the database engine determines the best algorithm and logic to execute the query.

Because SQL is collections based, this approach is more efficient than the procedural approach, which explains why SQL can work faster than code in some cases.

Collection based query methods are also required by the data mining analysis industry you must master skills! Because you need to be adept at switching between the two methods. If you find a program query in your query, you should consider whether you need to rewrite this section.

 

From query to execution plan

The reverse pattern is not static. Avoiding query reverse models and rewriting queries can be a difficult task as you become a SQL developer. So often you need to use tools to optimize your queries in a more structured way.

Thinking about performance requires not only a more structured approach, but also a more in-depth approach.

However, this structured and in-depth approach is primarily based on query planning. The query plan is first parsed into a “parse tree” and defines exactly what algorithm to use for each operation and how to coordinate the operation process.

 

Query optimization

When tuning a query, you will most likely need to check the plan generated by the optimizer manually. In this case, you will need to analyze your query again by looking at the query plan.

To master such a query plan, you need to use some of the tools provided by your DATABASE management system. Here are some tools you can use:

  • Some package functionality tools can generate graphical representations of query plans.
  • Other tools can provide you with a text description of the query plan.

Note that if you are using PostgreSQL, you can differentiate between different EXPLAIN items by simply getting a description of how planner executes queries without running plans. EXPLAIN ANALYZE also executes the query and returns an analysis report that evaluates the query plan against the actual query plan. In general, the actual execution plan will actually execute the plan, and the evaluation execution plan can solve this problem without executing the query. Logically, the actual execution plan is more useful because it contains additional details and statistics about what actually happens when the query is executed.

Next you’ll learn more about XPLAIN and ANALYZE, and how to use them to further understand your query plan and query performance. To do this, you need to start using two tables: one_million and half_million to do some examples.

You can retrieve the current information for the one_million table by using EXPLAIN: make sure you put it in the first place to run the query and return it to the query plan when the query is complete:

EXPLAIN
SELECT *
FROM one_million;
QUERY PLAN
_________________________________________________
Seq Scan on one_million
(cost=0.00.18584.82 rows=1025082 width=36)
(1 row)Copy the code

In the above example, we see that the Cost of the query is 0.00.. 18584.82, the number of rows is 1025082, the column width is 36.

You can also use ANALYZE to update statistics.

ANALYZE one_million;
EXPLAIN
SELECT *
FROM one_million;
QUERY PLAN

_________________________________________________ Seq Scan on one_million (cost=0.00.18334.00 rows=1000000 width=37) (1 row)Copy the code

In addition to EXPLAIN and ANALYZE, you can also use EXPLAIN ANALYZE to retrieve the actual execution time:

EXPLAIN ANALYZE
SELECT *
FROM one_million;
QUERY PLAN
___________________________________________________
Seq Scan on one_million
(cost=0.00.18334.00 rows=1000000 width=37)
(actual time=0.015.1207.019 rows=1000000 loops=1)
Total runtime: 2320.146 ms
(2 rows)Copy the code

The disadvantage of using EXPLAIN ANALYZE is that you need to actually execute the query, which is worth noting!

All the algorithms we’ve seen so far have been sequential or full table scans: a method of scanning a database in which each row of a scanned table is read in sequential (serial) order, and each column is checked for eligibility. In terms of performance, sequential scanning is not the best execution plan because you need to scan the entire table. But if you use a slow disk, sequential reads are also fast.

There are some examples of other algorithms:

EXPLAIN ANALYZE
SELECT *
FROM one_million JOIN half_million
ON (one_million.counter=half_million.counter);
QUERY PLAN
_____________________________________________________________
Hash Join (cost=15417.00.68831.00 rows=500000 width=42)
(actual time=1241.471.5912.553 rows=500000 loops=1)
Hash Cond: (one_million.counter = half_million.counter)
    -> Seq Scan onone_million (cost=0.00.18334.00 rows=1000000 width=37) (actual time=0.007.1254.027 rows=1000000 loops=1)
    -> Hash (cost=7213.00.7213.00 rows=500000 width=5) (actual time=1241.251.1241.251 rows=500000 loops=1) Buckets:4096 Batches: 16 Memory Usage: 770kB
    -> Seq Scan onhalf_million (cost=0.00.7213.00 rows=500000 width=5)
(actual time=0.008.601.128 rows=500000 loops=1)
Total runtime: 6468.337 msCopy the code

We can see that the query optimizer has selected the Hash Join. Keep this in mind because we need to use it to evaluate the time complexity of the query. We notice that there is no half_million. Counter index in the above example, we can add an index in the following example:

CREATE INDEX ON half_million(counter);
EXPLAIN ANALYZE
SELECT *
FROM one_million JOIN half_million
ON (one_million.counter=half_million.counter);
QUERY PLAN
______________________________________________________________
Merge Join (cost=4.12.37650.65 rows=500000 width=42)
(actual time=0.033.3272.940 rows=500000 loops=1)
Merge Cond: (one_million.counter = half_million.counter)
    -> Index Scan using one_million_counter_idx onone_million (cost=0.00.32129.34 rows=1000000 width=37) (actual time=0.011.694.466 rows=500001 loops=1)
    -> Index Scan using half_million_counter_idx onhalf_million (cost=0.00.14120.29 rows=500000 width=5)
(actual time=0.010.683.674 rows=500000 loops=1)
Total runtime: 3833.310 ms
(5 rows)Copy the code

By creating indexes, the query optimizer has determined how to look for Merge Joins during index scans.

Note the difference between an index scan and a full table scan (sequential scan) : the latter (also known as a “table scan”) scans all data or indexes all pages to find suitable results, while the former scans only each row in the table.

 

That’s the end of the second part of the tutorial. Stay tuned for the final article in the “How to Write Better SQL Queries” series.

The original link: http://www.kdnuggets.com/2017/08/write-better-sql-queries-definitive-guide-part-2.html

Reproduced please indicate from: Grape city control

 

Related reading:

More than 100 sets of report templates are free to download

How to write better SQL queries: The Ultimate Guide – Part 1

Dynamic hierarchical query is completed in one SQL sentence

Migrate SQL Server database to Azure SQL Combat

 



This article is published by grape city control technology development team, reprint please indicate source: Grape city control

For more development tools and tips, please go to the official website of grape City controls

To learn about enterprise reporting and Web applications, visit the Grape City Enterprise Software website


Category: Others
Tags: SQL, query statement
Good writing is the best
Pay attention to my
Collect the paper

Grape City control technology team



Attention – 108.



Fans – 2697.

Credit: Recommended blogs
+ add attention

7
0

The «The last:
How to write better SQL queries: The Ultimate Guide – Part 1