• Methods to improve query speed by handling data of more than a million levels:
  • Reasons for the slow query speed:
  • You can optimize the query in the following ways

MySQL million level data query optimization


Methods to improve query speed by handling data of more than a million levels:

  1. Use in where clauses should be avoided! = or <> otherwise the engine will abandon the index for a full table scan.

  2. Queries should be optimized to avoid full table scans, and indexes should be considered on where and order by columns first.

  3. Avoid null values for fields in the WHERE clause, which will cause the engine to abandon the index and perform a full table scan, as in: Select id from t where num is null select id from t where num is null select id from t where num=0

  4. Select id from t where num=10 or num=20 select id from t where num=10 or num=20 select id from t where num=10 union all select id from t where num=20

  5. Select id from t where name like ‘% ABC %’ select id from t where name like ‘% ABC %’

  6. Select id from t where num in(1,2,3) select id from t where num in(1,2,3) select id from t where num between 1 and 3

  7. Expression operations on fields in the WHERE clause should be avoided as much as possible, which can cause the engine to abandon indexes for a full table scan. Select id from t where num/2=100 select id from t where num=100*2

  8. Try to avoid functional manipulation of fields in the WHERE clause, which will cause the engine to abandon indexes for full table scans. Such as: Select id from t where substring(name,1,3)= ‘ABC’ – name id starting with ABC select id from t where Datediff (day,createdate, ‘2005-11-30’)=0 — ‘2005-11-30’ Select id from t where name like ‘ABC %’ select id from t where createdate>= ‘2005-11-30’ and createdate< ‘2005-12-1’

  9. Do not perform functions, arithmetic operations, or other expression operations to the left of the “=” in the WHERE clause, or the system may not use the index properly.

  10. When an index field is used as a condition, if the index is a compound index, the first field in the index must be used as a condition to ensure that the system can use the index. Otherwise, the index will not be used, and the field order should be as consistent as possible with the index order.

  11. Select col1,col2 into #t from t where 1=0; create table #t(…);

  12. In many cases it is a good choice to use exists instead of in: select num from a where num in(select num from b) select num from a where exists(select 1 from b where num=a.num)

  13. Not all indexes are effective for queries. SQL queries are optimized according to the data in the table. When there is a large number of repeated data in the index column, SQL queries may not use indexes.

  14. More indexes are not always better. While indexes can improve the efficiency of select operations, they can also reduce the efficiency of insert and update operations. The number of indexes in a table should not exceed 6. If there are too many, you should consider whether it is necessary to build indexes on infrequently used columns.

  15. We should avoid updating clustered index columns as much as possible, because the sequence of clustered index columns is the physical storage sequence of table records. Changes in clustered index columns will result in the adjustment of the sequence of the entire table records, which will consume considerable resources. If your application system needs to update clustered index data columns frequently, you need to consider whether to build the clustered index as a clustered index.

  16. Use numeric fields. If fields contain only numeric information, do not use character fields. This reduces query and connection performance and increases storage overhead. This is because the engine compares each character in the string one by one while processing queries and joins, whereas for numeric types it only needs to compare once.

  17. Use vARCHar /nvarchar instead of char/nchar whenever possible, because first of all, the storage space of a longer field is small, and second of all, it is obviously more efficient to search within a relatively small field for queries.

  18. Do not use select * from t anywhere, replace * with a list of specific fields, and do not return any fields that are not needed.

  19. Use table variables instead of temporary tables whenever possible. If the table variables contain a lot of data, be aware that the indexes are very limited (only primary key indexes).

21. Avoid frequent creation and deletion of temporary tables to reduce system table resource consumption.

  1. Temporary tables are not unusable, and using them appropriately can make some routines more efficient, for example, when you need to repeatedly reference a large table or a data set in a commonly used table. However, for one-off events, it is best to use exported tables.

  2. When creating a temporary table, if a large amount of data is inserted at a time, you can use Select INTO instead of create table to avoid creating a large number of logs and improve the speed. If the amount of data is small, to reduce the resources of the system table, create table first, then insert.

  3. If temporary tables are used, you must explicitly delete all temporary tables at the end of the stored procedure, truncate TABLE first, and then DROP table. In this way, system tables cannot be locked for a long time.

  4. Avoid cursors because they are inefficient, and you should consider rewriting them if they operate on more than 10,000 rows.

  5. Before using a cursor based approach or a temporary table approach, look for a set based solution to solve the problem, which is usually more efficient.

  6. As with temporary tables, cursors are not unusable. Using the FAST_FORWARD cursor for small data sets is generally preferable to other line-by-line processing methods, especially if you have to reference several tables to get the data you want. Routines that include “totals” in the result set are generally faster to execute than those that use cursors. If development time permits, both the vernior-based approach and the set-based approach can be tried to see which works better.

  7. SET SET NOCOUNT ON at the beginning and SET NOCOUNT OFF at the end of all stored procedures and triggers. There is no need to send a DONE_IN_PROC message to the client after each statement of the stored procedure and trigger is executed.

  8. Do not return a large amount of data to the client. If the amount of data is too large, consider whether the corresponding requirements are reasonable.

  9. Avoid large transaction operations as far as possible to improve system concurrency.


Reasons for the slow query speed:

  1. No index or no index used (this is the most common problem with slow queries and is a programming flaw)

  2. The I/O throughput is small, resulting in a bottleneck effect.

  3. The query is not optimized because no computed columns are created.

  4. Out of memory

  5. Slow Network speed

  6. The amount of data queried is too large (multiple queries can be used, other methods to reduce the amount of data)

  7. Locks or deadlocks (this is also the most common problem with slow queries and is a programming flaw)

  8. Sp_lock,sp_who, active user view because read and write compete for resources.

  9. Unnecessary rows and columns are returned

  10. Query statement bad, not optimized


You can optimize the query in the following ways

  1. Put data, logs, indexes on different I/O devices to increase read speed. Previously, Tempdb should be placed on RAID0, SQL2000 is not supported. The larger the data volume (size), the more important it is to improve I/O.

  2. Split the table vertically and horizontally to reduce the size of the table (SP_spaceuse)

  3. Upgrade the hardware

  4. According to the query conditions, build indexes, optimize indexes, optimize access methods, and limit the amount of data in the result set. Take care that the fill factor is appropriate (it is best to use the default value of 0). Keep indexes small. Use columns with small bytes to build indexes (see index creation). Do not create a single index for a field with a limited number of values, such as a gender field

  5. Improve network speed;

  6. Expand the memory of the server. Windows 2000 and SQL Server 2000 can support 4-8 GB memory. Configure virtual memory: The virtual memory size should be configured based on the services running concurrently on the computer. Running Microsoft SQL Server? For 2000, consider setting the virtual memory size to 1.5 times the physical memory installed on your computer. If you have installed full-text retrieval capabilities in addition and plan to run Microsoft search services to perform full-text indexing and queries, consider configuring virtual memory to be at least three times the size of the physical memory installed on your computer. Set the SQL Server Max Server Memory Server configuration option to 1.5 times the physical memory (half the virtual memory size set).

  7. Add the number of server cpus. But it’s important to understand that parallel processing serial processing requires resources such as memory. The use of parallel or series travel is automatically evaluated by MsSQL. A single task can be broken down into multiple tasks and run on the processor. For example, delayed query sorting, join, scan, and GROUP BY sentences are executed simultaneously. SQL SERVER determines the optimal level of parallelism based on system load. Complex queries that consume a lot of CPU are best suited for parallel processing. However, UPDATE operations UPDATE,INSERT, and DELETE cannot be done in parallel.

  8. If you are using like for query, simply using index is not feasible, but full text index, space consumption. Like ‘%a’ uses index like ‘%a’ does not use index when using like ‘%a%’, the query time is proportional to the total length of the field value. Therefore, VARCHAR cannot be used instead. Build full-text indexes for field values that are very long.

  9. Separation of DB Server and APPLication Server; OLTP and OLAP are separated

  10. Distributed partitioned views can be used to implement database server federations. A consortium is a group of servers that are managed separately but collaborate to share the processing load of the system. This mechanism of partitioning data into a database server federation can expand a set of servers to support the processing needs of a large, multi-tier Web site. For more information, see Designing a federated database server. (see SQL help file ‘partition view’) :

    1. Before you can implement a partitioned view, you must first partition the table horizontally

    2. After the member tables are created, a distributed partitioned view is defined on each member server, and each view has the same name. In this way, queries that reference the name of a distributed partitioned view can be run on any member server. The system operates as if there were a copy of the original table on each member server, but there is only one member table and one distributed partitioned view on each server. The location of the data is transparent to the application.

  11. REINDEX DBCC REINDEX,DBCC INDEXDEFRAG, DBCC SHRINKDB,DBCC SHRINKFILE. Example Set the automatic shrink log. Do not set automatic database growth for large databases; it can slow down server performance. First, a DBMS processes a query plan like this:

    1. Query statement lexical and syntax check www.2cto.com

    2. Submit statements to the DBMS’s query optimizer

    3. The optimizer does algebraic and access path optimization

    4. The query plan is generated by the precompiled module

    5. It is then submitted to the system for processing and execution at the appropriate time

    6. SQL SERVER data storage structure: a page size of 8K(8060) bytes, 8 pages into a disk, according to the B tree storage.

  12. Rollback: Rollback everything. Commit: Commit the current transaction. There is no need to write things in dynamic SQL. If you do, write them outside: begin TRAN exec(@S) COMMIT trans or write dynamic SQL as functions or stored procedures.

  13. In the query Select statement, use Where to limit the number of returned rows to avoid table scanning. If unnecessary data is returned, the I/O resources of the server are wasted, the burden on the network is increased, and the performance is reduced. If the table is large, locking the table during the table scan prevents other joins from accessing the table, with serious consequences.

  14. SQL comment declarations have no effect on execution

  15. Do not use cursors as much as possible; they consume a lot of resources. If row-by-row execution is required, use non-cursor techniques, such as client loops, temporary tables, Table variables, subqueries, Case statements, and so on. A cursor can be categorized according to the extraction options it supports: only rows must be extracted in order from the first to the last row. FETCH NEXT is the only allowed FETCH operation and the default. Scrollability can randomly extract any row anywhere in the cursor. Cursor technology has become very powerful under SQL2000, which is designed to support loops. There are four concurrent options READ_ONLY: updates are not allowed through cursors, and there are no locks in the rows that make up the result set. OPTIMISTIC WITH valueS concurrency control is a standard part of transaction control theory. Optimistic concurrency control is used in situations where, between opening a cursor and updating a row, there is only a small chance that a second user will update a row. When a cursor is opened with this option, no locks control the rows within it, which helps maximize its processing power. If the user tries to modify a row, the current value of the row is compared to the last value fetched when the row was fetched. If any values change, the server knows that someone else has updated the row and returns an error. If the values are the same, the server performs the modification. Select the concurrency option OPTIMISTIC WITH ROW VERSIONING: This OPTIMISTIC concurrency control option is based on ROW VERSIONING. With row versioning, the table must have some version identifier that the server can use to determine if the row has changed since the cursor has been read. In SQL Server, this performance is provided by the TIMESTAMP data type, which is a binary number representing the relative order of changes in the database. Each database has a global current timestamp value: @@DBTS. Each time a row with a TIMESTAMP column is changed in any way, SQL Server first stores the current @@DBTS value in the timestamp column and then increments the @@DBTS value. If a table has a TIMESTAMP column, the timestamp is recorded at the row level. The server can then compare the current timestamp value of a row with the timestamp value stored when it was last extracted to determine whether the row has been updated. The server does not have to compare the values of all columns, just the TIMESTAMP column. If an application requires optimistic concurrency based on row versioning for a table without a TIMESTAMP column, the cursor defaults to optimistic concurrency based on a number. The SCROLL LOCKS option implements pessimistic concurrency control. In pessimistic concurrency control, the application will attempt to lock database rows as they are read into the cursor result set. When using a server cursor, an update lock is placed on rows that are read into the cursor. If the cursor is opened within a transaction, the transaction update lock is held until the transaction is committed or rolled back; When the next row is extracted, the cursor lock is removed. If the cursor is opened outside the transaction, the lock is discarded when the next row is fetched. Therefore, whenever the user needs full pessimistic concurrency control, the cursor should be opened within the transaction. An update lock prevents any other task from acquiring an update lock or an exclusive lock, thereby preventing other tasks from updating the row. However, update locks do not prevent shared locks, so it does not prevent other tasks from reading rows unless the second task is also requiring a read with the update lock. Scroll locks These cursor concurrency options can generate scroll locks based on the lock prompt specified in the SELECT statement defined by the cursor. The scroll lock is acquired on each row at extract time and is held until either the next extract or the cursor is closed, whichever comes first. On the next extract, the server acquires the scroll lock for the newly extracted row and releases the scroll lock for the last extracted row. A scroll lock is independent of a transaction lock and can be held until after a commit or rollback operation. If the close cursor option is off at COMMIT time, the COMMIT statement does not close any open cursors, and scroll locks are reserved until after COMMIT to maintain isolation of extracted data. The type of scroll lock obtained depends on the cursor concurrency option and the lock hint in the cursor SELECT statement. Lock alert Read-only Optimistic Value Optimistic Line Version control Lock None Unlocked unlocked unlocked unlocked update NOLOCK unlocked unlocked unlocked Unlocked Unlocked HOLDLOCK Share Share Update UPDLOCK error update Update TABLOCKX error unlocked Unlocked updates Other unlocked unlocked unlocked unlocked updates * Specifying a NOLOCK prompt makes the table with the prompt specified read-only within the cursor.

  16. Use Profiler to track queries, get the time required for the query, and find out where SQL problems are; Optimize the index with the index optimizer

  17. Notice the difference between a UNion and a UNion all. UNION all good

  18. Be careful to use DISTINCT, not when you don’t have to; it, like UNION, can slow down queries. Duplicate records are fine in queries

  19. Do not return unwanted rows or columns when querying

  20. Use sp_configure ‘Query Governor cost limit’ or SET QUERY_GOVERNOR_COST_LIMIT to limit the resources consumed by the query. When the resource consumption of an estimated query exceeds the limit, the server automatically cancels the query, killing it before it can be queried. SET LOCKTIME sets the LOCKTIME

  21. Use Select Top 100/10 Percent to limit the number of rows returned by the user or SET ROWCOUNT to limit the number of rows for the operation

  22. Before SQL2000, generally do not use the following sentence “IS NULL”,” <> “, “! = “, “! > “, “! <“, “NOT”, “NOT EXISTS”, “NOT IN”, “NOT LIKE”, and “LIKE ‘%500′”, because they do NOT walk indexes are all table scans. Do not add a function to a column name in a WHere clause, such as Convert, substring, etc. If you must use a function, create a computed column and create an index instead. You can also workaround: WHERE SUBSTRING(firstname,1,1) = ‘m’ instead of WHERE firstname like ‘m%’ (index scan), be sure to separate function and column names. And indexes should not be too large or too large. NOT IN scans the table multiple times, using EXISTS, NOT EXISTS, IN, LEFT OUTER JOIN instead, especially LEFT OUTER JOIN. EXISTS is faster than IN, and the slowest operation is NOT. If the value of a column contains null, its index did not work before, now the 2000 optimizer can handle it. IS NULL, “NOT “, “NOT EXISTS”, “NOT IN” can optimize her, and “<>” still cannot optimize, does NOT use index.

  23. Query Analyzer is used to see if the Query plan and evaluation analysis of SQL statements is optimized for SQL. The average 20% of the code takes up 80% of the resources, and our optimization focus is on these slow areas.

  24. SELECT * FROM PersonMember (INDEX = IX_Title) WHERE processid IN (‘ male ‘, ‘female’) SELECT * FROM PersonMember (INDEX = IX_Title) WHERE processid IN (‘ male ‘, ‘female’)

  25. The results that need to be queried are pre-calculated and placed in the table, and then SELECT when querying. This was the most important tool before SQL7.0. For example, the calculation of hospital admission fees.

  26. MIN() and MAX() can use appropriate indexes

  27. As a rule, code should be as close to the database as possible, so select Default, Rules,Triggers, and Constraint. In this way, not only the maintenance work is small, the writing of the program quality is high, and the execution speed is fast.

  28. If you want to INsert large binary values into the Image column, use a stored procedure instead of an embedded INsert. This is because the application first converts the binary value to a string (twice its size), and the server converts the character to a binary value when it receives it. Create procedure p_INSERT as insert into table(Fimage) values (@image).

  29. Between is sometimes faster than IN, and it can find ranges by index faster. The difference is visible with the query optimizer. Select * from chineseresume where title in (‘男’,’ 男’) select * from chineseresume where between ‘男’ and ‘男’ is the same. Because in is going to be multiple times, it’s going to be slower sometimes.

  30. Creating indexes on global or local temporary tables when necessary can sometimes be faster, but not always, because indexes are also expensive. It is created as if it were an actual table.

  31. Don’t waste resources by building things that don’t work, such as generating reports. Use it only when it is necessary to use something.

  32. Terms with OR can be split into multiple queries, and multiple queries can be joined by UNION. Their speed depends only on whether an index is used or not; if a query requires a federated index, it is more efficient to use UNION all. Multiple OR sentences do not use the index, rewrite into the form of UNION and then try to match with the index. A key question is whether to use indexes.

  33. Use visual images as little as possible, they are inefficient. Operating on a view is slower than operating directly on a table. You can use stored Procedure instead. In particular, do not nest views; nested views make it harder to find the raw material. Let’s look at the view for what it is: it’s optimized SQL stored on the server that has generated the query plan. When retrieving data from a single table, do not use views that point to multiple tables. Retrieve data directly from the table or read from a view that contains only the table. Otherwise, unnecessary overhead will be added and the query will be dry. To speed up view queries, MsSQL adds view indexing.

  34. Do not use DISTINCT and ORDER BY when not necessary; these actions can be performed on the client side instead. They add extra overhead. It’s the same thing with UNION and UNION ALL.

SELECT   top   20   ad.companyname,comid,position,ad.referenceid,worklocation,   convert(varchar(10),ad.postDate,120)   as   postDate1,workyear,degreedescription   FROM   jobcn_query.dbo.COMPANYAD_query   ad   where   referenceID   in('JCNAD00329667'.'JCNAD132168'.'JCNAD00337748'.'JCNAD00338345'.'JCNAD00333138'.'JCNAD00303570'.'JCNAD00303569'.'JCNAD00303568'.'JCNAD00306698'.'JCNAD00231935'.'JCNAD00231933'.'JCNAD00254567'.'JCNAD00254585'.'JCNAD00254608'.'JCNAD00254607'.'JCNAD00258524'.'JCNAD00332133'.'JCNAD00268618'.'JCNAD00279196'.'JCNAD00268613')   order   by   postdate   desc
Copy the code
  1. IN the list of denominations after IN, place the most frequent value at the front and the least frequent value at the back, reducing the number of judgments

  2. When SELECT INTO is used, it locks system tables (sysobjects, sysIndexes, and so on), blocking access to other connections. Create temporary tables with display declaration statements instead of

select   INTO.   drop   table   t_lxh   begin   tran   select   *   into   t_lxh   from   chineseresume   where   name   =   'XYZ'  
Copy the code

–commit on another connection

SELECT   *   from   sysobjects
Copy the code

As you can see, SELECT INTO locks the system table, and Create table locks the system table (both temporary and system). So don’t use it in things!! So if you’re going to use temporary tables frequently use real tables, or temporary table variables.

  1. Rows can be eliminated before a GROUP BY HAVING, so try not to use them to eliminate rows. Their execution order should be optimal: select Where to select all appropriate rows, Group By to Group statistics rows, and Having to eliminate unnecessary groups. In this way, the overhead of Group By Having is small and the query is fast. Grouping and Having large rows is very resource-intensive. If the purpose of Group BY is not computation, but grouping, then Distinct is faster

  2. Updating multiple records at once is faster than updating one record at a time, which means batch processing is better

  3. Use less temporary tables and try to replace them with result sets and table-like variables. Table-type variables are better than temporary tables

  4. Under SQL2000, calculated fields can be indexed, the following conditions need to be met:

    1. The expression of the computed field is determined
    2. Cannot be used with TEXT,Ntext, Image data types
    3. You must configure ANSI_NULLS = ON, ANSI_PADDINGS = ON… .
  5. Try to put the data processing work on the server to reduce the network overhead, such as using stored procedures. Stored procedures are SQL statements that have been compiled, optimized, organized into an execution plan, stored in a database, and are a collection of control-flow languages, which are certainly fast. Dynamic SQL that executes repeatedly can use temporary stored procedures that are placed in Tempdb (temporary tables). In the past, SQL SERVER did not support complex mathematical calculations, so it had to be done in another layer, adding network overhead. SQL2000 supports UDFs, now supports complex mathematical calculations, the return value of the function is not too large, this is very expensive. User-defined functions execute like cursors and consume a lot of resources, using stored procedures if large results are returned

  6. Don’t waste resources by using the same function over and over again in one sentence. It’s faster to put the result in a variable

  7. SELECT COUNT(*) from sysobjects. Select count(Field of null) from Table and select count(Field of NOT null) from Table return different values.

  8. When the server has enough memory, the number of configuration threads = the maximum number of connections +5 for maximum efficiency. Otherwise, the number of threads configured < the maximum number of connections enable the THREAD pool of SQL SERVER. If the number of threads is still equal to the maximum number of connections +5, the SERVER performance will be severely damaged.

  9. Access your tables in a certain order. If you lock table A and then table B, they will be locked in this order throughout all stored procedures. If you (inadvertently) lock table B and then table A in A stored procedure, this could result in A deadlock. Deadlocks are difficult to detect if the locking sequence has not been designed in detail in advance

  10. The SQL Server Performance Monitor monitors the load on the hardware. Memory: Page Faults/SEC counters if the value goes up occasionally, it indicates that a thread is competing for Memory. If high levels persist, memory can be a bottleneck. Process:

    1. % DPC Time refers to the percentage of Time the handler spends on deferred program call (DPC) receiving and providing services during the sample interval. The DPC is running an interval with a lower priority than the standard interval. Since the DPC is executed in privileged mode, the percentage of DPC time is part of the percentage of privileged time. These times are counted separately and are not part of the total number of interval calculations. This total shows the average busy hours as a percentage of instance time.

    2. %Processor Time counter If the value of this parameter continuously exceeds 95%, the CPU is the bottleneck. Consider adding a processor or switching to a faster processor.

    3. % Privileged Time refers to the percentage of non-idle processor Time spent in Privileged mode. Privileged mode is a processing mode designed for operating system components and manipulating hardware drivers. It allows direct access to the hardware and all memory. The other pattern is user pattern, which is a finite processing pattern designed for applications, environment subsystems, and integer subsystems. The operating system converts application threads into privileged mode to access operating system services). The % of privileged time includes service time for discontinuities and DPCS. The high privilege time ratio may be due to the large number of intervals generated by failed devices. This counter displays the average busy hours as part of the sample time.

    4. % User Time Indicates CPU – consuming database operations, such as sorting and executing aggregate functions. If the value is high, consider adding indexes, using simple table joins, horizontal splitting of large tables, and other methods to reduce the value. Physical Disk: Curretn Disk Queue Length Counter The value should not exceed 1.5 to 2 times the number of disks. To improve performance, add disks. SQLServer:Cache Hit Ratio counter The higher the value, the better. If it continues to fall below 80%, consider increasing memory. Notice the value of this parameter is accumulated after the SQL Server is started. Therefore, the value cannot reflect the current value of the system after a period of running.

  11. Select Convert(Float,3000) from salary where salary > 3000 select Convert(Float,3000) from salary where salary > 3000 We should use 3000.0 at programming time rather than wait for the DBMS to convert at run time. The same character and integer data conversion

This article is formatted using MDNICE