Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

Chapter 29 SQL Commands DISTINCT

Specifies a SELECT clause that returns only different values.

The outline

SELECT [DISTINCT [BY (item {,item2})] ]  |  [ALL]
  select-item {,select-item2}
Copy the code

parameter

  • DISTINCT– Optional – Returns a unique line with the value of the combination option.
  • DISTINCT BY (item {,item2})– Optional – Returns the value of the selection item of the row that is unique by value.
  • ALL– Optional – Returns all rows in the result set. Default Settings.

describe

The optional DISTINCT clause appears after the SELECT keyword and before the optional TOP clause and the first select-item.

The DISTINCT clause applies to the result set of a SELECT statement. It limits the number of rows returned per different (unique) value to one arbitrary row. If a DISTINCT clause is not specified, all rows that meet the selection criteria are displayed by default. The ALL clause is the same as the DEFAULT clause without specifying it; If you specify ALL, SELECT returns ALL rows in the table that meet the selection criteria.

DISTINCT clauses have two forms:

  • SELECT DISTINCT: Returns one row for each unique combination of the selection values. You can specify one or more options. For example, the following query returns a row containingHome_StateandAgeOf each unique combination of valuesHome_StateandAgeValue:
SELECT DISTINCT Home_State,Age FROM Sample.Person
Copy the code
  • SELECT DISTINCT BY(Item): Returns one row for each unique combination of item values. You can specify a single item or a comma-separated list of items. The specified item or list of items must be enclosed in parentheses. Spaces can be specified or omitted between the BY keyword and parentheses. The selection list can (but does not have to) include the specified items. For example, the following query returns a row containingHome_StateandAgeOf each unique combination of valuesNameandAgeValue:
SELECT DISTINCT BY (Home_State,Age) Name,Age FROM Sample.Person
Copy the code

Project fields must be specified by column name. Valid values include the following: column name (DISTINCT BY(City)); %ID(returns all rows); A scalar function that specifies column names (DISTINCT BY(ROUND(Age, -1); Specify a sorting function (DISTINCT BY(%Exact(City)) for column names. Cannot specify fields by column alias; Attempting to do so generates a SQLCODE-29 error. Cannot specify fields by column number; This is interpreted as text and a line is returned. Specifying a literal as an item value in a DISTINCT clause returns 1 line; Which row to return is uncertain. Therefore, specifying 7, ‘Chicago’, ‘ ‘, 0, or NULL all returns 1 row. However, if you specify a literal as an item value in a comma-separated list, the literal is ignored, and DISTINCT selects a row for each unique combination of the specified field name.

The DISTINCT clause is applied before the TOP clause. If both are specified, SELECT returns only rows with a unique value, which is the number of uniquely valued rows specified in the TOP clause.

If the column specified in the DISTINCT clause contains NULL(no value) rows, DISTINCT returns a NULL row as a DISTINCT(unique) value, as shown in the following example:

SELECT DISTINCT FavoriteColors FROM Sample.Person
Copy the code
SELECT DISTINCT BY (FavoriteColors) Name,FavoriteColors FROM Sample.Person
ORDER BY FavoriteColors
Copy the code

The DISTINCT clause makes no sense in simple embedded SQL queries, because in this type of embedded SQL, SELECT always returns only one row of data. However, embedded SQL cursor based queries can return multiple rows of data; In a cursor based query, the DISTINCT clause returns only unique value rows.

DISTINCT and ORDER BY

The DISTINCT clause is applied before the ORDER BY clause. Therefore, the combination of DISTINCT and ORDER BY first selects any row that satisfies the DISTINCT clause, and then sorts those rows according to the ORDER BY clause.

DISTINCT and GROUP BY

The DISTINCT and GROUP BY records are grouped BY a specified field (or fields) and return a record for each unique value of that field. One important difference between them is that DISTINCT evaluates the aggregation function before grouping. GROUP BY computes the aggregation function after grouping. The following example shows the difference:

SELECT DISTINCT BY (ROUND(Age,- 1)) Age,AVG(Age) AS AvgAge FROM Sample.Person
 /* Avg(Age) returns the average of all ages in the table */
Copy the code
SELECT Age,AVG(Age) AS AvgAge FROM Sample.Person GROUP BY ROUND(Age,- 1)
 /* Avg(Age) returns the average Age of each Age group */
Copy the code

The DISTINCT clause can be specified with one or more aggregate function fields, although this is rarely meaningful because aggregate functions return a single value. Therefore, the following example returns a single line:

SELECT DISTINCT BY (AVG(Age)) Name,Age,AVG(Age) FROM Sample.Person
Copy the code

Note: If you use the DISTINCT clause of an aggregate function as a unique item or option with the GROUP BY clause, the DISTINCT clause is ignored. Subqueries can be used to achieve the desired combination of DISTINCT, aggregate functions, and GROUP BY.

Letter case with DISTINCT optimization

Group string values differently based on the collation type defined for the field. By default, string data type fields are defined using the SQLUPPER collation, which is case insensitive.

If the field/property collation type is SQLUPPER, grouped field values are returned in all uppercase letters. To group values by original alphabetic case, or to display the return values of grouped fields in original alphabetic case, use the %Exact collation function. This is shown in the following examples, which assume that the Home_City field is defined using the collation type SQLUPPER and contains the values’ New York ‘and’ New York ‘:

SELECT DISTINCT BY (Home_City) Name,Home_City FROM Sample.Person
/* Combining Home_City values with their uppercase values returns the name of each grouped city in uppercase letters. So, return 'New York'. */
Copy the code
SELECT DISTINCT BY (Home_City) Name,%EXACT(Home_City) FROM Sample.Person
/* Combining the Home_City value with its uppercase value returns the name of the city in each group (original letter case). Therefore, you can return 'New York' or 'New York', but not both. * /
Copy the code
SELECT DISTINCT BY (%EXACT(Home_City)) Name,Home_City FROM Sample.Person
/* Combining Home_City values by their original letter case returns the name of each group's city (original letter case). So both 'New York' and 'New York' return. No optimizations are used. */
Copy the code

You can use the management portal to optimize query performance for queries that contain DISTINCT clauses. Select System administration, configuration, SQL and object Settings, SQL. Viewing and editing GROUP BY and DISTINCT queries must generate raw value options. (This optimization also applies to the GROUP BY clause.) . The default value is No.

This default setting groups letter values by uppercase sort for letter values. This optimization makes use of the index of the selected field. Therefore, it only makes sense if one or more selected fields have indexes. It sorts the field values stored in the index; The alphabetic string is returned in all uppercase letters. You can set this system-wide option and then use the %exact collation function to override it for a particular query to preserve case.

You can also use $SYSTEM. SQL. Util. SetOption () method is rapid to distinguish the option within the scope of the SYSTEM is set up this option. To determine the current Settings, please call $SYSTEM. SQL. CurrentSettings (), it shows open different optimization Settings; The default value is 1.

Other uses of DISTINCT

  • Flow field:DISTINCTConvection fieldOIDDo not operate on its actual data. Because of all the stream fieldsOIDAre all unique values, soDISTINCTThere is no effect on the actual stream field duplicate data values.DISTINCT BY(StreamField)Reduces the number of records with an empty stream field to one empty record.
  • Asterisk syntax:DISTINCT*The syntax is legal, but meaningless, because by definition all rows contain some different unique identifier. Different from (*) syntax is illegal.
  • Subquery: Used in subqueriesDISTINCTThe clause is legal, but meaningless, because the subquery returns a single value.
  • Unselected row data:DISTINCTClauses can be associated with those that do not access any table dataSELECTUse together. ifSELECTcontainsFROMClause, specified on a lineDISTINCTThe result will contain these non-table values; If not specifiedDISTINCT(orTOP),SELECTWill produce withFROMThe number of rows in a clause table with the same number of rows. ifSELECTDoes not containFROMClause, it isDISTINCTIt’s legal, but it doesn’t make sense.
  • Aggregate functions: Can be used in aggregate functionsDISTINCTClause to select only the different (unique) field values to be included in the aggregation. withSELECT DISTINCTClause is different in aggregate functionDISTINCTDo not includeNULLAs aDISTINCT(Unique) value. Please note that,MAXandMINAggregation function analysisDISTINCTClause syntax has no errors, but this syntax does nothing.

DISTINCT and % ROWID

Specifying the DISTINCT keyword causes a cursor based embedded SQL query to leave the %ROWID variable unset. Even though DISTINCT does not limit the number of rows returned, it does not set %ROWID. The following example shows this:

ClassMethod Distinct(a)
{
	s %ROWID = 999
	&sql(
		DECLARE EmpCursor3 CURSOR FOR 
			SELECT DISTINCT Name, Home_State
			INTO :name,:state FROM Sample.Person
			WHERE Home_State %STARTSWITH 'M'
	)
	&sql(
		OPEN EmpCursor3
	)
	q:(SQLCODE '= 0) for { &sql( FETCH EmpCursor3 ) q:SQLCODE w ! ,"RowID: ",%ROWID," row count: ",%ROWCOUNT w " Name=",name," State=",state } &sql( CLOSE EmpCursor3 ) }Copy the code

This change in query behavior only applies to embedded SQL SELECT queries based on cursors. Dynamic SQL SELECT queries and non-cursor embedded SQL SELECT queries are never set to %ROWID.

DISTINCT and transaction processing

Specifying the DISTINCT keyword causes the query to retrieve all current data, including data that has not yet been committed by the current transaction. Ignore the READ COMMITTED isolation mode parameter of a transaction (if set); Retrieves all data in READ UNCOMMITTED mode.

The sample

The following query returns one row for each different Home_State value:

SELECT DISTINCT Home_State FROM Sample.Person
ORDER BY Home_State
Copy the code

The following query returns one row for each different Home_State value, but returns the other fields of that row. It is impossible to predict which line is retrieved:

SELECT DISTINCT BY (Home_State) %ID,Name,Home_State,Office_State FROM Sample.Person
ORDER BY Home_State
Copy the code

The following query returns one row for each different combination of Home_State and Office_State values. Depending on the data, it will either return more rows or the same number of rows as in the previous example:

SELECT DISTINCT BY (Home_State,Office_State) %ID,Name,Home_State,Office_State FROM Sample.Person
ORDER BY Home_State,Office_State
Copy the code

The following query returns a single line for each DISTINCT name length using DISTINCT BY:

SELECT DISTINCT BY ($LENGTH(Name)) Name,$LENGTH(Name) AS lname
FROM Sample.Person
ORDER BY lname
Copy the code

The following query uses DISTINCT BY to return a row for each DISTINCT first element of the FavoriteColors %List value. It lists the different peers whose FavoriteColors are empty:

SELECT DISTINCT BY ($LIST(FavoriteColors,1)) Name,FavoriteColors,$LIST(FavoriteColors,1) AS FirstColor
FROM Sample.Person
Copy the code

The following query returns the top 20 different Home_State values retrieved from sample.person in ascending order by sorting order. The “top” row reflects the ORDER BY clause ordering of all rows in Sample.Person.

SELECT DISTINCT TOP 20 Home_State FROM Sample.Person ORDER BY Home_State
Copy the code

The following queries use DISTINCT in both the main query and the WHERE subsentence query. It returns the first 20 different Home_State values in sample. Person, which are also in sample.employee. If a subquery DISTINCT is not provided, it retrieves a DISTINCT Home_State value in sample.person that matches the randomly selected Home_State value in sample.employee:

SELECT DISTINCT TOP 20 Home_State FROM Sample.Person 
WHERE Home_State IN(SELECT DISTINCT TOP 20 Home_State FROM Sample.Employee)
ORDER BY Home_State
Copy the code

The following query returns the top 20 FavoriteColore values. This reflects the ORDER BY clause ordering of all rows in Sample.Person. As we all know, the FavoriteColors field has NULL, so the different rows whose FavoriteColors are NULL appear at the top of the collation sequence.

SELECT DISTINCT BY (FavoriteColors) TOP 20 FavoriteColors,Name FROM Sample.Person 
      ORDER BY FavoriteColors
Copy the code

Also note that in the previous example, because FavoriteColors is a list field, the categorization sequence includes element length bytes. Therefore, different list values starting with a three-letter element (red) are listed before list values starting with a four-letter element (blue).