Chapter 3 Optimization Table (2)

Adjust the table to calculate values

Tuning table operations calculate and set table statistics based on representative data in the table:

  • ExtentSize, which may or may not be the actual number of rows in the table (row count).
  • The selectivity of each property (field) in the table.

You can selectively prevent selective evaluation of individual attributes.

  • The outlier selectivity of attributes where one value occurs more commonly than the others.

Effective queries can be optimized using outliers.

  • A comment for each attribute that identifies some attribute characteristics.
  • Average field size per property.
  • Table of SQLMap Name,BlockCountandSource of BlockCount.

Extent size and line count

When you run the Tune Table tool from the administrative portal, ExtentSize is the actual count of the current row in the Table. By default, the GatherTableStats() method also uses the actual number of rows as ExtentSize. When a table contains a large number of rows, it is best to perform analysis on a small number of rows. You can use the SQL Tune table command and specify %SAMPLE_PERCENT to perform analysis on only a percentage of the total line. You can use this option to improve performance when running against a table with a large number of rows. This %SAMPLE_PERCENT value should be large enough to sample representative data. If ExtentSize<1000, TUNE TABLE analyzes all rows regardless of the %SAMPLE_PERCENT value.

The specified ExtentSize can be smaller or larger than the actual number of rows. However, ExtentSize should not significantly exceed the actual number of rows in the current table data. When ExtentSize is specified, the Tuning Table implements ids for this number of rows and then performs sampling. If ExtentSize greatly exceeds the actual number of rows, most sampled row ids will not correspond to the actual row data. If this is the case, field selectivity cannot be calculated; Instead, Tune Table lists the specified ExtentSize as the calculated ExtentSize and a smaller number as the SAMPLESIZE; Tune Table returns

for these non-existent computed values.

You can set ExtentSize to 0. You may need to do this if the table is never intended to populate data but is used for other purposes, such as query joins. When ExtentSize is set to 0, InterSystems IRIS sets the selectivity for each field to 100% and the average field size for each field to 0.

Selectivity and outlier selectivity

Tune Table calculates the selectivity of each attribute (field) value as a percentage. It does this by sampling the data, so selectivity is always an estimate rather than an exact value. Selectivity is based on the assumption that all attribute values are (or may be) equally likely. This is a reasonable assumption for most of the data. For example, in a general population table, most data values are typical: any particular birth date will appear in about 0.27% of the data (1 in 365); About half are women and half are men (50%). Fields defined as Unique have a selectivity of 1(should not be confused with a selectivity of 1.0000(1%)). For most properties, the selectivity percentage is sufficient.

For some attributes, Tune Table also calculates outlier selectivity. This is the percentage of a single attribute value, which appears more frequently in the example than other data values. The tuning table returns outlier selectivity only if the frequency of one data value is significantly different from the frequency of other data values. Tune Table returns at most one outlier for the Table, regardless of the distribution of data values. If an outlier is selected, the tuning table displays that value as an outlier. NULL means < NULL >.

If TuneTable returns outlier selectivity, the normal selectivity is still the percentage of each non-outlier data value in the entire rowset. For example, if 11 different values are detected out of 1000 randomly selected values, and one of them is an outlier, the selectivity is 1 in 11(9.09%) : the average probability of each item appearing is 1 in 11. If the outlier selectivity is 80% and the regular selectivity is 1%, about 20((1-0.80)/0.01) additional non-outliers can be found in addition to outliers.

If the optimization table initial sampling returns only a single value, but additional sampling returns multiple different values, these sampling results modify the normal selectivity. For example, an initial random sampling of 990 values detects only one value, but subsequent sampling detects 10 single instances of other different values. In this case, the initial outlier affects the selectivity value, which is now set to 1/1000(0.1%) because each of the 10 non-outliers occurs only once in 1000 records.

The most common example of outlier selectivity is a property that allows NULL. If the number of records with NULL for a feature significantly exceeds the number of records with any particular data value for that feature, NULL is an outlier. The following is the FavoriteColors field selectivity and outlier selectivity:

SELECTIVITY of FIELD FavoriteColors
   CURRENT =     1.8966%
   CALCULATED =  1.4405%
   CURRENT OUTLIER = 45.0000%, VALUE = <Null>
   CALCULATED OUTLIER = 39.5000%, VALUE = <Null> 
Copy the code

If a field contains only one different value (all rows have the same value), the selectivity of that field is 100%. Values with 100% selectivity are not considered outliers. Tuning tables sample data to establish selectivity and outlier selection values. To determine this, the preference table first tests a small number or a few records, and if they all have the same field value, it will test up to 100,000 randomly selected records to support the assumption that all values of non-indexed fields are the same. Only if the field is indexed, the field is the first field in the index, and the field and the index have the same collation type can the optimization table fully determine whether all values of the field are the same.

  • If unindexed fields are known to have been tested100000Bar other values that may not be detected in randomly selected records should be manually set for selectivity and outlier selectivity.
  • If a non-indexed field is known to have no other value, you can specify it manually100%The selectivity, remove any outlier selectivity, and setCALCSELECTIVITY=0To prevent the preference table from trying to calculate selectivity or specifying this value as an outlier.

To modify these selectivity, outlier selectivity, and outlier calculation values, select a single field from the tuning table display. These values for the field are displayed in the details area on the right side of the display. The selectivity, outlier selectivity, and/or outliers can be modified to better fit the expected complete data set.

  • Selectivity can be specified with a percent sign (%), can also be specified as integer rows (without a percent sign). If the number of rows is specified as an integer, InterSystems IRIS will use the range size to calculate the selectivity percentage.
  • You can specify outlier selectivity and outliers for fields that previously had no outliers. Optionally specify outliers with a percent sign (%). If only outlier selectivity is specified, the Tune Table assumes that the outlier is<Null>. If only an outlier is specified, the tuning table will not hold this value unless outlier selectivity is also specified.

The CALCSELECTIVITY parameter does not calculate selectivity

In some cases, you may not want to optimize the selectivity of the table tool’s calculation properties. To prevent calculation selectivity, specify the value of the attribute’s CALCSELECTIVITY parameter to 0(the default is 1). In Studio, CALCSELECTIVITY can be set on the Property Parameters page of the New Property Wizard, or in the property parameter list in the inspector (you may need to shrink and reexpand the property parameter list to display it).

One case where CALCSELECTIVITY=0 should be specified is if the field is not indexed, then it is known to contain only one value in all rows (selectivity =100%).

Optimization of outliers

By default, the query optimizer assumes that queries do not select outliers. For example, a query typically selects a specific field value and returns a small number of records from the database, rather than a large number of records where the field value is an outlier. The query optimizer always uses selectivity to construct the query plan, except when performing operations that require consideration of outlier selectivity.

Depending on the selected outlier, the following actions can be performed to adjust query optimization:

  • If the outlier is<null>, is queryingWHEREClause to specify one for this fieldis nulloris NOT nullConditions.

This causes the query optimizer to use outlier selectivity when constructing the query.

  • If the outlier is a data value, the query optimizer assumes that the selected field value is not an outlier.

For example, an employee record for a company headquartered in Massachusetts might have an outlier for the Office_State field MA (Massachusetts). The optimizer assumes that the query will not select ‘MA’ because this will return most records in the database. However, if you are writing a query to select an outlier, you can notify the optimizer by wrapping the outlier in double parentheses. When querying on this field, specify a WHERE clause as follows :WHERE Office_State=((‘MA’)). This technique inhibits literal substitution and forces the query optimizer to use outlier selectivity when building query plans. This syntax is required for dynamic SQL queries, as well as queries written outside of InterSystems IRIS provided using ODBC/JDBC. This is not required for class queries, embedded SQL queries, or queries accessed through views.

  • Configure system-wide optimization queries based on parameter value SQL Settings.

This option sets up the appropriate combination of run-time plan selection (RTPC) optimization and deviation query as outlier (BQO) optimization for outliers. Note that changing this configuration setting clears all cache queries in all namespaces. Using the management portal, select System Administration, Configuration, SQL and Object Settings, SQL to view and change this option. The options available are:

  • Assume that the query parameter values are not field outliers (BQO=OFF.RTPC=OFF, initial default)
  • Assume that query parameter values often match field outliers (BQO=ON.RTPC=OFF)
  • Tuning actual query parameter values at run time (BQO=OFF.RTPC=ON)

To determine the current Settings, call $SYSTEM. SQL. CurrentSettings ().

  • Override the system-wide configuration Settings for the query.

You can override the RTPC of a single query by specifying the %NORUNTIME RESTRICT keyword. SELECT Name,HaveContactInfo FROM t1 WHERE HaveContactInfo=? SELECT %NORUNTIME Name,HaveContactInfo FROM t1 WHERE HaveContactInfo=? The RTPC is overridden to produce a standard query plan.

You can override biased queries as outliers for a single query by specifying the comment option /*#OPTIONS {“BiasAsOutlier”:1} */.

Note column

The PORTAL Optimization table information option displays a comment column for each field. The values in this field are system defined and cannot be modified. They include the following:

  • RowIDFields: A table has oneRowIDIs defined by the system. Its name is usually ID, but it can have different system-assigned names. Since all of its values are (by definition) unique, its selectivity is always 1. If the class definition includesSqlRowIdPrivate,NotesThe column values forRowIDFields,HiddenField.
  • Hidden fields: The hidden fields are defined as private,SELECT*Don’t show. By default,CREATE TABLEwillRowIDThe field is defined to be hidden; You can specify%PUBLICROWIDKeyword to enableRowIDNot hidden and open. By default, tables defined by persistent class definitions willRowIDDefined as not hidden; You can specifySqlRowIdPrivatewillRowIDDefined as hidden and private. The container field is defined to be hidden.
  • Stream field: Represents a field defined using a stream data type, which can be a character stream (CLOB), or a binary stream (BLOB). The stream file has no average field size.
  • Parent reference field: field that references the parent table.

An identity field, ROWVERSION field, sequence field, or UNIQUEIDENTIFIER(GUID) field is not identified in the comment column.

Average field size

Run the tuning table to calculate the average field size (in characters) for all non-streamed fields based on the current table data set. This is the same as AVG($length(Field)) (unless otherwise noted), rounded to two decimal places. You can change the average field size of individual fields to reflect the expected average size of field data.

  • NULL:$LENGTHFunction willNULLField is treated as having length 0, so 0 is treatedNULLFields are averaged. This can cause the average field size to be less than one character.
  • Empty column: If the column contains no data (all rows have no field value), the average field size value is 1, not 0. For columns that do not contain data,AVG($length(Field))To zero.
  • ExtentSize=0Will:ExtentSizeWhen set to 0, the average field size for all fields is reset to 0.
  • Logical field values: The average field size is always calculated based on the logical (internal) value of the field.
  • List fields: InterSystems IRIS List fields are calculated based on their logical (internal) code values. This encoding length is greater than the total length of the elements in the list.
  • Container field: The container field of a collection is greater than the total length of its collection objects. For example, in theSample.Person,HomeContainer fieldAverage FieldThe Size is larger thanHome_Street,Home_City,Home_StateandHome_ZipThe sum of the average field size.
  • Stream field: A stream field does not have an average field size.

If the attribute parameter CALCSELECTIVITY is set to 0, the tuning table does not calculate the average field size for the attribute/field.

You can modify the average field size calculation by selecting a single field from the tuning table display. This displays the value of the field in the details area on the right side of the display. You can change the Average field size to a value that better fits the expected full data set. Since the optimization table does not perform validation when this value is set, you should ensure that the field is not a stream field and that the value specified is not greater than the maximum field size (MaxLen).

The average field size is also shown in the Management Portal Directory Details TAB field options table. The tuning table must have been run for the field options table to display the average field size values.

Map BlockCount TAB

The Tuning table Map BlockCount TAB shows the SQL Map name, BlockCount(as a positive integer), and source of the BlockCount. The source of the block count can be defined in the class definition, estimated by the class compiler, or measured by TuneTable. Run tuning table changes estimated by the class compiler to TuneTable measurements; It does not affect the values defined in the class definition.

You can modify the BlockCount calculation value by selecting a single SQL map name from the tuning table display. This displays the block count for the map name in the details area on the right side of the display. The block count can be modified to a value that better fits the expected full data set. Because Tune Table does not perform validation when setting this value, you should ensure that the block count is a valid value. Modifying BlockCount changes the source of the BlockCount to that defined in the class definition.

Export and re-import tuning table statistics

You can export tuning table statistics from a table or set of tables, and then import these tuning table statistics into a table or set of tables. Here are three situations in which you might want to perform this export/import. (For simplicity, these describe exporting/importing statistics from a single table; In practice, statistics are usually exported/imported from multiple interrelated tables):

  • Model production systems: Production tables are completely populated with actual data and usedTune tableOptimize.

In the test environment, the tables were created with the same table definition, but with much less data. Production tables can be optimized for modeling on test tables by exporting tuning table statistics from production tables and importing them into test tables.

  • To replicate the production system: The production Table is fully populated with real data and optimized using tune Table. A second production table with the same table definition will be created. (For example, a production environment with its backup environment, or multiple identical table definitions, each containing patient records from a different hospital.) . By exporting tuning table statistics from the first table and importing it into the second table, you can provide the same optimizations for the second table as for the first, without having to run the tuning table a second time or wait for the second table to fill in representative data.
  • To revert to the previous set of statistics: You can create optimized statistics for the Table by running tune Table or explicitly setting statistics. By exporting these statistics, you can keep them when you try other statistics Settings. Once the optimal set of statistics has been determined, they can be imported back into the table.

Can use $SYSTEM. SQL. Stats. Table. The Export () method will be tuning Table statistics Export to XML file. This method can export optimized table statistics for one, more, or all tables in a namespace, as shown in the following example:

  DO $SYSTEM.SQL.Stats.Table.Export("C:\AllStats.xml")
  /* Export TuneTable statistics for all schemas/tables in the current namespace */
Copy the code
  DO $SYSTEM.SQL.Stats.Table.Export("C:\SampleStats.xml"."Sample")
  /* Export tunable statistics for all tables in the Sample schema */
Copy the code
  DO $SYSTEM.SQL.Stats.Table.Export("C:\SamplePStats.xml"."Sample"."P*")
  /* Export adjustable statistics for all tables in the Sample schema that start with the letter "P" */
Copy the code
  DO $SYSTEM.SQL.Stats.Table.Export("C:\SamplePersonStats.xml"."Sample"."Person")
  /* Export the adjustable statistics for Sample Person */
Copy the code

Can use $SYSTEM. SQL. Stats. Table. The Import () method to Import use $SYSTEM. SQL. Stats. Table. The Import export () method of tuning Table statistics.

$SYSTEM. SQL. Stats. Table. The Import () there is a KeepClassUpToDate Boolean options. If it is true (and the update is true), $SYSTEM. SQL. Stats. Table. The Import () will update with new EXTENTSIZE and selective values class definition, but class definition will be kept up to date. However, in many cases, it is best to recompile the class after tuning the class table so that the queries in the class definition can be recompiled and the SQL query optimizer can use the updated data statistics. The default is FALSE(0). Note that if the class is deployed, the class definition is not updated.

$SYSTEM. SQL. Stats. Table. The Import () there is a ClearCurrentStats Boolean options. $SYSTEM. If it is TRUE, SQL. Stats. Table. The Import () before the Import statistics from the existing Table clear all previous section size, selective, piece count and other tuning Table statistics. You can use this method if you want to completely clear table states that are not specified in the import file, rather than having them defined in the persistent class of the table. The default is FALSE(0).

If $SYSTEM. SQL. Stats. Table. The Import () didn’t find the corresponding Table, it will skip the form and continue to Import file specified in the next Table. If a table is found, but some fields are not, those fields will be skipped.

BlockCount mapped in the class store definition cannot be inherited. BlockCount can only appear in the storage definition of the class from which the map originated. If the map is derived from a superclass, $SYSTEM. SQL. Stats. Table. The Import () only set the projection Table BlockCount metadata, rather than setting the class stored BlockCount metadata.