Ta-lib is a Python library that encapsulates a number of commonly used metrics for technical analysis of financial transactions implemented in C. DolphinDB scripts were used to implement DolphinDB indicator functions included in ta-lib and encapsulated in DolphinDB TA Module (ta.dos) to facilitate calculation of these technical indicators in DolphinDB. DolphinDB Database Server 1.10.3 or later is required to use the TA module.

1. Standard naming and usage of functions and parameters

  • In contrast to ta-lib, where all function names are uppercase and all parameter names are lowercase, the TA module uses camel case naming for both function and parameter names.

For example, the syntax of DEMA in ta-lib is DEMA(close, timeperiod=30). The corresponding function in the TA module is dema(close, timePeriod).

  • Some functions in ta-lib have optional arguments. In the TA module, all parameters are mandatory.

  • To get meaningful results, the parameter timePeriod in the TA module is required to be at least 2.

2. Use examples

2.1 Indicator functions are directly used in scripts

Calculate a vector directly using the WMA function in the TA module:

Use ta close = 7.2 6.97 7.08 6.74 6.49 5.9 6.26 5.9 5.35 5.63 x = wMA (close, 5); use ta close = 7.2 6.97 7.08 6.74 6.49 5.9 6.26 5.9 5.35 5.63 x = wMA (close, 5);Copy the code

2.2 Use groups in SQL statements

Users often need to compute multiple groups of data within each group in a data table. In the following example, we construct a table containing two stocks:

Close = 7.2 6.97 7.08 6.74 6.49 5.9 6.26 5.9 5.35 5.63 3.81 3.935 4.04 3.74 3.7 3.33 3.64 3.31 2.69 2.72 date = (2020.03.02 + 0.. 4 join 7.. 11).take(20) symbol = take(`F,10) join take(`GPRO,10) t = table(symbol, date, close)Copy the code

For each stock, wMA function in TA module was used to calculate:

update t set wma = wma(close, 5) context by symbol
Copy the code

2.3 Return the result of multiple columns

Some functions return the result of multiple columns, such as the function bBands.

Examples of direct use:

Low, mid, high = bBands(close, 5, 2, 2);Copy the code

Examples of use in SQL statements:

Close = 7.2 6.97 7.08 6.74 6.49 5.9 6.26 5.9 5.35 5.63 3.81 3.935 4.04 3.74 3.7 3.33 3.64 3.31 2.69 2.72 date = (2020.03.02 + 0.. 4 join 7.. 11).take(20) symbol = take(`F,10) join take(`GPRO,10) t = table(symbol, date, close) select *, bBands(close, 5, 2, 2, 2) as `high`mid`low from t context by symbol symbol date close high mid low ------ ---------- ----- -------- -------- -------- F 2020.03.02 7.2 F 2020.03.03 6.97 F 2020.03.04 7.08 F 2020.03.05 6.74 F 2020.03.06 6.49 7.292691 6.786 6.279309 F 2020.03.09 5.9 7.294248 6.454 5.613752 F 2020.03.10 6.26 7.134406 6.328667 5.522927 F 2020.03.11 5.9 6.789441 6.130667 5.471892 F 2020.03.12 5.35 6.601667 5.828 5.054333 F 2020.03.13 5.63 6.319728 5.711333 5.102939 GPRO 2020.03.02 3.81 GPRO 2020.03.03 3.935 GPRO 2020.03.04 4.04 GPRO 2020.03.05 3.74 GPRO 2020.03.06 3.7 4.069365 3.817333 3.565302 GPRO 2020.03.09 3.33 4.133371 3.645667 3.157962 GPRO 2020.03.10 3.64 4.062941 3.609333 3.155726 GPRO 2020.03.11 3.565302 GPRO 2020.03.09 3.33 4.133371 3.645667 3.157962 3.31 3.854172 3.482667 3.111162 GPRO 2020.03.12 2.69 3.915172 3.198 2.480828 GPRO 2020.03.13 2.72 3.738386 2.993333 2.24828Copy the code

3. Performance description

The average speed of the functions in TA module is similar to that in TA-Lib when used directly, but the performance of the functions in TA module is much better than that in TA-Lib when calculated in groups. In this section, we take the WMA function as an example for performance comparison.

3.1 Direct Performance comparison

In DolphinDB:

Use ta CLOSE = 7.2 6.97 7.08 6.74 6.49 5.9 6.26 5.9 5.35 5.63 timer X = WMA (close, 5); use TA CLOSE = 7.2 6.97 7.08 6.74 6.49 5.9 6.26 5.9 5.35 5.63 timer X = WMA (close, 5);Copy the code

Using the WMA function in the TA module directly for a vector of 1,000,000 length takes 3 ms.

The corresponding Python statement is as follows:

The close = np. Array ([7.2, 6.97, 7.08, 6.74, 6.49, 5.9, 6.26, 5.9, 5.35, 5.63, 5.01, 5.01, 4.5, 4.47, 4.33]) close = np. Tile (close, 100000)  import time start_time = time.time() x = talib.WMA(close, 5) print("--- %s seconds ---" % (time.time() - start_time))Copy the code

The WMA function in TA-lib takes 11 milliseconds, 3.7 times that of DolphinDB TA Module.

3.2 Group Usage Performance Comparison

In DolphinDB, construct a 1,000,000 – long table containing 1000 stocks:

Close = rand(1.0, n) date = take(2017.01.01 + 1.. 1000, n) symbol = take(1.. 1000, n).sort! () t = table(symbol, date, close) timer update t set wma = wma(close, 5) context by symbol;Copy the code

Each stock was calculated using the WMA function in the TA module, which took 17 milliseconds.

The corresponding Python statement is as follows:

Random. Uniform (size=1000000) symbol = np.sort(np.tile(np.arange(1,1001),1000)) date = np.tile(pd.date_range('2017-01-02', '2019-09-28'),1000) df = pd.DataFrame(data={'symbol': symbol, 'date': date, 'close': close}) import time start_time = time.time() df["wma"] = df.groupby("symbol").apply(lambda df: talib.WMA(df.close, 5)).to_numpy() print("--- %s seconds ---" % (time.time() - start_time))Copy the code

The calculation time of each stock using the WMA function in TA-lib is 535 ms, 31.5 times that of the WMA function in TA module.

4. Vectorization

All functions in the TA module, like ta-lib, are vector functions: the input is a vector and the output is a vector of equal length. Ta-lib is implemented in C language with high efficiency. The TA module is implemented in DolphinDB script language, but it makes full use of built-in vectorization functions and higher-order functions to avoid loops. Of the 57 functions that have been implemented, 28 are faster than TA-Lib. The fastest function is about three times the performance of TA-Lib. 29 functions are slower than TA-lib, with the slowest performing at least 1/3 of ta-lib’s performance.

Function implementation in TA module is also very concise. Ta.dos totals 765 lines, averaging about 14 lines per function. The core code for each function is about four lines, excluding comments, empty lines, the start and end lines of the function definition, and pipelining code to remove null values at the start of input parameters. Users can browse the ta module function codes to learn how to use DolphinDB scripts for efficient vectorization programming.

4.1. Processing of null values

If ta-lib’s input vector starts to contain null values, the calculation starts from the first non-null position. The TA module uses the same strategy. In the scroll/cumulative window function, each set of initial values that have not reached the length of the window is empty. This ta-lib is consistent with the TA module’s results. After that, however, if the value is null, this and all subsequent positions may be null in the TA-lib function. The number of null values does not affect the result of TA module function unless the number of non-null data in the window is not enough to calculate the index (for example, there is only one non-null value when calculating variance).

Dolphindb close = [99.9, NULL, 84.69, 31.38, 60.9, 83.3, 97.26, 98.67] 1) [,,,,,, 670.417819, 467.420569, 539.753584, 644.748976] / / use talib in python close. = np array ([99.9, np. Nan, 84.69, VAR(close, 5, 1) array([nan, nan, nan, nan, nan, nan, nan, nan, nan])Copy the code

In the population variance calculation above, because the second value of close is null, the ta module’s output is different from ta-lib’s. Ta-lib’s output is all null. If the replacement null value is 81.11, the TA module and ta-lib get the same result. Add a null value before the first element 99.9, and the result is still the same. In short, when only the first K elements of the input parameter are empty, the ta module and ta-lib produce exactly the same output.

4.2 Iterative Processing

Many index calculations in technical analysis use iteration, i.e. the current index value depends on the previous index value and the current input: r[n] = Coeff * r[n-1] + input[n]. DolphinDB introduced the function iterate for this type of calculation, avoiding loops.

def ema(close, timePeriod) { 1 n = close.size() 2 b = ifirstNot(close) 3 start = b + timePeriod 4 if(b < 0 || start > n) return Array (DOUBLE, n, n, NULL) 5 init = close.subarray(:start).avg() 6 coeff = 1-2.0 /(timePeriod+1) 7 ret = iterate(init, coeff, close.subarray(start:)*(1 - coeff)) 8 return array(DOUBLE, start - 1, n, NULL).append! (init).append! (ret) }Copy the code

Taking the ema function implementation as an example, line 5 calculates the mean value of the first window as the initial value of the iteration sequence. Line 6 defines the iteration parameters. Line 7 calculates the EMA sequence using the iterate function. The built-in function iterate has a very high running efficiency, calculating the EMA sequence of 1,000,000 vectors with a window length of 10, ta-Lib takes 7.4ms, TA module only takes 5.0ms, which is faster than TA-Lib.

4.3 Application of sliding window functions

Most technical indicators specify a sliding window in which the indicator value is calculated. DolphinDB’s built-in functions already include some basic sliding-window metrics. Including McOunt, MAVG, Msum, MMAX, MMIN, MIMAX, MIMin, MMed, MPercentile, MRANK, MMAD, MBeta, McOrr, McOvar, MSTD and MVAR. These basic sliding window functions are well optimized, and most of them have a complexity of O(n), which is independent of window length. More complex sliding indices can be achieved by superimposing or transforming the above basic indices. Ta :: VAR is the population variance, whereas DolphinDB’s built-in Mvar is the sample variance and needs to be adjusted.

def var(close, timePeriod, nddev){ 1 n = close.size() 2 b = close.ifirstNot() 3 if(b < 0 || b + timePeriod > n) return array(DOUBLE, n, n, NULL) 4 mobs = mcount(close, timePeriod) 5 return (mvar(close, timePeriod) * (mobs - 1) \ mobs).fill! (timePeriod - 1 + 0:b, NULL) }Copy the code

Now let’s give a more complex example, the implementation of linearreg_SLOPE. Linearreg_slope actually computes close relative to the sequence 0.. (timeperiod-1) beta. This metric appears to be unvectorized, and the data must be extracted from each window and iterated through the beta. But in fact, the independent variable in this example is special, it is a fixed arithmetic sequence, which can be optimized by incremental calculation when calculating the beta of the later window. Since beta(A,B) = (sumAB – sumA*sumB/ obS)/varB, varB and sumB are fixed, sliding window when we only need to optimize sumAB and sumA calculations. The change of sumAB between the two Windows can be vectorized by formalizing the simplification, refer to line 10. Line 12 calculates the sumAB for the first window. Sumabdelta.cumsum () in line 13 vectorizes the sumAB values for all Windows.

def linearreg_slope(close, timePeriod){
1	n = close.size()
2	b = close.ifirstNot()
3	start = b + timePeriod
4	if(b < 0 || start > n) return array(DOUBLE, n, n, NULL)
5	x = 0 .. (timePeriod - 1)
6	sumB = sum(x).double()
7	varB = sum2(x) - sumB*sumB/timePeriod
8	obs = mcount(close, timePeriod)
9	msumA = msum(close, timePeriod)
10	sumABDelta = (timePeriod - 1) * close + close.move(timePeriod) - msumA.prev() 
11	sumABDelta[timePeriod - 1 + 0:b] = NULL
12	sumABDelta[start - 1] =  wsum(close.subarray(b:start), x)
13	return (sumABDelta.cumsum() - msumA * sumB/obs)/varB
}
Copy the code

The linearreg_slope sequence of a vector with a length of 1,000,000 is calculated. When the window length is 10, the ta-lib takes 13ms and the TA module takes 14ms, which are almost equal. This is not easy for ta to implement with scripts. When the window is increased to 20, ta-lib’s time increases to 22ms, while TA’s time is still 14ms. This shows that ta-lib’s implementation uses a loop and evaluates each window separately, while TA implements vectorization regardless of window length.

4.4 Techniques for Reducing Data Replication

When performing slice, join, append and other operations on vectors, it is likely that a large amount of data will be copied. Often data replication is more time consuming than many simple calculations. Here are some tips on how to reduce data replication with some practical examples.

4.4.1 Using vector View subarray to reduce data replication

If you slice a vector directly into a subwindow, a new vector will be generated and the data will be copied, which will take up more memory and time. DolphinDB introduced a new data structure called Subarray for this purpose. It is actually a view of the original vector, just a pointer to the original vector, and the starting and ending positions. There is no chunk of memory allocated to store the new vector, so no data replication actually takes place. Read-only operations on all vectors can be applied directly to subarray. Both the EMA and Linearreg_SLOPE implementations make heavy use of subarray. In the following example, we perform 100 slice operations on a million-length vector, which takes 62ms and 0.62ms per operation. Considering that the ema operation of testing a million-length vector in 4.2 takes only 5ms, the saving of 0.62ms is significant.

Close = rand(1.0, 1000000) timer(100) close[10:] Time Elapsed: 62 msCopy the code

4.4.2 Specifying capacity for Vectors avoids capacity expansion

When we append data to the end of a vector, if there is not enough space, we need to allocate a larger memory space, copy the old data into the new memory space, and finally free the old memory space. This operation can be time-consuming when the vector is large. If the final length of a vector is clearly known, specifying this length as the capacity of the vector in advance can avoid the expansion of the vector. DolphinDB’s built-in functions array(dataType, [initialSize], [Capacity], [defaultValue]) can specify capacity when created. For example, line 8 of ema creates a vector of capacity N and append evaluates the result.

5. DolphinDB TA indicator list

Overlap Studies

Momentum Indicators

Volume Indicators

Volatility Indicators

Price Transform

Statistic Functions

Other Functions

  • The corresponding DolphinDB built-in functions can be used instead of the Math Transform and Math Operators class functions in TA-lib. For example, the SQRT, LN, and SUM functions in ta-lib can be used in DolphinDBsqrt.log.msumStudent: Function substitution.
  • The following TA-lib functions are not yet implemented in the Ta module: All Pattern Recognition and Cycle Indicators class functions, HT_TRENDLINE(Hilbert transform-Instantaneous Trendline), Chaikin A/D Oscillator (ADOSC), MAMA(MESA Adaptive Moving Average), SAR(Parabolic SAR), and SAREXT(Parabolic SAR – Extended) functions.

6. Roadmap

  • The index functions that have not been implemented will be implemented in the next version, which is expected to be completed in April 2020.
  • DolphinDB custom functions currently do not support default arguments or key-based arguments for function calls. These are implemented in DolphinDB Server 1.20.0, where ta modules have the same default parameters as TA-lib.
  • The TA module must be loaded using Use TA, which is inconvenient in interactive queries. DolphinDB Server will allow pre-loading of DolphinDB modules during system initialization in version 1.20. Ta module functions will have the same status as DolphinDB built-in functions to eliminate the need to load modules.