Write in front: The algorithm is improved Based on the Up-growth algorithm, and a new episod-utility model IV-Uber (InVestment by Utility-based Episode Rules Model) is proposed to predict stocks. This model can get better results than UP-Span and TSpan algorithms.

Discovering utility-based episode rules in complex event sequence

The sample

define

Most of the content can be referred to the UP-span algorithm, here are just a few additions:

  • User-specified Window size (WS) refers to the preset window size, namely, the time span.
  • Where =<(SEh,Th), (SEh+1,Th+1)… , (SEh + WS – 1, Th + WS – 1) > W ^ {} \ rm h = < (SE_ {\} rm h, T_} {\ rm h), \, (SE_ {\ rm h + 1}, T_ {\ rm h + 1}), \ \ ldots, \, (SE_ {\} rm h + WS – 1, T_} {\ rm h + WS – 1) > Wh = < (SEh, Th), (SEh + 1, + 1) Th… , (SEh + WS – 1, Th + WS – 1) >, in which 1 h or less or less (∣ ∣ C – W ⁣ S + 1) 1 \ \ le le h (\ mid/mid – W/C! S+1) 1≤ H ≤(∣C∣−WS+1), ∣C∣ mid C \mid∣C∣ is the time span of a complex sequence of events
  • In this paper, event length is defined as time span, which is the definition in most of the papers related to plot utility mining
  • Simultaneous concatenation is expressed as α□β \alpha \, \square \, \betaα□β. Serial concatenation is denoted as α _ β\alpha \, \cdot \, \betaα _ β
  • ET ⁣Set(α)={Te1,… Ten} ET \! Set (\ alpha) = \ lbrace T_} {\ rm e ^ 1, \ \ ldots \, T_} {\ rm e ^ {\ rm n} \ rbraceETSet (alpha) = {Te1,… Ten} (this point was not in the article before)
  • A window-based occurrence is, as the name suggests, a window larger than the time span of the event α\alphaα. There can be many of these, denoted by ∣woSet(α)∣ mid woSet(\alpha) \mid∣woSet(α)∣
  • The utility value of the complex sequence of events CUWS = ∑ I = WS + 1 ∣ u – SEi ∣ u (u – SEi, Ti) CU_ {\ rm WS} = \ sum_ + 1} {\ rm I = WS ^ {/ rm/mid u – SE_ {I} \ rm \ mid} u (u – SE_ {I} \ rm, Rm, T_ \ {\} I) CUWS = ∑ I = WS + 1 ∣ u – SEi ∣ u (u – SEi, Ti), but this why set? (Note that u− seiu-se_ {\rm I}u−SEi refers to the total utility value at a point in time, because “−-−” is a hyphen, not a minus sign for writing reasons)
  • The maximum utility is defined as MU ⁣R(X)=∑ I =1ku(uS ⁣Epi+1, PI +1)MU\! R(X) = \sum_{\rm i=1}^{\rm k}u(uS\! E_ {\ rm p_ {\ rm I + 1}}, \, p_ {I} \ rm + 1) MUR ku (X) = ∑ I = 1 (uSEpi + 1, PI + 1), the role is to provide closed down (downward closure property), namely the subset is not efficient, Then the extended set must not be efficient

algorithm

UBER-Mine

The main idea is to start from the plot of length 1, find out all efficient 1-episodes and then string and combine them to get the high-order efficient plot

MineSimuUBER

When the combined plot has other plots happening up to the time point (because the time span is added here, the algorithm gradually expands from a single time point to a time period), the two plots can be regarded as simultaneous in this time period

MineSerialUBER

The window length is a span of time, and events occurring within this span can be used as serial objects

CalculateUtility

GenUBER

The core lies in the application of up-growhth algorithm. It should be noted that in order to construct UR-tree, this algorithm will modify the preceding concatenation method to a certain extent, and make the time point as the retrieval condition to construct header-table, and then store all key information into UR-tree

conclusion

The algorithm is improved based on the up-growth algorithm, which shows that the Tree structure has strong expansibility again. By setting different indexes to build tables, all key information is stored in the Tree to achieve the purpose of saving time. Personally, I think that the preparatory work of uber-Mine algorithm for event sequence in the early stage is too long and tedious. Although the downward closed nature is also used for pruning, the operation workload will be further increased due to the complex tree-building process of up-growth algorithm itself. Is it possible to set other index variables to increase efficiency? Or do you use some other strategy that optimizes up-growth? (Some policies are based on transaction Databases, so they may not be applicable to sequence databases.)