UMEpi algorithm
Write above: This algorithm is the first one that can fully mine all efficient plots. It is slightly different from upspan in definition, but it is this little difference that leads to a big change in the accuracy of mining results
UtilityDriven Mining of High Utility Episodes
The sample
The utility value table
Complex sequence of events
define
Most of the basic definitions can refer to the UPspan algorithm, and the following are some supplementary and key definitions.

The Occurrence period is defined in the same way as in the upspan, which is added here. A complex plot of alpha = } {A, D  B \ alpha = \ big \ lbrace A, D \ \ rbrace \ rightarrow B big alpha = ⟨ ⟩ {} A, D, B, It is called complex because it contains A set of parallel occurrences ({A,D}\lbrace A,D \rbrace{A,D}) and A set of serial occurrences ({A,D},{B}\lbrace A,D \rbrace, \lbrace B \rbrace{A,D},{B}), Period of time the incident set as occSet (alpha) occSet (\ alpha) occSet (alpha) = {(T1, T2), [T1, T5], [T1, T6], [T3 and T5], [T3, T6]} \ lbrace [T_1, T_2], [T_1, T_5], [T_1, T_6], [T_3, T_5], [T_3, T_6] \ rbrace {[T1, T2], [T1, T5], [T1, T6], [T3 and T5], [T3, T6]}, it is important to note that Events AAA and DDD must occur simultaneously (i.e., at the same point in time) or they are serial events, which is inconsistent with the original scenario α\alphaα

The Maximum time duration is similar to the sliding window in the early plot mining algorithm. It is a time constraint (MTD) that needs to be specified by the user, so as to avoid the increase of calculation amount and the decrease of correlation between events due to the long time span. For any event α\alphaα, the time span Te−Ts≤MTDT_{\rm e} t_ {\rm s} \le MTDTe−Ts≤MTD, it is obvious that the minimum occurrence time span Mo (α) Mo (\alpha) Mo (α) always satisfies this inequality.
Ps. It should be noted here that the definition in upspan is Te−Ts+1≤MTDT_{\rm e} t_ {\rm s} +1 \le MTDTe−Ts+1≤MTD. The reason why the UMEpi algorithm is designed this way is to provide a more accurate boundary and to be easier to understand

There are two formulas for calculating episoderweighted utility:
 The UP  Span: The EWU (alpha, mo (alpha)) = ∑ I = 1 k  1 u (Ei, Ti) + ∑ I = TeTs + MTDu (SEi, Ti) EWU (\ alpha, mo (\ alpha)) = \ sum_} {\ rm I = 1 ^ {} \ rm k  1 u (E_ {I} \ rm, T_ {I} \ rm) + \ sum_ {\ rm I = T_e} ^ + MTD} {T_s u (SE_ {I} \ rm, T_ {I} \ rm) EWU (alpha, mo (alpha)) = ∑ I = 1 k  1 u (Ei, Ti) + ∑ I = TeTs + MTDu (SEi, Ti)
 TSpan: The EWU (alpha, mo (alpha)) = ∑ I = 1 ku (Ei, Ti) + ∑ I = TeTs + MTDu (SEi, Ti) EWU (\ alpha, mo (\ alpha)) = \ sum_} {\ rm I = 1 ^ {} \ rm k u (E_ {I} \ rm, T_ {I} \ rm) + \ sum_ {\ rm I = T_e} ^ + MTD} {T_s u (SE_ {I} \ rm, T_ {I} \ rm) EWU (alpha, mo (alpha)) = ∑ I = 1 ku (Ei, Ti) + ∑ I = TeTs + MTDu (SEi, Ti)
The alpha = (E1),... , (Ek), Ti (1 I k or less) or less \ alpha = \ big (E_1), \ ldots, (E_} {\ rm k) \ big and \; T_{\rm I}(1 \le I \le k)α=⟨(E1),... ,(Ek)⟩,Ti(1≤ I ≤ K), TSpan doublecounting event sets at point in time TeT_{\rm e}Te, resulting in inaccuracy of EWU, for example:
Set of alpha = } {A, D  B , mo (alpha) = \ alpha = \ [T1, T2] big \ lbrace A, D \ \ rbrace \ rightarrow B big and \, mo (\ alpha) = [T_1, T_2]α=⟨{A,D}→B⟩,mo(α)=[T1,T2], the following calculation can be obtained
$UPSpan: EWU(\alpha, mo(\alpha)) = u(\lbrace A,D \rbrace, T_1)+u(\lbrace B,C \rbrace, T_2)+u(\lbrace A,D,E \rbrace, T_3) \\ TSpan: EWU(\alpha, mo(\alpha)) = u(\lbrace A,D \rbrace, T_1)+u(B, T_2)+u(\lbrace B,C \rbrace, T_2)+u(\lbrace A,D,E \rbrace, T_3)$It is obvious that the event BBB occurring at the time point T2T_2T2 is calculated repeatedly, which leads to the larger calculation of the final result. UMEpi algorithm adopts the calculation method of UPspan
It should be noted that EWU is different from TWU and SWU in that it can only be seen as local pruning. Instead of the other two which can be pruned in place from the beginning (because of occurrence and expansion modes) for example MTD=2, minUtil=50%, TU=94MTD =2, \; minUtil = 50\%, \; TU = 94 MTD = 2, minUtil = 50%, TU = 94 plots of alpha = A, B, D \ alpha = \ big A, B, D \ big alpha = ⟨ A, B, D ⟩ EWU (alpha) EWU (\ alpha) EWU (alpha) = $24 an inefficient use, should be pruned, But its extension plot
\big →b→{a,b,d}⟨E→B→{A,B,D}⟩($48), B {D, E}} {A, B, D \ big \ lbrace D, E \ rbrace \ \ rightarrow \ lbrace rightarrow B A, B, D \ rbrace \ big ⟨ B {D, E} {A, B, D} ⟩ ($51) It's an efficient plot. Therefore, in the plot utility mining, there is a phenomenon that "inefficient plot can be expanded into efficient plot", which is also the difficulty in the mining process 
(Remaining Utility of an episode)ru(α,Ti)=∑e 'hosted α∧α≺e' u(e ',Ti)ru(\alpha, T_{\rm i}) = \sum_{\rm e^\prime \not\in \alpha \land \alpha \prec e^\prime}u(e^\prime, T_{\rm I})ru(α,Ti)=∑e '∈α∧α≺e' u(e ',Ti) where e 'e^\primee' is a event after the event α\alphaα on the time point TiT_{\rm I}Ti, which is the key of EWU calculation, Since EWU can only be counted as a boundary value if residual scenario utility (a potentially extensible set of events) is taken into account, Moreover, it should be noted that the time span of calculating the sum of residual utility of a certain plot of Z in a complex sequence of events is Te+1≤Ti≤Ts+MTDT_{\rm e}+1 \le T_{\rm I} \le T_{\rm s} + MTDTe+1≤Ti≤Ts+MTD
According to the residual utility of the plot, we can optimize EWU and obtain the new formula:
$EWU(\alpha, mo(\alpha)) = u(\alpha, mo(\alpha)) + ru(E_{\rm k}, T_{\rm e}) + \sum_{\rm i=T_{e}+1}^{T_{\rm s}+MTD}tu(T_{\rm i}) \\ = \sum_{\rm i=1}^{\rm k}u(E_{\rm i}, T_{\rm i}) + ru(E_{\rm k}, T_{\rm e}) + \sum_{\rm i=T_{\rm e}+1}^{T_{\rm s}+MTD}tu(T_{\rm i})$The Te Ti Ts] + MTDT_ or less or less {\} rm e \ le T_ {I} \ rm \ le T_ rm {\ s]} + MTDTe Ti Ts] + MTD or less or less

Given a complex event sequence SSS, maximum time MTD, and userset threshold minUtil, When an event (set) in SSS satisfies u(α)≥minUtil×TUu(\alpha) \ge minUtil \times TUu(α)≥minUtil×TU, we call the event (set) HUE
strategy
EpisodeWeighted Downward Closure Property

Lexicographic sequence tree (Lexicographic sequence tree) is based on the root node of the prefix tree is empty, and all the children of the parent node (also called the prefix node) are formed by concatenation (string, and). These children nodes are ordered between them
Ps. AndEHINThe spatial retrieval enumeration tree is consistent
In the dictionary sequence tree we can use u(γ)≤EWU(γ)≤EWU(α)≤minUtil×TUu(\gamma) \le EWU(\gamma) \le EWU(\gamma) \le EWU(\alpha) \le minUtil \times TUu(γ)≤EWU(γ)≤EWU(α)≤minUtil×TU, where γ\gammaγ is the extended plot α\alphaα (the extensions are divided into Simultconnection and serialconnection), The derivation can be proved by comparing their definitions
algorithm
The algorithm is also divided into two steps: 1) scan to generate firstorder HUEs; 2) The Huespan method is constantly called to generate highorder HUEs with loworder combination
The Span of SimultHUE procedure:
The Span of SerialHUE procedure:
conclusion
The main contributions of this paper are as follows: 1) It is demonstrated that lowweight plots can also be combined with highweight plots to generate highweight plots, which has not been fully considered by previous HUEM algorithms; 2) A more compact EWU is redefined using the residual utility value, which greatly improves the computing speed; 3) Compared with upspan algorithm, moSet(α)moSet(\alpha)moSet(α) stores all the occurrence subsequences of the plot α\alphaα (i.e., potentially combinable episodes).