If ice cream is not sold well in the winter, it is sold well in the summer. Each item will have a different sales performance at different times, and that is the problem of research in this field. On-shelf refers to the first period, the second period, the third period… , there is no sequence between each cycle. If an item set occurs in multiple cycles, then it is denoted as (X, 1), (X, 2).

FOSHU: Faster on-shelf High Utility Itemset Mining — with or without Negative Unit Profit

The sample

Transaction Database

External utility of Item

define

  • Investigating the set I = {x1, x2,… ,xn}I = \{x_1, x_2, \dots, x_{\rm n}\}I={x1,x2… ,xn}, dataset D={T1,T2… ,Tj}D = \{T_1, T_2, \dots, T_{\rm j}\}D={T1,T2… Tj}, where Ti∈DT_{\rm I} \in DTi∈D is called transaction item, each transaction item has a unique number TIDTIDTID, each item has its own unique external utility value P (xi)p(x_{\rm I})p(xi) (can be seen as a single item profit, can be positive or negative), Each item may have a different internal utility value q(xi,Tj)q(x_{\rm I}, T_{\rm j}) Q (xi,Tj) in different transaction sets

  • In particular, in this paper, a set of positive numbers such as PEPEPE is used as a time range, and each transaction is associated with a time range PT (Tj)∈PEpt(T_{\rm j}) \in PEpt(Tj)∈PE, which means that the transaction set exists in these time ranges

  • Utility of item(set) : In transaction item TjT_{\rm j}Tj, for item xi∈Ix_{\rm I} \in Ixi∈I, Its utility value calculation formula for u (I, Tj) = q (I, Tj * p (I)) u (I, T_ {\ rm j}) = q (I, T_ {\ rm j} \ times p (I)) u (I, Tj) = q (I, Tj * p (I)); Similarly, for item set XXX, The calculation formula is u(X,Tj)=∑xi∈X and I∧ XI ∈Tju(xi,Tj)u(X, T_{\rm j}) = \sum_{x_{\rm i} \in X \subseteq I \land x_{\rm i} \in T_{\rm j}}u(x_{\rm i}, T_ {\ rm j}) u (X, Tj) = ∑ I Sunday afternoon xi xi ⊆ ∈ X ∈ Tju (xi, Tj). Utility is a measure of how important an item is (in the case of goods, profit).

  • Time period of item set: Refers to the time at which the item was sold, Defined as PI (X) = {pt (Tj) ∣ Tj ∈ X D Sunday afternoon ⊆ Tj} PI (X) = \ {pt (T_ {\ rm j}) | T_ {\ rm j} \ \ land in D X \ subseteq T_ {\ rm J} \} PI (X) = {pt (Tj) ∣ Tj Sunday afternoon X ∈ D ⊆ Tj}

  • In some period h∈ PI (X)h \in PI (X)h∈ PI (X), Itemsets XXX is defined as the utility value of u (X, h = ∑ h = PI (Tj) u (X, Tj) u (X, h = \ sum_ {h = PI (T_ {\ rm j})} u (X, T_ {\ rm j}), u (X, h = ∑ h = PI (Tj) u (X, Tj), in particular, Computing XXX all exist in the data set – + DDD period of utility value and the formula is defined as u h (X) = ∑ ∈ PI u (X, h), u (X) (X) = \ sum_ \ {h in PI (X) u (X, h)} u h (X) = ∑ ∈ PI (X) u (X, h)

    Such as: In Table 1. We can calculate that the item set {c,e}\{c, e\}{c,e} appears in periods 1, 2, 3 and twice in period 2, So u can get the following calculation process ({c, e}) = u ({c, e}, 1) + u ({c, e}, 2) + u ({c, e}, 2) + u ({c, e}, 3) u (\ {c, e \}) = u (\ {c, e \}, 1) + u (\ {c, e \}, 2) + u (\ {c, E \}, 2) + u (\ {c, e \}, 3) u ({c, e}) = u ({c, e}, 1) + u ({c, e}, 2) + u ({c, e}, 2) + u ({c, e}, 3) = 12 + 2 + 6 + 11 = 31 (in period 2 there are two different trading items, So itemsets calculate different utility values.)

  • A transaction contains many different items. The sum of the utility values of these items is the utility value (TU) of the transaction. TU can be used to calculate the TWU of each item. TU (Tj) = ∑ xi ∈ Tju (xi, Tj) TU (T_ {\ rm j}) = \ sum_ {x_ {I} \ rm \ in T_ {\ rm j}} u (x_ {I} \ rm, T_ {\ rm j}) TU (Tj) = ∑ xi ∈ Tju (xi, Tj) XXX for itemsets in all cycle utility value defined by the to h (X) = ∑ ∈ PI Sunday afternoon h (X) = pt (Tj) TU (Tj) to (X) = \ sum_ {h \ in PI \ land h (X) Rm = pt (T_ {\ j})} TU (T_ {\ rm j}) to h (X) = ∑ ∈ PI Sunday afternoon h (X) = pt (Tj) TU (Tj) the = = relative utility value (relative to the utility) = = is defined as ru (X) = u (X)/to (X) ∣ ru (X) = U (X)/to (X) | ru (X) = u (X)/to ∣ if to (X) (X) indicates a 0 to \ not (X) = 0 to  (X) = 0 relative utility value of response is the profit (loss) of the proportion of total, it can point to different itemsets sales performance, It is convenient for retailers to organize the goods on the shelves at different times. The following table can be obtained:

  • High on-shelf Utility Itemset (HOU) : If an item set XXX ru(X)≥minUtilru(X) \ge minUtilru(X)≥minUtil where 0≤minUtil≤10 \le minUtil \le 10≤minUtil≤1, If minUtil=0.43minUtil =0.43minUtil =0.43, HOUs can be calculated as follows according to the sample:

  • Transaction -weighted utilization is the same problem as HUI Mining, which requires anti-monotonicity of TWUTWUTWU to perform tailoring. The definition formula is: TWU (X, h) = ∑ X ⊆ Tj Sunday afternoon h = pt (Tj) TU (Tj) TWU (X, H) = \ sum_ \ {X subseteq T_ {\ rm j} \ land h = pt (T_ {\ rm j})} TU (T_ {\ rm j}) TWU (X, h) = ∑ X ⊆ Tj Sunday afternoon h = pt (Tj) TU (Tj)

  • Utility was defined as the office for a period of time HHH (h) = ∑ h = pt (Tj) TU (Tj) office (h) = \ sum_ {h = pt (T_ {\ rm j})} TU (T_ {\ rm j}) office (h) = ∑ h = pt (Tj) TU (Tj), So in this time period itemsets XXX relative utility value is defined as: ru (X, h) = u (X, h)/office (h) ru (X, h) = u (X, h)/office (h) ru (X, h) = u (X, h)/office (h)

    • TWU(X,h)≥u(X,h)TWU(X, h) \ge u(X,h)TWU(X, h)≥u(X,h), so TWUTWUTWU can be used as the upper boundary of the item set in a certain period of time for pre-determination
    • TWU(X,h)≥TWU(Y,h)TWU(X, h) \ge TWU(Y,h)TWU(X, h)≥TWU(Y,h), TWU(X,h), TWU(X,h)≥TWU(Y,h), In other words, if item set XXX is not an efficient item set within the time period HHH, then its subset must not be either
    • In a certain period of time, the TWUTWUTWU of item set XXX divided by the total utility of this period is greater than or equal to the relative utility value of item set in this period, namely: TWU (X, h)/office (h) or greater ru (X, h) TWU (X, h)/office (h) \ ge ru (X, h) TWU (X, h)/office (h) or greater ru (X, h)
    • If the item set XXX does not exist in the time period HHH, then the inequality TWU(X,h)/pto(h)≥minUtilTWU(X, h)/pto(h) \ge minUtilTWU(X,h)/pto(h)≥minUtil holds, then the item set XXX cannot be HOU, otherwise, It could be. It needs more testing
  • Similarly, to solve the problem of disutility items, the TU needs to be redesigned, The RTU (Tj) = ∑ xi ∈ Tj Sunday afternoon p (xi) > 0 u (xi, Tj) RTU (T_ {\ rm j}) = \ sum_ {x_ {I} \ rm \ in T_ {\ rm j} \ land p (x_ {I} \ rm) > 0} u (x_ {I} \ rm, T_ {\ rm j}) RTU (Tj) = ∑ xi ∈ Tj Sunday afternoon p (xi) > 0 u (xi, Tj)

  • Redefined transaction-weighted utilization as above, item set X has the formula in time period H: RTW (X, h) = ∑ X ⊆ Tj Sunday afternoon h = pt (Tj) RTU (Tj) RTW (X, H) = \ sum_ \ {X subseteq T_ {\ rm j} \ land h = pt (T_ {\ rm j})} RTU (T_ {\ rm j}) RTW (X, h) = ∑ X ⊆ Tj Sunday afternoon h = pt (Tj) RTU (Tj)

  • The utility-list takes item set XXX as the core and records information related to the item set without scanning the data set again (Ps. “≻\succ≻” is still in order, depending on the actual situation). The form is (tid,iutil,rutil)(tid, iutil,rutil)(tid, iutil,rutil) Which iutil (X) = u (X, Ttid) iutil (X) = u (X, T_ {\ rm dar}) iutil (X) = u (X, Ttid), Util (X) = ∑ xi ∈ Ttid Sunday afternoon xi ≻ xj, ∀ xj ∈ Xu (xi, Ttid) util (X) = \ sum_ {x_ {I} \ rm \ in T_ {\ rm dar} \ land x_ {I} \ rm \ succ x_ {\ rm j}. \forall x_{\rm j} \in X}u(x_{\rm I}, T_{\rm tid})util(X)=∑xi∈Ttid∧xi≻xj,∀xj∈Xu(xi,Ttid)

    • Itemsets XXX have u (X) = ∑ X ⊆ Tjituil u (X) (X) = \ sum_ \ {X subseteq T_ {\ rm j}} ituil u (X) (X) = ∑ X ⊆ Tjituil (X)
    • Itemsets XXX has ∑ X ⊆ Tjiutil ∑ (X) + X ⊆ Tjrutil p u (X) (X) \ sum_ \ {X subseteq T_ {\ rm j}} iutil (X) + \ sum_ \ {X subseteq T_ {\ rm j}} rutil \ ge (X) U(X) ∑X⊆Tjiutil(X)+∑X⊆Tjrutil(X)≥ U(X), and the formula is more compact than TWU(X)TWU(X)TWU(X)
  • For item set XXX, there is one on time period HHH


    s u m I U t i l ( X . h ) + s u m R U t i l ( X . h ) p u ( X . h ) s u m I U t i l ( X . h ) + s u m R U t i l ( X . h ) Or less T W U ( X . h ) s u m I U t i l ( X . h ) + s u m R U t i l ( X . h ) p s u m I U t i l ( Y . h ) + s u m R U t i l ( Y . h ) [ X Y ] ( s u m I U t i l ( X . h ) + s u m R U t i l ( X . h ) ) / p t o ( h ) p r u ( X . h ) sumIUtil(X, h) + sumRUtil(X, h) \ge u(X, h) \\ sumIUtil(X, h) + sumRUtil(X, h) \le TWU(X, h) \\ sumIUtil(X, h) + sumRUtil(X, h) \ge sumIUtil(Y, h) + sumRUtil(Y, h) [X \subset Y] \\ (sumIUtil(X, h) + sumRUtil(X, h)) / pto(h) \ge ru(X, h)
  • In constructing the three-element and above utility-list algorithm, set the disutility items to be sorted after the positive utility items forever, and use up(X)up(X)up(X) to represent the set composed of all positive utility items in item set XXX. UN (X) UN (X) UN (X) represents the set composed of all disutility terms in item set XXX, then it has the following properties:

    • U (X, h) u or less (up (X), h), u (X, h) \ le u (up (X), h), u (X, h), u (up (X), h) or less, because u (X, h), u (X, h), u (X, h) may also contain the negative effects of negative utility value
    • U (up (X ∪} {z), h) or less u (up (X), h), u (up \ cup \ {\} z (X), h) \ le u (up (X), h), u (up (X ∪} {z), h), u (up (X), h) or less, every increase in length and easy to think of itemsets, The number of times that can occur in the transaction must be less than the number before each increase
    • When inequality ru itemsets XXX (the up (X), h) = u (up (X), h)/office (h) < minUtilru (up (X), h) = u (up (X), H)/pto(h)< minUtilru(up(X),h)=u(up(X),h)/pto(h)
    • When the item set XXX is sumIPUtil(X,h)+sumRUtil(X,h)/pto(h)

      0∧X star Tj =pt(Tj)iputil(X,h)sumIPUtil(X, h)= \sum_{u(X, T_{\rm j}) > 0 \land X \subseteq T_{\rm j} \land h = pt(T_{\rm j})}iputil(X, H) sumIPUtil (X, h) = ∑ u (X, Tj) > 0 Sunday afternoon X ⊆ Tj Sunday afternoon h = pt (Tj) iputil (X, h))
      (x,>

algorithm

The main method

Deep traversal looking for HOUs

Construct 3 yuan and above utility-list

conclusion

This algorithm is based on FHM’s design and some techniques used by FHN to deal with negative items. The key point is to understand why on-shelf is discussed and what is the significance of its application.