Original link:tecdat.cn/?p=7947

Original source:Tuo End number according to the tribe public number

 

Association rule mining is a technique to identify potential relationships between different items. Take a supermarket for example, where customers can buy all kinds of goods. Often, there is a pattern to what customers buy. For example, mothers with babies buy baby products such as milk and diapers. Teenage girls can buy cosmetics, while bachelors can buy beer and chips. In short, trading involves a pattern. More profits can be generated if relationships between items purchased in different transactions can be identified.

For example, if items A and B are purchased more frequently, several steps can be taken to increase profits. Such as:

  1. A and B can be placed together so that when A customer buys one of the products, he doesn’t have to go far to buy the other.
  2. People who buy one product can be targeted to buy another through an advertising campaign.
  3. If customers buy both products, discounts can be offered on those products.
  4. Both A and B can be packaged together.

The process of identifying associations between products is called association rule mining.

Apriori algorithm for mining association rules

Different statistical algorithms have been developed to realize association rule mining, and Apriori is such an algorithm. In this article, we will examine the theory behind the Apriori algorithm, which will be implemented later in Python.

Theory of prior algorithm

support

Support is the default popularity of an item, which can be calculated by looking for the number of transactions containing a particular item divided by the total number of transactions. Suppose we want to find support for project B. It can be calculated as:

Support(B) = (Transactions containing (B))/(Total Transactions)
Copy the code

For example, if 100 out of 1000 transactions contain Ketchup, the support for the item Ketchup can be calculated as:

Support(Ketchup) = (Transactions containingKetchup)/(Total Transactions)

Support(Ketchup) = 100/1000
                 = 10%
Copy the code

Degree of confidence

Confidence is the probability that if you buy good A, you will also buy good B. It can be calculated by finding the number of transactions that buy A and B together divided by the total number of transactions that buy A. Mathematically, it can be expressed as:

Confidence(A→B) = (Transactions containing both (A and B))/(Transactions containing A)
Copy the code

To get back to our question, we had 50 transactions where the hamburger and ketchup were purchased together. In 150 transactions, the burger will be purchased. Then we can see that the likelihood of buying Ketchup when buying a Burger can be expressed as the confidence of a Burger-> Ketchup and can be written mathematically:

(Burger and Ketchup) = (Transactions containing both (Burger and Ketchup) /(Transactions containing A) Confidence(Burger→Ketchup) = 50/150 = 33.3%Copy the code

Ascension degree

Lift(A -> B) refers to the increase in the percentage of B sold when A is sold. The boost (A -> B) can be calculated by dividing the Confidence(A -> B) from Support(B). Mathematically, it can be expressed as:

Lift(A→B) = Confidence (A→B) /(Support (B))Copy the code

Back to our Burger and Ketchup problem, Lift(Burger -> Ketchup) can be calculated as:

(Burger→Ketchup) = (Confidence (Burger→Ketchup))/(Support (Ketchup)) = 33.3/10 = 3.33Copy the code

Lift basically tells us that we are 3.33 times more likely to buy a burger and ketchup together than to buy ketchup alone.

The steps involved in Apriori algorithm

For large amounts of data, there may be hundreds of items in thousands of transactions. Apriori algorithm tries to extract rules for every possible portfolio of projects. For example, promotions can be calculated for a combination of projects 1 and 2, 1 and 3, 1 and 4, then 2 and 3, then 2 and 4, and then 1, 2 and 3. Item 3; Similarly, project 1, Project 2, project 4, and so on.

 

Implement Apriori algorithm with Python

In theory, it’s time to look at the Apriori algorithm. In this section, we will use the Apriori algorithm to find rules that describe the relationships between different products that are traded 7,500 times a week in French retail stores.

Another interesting point is that we do not need to write scripts to calculate support, confidence, and lift levels for all possible portfolios. We will use an off-the-shelf library in which all the code has been implemented.

I’m referring to the Apyori library, where you can find the source. I recommend that you download and install the Python library in its default path before proceeding.

Note: All scripts in this article have been executed using the Spyder IDE for Python.

To implement the Apriori algorithm in Python, perform the following steps:

Import libraries

As always, the first step is to import the required libraries.

Import data set

Now let’s import the data set and execute the following script:

store_data = pd.read_csv('D:\\Datasets\\store_data.csv')
Copy the code

Let’s call the head() function to see the data set:

store_data.head()
Copy the code

A summary of the data set is shown in the screenshot above. If we look closely at the data, we can see that the title is actually the first transaction. Each row corresponds to a transaction, and each column corresponds to an item purchased in that particular transaction. NaN told us that the column represented items in the specific transaction that were not purchased.

There are no heading rows in this dataset. As follows:

store_data = pd.read_csv('D:\\Datasets\\store_data.csv', header=None)
Copy the code

Now execute head() :

store_data.head()
Copy the code

 Now, we’ll use the Apriori algorithm to figure out which items are usually sold together so that store owners can take action to put related items together or advertise them together.

The data processing

The Apriori library we will use requires that our data set take the form of a list list, where the entire data set is a large list and each transaction in the data set is an internal list within an external large list.

application

The next step is to apply Apriori algorithm to the data set. To do this, we can use apriori classes imported from the Apyori library.

 
Copy the code

In the second line, we convert the rules found by the Apriori class to list because it makes it easier to see the results.

View the results

First let’s find the total number of rules for apriori mining. Execute the following script: the script above should return 48. There is a rule for each item.

Let’s print the first item in the association_rules list to see the first rule. Execute the following script: The output should look like this:

RelationRecord (items = frozenset ({' light cream 'and' chicken '}), support = 0.004532728969470737, ordered_statistics[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), Confidence = 0.29059829059829057, lift = 4.84395061728395)])Copy the code

The first item in the list is the list itself, which contains three items. The first item in the list shows the grocery item in the rule.

For example, starting with item 1, we can see that light cream and chicken are usually bought together. This makes sense because light cream buyers are careful about what they eat, so they are more likely to buy chicken or white meat than red meat or beef. Or it could mean light cream is commonly used in chicken recipes.

The value of the first rule is 0.0045. The figure is obtained by dividing the number of transactions containing light cream by the total number of transactions. This rule has a confidence level of 0.2905, which indicates that 29.05% of all transactions that contain light cream also contain chicken. Finally, a lift of 4.84 told us that customers who bought light cream were 4.84 times more likely to buy chicken than those who sold chicken by default.

The following scripts show the rules, supports, confidence, and upgrades for each rule in a clearer way:

If you execute the above script, you will see all the rules returned by the Apriori class. The first four rules returned by the Apriori class are as follows:Copy the code
Rule: light cream -> chicken Support: 0.004532728969470737 Confidence: 0.29059829059829057 Lift: 4.84395061728395 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Rule: mushroom cream sauce - > escalope Support: 0.005732568990801126 Confidence: Lift 0.3006993006993007:3.790832696715049 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Rule: Escalope -> pasta Support: 0.005865884548726837 Confidence: 0.3728813559322034 Lift: 4.700811850163794 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Rule: ground beef - > herb & pepper Support: 0.015997866951073192 Confidence: Lift 0.3234501347708895:3.2919938411349285 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =Copy the code

We’ve already discussed the first rule. Now let’s discuss the second rule. Rule number two says always buy mushroom cream sauce and boneless steak. Mushroom cream sauce supports 0.0057.

conclusion

Association rule mining algorithms, such as Apriori, are useful for finding simple associations between data items. They are easy to implement and highly explanatory.