What are the data analysis requirements in business scenarios

• Descriptive ability, ability to describe with data (business indicators, statistical methods)

• Forecasting capabilities, forecasting user profiles, sales forecasting, inventory forecasting, predictive maintenance

• Analytical ability, understanding of causes, explanatory features

• Automation, 7*24 automatic operation, continuous action (not one-time)

Kraljic Model (Data-driven procurement positioning)

Strategic Items

Procurement items that are critical to the production process often have high supply risks due to scarce supplies or transportation difficulties

Position of buyer and seller: balance of power, high interdependence

Procurement strategy recommendation: strategic alliance, close connection, early involvement of suppliers, co-creation, and full consideration of vertical integration, focus on long-term value

Bottleneck Items

Procurement items that can only be supplied by a specific supplier, are difficult to transport and have low financial impact

Buyer and seller status: the seller is active and interdependent.

Procurement strategy recommendations: quantity insurance contracts, vendor management inventory, securing additional inventory, sourcing potential suppliers

RFM user value model

RFM indicators:

Recency: the interval between the last consumption

Frequency a number of purchases made within a period of time

Monetary The amount of money spent over a period of time such as a year

For example, on August 22nd, user A purchased A product in the store, and the last purchase was on August 15th. Please ask Recency=?

The larger each indicator is, the higher the user value is. Use three indicators as the XYZ coordinate axis and divide the space into eight parts for analysis

Intelligent supply chain

Supply Chain data Exploration

1) Check whether there are missing fields => Data completion

2) How does the correlation between these features => present the thermogram

3) Explore Sales volume (corresponding to Sales per Customer)

According to different Market, Order Region

By different Category names

Trends in different time dimensions (year, month, week, hour)

How does Product Price correlate with Sales per Customer

RFM is used to manage users hierarchically

Predict fraud

To predict the fraudulent Order, Order Status=’SUSPECTED_FRAUD’

Forecast Sales performance

3) Explore Sales volume (corresponding to Sales per Customer)

According to different Market, Order Region

By different Category names

Trends in different time dimensions (year, month, week, hour)

How does Product Price correlate with Sales per Customer

GBDT vs. XGBoost

GBDT is a machine learning algorithm and XGBoost is an engineering implementation of the algorithm

XGBoost adds regularization terms to control the complexity of the model, which helps to prevent overfitting and improve the generalization ability of the model

GBDT only uses the information of the first derivative of the cost function in model training, XGBoost carries out the second-order Taylor expansion of the cost function, and can use the first and second derivatives at the same time

Whereas traditional GBDT uses all the data in each iteration, XGBoost adopts a strategy similar to random forest and supports data sampling

Traditional GBDT is not designed to process missing values, XGBoost can automatically learn the missing value processing strategy

How to deal with the over-fitting problem in neural network