Author: Emma

Introductions to the 0.

Every morning when I wake up, I face a difficult question: what to wear today? At this time, there are many options in my mind, but none of them are satisfactory, and I often fall asleep thinking about them. Twenty minutes later, I woke up from sleep, grabbed my T-shirt and shorts and left, dripping toothpaste on my body in a hurry.

So in the eyes of colleagues, I have been a slovenly female programmer, I did not let everyone down, often with the image of cultural sweater slippers, self-boast anyway are married, wear to you look boring.

But every morning I never give up. I still think: What will I wear today? That’s the question. It’s like a problem I can never solve, but I can’t get around it.

How to solve

Do more data analysis and recommendation, see everything have the impulse to collect data to solve the problem. So I came up with this imaginative idea: to use data analysis to solve the problems that bother me when I get up in the morning, so that I can go to work happy and confident.

I summarized the overall process of using data to solve problems at work:

  1. Clearly define the problem to be solved.
  2. Data collection, data cleaning.
  3. Define indicators and perform statistical calculations.
  4. The index was subdivided and compared with drilling, and the data were observed to draw a conclusion.
  5. Take out some typical cases for specific analysis.
  6. According to conclusions 4 and 5, optimize the strategy.
  7. Using the optimized strategy, continuously observe the metrics defined in 4.

There will be a lot of details, such as whether the indicators meet expectations, assumptions and verification of problems encountered.

Write it, stick it on the wall, do it. Every time a hole is opened, there is a mixture of excitement and tension.

Data analysis is one of those exciting things that comes to mind. They need to be combed out, otherwise they can easily get lost in the middle of the journey. You never know until you see the data. Does the data come out as expected? If not, why? If not, what assumptions and tests should be made?

As a result, sometimes excited, often can not avoid the loss. The most afraid is not the conclusion does not meet expectations, but the search for a long time did not find what useful conclusions. We can only accept that there is no conclusion for the time being. Keep the data in your heart and you may have some inspiration to use later.

What a collision of logic and reason and inspiration!

1. Define the problem that needs to be solved

It’s not that I don’t have clothes, not that MANY, but half of them. Once just start oneself make money when, also “squander” bought a lot of treasure to explode money. But the feeling of having nothing to wear never seems to go away.

To sort it out:

  1. I often feel dissatisfied with my current choices of clothes
  2. I don’t know how to buy, seems to have been buying but still not enough

From a recommendation strategy perspective, we can think of the wardrobe as our candidate pool. Life in a variety of occasions, a variety of seasons on behalf of different characteristics of users (in fact, are I, in different circumstances I change! The demand.

Such as (working days, go to work, spring, want to go to exercise after work, hope simple and bright, a few days ago through the sequence (XXXXX), dirty washed sequence (XXXXX)) or (weekend, take children to the park, summer, will run and jump to take photos, hope convenient action photo,… ..) Recommended effect: personal feelings, tangled for a long time or feel that clothes are not enough. It shows that the effect needs to be improved.

Here, clothing selection strategy and evaluation index — whether personal feelings are agreeable or not are both relatively subjective and difficult to quantify. After all, women are so complicated that I can’t understand myself.

And every time we feel bad about our clothes, we think it’s because we don’t have enough clothes to wear. Therefore, we hope to solve the problem: how to optimize the pool to improve the effect under the condition of fixed distribution strategy and evaluation index. Of course, since the pool was also purchased according to my own decision, the problem was to solve: how to optimize the strategy of building the pool (buying clothes). After all, buying clothes often takes longer than putting them on. It would save a lot of effort to have a clear idea of what kind of clothes I need.

2. Collect and clean data

Basic data construction and cleaning. Clean data is always important.

2.1 Basic data construction

Basic data: Each piece of clothing and its associated attributes. The related attributes are convenient for later statistics and trips. Each piece of clothing is photographed for case by case analysis. If this analysis took me a whole weekend, 80% of the work was here. I smoothed out all the clothes in the closet and took pictures. I marked some labels and put them in the Excel form.

Combined with the objective of the analysis, the label is mainly based on the factors taken into consideration when buying clothes, the decision factors when wearing clothes, and the final wear of the clothes, the following labels: Type (vest short sleeves, pajamas, hoodies, jumpsuits, etc.), season (spring and autumn, summer, winter) purchase time (student days, after work, within a year), purchase channels (shopping malls, Taobao, others send). Color (flowers, gray, stripes…) Special degree (special, a little characteristic, normal), wear frequency (high, medium, low, gradually low, no longer want to wear) in fact, also want to mark more, such as who bought with. The main purpose when buying is whether to try it on or not. But I was running out of energy, and it was tiring to remember the past life of every piece of clothing.

2.2 Dirty data processing

If you don’t take some samples first, or do some simple checks, it’s easy to get dirty data. They tend to skew indicators like the mean with very small quantities and very unusual values. I got rid of some clothes. There are mainly: elders feel that I am suitable to wear must give me, for special things to buy can not wear a second time, such as the two kinds of performance clothes. I didn’t choose these clothes on my own initiative, so I’ll leave them out of the analysis.

3. Define indicators for statistical calculation

3.1 the number of

Simplicity and intuition is also the most important indicator of the recommendation pool. After all, our “never enough clothes” appeal is quantity. Contrast and subdivision thinking are mainly used here. Because the total amount is certainly a lot, feel not enough must be focused on some subdivision of the label. Subdivision and contrast is about finding those labels. So let’s look at the total.

Actually, I don’t know how much or how little it is. This is one of the problems with data analysis: a lot of data needs to have an overall average or comparison to know the size. Some data through long-term observation of this kind of business data, the mean and distribution of the heart roughly know, see can know the size. For example, the click-through rate of mobile feeds ads is generally 1%+. Cloud music each TAB permeability data, are known in advance. And I don’t have data on the number of other people’s clothes or the average distribution. I can only make a simple estimate. 99 pieces of clothing and pants, outerwear and undergarments, all included. Three seasons, each season is 30 clothes, the upper body and the lower body divided evenly, each season becomes 15 clothes. 15 clothes in 4 months is not a lot of clothes (guilty head scratching), at least not very exaggerated.

Simple drill-down and comparison of quantitative indicators —– very simple and easy way to draw conclusions

Most clothing in summer and least clothing in winter. It matches the climate in the south. When we look at each data, we have a rough estimate in mind. For example, according to the data of seasons, it can be preliminarily judged from the climate that summer is the most. When the data is in line with our expectations, it is also a validation of the accuracy of the data. When data does not match our expectations, attention and further validation checks are required.

Divide time to look at the clothes bought in the past 10 years or the vast majority. Thirty-three percent of the clothes were new, while 22 percent were seven years old. There are a few more than 10 years of undergraduate clothes. I guess I haven’t put on much weight.

The distribution of frequency of use, from low to high, is skewed to the left. It is true that many clothes are used infrequently. The goal was to adjust the distribution to the right.

Shopping mall to buy the most clothes, like to take a fancy to take away the straightforward.

The lack of formal clothes has something to do with personal temperament. No formal requirements. In line with expectations

Do some simple cross for each dimension and have some further conclusions

Low frequency of use of the problem, the most serious spring clothes, like fewer clothes. The clothes used in winter are still more commonly worn.

Occasion cross season, find summer is really a romantic season, holiday style more. One dress for each of the three seasons is perfect and enough. Next time you see something more formal, don’t bother thinking about it.

The occasion is cross. There are more special clothes for holidays and more regular clothes for weekdays. It’s reasonable.

Clothes also can not be ignored – collocation attributes. How can not match the clothes together, is also a big trouble to choose. Analyze the top/bottom ratio. Get rid of dresses, jumpsuits, things you don’t need to wear.

Improper parts of the upper and lower assembly are shown:

  • Spring 11.5 tops with a pair of pants
  • Jeans that go with everything are very rare and need to be replenished

The analysis of the quantitative indicators, let me have more understanding of their own wardrobe. Master which categories need replenishment. Which ones are more abundant.

Besides quantity, quality is very important. Girls more or less are constantly buying clothes, but why are they always buying clothes, always feel not enough to wear.

Focus on the clothes you don’t want to wear anymore. Learn from your failures.

3.2 high

Define elimination rate = clothes you no longer want to wear/all clothes

The biggest pain in my heart is “buying clothes I haven’t worn much”. Take up space and have no wear and cost money, but also be said: you see so many clothes in the cabinet how to say no clothes!

Analyze the clothes with high elimination rate to have what characteristic, can avoid trample thunder. Also in the future when buying clothes tangled, give yourself some guidance. Similarly, dimensional subdivision thinking and contrast thinking. As the main means. The overall elimination rate is 30%. A third of the clothes were invalid, which is still a high proportion.

Depending on the season, winter is especially high. Winter clothes are more frequently used, but they are also more often worn. Some need to be weeded out.

I want to discuss a problem here. There are a lot of dimensions, how to pick them when we’re drilling down.

With large-scale data and high-dimensional situations, we can use machine learning to specify the elimination rate, and then calculate the contribution of each feature.

But in data analysis, interpretability is very important, and a lot of data is used to test our hypotheses. There is no need to make accurate predictions, or to train models. (Of course, if you use a model, it’s still common to see if the characteristics of the high contribution are consistent with expectations, and if there are any implications.)

Therefore, in data analysis, the preferred dimensions for drilling are those that are most likely to be discriminating, to test some hypothesis, or to have special meaning in the context of the scenario.

For example, many trips are conducted in a “seasonal” dimension. Because the season dimension has a special meaning. Spring, summer and autumn clothes can not wear each other. So drill down this dimension first and find some problems more easily.

The knockout rate, the most likely discriminating metric for priority tripping, is also the dimension that can test the hypothesis: purchase time. Do the clothes you don’t want to wear have a direct relationship with the old and the new? If you just don’t want to wear it because you’ve bought it for a long time, that’s not a decision at the time of purchase.

The elimination rate from high to low in order, graduate students or work after purchase > undergraduate purchase > purchase within a year.

The elimination rate is not necessarily lower the newer the clothing. The elimination rate of undergraduate clothes is lower than after work. Does this mean that early vision is better? It’s important to note that only 5% of your wardrobe is made up of clothes purchased as an undergraduate.

The reason here can be imagined: undergraduate students bought clothes ten years ago, can stay now, about are the most like a batch. If all the undergraduate clothes are left to now, that elimination rate will certainly be much larger.

Clothes bought within a year had the lowest elimination rate. The recent aesthetic pit is still less.

So there’s an unfair aspect to the elimination rate indicator: clothes bought in the last year have a significantly lower elimination rate.

Then, if there is a low elimination rate of a kind of clothes, it may not be because OF my wise decision and unique vision, but also because I bought a lot of clothes recently and clothes in a year accounted for a large proportion.

So as seen earlier, the low elimination rate of summer clothes is because summer clothes are bought more in a year?

Cross season and purchase time.

You can see that the elimination rate is lower in summer than in spring and autumn for clothes purchased within a year and a year before. And it’s been exceptionally low for a year. Considering the majority of short sleeves in summer, it is not easy to step on pits.

It is the winter clothing that deserves attention. The elimination rate for purchases made within a year is higher than it was a year ago. Although there are some use frequency is very high. But recently bought, the probability of not wearing at all is higher. You need to shop rationally in the near term.

Purchasing channels are also an important dimension. The proportion of online shopping is increasing recently.

But more disturbing is that the online shopping clothes, the elimination rate is actually higher than others.

From the stylistic dimension

More maverick clothes are more likely to go out the door. Modest clothes are relatively safe and conform to common sense. Especially the spring special style, need to be careful, the elimination rate of heaven. More variety in the summer is fine.

4. Specific analysis of typical cases

We have a general idea of which dimensions have a high failure rate. In order to further imprint badcase in mind, a fall into a pit and a gain in wisdom. I made a cause note of the clothes I don’t want to wear anymore. Think back to the source. And give a list of solutions

5. Output Conclusion: Clothing buying strategy

So, here are some strategies for this weekend

  1. Jeans pants are in great demand;
  2. Go to the mall and try on winter clothes. Winter clothes have been some of the older in wear, bad is over risk;
  3. Summer clothes are plentiful and personal satisfaction is high. You can postpone the purchase; Can occasionally buy online icing on the cake;
  4. Don’t buy frills for spring clothes. Bought it and barely wore it;
  5. Online shopping to inappropriate clothes decisively return. Online shopping is not good for the elimination of the first reason;

6. Continuously observe data as decisions change

Do not do scattered data, to do analysis system. Is a very important point.

Analysis of indicators that can identify problems precipitates. It is critical to become aware of the business and the changes in strategy.

When the measures of step6 are implemented, the original data are updated and the changes of indicators are observed. Adjust direction in time, just be to maintain wardrobe “ecological health” key.

But time is limited, and I’m a little broken about the original data collection and entry. Let’s hope it sticks.

The last

To summarize the data analysis methods and key points encountered in this article:

  1. Problems need to be sorted out and defined.
  2. Set key indicators.
  3. Clean underlying data is critical.
  4. The method of drilling and comparative analysis of key indicators is simple, but many conclusions can be obtained.
  5. You can set up some assumptions to test.
  6. Pay attention to whether the indicator is fair, and if there is some natural deviation in the indicator, remember the score barrel analysis.
  7. Analyzing bad cases is a powerful tool in formulating strategies.
  8. Avoid one-time work, long-term observation constitutes analysis system.

Thank you for seeing this. I’m packing over a hundred clothes.

This article is published by NetEase Cloud Music Technology team. Any unauthorized reprinting of this article is prohibited. We recruit all kinds of technical positions all year round, if you are ready to change jobs, and you like cloud music, then join us at [email protected].