Strive to be irreplaceable in 2020!

Long warning, complete without code, only about small I in the data analysis of the experience of the road


preface

Recently, a friend asked some questions about changing careers and getting started. The questions went something like this:

“I want to learn data analysis, do not know where to start learning, small one can you take me?”

“Zero foundation, want to learn data analysis, have good experience to share?”

While summing up your questions, I can’t help but think of myself two or three years ago.

At that time, small one I also just entered the society not long, their professional skills are all prepared for background development.

After I joined the department, my daily work was more focused on data analysis, so I also faced the same problems as everyone else. Confusion, confusion, and a feeling of impotence.

Speaking of this, my canthus seems to wet a……

Now look at the problems encountered at that time, there is the following article.

The article is very long, I did not expect to write so much content, write write write……

May be their own trample pit experience is too rich, once mentioned spiralling out of control.

Although the article is long, but I have marked the key points, typesetting is also ok, I hope to let you get, get.

What is data analytics?

The word data analysis, this profession has appeared for a long time, but its status is becoming more and more important in today’s increasingly large amount of data, it will attract people’s attention.

But I suggest you think of them all together: data analytics, data mining, artificial intelligence and data science.

It is true that some people can not distinguish between these positions, and in the actual work, cross-field work is not a small number of things.

The data analysis

Let’s start with the simplest one

Take it literally: data analysis = data + analysis

First of all, you have to be able to get data, and when you need some data to support your analysis, you have to know where to get it and how to get it

By “get” I mean not simply download it from a website or buy it from a fish or treasure; It’s more that you have this ability to get data, to have a sense of what kind of data you need for analysis.

At this point, you might retort: the faster the better, as long as you can get the data you need.

If you can make sure you can cut corners on all the data you need for future projects, that’s fine. But the question is, can most of you?

I once needed to climb a batch of microblog data because of my work, taobao charged 300+, one-time without after-sale package.

Later, I found the code on the Internet and fixed the data requirement by myself. Later, the leader knew that this happened and gave me 300 bonus.

Good leadership is one aspect, if you put this matter in any environment, you have this ability, other colleagues do not have, how will the leader treat you when doing the project?

Don’t tell me what can do many things, now the leaders are not stupid, this kind of thing not everyone is willing to do.

Let’s look at the analysis after the data

You want to say analysis heavy not important, small one feels quite important!

It is analysis to put forward hypotheses by observing data, to verify hypotheses by data indicators, and to make predictions by data rules.

Analysis is the observation and utilization of data to verify existing data conclusions and put forward reasonable assumptions to predict future trends. Of course, the degree of reasonableness depends a lot on the business, as we’ll talk about later.

In summary: The purpose of data analysis is to solve problems, verify the hypotheses we put forward through data, and make corresponding prediction planning according to the data law.

Say one more sentence: for some recruitment websites require data analysts to be familiar with XXX algorithm, master XXX model ability, I only advise you one sentence, if you have the ability it said, scroll down, put your position in the following two, you deserve better.


Data mining

This topic I can’t say how much, qualifications are not enough I still know, if some places you think I said wrong, you when I say blind barren.

Just tell me how it differs from the one above. If data analysis ultimately leads to predictive analysis of data patterns, it can only be said that the predictive analysis is very limited.

However, data mining is different, supported by mathematical theory and verified by a large number of data sets, with high accuracy and conviction.

The biggest difference with data analytics is definitely the purpose.

The purpose of data mining is to predict the data changes in the future by mining the internal association between a large number of data samples.

The most obvious difference is that data analysis is more about the analysis of what you know, and data mining is more about the mining of what you don’t know. If you’re a data analyst now, or you’re going to be a data analyst, I would suggest that you focus on this, data mining.

Data mining can be transformed into data analysis, but data analysis may not be transformed into data mining. Of course, no data miner is so stupid as to switch to data analysis.

Artificial intelligence (ai)

The topic is generalized

In terms of the current social development of science and technology, artificial intelligence has a great prospect.

Basically, speech recognition, image recognition, robotics, natural language processing, intelligent search these fields belong to artificial intelligence.

But remember: AI must have data mining capabilities, and machine learning and deep learning are secondary.

Two directions are mentioned again: machine learning and deep learning. If you don’t understand, just think of it as another data mining.

Data science

Finally, there’s data science, which sounds like a fancy word, but it is a fancy word.

I suggest you understand the subject this way:

There is a package in Python called Pandas for data processing

There is also a package called SciKit-learn for data mining

And like a creeper, visual Seaborn | matplotlib, linear algebra scipy, deep learning keras, etc. These package, data science were covered.

Ok, data science is a discipline that covers data processing, visualization, data mining, deep learning, etc. Just understand.





What is the process of data analysis?

Surely for many students, especially those still in school, will be more interested in this question.

So in the actual work, a data analysis project, its implementation process is what?

When I was young, I looked over all the projects I had encountered in the past two or three years and had a full communication with my colleagues, and almost summed up six steps.

Analyze business indicators and define data content

It’s a surprise that this is the first one.

When you actually meet a project, the leader often gives you a task that is a big goal, such as:

Leader: Xiao Yi, we have a slight increase in user complaints this month compared to last month. Please analyze the reasons and predict which indicators we should focus on in the next month.

Like this, you need to know the specific business first

Why do users complain? What is it about the product that makes users unhappy? What specific data are users unhappy with? How is this data generated? What about data of different granularity?

And don’t forget the boss’s end goal (XXX, could that have been predicted by the way…).

Propose problem hypothesis and establish analysis method

When you know which data your goals relate to, but can’t decide which is primary and which is secondary, you reach stage two.

If you have time, I suggest you set up a control group for a controlled trial.

For example, you can assume that the customer’s complaint is related to the price is too high, the quality of the product, and the after-sales service

Alternatively, if you have historical complaint data, you can make assumptions based on a time dimension: different months are too affected by the weather to cause complaints (if you do blame the weather, hopefully you won’t get beaten up by your boss).

Use acquisition tools to obtain relevant data

You've already asked a question, and you know what data your question is going to be associated with, so you need data, as much data as possible to validate your conclusion, to make your conclusion more convincing and acceptable to your boss.

Generally in large companies, there will be a special person responsible for docking the number of this work, you just need to raise the demand is done.

In small companies, you’re often the boss.

Want data? Get it yourself.

Not enough data? Find it yourself.

I can’t find it? You’re so good.

You can use some data collection tools or write your own crawler scripts.

Here, Xiao Yi's experience is:

If you have a very small amount of data, hundreds or thousands of them, you don’t need a crawler, you can get an octopus or a locomotive or something like that, it’s very efficient.

If you have a large amount of data and need to retrieve it regularly, you are advised to learn Python crawlers

“No Python, no crawler?”

“Look down, there’s what you want!”


Through programming means, data cleaning is realized

After a wave of extreme manipulation (almost getting fired), you finally get the numbers you want.

A careful look, monitoring system every hour to save a record file, a total of 30*24 files last month, the file is not big, just a few MB, but can not hold it large ah.

Do how?

So at this stage, mastering at least one programming language is a basic requirement. R, Python can be used, the latter is recommended, more on this later

In the process of data cleaning, you need to face these problems:

Missing value handling, outliers handling, duplicate value handling, and garbage data saved after the system occasionally bursts.

Extract useful information for data analysis

Now everything is ready. Finally, you’re in familiar territory.

You use basic statistical methods to make statistics, analyze the data distribution of each indicator, and calculate the month-on-month situation of different indicators compared to last month’s data.

You find specific indicators of customer complaints, and they fluctuate a lot during the month, and then you verify your results by comparing data from different years over the same period.

Finally, you use data analysis software to build a simple prediction model, which predicts the month’s indicators from the data over the years. The results show little difference.

Encouraged by this, you predict the next month’s indicators through the model, full of confidence.

Reasonable data presentation, output analysis report

When you have no idea how to write your analysis report, your colleague sends you the “XXXX Data Analysis Report Template”.

When you look at the template, you know what you need to write a report: chart presentation + data demonstration + outcome prediction

You want to talk about it for a long time, but you boil it down to five or six powerpoint pages

With the help of a colleague’s template, you just need to change a few images, post a few data sheets, and explain the reasons for your original goal to come to a slightly convincing conclusion

Finally, attach your predictions, a reasonable and unbiased recommendation, and your report is done.





How to get started with data analysis?

Process said, or have some actual dry goods, or you said I water article.

This part mainly introduces: how to get started + learning methods

1. Set your direction

In data analysis, there are two types: business-oriented numeration and technology-oriented numeration.

Partial business type I not quite say, because I am not business background. What I know about the business type is dealing with people in the market, analyzing users’ pain points for marketing promotion activities and providing valuable analysis results. (It’s a question mark, I got it right)

The input and output of weekly, monthly, quarterly and annual indicators of operations perfectly illustrates the saying “we do not produce data, we are only data porters”.

Most of the technical ones don't last long. Haha, just kidding.

Technical people will pay attention to the correlation between various indicators, optimize the indicators according to the business situation, and predict the corresponding indicators of the business.

So, the technical type of data analysis eventually became a data mining engineer, natural evolution.

The business type is easier to get started, understand more indicators to see the business related, the rest is a natural thing. Partial technical need you to learn, improve their own, especially algorithm model, not so simple entry.

2. Improve your skills

This is a bit of an exaggeration, so don’t be nervous. There are many ways to improve data analysis.

Now I’m going to give you a ladder of abilities, and if you want to get started, you can go up and up.

2.1 Service Capability

Not much introduction, depends on personal ability. As short as a week or two, as long as a month or two.

Business related, just a little bit of data, the meaning of the indicators are already determined. The ability to break a big goal down into specific small indicators, or to determine the small indicators to specific data, business ability is OK.

2.2 the Excel related

Many students don't take it seriously. I think you are still a student for now. If you start to work and have access to data, do you dare to say that you really know how to use Excel?

Excel has always been a great tool for data processing, not only in its operations of statistical summary of data, but also in its graph output.

I believe many students will also use Excel to draw diagrams after processing data through Python.

When your data volume is small (to 1 million behavior limit), the data processing is simple, the data table is unique, I suggest you directly use Excel, fast, convenient, simple output.

In the eyes of the boss, the analysis task that can be done with Excel can be solved in minutes!

2.3 the Python/R

Previously said why to master a programming language, that here to say how simple and quick start a programming language

As we all know, learning with a purpose is twice as effective as programming

Here we learn programming for the purpose of counting points is to carry out data cleaning, statistical prediction and so on, there will certainly be such a process: read data – cleaning data – analysis of data – chart analysis – associated prediction – save data.

Read data: must be related to file operation, learn file related content

Data cleaning: first judge all unreasonable data, delete or fill, so it is related to judgment, cycle, learning branch related content

Statistical analysis: summarize indicators through mathematical methods, and methods, learning functions, mathematical modules related content

Chart analysis: Learn about visualization by creating charts for deeper dimensional analysis

Association prediction: To make reasonable association prediction through step 3 and step 4, we need to learn the corresponding simple algorithm and realize the code application (bonus)

Save data: Save it to a file or database and learn about databases

This process down, programming related syntax is almost also learned, in order to improve processing efficiency in learning process thread related, data processing high-order module related, basically your programming technology is no problem.

I recommend you to learn Python directly. I have written a complete set of Python learning tutorials from beginning to advanced and then to advanced. If you are interested in learning Python, you can get started quickly.

Let’s not say much about R. Python is recommended

2.4 SQL

I forgot to write this skill before I went back to the article and made it up. Not because it’s not important, but I use it every day and forget the familiar!

Learning SQL has nothing to say. It is much easier than Python and R above

Remember the four functions: add, delete, change and check

The basic operations of the database are all around these four, and high-level database operations are not used in data analysis, so it is not considered for the time being

2.5 Mining Ability

Regardless of the subtitle, consider this a plus for data analysis, even better.

At present, most data analysts will require some knowledge of algorithms, which can not be done, others will have an advantage over you, will rob you of your job.

For novices, to this step, it is suggested to go to see mining ten algorithms directly, understand the concept is good, and then as far as possible to see some relevant algorithm model cases, know how to use it.

Better yet, get your hands dirty with a few data sets.

As for the hand tear XX algorithm this kind of thing, see personal ability, do not force.

2.6 Output Capability

Routine + practice, small one I think the main two.

After all, write PPT, output article this ability, and talent is also related to your boss, if you write something that fits your appetite, how to write, if not, it may have XXXX data analysis report Vxx. Xx version of this type of report frequently appeared.

Take a look at the documents written by old colleagues, familiar with their own company’s document style and routine, write more practice, will slowly get better, this is not a mandatory requirement.


conclusion

All right, that’s all for introductions.

There is no summary today, I suggest you go back and read the article several times when you are free

If it works for you, just give it another thumbs up.




The words in the back

I don’t know how TO write so much content, I feel very simple after the outline, I can write it quickly, the result is…

It is possible that I have experienced too much in the past two years. What I have written above is also my experience in data analysis. Some places may be biased, but I still hope to be helpful to you as a whole.

Sometimes I will share some of the resources and tutorials I have learned along the way and a list of books I have read



Original is not easy, welcome to like oh

The first article: public account [know autumn little dream]

Article synchronization: Nuggets of gold, Jane book, CSDN


2020, how to change industry data analytics?


Please like it. Hope it works for you