background

Recently I plan to learn some data analysis content. Although there are a lot of advertisements in the picture below, many skills do not matter to me. They all say that programmers should have a little product thinking and be sensitive to data.

I have read the introduction of some training institutions, which involves a lot of knowledge, including tools, thinking, practical operation and the final report. You can’t eat a fat man in one mouthful, so learn it slowly.

Data analysis framework

The following is a complete data analysis program, which is divided into five steps: problem identification, data acquisition, data cleaning, data analysis and report presentation.

Determine the problem, analyze the problem to be solved, define some digital indicators, and get the answer through the comparative analysis of these indicators. The decision will be converted into quantitative comparison between big and small, high and low, and more and less

Acquire data, collect data containing these indicators information through various ways, including external public data, business data of your company (department), etc. External data often uses crawlers to get some public data

Data cleaning: illegal values, null values, repeated values and outliers in the obtained data are cleaned to obtain high-quality data for subsequent analysis

Data Analysis and Reporting Analyzes the relationship between metrics and each dimension, analyzes the relationship between multiple indicators, forms a regression or classification model, and replaces the parameters to find the predicted results

In field

The above is some theoretical knowledge, combined with the theoretical knowledge to carry out practical operation, we first use crawler to get some fund data from the fund website and store it in the cloud database MemfireDB, use Tableau to carry out data cleaning and visualization analysis, and find out the most valuable stocks

To get the data, this article to share the way how to obtain open fund data https://juejin.cn/post/697093… , we obtained some fund data as shown in the figure below:

We use Tableau to clean up our data, and Tableau is a company that seamlessly grafts data crunching into beautiful graphics. Its easy-to-use program allows companies to drag and drop large amounts of data onto digital “canvases” and create charts in the blink of an eye. The idea is that the easier it is to manipulate the data on the interface, the better a company will know whether it is doing things right or wrong in its business area.

To download and install tableau, download address https://www.tableau.com/zh-cn… Load data, Tableau need to connect to the database through the odbc way, we need to configure odbc first, this article has a configure odbc way https://juejin.cn/post/697609…

Click Connect – Login and select the datasheet

First explain the meaning of a few fields code: fundcode, name: name, net value date: JZRQ, net value per unit: DWJZ, estimated value: GSZ, estimated growth rate: GSZZL

Click the worksheet, drag POSNAME (stock name) to the row (dimension), select counter to column (metric), and then select the bubble graph in the intelligent recommendation on the right. From this graph, we can see that Kweichow Moutai has the most purchase times, and observe the largest and smallest bubbles to see whether the data is abnormal.

You can also use quad map (box and beard map) to find the abnormal data, maotai quite abnormal, but it seems to be the real value!!

When outliers are found, they can be washed out through filters

Next, the data are analyzed and the relationship between the estimated growth rate and the total amount of transactions can be observed in the way of a scatter plot

The relationship between the estimated value and the number of transactions

We can also see what percentage of each stock is traded each day and how many times

Arrived here I still compare Meng, why so many people buy maotai?? Keep learning!