Parsing a wave of fund data in Python

Guide language:

Hello, hello ~ soon it is the end of the year again, all the people are rushing to make a lot of money, and then prepare to go home for the New Year. Here xiaobian want to ask, your fund manager performance is good 👀

Today, we analyze a wave of fund data using a quantitative investment platform and Python.

Body:

This analysis uses the jukuan (www.joinquant.com/) platform, and new users register and…

After registering an account, we can call the aggregated data. I wrote and ran the following code using Python 3.8+ Jupyter.

If you want to get more complete source code or Python learning materials, please come to the private xiaobian!

1. Import the package to obtain authorization for broadening

from jqdatasdk import * import pandas as pd import numpy as np import matplotlib.pyplot as plt Plt.rcparams ['font. Sans-serif '] = ['Arial Unicode MS'] # MAC matplot shows Chinese PLt.rcparams ['axes. Unicode_minus '] = False Auth (' account ', 'password ') # LicenseCopy the code

Get all funds by calling the get_all_securities function, which is documented below

The code is as follows:

df = get_all_securities(['fund', 'open_fund'], '2021-11-10')
df
Copy the code

There may be duplicate funds in df, according to the fund code (df.index) to see how many funds were acquired

code_arr = list(set([x.split('.')[0] for x in df.index.values]))
len(code_arr)
Copy the code

A total of 13,698 funds were returned, which is much faster than the crawler.

2. Get a share of your stock investment

I only want to focus on stock funds here, so I need to get the percentage of each fund’s stock investment, and if the percentage of each fund’s stock investment is less than 50%, get rid of it.

From the documentation, you can query the stock_rate field of the FUND_PORTFOLIO table to obtain the equity portion of the fund. The query is constructed using the query function, and then finance.run_query is called to complete the query.

Write a function that returns a query statement

def asset_query(arr): q=query(finance.FUND_PORTFOLIO.code, finance.FUND_PORTFOLIO.name, finance.FUND_PORTFOLIO.period_end, finance.FUND_PORTFOLIO.report_type, finance.FUND_PORTFOLIO.stock_rate ).filter(finance.FUND_PORTFOLIO.code.in_(arr), finance.FUND_PORTFOLIO.period_end.in_(['2021-03-31', '2021-06-30', '2021-09-30']), Finance.fund_portfolile.report_type. In_ ([' q1 ', 'q2 ',' Q3 '])) return qCopy the code

The arr argument to the asset_query function is an array that holds the fund code. The query function specifies the fields to return. The filter function specifies the data to be filtered. Code represents the fund code. Period_end is the reporting period (the last day of each quarter). The three dates filled in represent each quarter; Report_type is the report type, which is filled with three quarters. This condition is added to read data of a single quarter, not half a year or a year. Finally return the query statement Q.

The asset_query function generates the object Q using the syntax provided by SQLAlchemy, a well-known ORM framework in Python. In short, it provides a way to query a database in Python code without writing SQL. Interested friends can learn, can help you achieve more complex query logic.

Run finance.run_query to execute the query

I = 0 while I < len(code_arr): print(code_arr + STR (I)) i+1500] q = asset_query(tmp_arr) tmp_df = finance.run_query(q) if i == 0: asset_df = tmp_df else: asset_df = pd.concat([asset_df, tmp_df]) i += 1500Copy the code

Because finance.run_query returns a maximum of 5000 rows at a time, a loop is written to read in batches, only 1500 funds at a time. Because a fund returns three quarters of data, a single read returns a maximum of 4500 data without exceeding the limit.

After reading, take a look at asset_df

Filter funds with a stock_rate greater than 50

stock_fund_df = asset_df[asset_df['stock_rate'] > 50]
Copy the code

After statistical screening, how many funds are left

stock_fund_code_arr = list(set(stock_fund_df['code']))len(stock_fund_code_arr)
Copy the code

Return 5590, half filtered out.

3. Acquire fund holdings

The method of obtaining fund holdings is the same as above, but the table name and field name have been changed.

Write the hold_stock_query function to generate the corresponding query statement

def hold_stock_query(arr):
    q=query(finance.FUND_PORTFOLIO_STOCK
       ).filter(finance.FUND_PORTFOLIO_STOCK.code.in_(arr),
                finance.FUND_PORTFOLIO_STOCK.rank <= 10,
                finance.FUND_PORTFOLIO_STOCK.period_end.in_(['2021-03-31', '2021-06-30', '2021-09-30']),
                finance.FUND_PORTFOLIO_STOCK.report_type.in_(['第一季度', '第二季度', '第三季度']))
    return q
Copy the code

FUND_PORTFOLIO_STOCK. Rank <= 10 represents the top 10 stocks.

The data is also queried in a circular manner

I = 0 while I < len(stock_fund_code_arr): print(I) tmp_arr = stock_fund_code_arr[I: i+150] q = hold_stock_query(tmp_arr) tmp_df=finance.run_query(q) if i == 0: hold_stock_df = tmp_df else: hold_stock_df = pd.concat([hold_stock_df, tmp_df]) i += 150 hold_stock_dfCopy the code

Look at the information for hold_stock_df

Name is the name of the holding, rank is the rank of the holding (1 represents the largest holding of the fund), and PROPORTION is the proportion of the holding.

4. Adjust the warehouse direction

Here I want to look at the industry, such as: from the first quarter to the third quarter, how many funds bought new energy stocks, how many funds bought liquor stocks. So, you need to map stocks to industries.

Strictly speaking, go to an authoritative broker and map each stock to an industry. But I just want to do a qualitative analysis here, so I looked for the top six holdings of some industry index funds to replace a particular industry. The mapping is as follows

Stock_to_industry_dict = {' byd ':' new energy ', 'age of ningde' : 'new energy', 'er jie shares' :' new energy ', 'jiangxi feng LiYe' : 'new energy', 'million weft lithium can' : 'new energy', 'inovance technology' : 'new energy', 'holy state shares' :' semiconductor ', 'he shares' :' semiconductor ', 'maxscend technologies inc micro' : 'semiconductor', 'the north China:' semiconductor ', 'mega easy innovation' : 'semiconductor', 'three Ann photoelectric' : 'semiconductor', 'guizhou moutai' : 'white wine', 'wuliangye' : 'white wine', 'shanxi fenjiu' : 'white wine', luzhou laojiao (a kind of daqu liquor) ':' white wine ', 'yanghe shares' :' white wine ', 'drunkard wine' : 'white wine', 'the sunlight power:' photovoltaic (pv) ', 'tong wei shares' :' photovoltaic (pv) ', 'central stake' : 'photovoltaic', 'longji shares' :' photovoltaic (pv) ', 'change especially electrician' : 'photovoltaic (pv), 'chint electric appliances' :' photovoltaic (pv) ', 'wuxi' : 'medical', 'wisdom flying creatures' :' medical ', 'Watson creatures' :' medical 'and' tiger medicine ':' medical ', 'changchun gaoxin' : 'medical', 'gloria British' : 'medical', 'fosun medicine' : 'medical'}Copy the code

Add a column industry to hold_stock_df to represent the industry in which the stock is held

def stock_to_industry(cols): if cols['name'] in stock_to_industry_dict: Return stock_to_industry_dict[cols['name']] return 'no' hold_stock_df['industry'] = hold_stock_df.apply(lambda x: stock_to_industry(x), axis=1) hold_stock_dfCopy the code

Find a new energy fund to see the effect

hold_stock_df[hold_stock_df['code'] == '005939']
Copy the code

With industry information, pivottables can be used to directly measure the change in the number of funds in each industry over the three quarters

industry_count_df = hold_stock_df.pivot_table(index='industry', columns=['report_type'], values='code', Aggfunc =lambda x:len(x.unique())) cols_name = [' q0 ', 'q0 ', Industry_count_df = industry_count_df[cols_name] industry_count_df.drop([' none '], inplace=True) plt.rcParams['figure.figsize'] = (12, 8) industry_count_df.plot.bar()Copy the code

From the chart can still find some obvious features, holding new energy funds not only a large base, the speed of growth is also very fast. What is more unexpected is that the number of baijiu holding funds actually declined.

5. Explore something more interesting

Since we have access to all the funding, there’s a lot we can explore. For example: data that can produce the following effect

Each line represents a fund, rank1 to RANK10 represents the fund’s top ten heavy positions, the first half of the comma is the holding of stocks, the second half is the holding ratio. The standard red represents the stocks we care about, and the last column is the sum of the standard red holdings.

This way of presentation has many advantages, than Alipay, flush software is better.

First, support according to some stock screening fund (icon red part), I understand the fund software can only according to a stock screening fund

Second, the ratio of total holdings including these stocks (candid_prop column) can be calculated. The higher the total holdings, the closer to our expectation. And existing fund software wants him to add manually

Third, one line shows all the positions of the fund, at a glance, and the fund software to see the positions of different funds need to cut back and forth between different funds page, very inconvenient.

Here I want to use Python to encapsulate a visual fund screening tool, I do friends can leave a message or at the end of the article, if there are many people, next time to share.

Finally, the implementation code of the figure above is introduced. First, filter only the most recent quarter

Hold_stock_p3_df = hold_stock_df[hold_stock_df['report_type'] == 'q3 ']Copy the code

Construct two new columns

hold_stock_p3_df['hold_rank'] = ['rank%d' % i for i in hold_stock_p3_df['rank']]
hold_stock_p3_df['hold_info'] = hold_stock_p3_df['name'] + " : " + hold_stock_p3_df['proportion'].astype(str)
Copy the code

With perspective, columns turn to rows

tmp_df = hold_stock_p3_df.pivot(index='code', columns='hold_rank', values='hold_info')
tmp_idx = tmp_df.index
tmp_df = tmp_df.reset_index()
tmp_df['name'] = tmp_idx

rank_cols = ['rank%d' % i for i in range(1, 11)]
col_names = ['code'] + rank_cols

final_fund_df = tmp_df[col_names]

final_fund_df
Copy the code

Define a function to calculate the total position proportion of the candidate stock set

Def filter_fund(x): prop = 0 for I in range(1, 11): def filter_fund(x): prop = 0 for I in range(1, 11): col = 'rank' + str(i) vals = str(x[col]).split(':') if vals[0].strip() in candid_fund_list: all_prop += float(vals[1].strip()) return all_propCopy the code

Define functions for coloring

def show_color(val):
    color = '#BB0000' if str(val).split(':')[0].strip() in candid_fund_list else ''
    return 'color:%s' % color
Copy the code

Changing the value of condid_fund_list allows stocks to filter related funds, as shown in the diagram at the beginning of this section

final_fund_df['candid_prop'] = final_fund_df.apply(lambda x: filter_fund(x), axis=1)
tmp_fund_df = final_fund_df[final_fund_df['candid_prop'] > 0].sort_values(by='candid_prop', ascending=False)
tmp_fund_df = tmp_fund_df.style.applymap(show_color)
tmp_fund_df
Copy the code

The end:

This is the end of the xiaobian analysis, the broad platform also includes stocks, bonds and other financial data, access to the same as mentioned above. Interested partners can try their own! Do not understand the welcome to private xiaobian oh!!

Parsing a wave of fund data in Python

Guide language:

Body:

Related Posts

MSBuild (Visual C++ 2015) platform tool set There is a way!

LeetCode brush question – palindrome number

Leetcode 3. The oldest string without repeating characters