Abstract: The cloud to cloud technology senior architects by ali ZhaiYongDong brought “based on MaxCompute building social friends recommendation system” as the theme of the share, mainly for large data in the application of friend recommendation system, recommendation system analysis model and recommend friends system is implemented on ali cloud and MaxCompute wonderful introduced technology.

Click here to view the original text:click.aliyun.com/m/42756/

The cloud to cloud technology senior architects by ali ZhaiYongDong brought “based on MaxCompute building social friends recommendation system” as the theme of the share, mainly for large data in the application of friend recommendation system, recommendation system analysis model and recommend friends system is implemented on ali cloud and MaxCompute wonderful introduced technology.

Live video: yq.aliyun.com/articles/41…

The following is the highlights of the video:

The application of big data in friend recommendation system

MaxCompute = MaxCompute = MaxCompute = MaxCompute = MaxCompute = MaxCompute = MaxCompute = MaxCompute = MaxCompute = MaxCompute Friends recommendation system it introduces a scene, now everyone is talking about big data, if want to use these data, we think it need to have three elements, the first element is the vast amounts of data, data volume, the better, the more data only is enough big, we can be a potential to dig out the data. The second is the ability to process data. With such a high ability to process data quickly, we can dig out the information in the data more quickly. The third one is a scenario of commercial realisation. When we collect big data, it is not the more data the better. There must be a specific scenario. Take the recommendation system as an example to look at an application of big data.

On the left is Alipay, when alipay is opened, there will be a column of recommendation may be your friends, generally speaking, the following people are you know, may not add them as friends. On the right is Linkin, which is a social networking site for job seekers, and Linkin will also give you a recommendation like this, and it will tell you which users are your potential friends, and it will tell you whether that friend is a one-time friend or a two-time friend or a third-time friend. Those with high potential relevance will be displayed directly in front, and those with low potential relevance will also be displayed later, both of which are typical one-friend recommendations.

For friends to recommend, how to recommend to the user, first the relationship between the two men are good friends, and then we go to have a look at both of the processing of potential mutual friend, in this way to push to users, for example potential friends number, then I think the two people are friends relationship, is implemented in this way.

The right side of the above is A social relationship between people’s services, such as A and B is A good friend, we can draw the five ways, let the machine to analyze these data, need to put this kind of social relations, on the right into the machines can identify the data, converted to the left side of the two-dimensional table data, such as between A and B, C, D, they are friends, In this way, the table can be transmitted to the machine for analysis. For example, after analysis, it can be found that A and E have one mutual friend, B and D have two mutual friends, and C and E have one mutual friend. At this time, we can recommend B and D as potential friends, and they are ranked first, A and E or C and E are ranked lower, the probability is slightly lower, those with more potential friends are ranked first, and those with fewer potential friends are ranked behind. By this way, we can arrange them, and this is the expected result.

Analysis model of friend recommendation system

How do we calculate that? What do we use in general? MapReduce is a computing model, which is a programming model for parallel operation of large-scale data sets. MapReduce consists of Map, Combine and Reduce.

Take the example of a scenario recommended by a friend.

First input the data that can be identified by the machine on the left. After input, the Map side splits the data into two different parts and converts them into key and value types at the same time. For example, what are the lines A, B, D and E converted into? A and B, and value is zero, zero means they’re friends. If B and D are not friends, define this line as 1. B and E down here, D and E are also 1. Convert a row of data to a Key or Value, similar to the one on the right, with a Key or Value type above and similar to the one below. This is what we do in Map. We split the data into two keys and values and convert it into a key and value type.

Combine is A local summary of data, and some data are repeated, for example, A and B are zero, A and B are zero, which occurs twice, so only one data can be saved. Other things like that, so I’m going to do a local summary of this data, something like this table, these two data.

The third step is the Reduce stage. Reduce is to summarize these data, summarize the data on both sides together, and then summarize the unique value corresponding to each Key value, which is a result of its final calculation. If two users are already friends and Value is zero, there is no need to recommend them. Therefore, if A and B are zero, they should be removed. All we need to know is that their value is greater than zero and they have potential friends. At the same time, these two people are still not friends, and this will achieve the desired effect.

The realization of friend recommendation system in Ali Cloud

Friends recommended Ali cloud implementation of the entire structure is what? For example, there is a social software that is a business system. The front-end uses the cloud server ECS of Ali Cloud to deploy the whole social software application, and some data stored in the database is stored in The RDS of Ali Cloud. This is a social application system at present. Business system produces a data, how to analyze the data, you first need to put the data in the database and extract, extract the big ali cloud computing services MaxCompute inside, very similar to our warehouse when doing Numbers a process of ETL, will use ali cloud, big data platform for data analysis and processing.

It can be used to develop a process such as data implantation or data quickly and conveniently. This is to use the big data development platform and big data manufacturing. The result is a data analysis result, and the front-end application data is needed to display the analyzed result.

Technical features of MaxCompute

MaxCompute compute compute compute compute compute compute compute

(1) Distributed: distributed cluster, cross-cluster technology, flexible expansion.

(2) Security: in terms of security, it has automatic storage error correction, sandbox mechanism and multi-point backup.

(3) Easy to use: standard API, comprehensive support for SQL, upload and download tools.

(4) Permission control: multi-tenant management, user permission policy, data access policy.

Usage scenarios of MaxCompute

For the scenarios where MaxCompute is used, you can use MaxCompute to build a data warehouse of your own. At the same time, MaxCompute can also provide a distributed application system, for example, you can use graph computing, or through the effective wide way, you can build a workflow. For example, data analysis does not mean that only one day of analysis will not be analyzed, in fact, it is periodic. If the data is to be analyzed once a day, you can generate that workflow in MaxCompute, set a periodic schedule for it, and then schedule it once a day. MaxCompute is also useful in machine learning, because machine learning uses the data from MaxCompute, and other similar services to analyze the data. The results of the analysis are put into the machine learning platform, and the machine uses some algorithms and some models to learn the data. Generate a desired model.

Big data development suite DataIDE

In addition to MaxCompute, there is also a big data development operation DateIDE. The big data development suite DataIDE (current name: Data Factory DataWorks) provides an efficient and secure offline data development environment. Why introduce it? DateIDE is a graphical data development service for the use of MaxCompute, which is used to compute data. You can also see that we can do a development in DateIDE instead of developing in MaxCompute, an effect of development in MaxCompute versus an effect of development in DateIDE.

DateIDE = MaxCompute = MaxCompute = MaxCompute = MaxCompute = MaxCompute = MaxCompute = MaxCompute = MaxCompute = MaxCompute At the same time can also be DateIDE data processing, storage and other operations can be completed on the DateIDE. DateIDE can store, analyze, process and cluster data in the whole process of data analysis.

MaxCompute application development process

The application development process of MaxCompute consists of six steps:

(1) Install the configuration environment

(2) Develop MR program

(3) Local mode test script

(4) Guide JAR package

(5) Upload to MaxCompute project space

(6) Use MR in MaxCompute

Let’s explain the process in detail with an example recommended by a friend. The MaxCompute client needs to be installed first. The advantage of using this client is that you can use the local command to remotely use the MaxCompute of Ali cloud. You only need to configure the MaxCompute information locally. In addition, we also need to configure our own development environment, because now Ali Cloud’s MaxCompute is mainly two languages, one is Java and one is Eclipse. And then create a new project, and when you’re developing a new project, you can see this red envelope, and this red envelope is the information about the client that needs to be configured locally. In the process of getting into writing code.

The next step is simple testing, after development, to see if the code works the way it’s supposed to. And then what you put in here is A test data, and the output data category is A table like this, and the table has three columns, the first is user A, the second is user B, and the third is the number of two potential mutual friends, and you just need to focus on these three data, and then you can test. Then the third locally run data code, the result of the run is to pass the local development test, in the local test there is a data here, the first step you need to choose which project to use. The second one is to choose the input table and the output table, to tell him which output table is, what is the purpose of the output table, tell this program, your output result is saved in the table, configure the click to run the result comes out.

After the local development test is successful, you need to create a Jar package and upload it to alicloud. Add resources after the second Jar package, the following just output Jar package, through resource management, just input Jar package upload. A MR Jar package has been uploaded to the MaxCompute cluster.

Upload good can use it after, to create a new task, and then the task to a name, this task is associated with which a Jar package, followed by OPENBMR, we choose is MR program, so the choice is OPENMR module, to generate such a task, to the edit page, in the edit page first tell it, This is a task like OPENMR, using a Jar package recommended by friends to upload. The bottom tells it what the logic of the program in the Jar package is. After making it in the Jar, click to run the result will come out. This is the development and deployment process of a local test, upload resources to the MaxCompute cluster, and then use the Jar package I developed locally in the cluster.

Scan for more information: