Product overview

Big Data Computing Services (MaxCompute, formerly ODPS) is a fast, fully managed GB/TB/PB level data warehouse solution. MaxCompute provides you with a comprehensive data import solution and a variety of classical distributed computing models, which can quickly solve massive data computing problems, effectively reduce enterprise costs, and ensure data security.

At the same time, DataWorks and MaxCompute are closely related. DataWorks provides MaxCompute with one-stop data synchronization, task development, data workflow development, data management and data operation and maintenance. For details, please refer to DataWorks (original Big Data development suite).

MaxCompute mainly serves to store and compute batch structured data, and provides massive data warehouse solutions and analysis and modeling services for big data. With the continuous enrichment and improvement of social data collection methods, more and more industry data have been accumulated. Data scale has grown to the traditional software industry cannot carry massive data (100 GB, TB and PB) level.

In the scenario of massive data analysis, due to the processing capacity limitation of a single server, data analysts usually adopt distributed computing mode. However, the distributed computing model puts forward higher requirements for data analysts and is difficult to maintain. With a distributed model, data analysts need to be familiar with both the business requirements and the underlying computing model. MaxCompute is designed to provide you with a convenient way to analyze massive amounts of data without worrying about the details of distributed computing.

MaxCompute has been widely applied in Alibaba Group, such as data warehouse and BI analysis of large Internet enterprises, log analysis of websites, transaction analysis of e-commerce websites, user characteristics and interest mining, etc.

Product advantage

Massive computing storage

MaxCompute applies to storage and computing requirements greater than 100GB and the maximum value is EB.

Multiple computing models

MaxCompute supports SQL, MapReduce, Graph, and MPI iterative algorithms.

Strong data security

MaxCompute has stably supported all offline analysis services of Ali for more than 7 years, providing multi-layer sandbox protection and monitoring.

Low cost

Compared with enterprise – built private clouds, MaxCompute provides more efficient computing storage and reduces procurement costs by 20%-30%.

Functions overview

Data channel

Support for batch and historical data TUNNEL The TUNNEL is a data transmission service provided by MaxCompute, which provides highly concurrent offline data uploading and downloading. Supports daily TB/PB data import and export, especially suitable for batch import of full data or historical data. The Tunnel provides a Java programming interface for you to use. In addition, commands are provided in the MaxCompute client tool to enable the communication between local files and service data.

Real-time and incremental Data channel For real-time data upload, MaxCompute provides a low-latency and easy-to-use DataHub service, especially for incremental data import. DataHub also supports a variety of data transfer plug-ins, such as: Logstash, Flume, Fluentd, Sqoop, and so on. In addition, Log data in Log Service can be delivered to MaxCompute by one click, and then Log analysis and mining can be performed by DataWorks.

Integrated use with other Ali cloud services

MaxCompute (original ODPS) is a big data computing service that can provide fast, fully managed PB-level data warehouse solutions. It has been integrated with part of Ali Cloud products and can quickly implement many business scenarios.

MaxCompute and Big Data development suite

Big data development suite is a one-stop offline processing and analysis platform for massive data based on MaxCompute computing and storage, which provides workflow visualization development, scheduling operation and maintenance management. In number plus, the big Data Development Suite console is the MaxCompute console.

With the big data development suite, you can not only write and run MaxCompute SQL directly, but also visually configure workflows and schedule and run MaxCompute SQL, MR and other tasks. For more instructions, see the help documents of the Big Data Development Suite.

You can think of the big data development suite as a Web client for MaxCompute. MaxCompute integrates with data

MaxCompute can load data from different data sources through data integration. It can also export MaxCompute data to various business databases through data integration.

Data integration has been integrated into the Big data development suite and is configured and run as a data synchronization task. You can configure the MaxCompute data source directly on the big data development suite and then configure tasks to read or write to the MaxCompute table, all on one platform.

MaxCompute and machine learning

Machine learning is a machine learning algorithm platform based on MaxCompute. Number plus create a good MaxCompute project, open a good machine learning, you can through the algorithm component of the machine learning platform for MaxCompute data model training and other operations. See the machine learning operations documentation for details.

MaxCompute and QuickBI

After the data is processed in MaxCompute, add Project as QuickBI data source, then report can be made for MaxCompute table data in QuickBI page to realize data visualization analysis.

MaxCompute and AnalyticDB

AnalyticDB is a cloud computing service for real-time and concurrent online analysis of massive data (Realtime OLAP), which combines with MaxCompute to realize the scenario of big data driving business system. Through MaxCompute offline computing mining, high-quality data is generated and imported into an analytical database for business system call analysis.

Import MaxCompute data to AnalyticDB in either of the following ways:

Configure DMS for AnalyticDB by importing and exporting data. Configure data synchronization tasks using the big data development suite, read MaxCompute and write AnalyticDB. MaxCompute and recommendation engine

Recommendation engine is a set of recommendation service framework established in ali cloud computing environment. Recommendation service usually consists of three parts: log collection, recommendation computing and product connection, and the offline input and output of recommendation computing are MaxCompute (original ODPS) tables.

On the resource management page of the recommendation engine console, add the MaxCompute project as the computing resource of the recommendation engine by adding cloud computing resources.

MaxCompute and table storage

Table Store is a distributed NoSQL data storage service built on the Distributed system of Alibaba Cloud Fetian. MaxCompute2.0 allows users to access and process Table data in a Table Store using external tables. For details, see Accessing OTS Unstructured Data.

MaxCompute with OSS

Object storage OSS is a massive, secure, low-cost, and highly reliable cloud storage service. MaxCompute2.0 allows users to access and process table data in a table store using external tables. For details, see Accessing OSS Unstructured Data.

MaxCompute and OpenSearch

OpenSearch is a large-scale distributed search engine platform independently developed by Alibaba. After the data is processed by MaxCompoute, the MaxCompute data can be connected by adding data source on OpenSearch platform.

MaxCompute and mobile data analysis

Mobile Analytics is a Mobile App data statistical analysis product launched by Aliyun, providing one-stop data operation services for developers. When the basic analysis report provided by mobile data analysis cannot meet the personalized needs of APP developers, the data can be synchronized to Maxcompute by one click to further process and analyze their own data based on their own business requirements.

MaxCompute and logging service

The log service can quickly collect, consume, deliver, query, and analyze data. After log data is collected, more personalized analysis and mining are required. You can send logs to MaxCompute on the log service for personalized and in-depth data analysis and mining.

MaxCompute and RAM

RAM is the user identity management and resource access control service provided by Ali Cloud. There are two scenarios for the integration of MaxCompute and RAM:

Scenario 1: Identity management of subaccounts when MaxCompute is used through the Number plus · Big Data development suite

After the master account is opened and a project is created, if MaxCompute needs to be used through the data plus · Big data development suite and multiple accounts need to be co-developed, the master account must create a sub-account in the RAM service and add the RAM sub-account as a project member for collaborative development. For details, see the big Data Development Suite documentation to prepare RAM sub-accounts and add project members and roles.

Note: at this time, RAM only plays the function of user identity management, and the related permission management is not controlled on RAM. The MaxCompute command authorizes RAM subaccounts by referring to the documentation for adding RAM subaccounts.

Scenario 2: When MaxCompute processes unstructured data, the RAM authorizes unstructured data

MaxCompute supports processing of unstructured data (including OSS and Table Store). If you grant MaxCompute access to OSS or Table Store in RAM, you need to grant MaxCompute access to OSS or Table Store in RAM. For details, see the documents for accessing OSS Unstructured Data and Accessing Table Store Unstructured Data.

MaxCompute = MaxCompute

Apsara Clouder Big Data special skills certification: build social friend recommendation system using MaxCompute

(This course is Apsara Clouder big data special skills certification: use MaxCompute to build social friend recommendation system related courses, only can try the free class, you need to buy the certification package to learn all the class, and obtain the certificate.)

Apsara Clouder: Deploy stock trading strategies using MaxCompute

(This course is Apsara Clouder big data special skills certification: Use MaxCompute deployment stock trading strategy related course, only can try the free class, you need to buy certification package to learn all the class, exam certificate.)

The certification process

1 Purchase Authentication

2 Study courses/online experiments

3 Online exam

4 Obtaining e-cert

Official website of Ali Yun University (Official website of Ali Yun University, Innovative Talent Workshop under cloud Ecology)