Big Data Computing Services (MaxCompute, formerly ODPS) is a fast, fully managed GB/TB/PB level data warehouse solution. MaxCompute provides you with a comprehensive data import solution and a variety of classical distributed computing models, which can quickly solve massive data computing problems, effectively reduce enterprise costs, and ensure data security.

Meanwhile, the big data development suite is closely related to MaxCompute. The big data development suite provides MaxCompute with one-stop data synchronization, task development, data workflow development, data management, data operation and maintenance, etc. For details, see the Big Data Development Suite.

MaxCompute mainly serves to store and compute batch structured data, and provides massive data warehouse solutions and analysis and modeling services for big data. With the continuous enrichment and improvement of social data collection methods, more and more industry data have been accumulated. Data scale has grown to the traditional software industry cannot carry massive data (100 GB, TB and PB) level.

In the scenario of massive data analysis, due to the processing capacity limitation of a single server, data analysts usually adopt distributed computing mode. However, the distributed computing model puts forward higher requirements for data analysts and is difficult to maintain. With a distributed model, data analysts need to be familiar with both the business requirements and the underlying computing model. MaxCompute is designed to provide you with a convenient way to analyze massive amounts of data without worrying about the details of distributed computing.

MaxCompute has been widely applied in Alibaba Group, such as data warehouse and BI analysis of large Internet enterprises, log analysis of websites, transaction analysis of e-commerce websites, user characteristics and interest mining, etc.

Product advantage

Massive computing storage

MaxCompute applies to storage and computing requirements greater than 100GB and the maximum value is EB.

Multiple computing models

MaxCompute supports SQL, MapReduce, Graph, and MPI iterative algorithms.

Strong data security

MaxCompute has stably supported all offline analysis services of Ali for more than 7 years, providing multi-layer sandbox protection and monitoring.

Low cost

Compared with enterprise – built private clouds, MaxCompute provides more efficient computing storage and reduces procurement costs by 20%-30%.

Functions overview

Data channel

Support for batch and historical data TUNNEL The TUNNEL is a data transmission service provided by MaxCompute, which provides highly concurrent offline data uploading and downloading. Supports daily TB/PB data import and export, especially suitable for batch import of full data or historical data. The Tunnel provides a Java programming interface for you to use. In addition, commands are provided in the MaxCompute client tool to enable the communication between local files and service data.

Real-time and incremental Data channel For real-time data upload, MaxCompute provides a low-latency and easy-to-use DataHub service, especially for incremental data import. Datahub also supports a variety of data transmission plug-ins, such as Logstash, Flume, Fluentd, Sqoop, etc. It also supports one-click delivery of Log data in Log Service to MaxCompute, and then uses big data development suite for Log analysis and mining.

Calculation and analysis tasks

MaxCompute supports multiple computing models as follows:

SQL: MaxCompute stores data in tables and provides THE SQL query function. You can operate MaxCompute as traditional database software, but it can handle terabytes and petabytes of data. Note:

MaxCompute SQL does not support transactions, indexes, and Update/Delete operations. The SQL syntax of MaxCompute is different from that of Oracle and MySQL. You cannot seamlessly migrate SQL statements from other databases to MaxCompute. MaxCompute SQL can perform a query in minutes or even seconds, but cannot return results in milliseconds. The advantage of MaxCompute SQL is that it is cheap to learn and you do not need to understand complex distributed computing concepts. If you have database experience, you can quickly become familiar with using MaxCompute SQL.

UDF: user-defined functions. MaxCompute provides a number of built-in functions to meet your computing needs, and you can also create custom functions to meet different computing needs.

MapReduce: MaxCompute MapReduce is the Java MapReduce programming model provided by MaxCompute. Although it is different from the common MapReduce, MapReduce simplifies the development process and is more efficient. To use MaxCompute MapReduce, you must have a basic understanding of distributed computing concepts and programming experience. MaxCompute MapReduce provides you with a Java programming interface. Graph: The Graph function provided by MaxCompute is a set of iteration-oriented Graph processing framework. Graph computing jobs are modeled using graphs, which consist of points and edges that contain weights. Edit and evolve the graph through iteration, and finally solve the result. Typical applications: PageRank, single source shortest distance algorithm, K-means clustering algorithm, etc. SDK

The SDK is a toolkit provided by MaxCompute for developers. For details, see SDK Introduction.

security

MaxCompute provides powerful security services to protect your data security. For details, see the Security Reference manual.

Subsequent steps

Now that you’ve studied the product benefits, features, and other related introductions of MaxCompute, you can move on to the next tutorial. In this tutorial you’ll quickly learn how to use MaxCompute, see Quick Start for details.

The development course

Updated: 2017-09-08 08:19:17

Since the establishment of Ali Cloud in September 2009, its vision is to be the first platform for computing/data sharing. In April 2010, with the launch of ali Finance’s loan business, ODPS was officially put into production and operation. In 2012, a unified data platform was established, and in 2013, it had the ability to process large-scale and massive data. From 2014 to 2015, the big data platform began to mature. In 2016, MaxCompute 2.0 was born, and the original vision is gradually being realized.

Key milestone

2010.04 ODPS officially put into production and operation. Ali Finance’s loan business is running stably online. May 2013 ODPS public test. July 2013 ODPS officially provided commercial services with multi-stage cluster capacity of 5K servers in a single cluster. ODPS officially changed its name to MaxCompute and launched MaxCompute 2.0 to achieve high performance, new functions and rich ecology.

MaxCompute, alibaba Cloud big data computing service

Ali Cloud big data computing service MaxCompute usage tutorial

MaxCompute (formerly ODPS) is a big data computing service that provides fast, fully managed PB-level data warehouse solutions that enable you to analyze and process massive amounts of data economically and efficiently.

Official website of Ali Yun University (Official website of Ali Yun University, Innovative Talent Workshop under cloud Ecology)