Abstract: Forrester has released Now Tech: Cloud Data Warehouse Q1 2018 report, the report on Cloud Data Warehouse (CDW) main functions, regional performance, market segments and typical customers, etc., a comprehensive assessment.


1. Introduction


This article is based on the analysis of Now Tech: Cloud Data Warehouse, Q1 2018 (Published: By Noel Yuhanna, March 13, 2018). The views expressed in this article are personal.


Forrester released the Now Tech: Cloud Data Warehouse Q1 2018 report on March 13, 2018. The report comprehensively evaluated the main functions, regional performance, market segments and typical customers of Cloud Data Warehouse (CDW), and finally AWS, Ali Cloud, Google and Microsoft entered the global first-tier camp. Ali Cloud DataWorks+MaxCompute became the only Chinese product selected.


In the report, Forrester highlights four core capabilities of CDW:
· Flexible deployment. CDW should have multiple flexible deployment patterns. For smaller customers, CDW should offer an online multi-tenant model that gives customers the ability to quickly mobilize computing resources to deploy a data warehouse in minutes. For medium to large customers, CDW should provide exclusive or local deployment mode, providing strong computing performance and absolute security, while shielding complex technical details;
· Efficient data on the cloud. CDW should provide a fast, low-cost way to integrate data for customers who do not cloud their data warehouses or who use a hybrid online-offline architecture.
· Diversify analysis methods. CDW should provide a variety of technical means to help users obtain the desired data processing capabilities in various business scenarios.
· Security. CDW security should consider data encryption, audit, desensitization, access control and other aspects.
DataWorks (data.aliyun.com/product/ide), which is the core of the alibaba CDW service ability, can get the favour of Forrester why? Today we are going to make an interpretation.


2.DataWorks Product architecture


Before the formal interpretation, let’s first understand the role of DataWorks in The CDW service system of Ali Cloud and the product architecture of DataWorks.


DataWorks and MaxCompute together form the core of CDW service capability among aliCloud’s many products. As a storage computing engine, MaxCompute plays a supporting role in IaaS layer, providing users with massive and reliable storage of big data tables and SQL execution capabilities. However, having MaxCompute is not enough. In order to enable big data technology to truly empower customers, a series of CDW services such as data development and data integration are also needed, and DataWorks provides a relatively complete solution.


Specifically, it contains 8 main modules:
  • Data integration: Heterogeneous data integration, which aggregates massive data from various source systems to the big data platform
  • Data development: Data warehouse design and ETL development process
  • Monitor o&M: monitor O&M of ETL online operations
  • Real-time analysis: Real-time exploration and analysis of data
  • Data asset management: metadata management, data map, data pedigree, data asset map, etc
  • Data quality: Data quality exploration, monitoring, verification, and scoring system
  • Data security: data rights management, data grading, desensitization, and data audit
  • Data services: data sharing and data exchange, data API services


3. Flexible deployment


Forrester’s report discusses at length the need for multiple deployments and compares CDWS, and DataWorks is one of the few offerings in the first camp that offers multiple deployments.


First, as the core of Alibaba Group’s data center system, DataWorks has been supporting alibaba Group, Ant Financial, Cainiao and other group-wide businesses since 2009. As long as you use data services from Taobao, Tmall, Ant Financial and other products, you may be indirectly using DataWorks computing services.


Second, DataWorks is available in the public cloud. So far, DataWorks has served more than 4,000 public cloud customers, supporting sina Weibo, Renrenche, Tianhong Fund and other important customers.


Finally, DataWorks also supports proprietary cloud output. As an important means of enabling big data capabilities, DataWorks appears in Aliyun’s proprietary cloud solutions such as Apsara Enterprise. Since 2015, it has supported heavyweight government and enterprise projects, including “City Brain” and “Run at most once”.


With flexible deployment, DataWorks can meet a wide variety of customer needs. For small users, it can be flexibly supported through public cloud. For medium – to large-sized customers, proprietary cloud or hybrid cloud solutions can also fully meet customer needs.


4. Efficient data on the cloud


Efficient data integration means a lot to the cloud of enterprise data. In the initial cloud stage, enterprises need to quickly and safely migrate their data assets to the cloud. In the continuous operation stage, enterprises need to input various forms of data into CDW, and output the data results processed in CDW to each business unit.


DataWorks data integration provides a variety of types of data sources for reading and writing ability, including the relational database, no database, data, text database (FTP), etc., to impose unified data source data resource can count, and under the condition of complex network and integration of heterogeneous data sources for data synchronization. In terms of specific import task scheduling, DataWorks supports batch, full, and incremental synchronization of offline data, and supports user-defined synchronization time in minutes, days, hours, weeks, and months.


DataWorks data integration also has the ability of data flow management and control, which can control the behavior of data flow from dirty data, data flow rate, concurrent threads and other dimensions, save user costs in multiple directions and achieve lean management.


5. Diversify your analytics


DataWorks provides a powerful DATA development IDE that supports visual editing of everything from SQL code editing and integration task editing to business process DAG diagrams. The multi-party online collaboration function and task script version management function are also very suitable for the actual needs of enterprise data development. In addition to regular offline processing tasks, DataWorks also provides a lightweight tool, “Data Analysis Workbench,” that takes full advantage of MaxCompute computing power to meet users’ needs for AD hoc data analysis.




DataWorks has also recently updated its drag-and-drop business process editing capabilities to further improve the user experience and create what may be the best data development IDE ever.


6. Security


Data security capabilities are a top priority at DataWorks, and the protection of sensitive data requires compliance with industry regulations and data privacy laws. DataWorks provides a data security module that provides comprehensive data security protection through the following aspects:
· Multi-tenant isolation. DataWorks has its own multi-tenant rights model. Tenants can apply for resource quotas as required and manage their own resources independently. Tenants can also manage their own data, rights, users, and roles independently to ensure data security.


· Data security level setting. Based on the data security level, sensitive data is discovered and located, its distribution on the data resource platform is determined, and sensitive data is automatically discovered and classified according to the defined sensitive data type. It is usually classified into top secret, confidential and normal levels to ensure corresponding security rules.


· Data access audit. DataWorks has a strict review process for privileged user access, including when access is made, what actions are performed, and in what order. Audit access records of privileged users to ensure that privileged users complete correct operations at the correct time and check for irregularities, thus ensuring data system security.


· Data desensitization. DataWorks can protect data security by focusing on the data content itself, capturing sensitive information points, and dynamically masking this part of information when it is uncertain which users can be excluded, which access addresses, and even which fields are suspicious or harmful access.


At present, DataWorks has passed the Ministry of Public Security information Security protection level 3 certification.


7. To summarize


With the deepening of “Internet plus” reform in all walks of life, enterprises are increasingly demanding the management, processing and utilization of data assets. Using cloud computing technology, Internet companies can quickly leverage their big data processing capabilities. This is why four of the world’s leading cloud service companies have been able to surpass established warehouse technology companies such as Oracle and IBM to become tier 1 CDW providers in Forrester’s list.


Thanks to Alibaba’s years of experience in data utilization, DataWorks has achieved a high degree of fit with enterprise needs in deployment mode, data integration, analysis means, data security and other aspects.


DataWorks will continue to deliver more advanced data management concepts, including real-time data integration and data asset analysis. I think this is why DataWorks is on Forrester’S CDW list, combining cloud computing with warehouse management methodology, and iterating to create “the best platform for big data warehouse construction.”


The original link
To read more articles, please scan the following QR code: