Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

Virtualization resources for users by way of custom or based on the template to create a virtual machine, and manage the cluster resources, including resource dynamic scheduling, a virtual machine management (including create, delete, start, and shutdown, restart, sleep, wake up the virtual machine), storage resource management (including ordinary disk and Shared disk management), virtual machine safety management, Supports vm live migration and VM HA.

Off-line calculation

Distributed batch computing framework, the input data set cut into blocks after parallel processing, sorting and collection of the whole process, support PB level of data offline processing.

Memory computing

In-memory computing Is based on the dedicated distributed computing engine developed by ApacheSpark, which not only improves computing performance, but also solves many stability problems of Spark, and significantly improves performance in applications such as comparison of massive small data and relationship analysis.

Real-time computing

The real-time streaming data calculation and processing module is based on Twitter Storm technology, with streaming data calculation and processing capability and complex business application logic. The real-time stream data is formed into an operation processing pipeline in the cluster, and the data calculation such as information extraction, data analysis and rule judgment is successively completed to realize the real-time concurrent processing of high-throughput data.

Figure calculation

Graph computing module based on the “graph theory” to realize the abstract processing of the relationship between data elements, through the analysis and processing of data nodes, edges and weights, establish the correlation between data entities, support TB data relationship query, relationship network analysis and other applications.

Thumb up ah