VMware cloud management platform operation and maintenance management

On January 30, 2018, Yao Quan, senior technology lecturer of VMware Greater China, shared his speech “VMware Cloud Management Platform Operation and Maintenance Management” in “VMware official Live broadcast”. IT big said as the exclusive video partner, by the organizers and speakers review authorized release.

Read, read the word count: 4068 | 6 minutes

Guest Speech Video Address:
suo.im/4yXcMj

Abstract

Intelligent IT operations management from application to infrastructure across SDDC and multi-cloud environments. VMware vRealize Operations integrated with vRealize Log Insight and vRealize Business for Cloud through unified monitoring, automated performance management, Cloud computing planning and capacity optimization, Helps plan, manage, and scale SDDC and multi-cloud environments.

VMware cloud platform

All cloud management platforms are based on software-defined data centers. VMware specifically mentioned the software-defined data center, which uses software to virtualize the underlying infrastructure, including server virtualization, storage virtualization, and network virtualization, which are implemented by different products.

With this infrastructure in place, software definitions are made for the entire data center. So how do you manage effectively at the top? VMware has launched a cloud platform called vRealize. There are many core components in this suite that work together to automate the management of the platform. There are three major components: vRA for automated deployment, vR Ops for intelligent operation and maintenance, and vRB for cost analysis. These three components together support the cloud management platform.

In this episode we focus on vR Ops.

When we face a cloud computing platform, we will find that the biggest characteristic of cloud platform is the large user group, especially the public cloud, tens of thousands or even hundreds of millions of users may request some virtual machines to use. When users request these resources, it takes a long time and is inefficient if the background administrator is required to deploy these resources in batches according to the traditional VM deployment.

With a cloud platform, users can request machines self-service through specific components, and the platform will automate the deployment of the process. This is to automate the deployment of machines through vRA to make performance more efficient.

When the environment is large and there are many machines, how do you analyze performance, monitor performance, and manage failures? This requires a highly intelligent software, which is vR Ops.

VR Ops enables intelligent performance analysis of machines in large-scale cloud platforms. However, as a tenant, when using these cloud platform resources, everyone will use different resources and services, and the final cost will also be different.

In order to do an intelligent cost analysis of the resources that users are using, we created a vRB to let users know where their money is going.

VRealize Operations – Intelligent cloud operation and maintenance

VRealize Operations implements performance management, capacity management, cost management, configuration management, and compliance management across the platform.

VRealizeOperations can connect different underlying platforms, manage virtual platforms, physical platforms, and even some cloud platforms. In other words, the entire cloud platform is actually cross-platform compatible.

VRealize Operations Manager console

We have introduced a new USER management interface based on HTML5.

After login, if you select any object, such as a host, you can view some performance indicators related to the host. There are three very important indicators, one is health, one is risk, and one is efficiency.

The login page

If you are logging in to Operations Manager for the first time, you will see that the home page is slightly different from the previous one.

There are five tabs at the top of the page, each representing five core functions. Some simple navigation bars on the left and some specific parameter information in the middle, especially the actions that need to be taken, are critical to the user. In the upper right corner are some simple menus. In such an interface the user can get a whole new experience.

Predefined dashboard

Open the second TAB, and you’ll see some of the core functionality, which is the predefined dashboard. In the overall environment, the dashboard is very important. Because each dashboard will show different content, indicators and parameters.

On the left side of the page, there are some system custom dashboards that you can also manage, such as editing, creating, and even setting as the default page.

The alarm

In Operations Manager, an alert is an intelligent alert. Smart alerts can tell users not only what’s wrong, but why, and even how to fix it. Through intelligent alarm, users can intuitively understand the problems of the system.

The environment

In the environment, users can see all the core metrics for different objects. On the right side of the page, there are three columns of green ICONS representing health, risk and efficiency, each with a different shape.

In Operations Manager, there are four colors for ICONS: green, yellow, orange, and red. Green means everything is fine, and red means something is seriously wrong. Users can determine which services need to be adjusted in a timely manner based on the color.

View related objects

Users can view objects associated with certain objects. In the interface, by clicking on an object, you can see other objects associated with that object.

View object details

For example, if you click a host, you can view the VMS, storage, and networks associated with the host. This helps users have a preliminary understanding of these objects. After you click an object, the VM parameters are displayed on the right of the page. You can obtain required information by using the parameters.

View object association

You can also view the relationship between objects by topological graph, which is very intelligent. And many charts can be selected according to the user’s own needs.

Administration page

During user management, you can select some specific parameters so that administrators can manage users in a centralized manner.

Data flow structure

The vRealizeOperations instance contains several components to collect and transfer data.

VRealize Operations database

VRealizeOperations contains the following databases: file system database, centralized vPostgres, Alert/Symptom vPostgres, HIS vPostgres, and Cassandra.

Installation and configuration

The first step is to deploy an OVF, because Operations Manager itself is a packaged OVF template. You can download the virtual machine and deploy it directly to the environment.

After deploying OVF, you can do some initial configuration. During the configuration, you need to open the corresponding interface to do some initialization work. On a large scale, this might involve the need to create data nodes, remote collection nodes for data, and so on. Because of the large scale or cross-stack data acquisition, a single node is not sufficient for our needs, which requires a master node and a standby node, and may also require a remote data acquisition node.

A single Operations Manager node should, in theory, suffice. After that, I will enter the user interface of the product to make some initial attempts and use. The whole deployment process is relatively uncomplicated.

The cluster size

Some cluster sizes may be involved at deployment time. For example, a main stack point of Operations Manager is configured differently depending on the number of objects it manages. Users can choose according to their own needs.

New features in vRealize Operations 6.6

Easier to use and faster to generate value

Easier to use. The new HTML5 user interface provides a simpler and more consistent experience.

Faster navigation. The new “starter” dashboard lets you quickly locate where you need it.

To speed diagnostic recovery, a user-friendly dashboard provides answers in one place. Dashboards are grouped into categories such as operations, capacity and utilization, performance testing, load balancing, configuration, and compliance.

Accelerate value generation. Out-of-the-box integration, such as storage (vSAN), logging (vRLI), business (vRBC), and automation (vRA).

Embedded vSAN management

VSAN management is complete. Allows centralized management in extended clusters, complete storage management capabilities, including managing performance, capabilities, logging, configuration, and health.

Confirmation that vSAN is ready from a single console, full visibility makes vSAN deployment more confident. Manage and maintain the vSAN environment by monitoring performance and capacity.

SDDC Health Overview dashboard

A single console monitors the status of the entire SDDC.

Extended support. SDDC- Application – Operation – View and health classification for each product; SDDC- Application – Health status of components associated with the underlying infrastructure (health consistency between deployed instances and virtual machines); Enhanced out of the box; Health and compliance, alarm and improvement suggestions.

Heat map

There is a very important feature in Operations Manager called heat mapping. A heat map can compare the performance of selected VM indicators in real time. It usually contains one or two indicators. One indicator defines the block size and the other one defines the block color.

Project

Project was called “What if” in previous versions. “What if” can be translated as “assuming” that several objects are added or removed in the future, how long the resource will last.

Capacity model prediction

With the project function, you can quickly predict the impact of future resource changes.

Original alarms — > Intelligent alarms

In the original alarm, just do a simple monitoring according to the operating state of the system, and then do some static alarm trigger, there is a parameter called static threshold. A static threshold is a fixed parameter that the monitoring system automatically alerts when it finds that this threshold is exceeded. At this point, users may receive unnecessary alerts.

Dynamic thresholds intelligently analyze historical trends over a period of time and tell the user what peak is normal at what time and what peak is abnormal at other times. This is a sign of intelligence.

Static thresholds trigger an overalarm, while dynamic thresholds gradually learn and recognize a high load but healthy state. The threshold may fluctuate according to load conditions and time periods.

Reduce time spent investigating and resolving problems

Alerts reduce the average time to investigate problems and Recommendations reduce the average time to solve problems.

What are the implications of dynamic thresholds for fault detection

Traditional monitoring can only set static thresholds and is often misleading. High levels of VM resource utilization are normal during peak business periods. Static thresholds can be overly sensitive and generate unnecessary alarms. During off-peak business periods, even 50% VM resource usage can be an exception, and static thresholds ignore such exceptions.

It is also not sufficient for static thresholds to consider the upper limit without considering the lower limit. When CPU or RAM usage suddenly drops below 5%, it can be a warning sign of a serious accident. For example, a sudden increase in storage latency causes a rapid decrease in application response speed or a sudden decrease in IOPS of the entire storage. This may indicate a serious problem for the storage engine. The monitoring tool doesn’t do anything.

Smart workload placement

Smart workload placement features can determine the best place to place a workload with the help of DRS. The rebalance feature suggests where the workload should be migrated.

DRS management – Complete DRS control

Ensure DRS configuration. View DRS Settings to meet service requirements. Changes can be made from vROps depending on whether the parameter Settings are aggressive and whether they are fully automated.

See how many of the vMotion trend records are generated environments and whether they conform to expectations.

Enhanced the load balancing function

Fully automated workload balancing.

Guaranteed performance, fully automated workload balancing capabilities across data centers. Load balancing across clusters and datastores; It is more convenient to control the degree of load balancing according to service needs. There are three methods to activate the load balancing function: manual, automatic, and scheduled. Powerful dashboard: monitor and adjust the status and parameters of load balancing;

Make sure the DRS configuration, view and set the DRS Settings for better balance.

Avoid contention. Generates predictable DRS actions before resource contention occurs.

Optimal initial deployment. Using vRA, use operations and maintenance analysis to optimize the initial deployment location of the workload.

Application cases and common application scenarios

Cloud Platform Features

The load changes dynamically, the operating environment is not fixed, and the state is difficult to track.

Configuration changes quickly, asset life cycle is short, and statistical analysis is difficult.

Resource allocation is dynamic, virtual machines share and compete with each other, and resource boundaries are flexible.

New technology, new features, new requirements for safety management.

The system is highly integrated and the dependency between components is higher, making impact analysis difficult.

User pain points

1. Due to the characteristics of resource sharing and dynamic configuration in Cloud environment, resource management in Cloud environment becomes more complex and difficult to control, and the alarming waste of resources and local resource tension exist at the same time.

2. In terms of security management, there are no management specifications, methods, and tools for the virtualization environment.

3. The asset allocation information lacks in-depth, timely and accurate statistical analysis, and is basically manual. The information differs greatly from the actual environment.

4. Lack of related analysis reports and panel views, and lack of global management capabilities for large-scale cloud environments.

5. The hypervisor lacks effective monitoring measures and is passively managed. Problems cannot be detected or analyzed in a timely manner.

6. Lack of automation tools, lack of adequate response and control management ability for large-scale and highly dynamic environment.

Capacity optimization

Optimize resource allocation and improve the utilization of existing resources. Discover inefficient and unused capacity and reclaim capacity to properly adjust VM size and reclaim idle resources to optimize the integration rate and VM density without compromising vm performance.

Capacity planning

VROps relies on VMware’s deep understanding of vSphere and cloud computing environments to provide intelligent capacity analysis and planning capabilities. The system collects statistics on the current capacity usage of CPU, memory, storage, and network resources in the vSphere hypervisor, helping O&M managers plan hypervisor resources properly to avoid resource exhaustion and waste and improve virtualization efficiency.

Configuration management

Provides detailed and continuous configuration data collection, configuration evaluation, and change review, as well as unified configuration data reports, helping users learn about hypervisor asset information in a timely manner.

Run the analysis

Expert report: Provides various types of analysis reports for hypervisors.

Expert panel: 1. Displays the overall operating status of the hypervisor from the perspectives of health, risk, and efficiency. 2. Provide heat maps for comparative analysis of multiple indicators; 3. Display topology panels for integrated VMS, networks, and storage.

Fault management

Comprehensive indicators coverage: vSphere, NSX, vSAN, structured data and unstructured data, comprehensive indicators.

Failure analysis: dynamic thresholds, comprehensive analysis, expert knowledge.

Troubleshooting: Automatic troubleshooting.

For Operations Manager, we also offer up to 100+ extended management packages covering various types of components, including networking, storage, database, middleware, and enterprise applications.

Related Exercise resources

VROPsHOL resource access: http://labs.hol.vmware.com/HOL/catalogs/

Video resources

Service quality management: http://v.youku.com/v_show/id_XMTQ2MDE3OTYzMg==

VSphere compliance: http://v.youku.com/v_show/id_XMTQ2MDAzNTQ0OA==

Capacity planning: http://v.youku.com/v_show/id_XMTQ1OTUyNjM1Ng==

Operations management: http://v.youku.com/v_show/id_XMTQ1OTk3Nzk3Mg==

That’s all for today. Thank you