directory

The article directories

preface

It was the best of times, it was the worst of times — A Tale of Two Cities by Charles John Herfern Dickens

The end of the year is approaching. In my spare time, I review my 2018 with OpenStack.

NOTE: This article is for my personal opinion only, and has nothing to do with the company and its partners.

OpenStack growth over the past year

To start with, let’s take a look back at what the community has been doing this year with OpenStack Releases Notes, and here are some of the new features that I’ve personally been pleasantly surprised with.

releases.openstack.org/rocky/

releases.openstack.org/queens/

Nova

  • Added support for vGPUs. Experimental feature with some caveats, but admins can now define flavors that request vGPU resources.

The “acceleration” capability of GPU (graphics processing unit) is widely used in high-performance computing such as scientific computing, graphics-intensive computing, machine learning, deep learning, and artificial intelligence. Nova Libvirt Driver supports VGpus to meet the market demand in this area. The administrator can use Flavors to define resource features and resolution of the vGPU.

nova flavor-key <flavor-id> set resources:VGPU=1
Copy the code

I think Nova vGPUs Support is an iconic new feature that shows how the community is adapting to new technology trends.

OpenStack Placement is still responsible for resource management of vGpus. Placement is a new Repo that has recently been incubated from the Nova Placement API and is planned as a standalone project in the Stein release, with the expectation that it will eventually replace the Nova-Scheduler Service. As the definition of OpenStack is further upgraded by the community to “an open source infrastructure integration engine”, it means that the resource system of OpenStack will be composed of more external (third party) resource types (e.g External storage resources, external network resources, all kinds of PCI devices). Therefore, as resource types and providers become diverse, a highly abstract, simple and unified management method is required to enable users and code to easily use and manage the entire OpenStack system resources, which is called Placement.

  • The performance of listing instances across a multi-cell cells v2 deployment has been improved and the results are now merge sorted.
  • Rescheduling during a server create or resize operation is now supported in a split-MQ multi-cell cells v2 deployment.
  • Operators can now disable a cell to make sure no new instances are scheduled there. This is useful for operators to introduce new cells to the deployment and for maintenance of existing cells.

Support for Nova Cell continues to be strong this year, with major improvements to multi-cell performance and operability that make OpenStack more resilient and scalable, enabling OpenStack to work with larger cluster sizes.

  • Traits-based scheduling is now available for the ironic compute driver. For more details, see the ironic docs for scheduling based on traits.

Ironic supports Traits based scheduling. This function is also supported by Resource Traits implemented by OpenStack Placement. Nova extends Nova’s support for scheduling Compute nodes based on Traits. Placement Resource traits is a flexible design similar to “tags” that allows users to create a “tag cloud” and decide to “label” a resource Provider. It is an aid to generalize resource classification requirements.

  • The placement service now supports granular RBAC policy rules configuration.

Placement supports RBAC (Role Access Control) policy rules and is a necessary formality before becoming an independent project. I personally think that OpenStack Placement is very interesting and will be integrated into all aspects of OpenStack operations in the future.

Cinder

  • Support for attaching a single Cinder volume to multile VM instances.

Cinder Multi-Attach support for mounting the same Volume to multiple different VMS is a long-awaited feature, given the length of discussion. The most direct value of multi-Attach is the high availability redundancy feature of core service data disks. For example, a Volume is attached to two VMS at the same time. If the ACTIVE VM is disconnected, another PASSIVE VM can access the Volume (RO, R/W). Multiple VMS share a Volume, which brings a lot of convenience in cluster scenarios. Similar functions have always been one of the most popular functions in cloud environments.

  • Support for creating a volume from a backup.
  • Improved user experience when recovering from storage system failures.
  • Security improvements enabled when creating volumes from signed images.
  • Numerous improvements that give administrators greater control over the placement of volumes.
  • Improved backup functionality and performance.

As for Cinder, I have been paying close attention to its “Volume Backup and DR” development. This is due to my experience in cloud Backup and DISASTER recovery development. Cinder now supports creating new volumes from backups and improves backup performance by optimizing the cinder-backup service to use multiple processors. Another good news is that failover and recovery can now be performed through cinder-manage directives, eliminating the need for users to directly manipulate the database. As you can see, Cinder also spends a lot of energy on improving the user experience.

In addition, at this year’s Denver OpenStack PTG, the Cinder team also discussed the issue of Cinder project independence, Make Cinder a project that can be independently deployed and provide service value without Keystone or Nova (e.g. K8s CSI). This is interesting because the issue has always been controversial in the community, and some developers in the community are resistant to projects that want to “break away” from the OpenStack system. They believe that only closer integration with OpenStack can produce 1+1 > 2 benefits. But there is definitely a sense of a growing number of groups discussing the topic, as the community promotes a collaborative eco-approach to so-called “composability” projects. I have reservations about that, but we’ll talk about that later.

Neutron

  • OVN DNS support. ovn-controller will respond to DNS queries locally on each compute node.
  • OVN distributed Floating IP support.
  • OVN L3 HA support for gateway routers. Now networking-ovn makes use of the OVN embedded mechanism for L3 high availability. It will be automatically used for any router as soon as more than one gateway node is available.
  • OVN supports IPv6 Router solicitation and IPv6 Periodic router advertisement support.
  • OVN supports binding SR-IOV ports on OVS > 2.8 and kernel >=4.8
  • Support migration from an existing ML2OVS TripleO deployment to ML2OVN TripleO deployment.

Neutron project has always been very popular, and there are naturally many updates. I mainly pay attention to the updates of OVN. OVN is the SDN controller of OpenvSwitch. As the control plane of OVS, it has no special requirements for the running platform. As long as IT can run OVS, IT can run OVN smoothly. Therefore, OVN has excellent compatibility with OpenStack. Neutron only needs to add a Plugin to configure OVN to achieve integration. The adoption of OVN will bring some convenience to Neutron, such as: OVN’s native network functions can replace Neutron’s OVS Agent, L3 Agent, DHCP Agent, and DVR, making Neutron’s deployment architecture lighter and providing higher L3 forwarding performance.

  • OVN NorthBound backend database consistency mechanism, multiple workers are now completely safe to access the backend database, and any inconsistency generated by the backend not being available is quickly detected and corrected by a periodic job

Unfortunately, OVS DB is not an ACID transactional database like MySQL. OVS DB is a JSON-base file database, which records execution commands and executes them one by one each time there is an update or initialization. As a result, OVN is still only suitable for small-scale scenarios, and there was a proposal to replace OVS DB with ETCD, but it never came to pass. However, from the perspective of OVN’s proportion in Neutron’s updated list, it is still worthy of attention.

  • ML2 implements Quality of Service rate limits for floating IPs.

Floating IP is also starting to support Qos, and Octavia supports VIP Qos thanks to this implementation.

  • Per TCP/UDP port forwarding on floating IP is supported. Operators can save the number of global IP addresses for floating IPs.

Port Forwarding is a practical function. It applies iptables rule mapping to realize IP reuse and effectively saves the waste of floating IP addresses. Is a very economical feature.

Ironic

  • Finished implementation of rescue mode. Users can repair instances, troubleshoot misconfigured nodes, lost SSH keys, etc.

Ironic Rescue Mode Resembles Nova’s rescue or Unrescue VM instance restoration function. It uses a specified boot disk to restore damaged system disks to the rescue mode. Ironic also uses this mode to restore bare vm instances. Those who have lost their SSH Keys can breathe a sigh of relief.

  • Added ability to manage BIOS settings, with driver support for iRMC and iLO hardware types.

The BIOS is the first program that the physical hardware loads, performs hardware initialization, and has a number of configuration options available. Ironic Manage BIOS Settings Provides BIOS Settings for iLO and iRMC hardware types. For example, configuring power management, RAID, enabling SR-IOV or DPDK, etc., is a friendly support for bare-metal cloud and NFV applications that require high hardware management flexibility.

  • Added a ramdisk deployment interface enabling high performance computing deployments to have diskless nodes.
  • Added automatic recovery from power faults, removing a long standing headache for operators.

RAMDisk can run ramdisks in memory on bare-metal devices that do not have operating systems installed. This provides an execution environment for ironic-python-agent to control bare-metal hardware devices in Diskless. Is a classic requirement for large-scale HIGH-PERFORMANCE computing deployment scenarios. Not only is the Ramdisk Deployment Interface now available but also automatic power failure repair is supported, which is a nice enhancement for bare-metal cloud.

  • Added the ability to define groups of conductors and nodes, enabling operators to define specific failure or management domains.

Ironic Conductor is the core Worker of ironics. Each Conductor can run multiple Drivers. A Conductor performs operations on hardware devices through these Drivers. The Conductor Group is used to classify Conductors and assign Nodes to the Conductor Group to limit the control of the specified Nodes over the specified Nodes. In a nutshell, this is a method of classifying fault domains, which can effectively reduce network partition and improve safety and performance by classifying Conductors with similar physical locations into a group.

Ironic did a lot of things this year. There was a lot of buzz around Bare Metal Cloud at the global summit, which was in direct proportion to the market feedback that Ironic deployment rates were going up every year. I personally feel that the market agrees with the pattern of bare machine, virtual machine and container. Now more and more enterprises are willing to implement cross-virtualization on their cloud platforms. Besides virtual machines, they also begin to deploy containers directly on bare machines, which makes cloud architecture more flexible. Ironic’s bare-metal cloud blueprint is becoming clear with these new features.

Cyborg

Cyborg is a newly released project that is derived from the telecom space. It is designed to be implemented for specialized accelerators (e.g., gpus, fpgas, socs, NVMe SSDS) and accelerators (e.g., inics, ip-sec, DPDK/SPDK, EBPF /XDP) provides a common lifecycle management framework.

Cyborg enables users to discover and display accelerator classification lists, connect/detach accelerator instances, and install/uninstall drivers, which has great potential in HIGH-PERFORMANCE computing applications such as NFV and ARTIFICIAL intelligence. Cyborg can be used alone or in conjunction with Nova or Ironic. For the former, Cyborg is an OpenStack Placement supplement to accelerated resource management; For the latter, Cyborg is Ironic’s complement to accelerated hardware device management.

In the past, Nova used PciDevTracker to collect data of PCI resources (e.g. sr-iov card) and PciPassthroughFilter to implement PCI resource scheduling. There was no unified PCI resource management API. And that is the historical mission of Placement. Placement combined with Cyborg’s solution helps solve the problem of accelerating device resource statistics and management. When a user needs to create a vGPU VM, Nova initiates the creation, Cyborg API provides GPU device management, and Placement API schedules and coordinates vGPU resources. The three cooperate and finally mount vGPU to virtual machine through PCI-Passthrough technology.

Therefore, I think the release of Cyborg project is also symbolic. It will accelerate the application of OpenStack in high-performance computing scenarios, and is a direct reflection of community layout edge computing, Internet of Things, and artificial intelligence.

Octavia

  • The neutron-lbaas and neutron-lbaas-dashboard projects are now deprecated.
  • Neutron-lbaas now includes a proxy plugin that forwards all API requests to the Octavia API.

Octavia is OpenStack’s official recommended LBaaS solution, which has better scalability, high availability, and stable apis than Neutron-LBaas. Octavia is carrier-level load balancer service. Neutron-lbaas has been marked deprecated, but older versions of neutron-Lbaas can still invoke Octavia API as a Proxy Plugin.

  • The initial release of the Octavia dashboard for Horizon includes significantly improved load balancer detail pages and workflows compared to the, now deprecated, neutron-lbaas-dashboard.
  • Octavia dashboard details pages now automatically refresh the load balancer status.
  • The Octavia OpenStack client plugin now supports quotas, load balancer QoS policies, load balancer failover, listener statistics, and filtering by load balancer ID.

Octavia had its own Dashboard and CLI portals, and in the early days Octavia and Neutron-Lbaas shared a UI, but the octavia-specific operations were not visible. The dashboards and CLI that are now octavia-specific will certainly provide a better user experience.

  • Octavia now supports provider drivers, allowing third party load balancing drivers to be integrated with the Octavia v2 API.

Neutron-lbaas implements many third-party load balancers whose Drivers have not migrated to Octavia due to time and manpower issues, which is a pity. Previously, users could only use the Octavia Amphorae LoadBalancer Provider, but now the Octavia architecture has been tweaked to support integration with third-party LoadBalancer Drivers. This is also good news for users who have been unable to upgrade to Octavia due to using a third-party load balancer (e.g. F5).

  • Pools can have backup members, also known as “sorry servers”, that respond when all of the members of a pool are not available.

Octavia Pool supports a Backup Member that responds when all Members of the Pool lose contact, further improving LB availability and service experience.

  • UDP protocol load balancing has been added to Octavia. This is useful for IoT use cases.

Octavia has come to support UDP, which is often used to handle the transport layer of audio and video data streams and other real-time applications, meeting load balancing requirements in the Internet of Things and edge computing scenarios.

Kolla

  • Implement minimal downtime for keystone and cinder service

Kolla already supports minimal downtime for Keystone and Cinder, which pre-sets the possibility of seamless upgrades for OpenStack projects, which is a great benefit to both users and o&M personnel.

  • Implement Ceph Bluestore deployment in kolla and kolla-ansible.
  • Implement cephfs service
  • Upgrade to ceph luminous

This year Kolla’s support for Ceph continues to be strong, and Ceph is now a distributed storage solution for all OpenStack users. Ceph and OpenStack package deployment is believed to be a great user fun point.

  • Add almanach, certmonger, ceph-nfs, ptp, rsyslog, sensu and tripleo ui image
  • Add vitrage ansible role
  • Support deployment of prometheus, kafka and zookeeper via kolla-ansible.
  • Added new docker images in kolla for logstash, monasca, prometheus, ravd, neutron-infoblox-ipam-driver and apache storm.

Kolla’s atomic service container solution for automated OpenStack deployment with Kolla-Ansible for configuration management has been widely recognized in the industry after multiple mass production level deployments. Kolla has now become the preferred scale deployment solution for domestic OpenStack users, and Kyushu Cloud, which has been making major contributions to Kolla in China, is really a great contribution.

Magnum

Developed and launched by OpenStack Containers Team, The Magnum project aims to use the container Orchestration engine as the first class resource of OpenStack, with heterogeneous compatibility with Kubernetes, Mesos, Swarm and other container management platforms, Provides seamless container running experience for OpenStack users. With Magnum, you can create a cluster of containers like virtual machines, with transparent COE (Container Orchestration Engine) deployment and network tuning, right out of the box. With OpenStack services, Magnum can also provide additional features such as multi-tenant authentication and multi-tenant network isolation that COE does not have.

  • Add new label ‘cert_manager_api’ enabling the kubernetes certificate manager api.
  • Add new labels’ ingress_controller ‘and’ ingress_controller_role ‘enabling the deployment of a Kubernetes Ingress Controller backend for clusters. Default for ‘ingress_controller’ is’ ‘(meaning no Controller deployed), With possible values being ‘traefik’ Default for ‘ingress_controller_role’ is’ ingress ‘
  • Update kubernetes dashboard to v1.8.3 which is compatible via kubectl proxy. Addionally, heapster is deployed as standalone deployemt and the user can enable a grafana-influx stack with the influx_grafana_dashboard_enabled label.
  • Update k8s_fedora_atomic driver to the latest Fedora Atomic 27 release and run etcd and flanneld in system containers which are removed from the base OS.
  • k8s_fedora_atomic clusters are deployed with RBAC support. Along with RBAC Node authorization is added so the appropriate certificates are generated.
  • Embed certificates in kubernetes config file when issuing ‘cluster config’, instead of generating additional files with the certificates. This is now the default behavior. To get the old behavior And still generate cert files, pass-output-certs.
  • Add ‘cloud_provider_enabled’ label for the k8s_fedora_atomic driver. Defaults to true. For specific kubernetes versions if ‘cinder’ is selected as a ‘volume_driver’, it is implied that the cloud provider will be enabled since they are combined.
  • Update k8s_fedora_atomic driver to the latest Fedora Atomic 27 release and run etcd and flanneld in system containers which are removed from the base OS.

While Kubernetes is the dominant container layout platform, Magnum’s update this year is centered around it and is certified as a deployment tool approved by the Kubernetes community. If you want to use Kubernetes to build your container environment, then using Magnum to deploy Kubernetes on OpenStack is a good choice. Because of security issues, deploying Kubernetes directly on bare metal machines is not a safe choice. Magnum gives you more flexibility to deploy Kubernetes on “Nova” or “Ironic”.

  • In the OpenStack deployment with Octavia service enabled, the Octavia service should be used not only for master nodes high availability, but also for k8s LoadBalancer type service implementation as well.

It is worth noting that because the code implementation of Octavia Member objects is dependent on IP rather than Instance, this design allows them to be used in many different scenarios rather than being limited to load balancing for OpenStack Nova Instance. So Octavia can also provide an outer load balancing service for Kubernetes Pods.

Zun

The Zun project was finally released 1.0 this year, followed by 2.0 later this year. Zun is an OpenStack Container as a Service project that aims to provide native Container management services through collaboration with OpenStack projects such as Neutron, Cinder, Keystone, and Kuryr. In this way, OpenStack’s network, storage, and authentication services are seamlessly integrated into the container technology architecture. It allows users to quickly start and run containers without managing servers or clusters, thus ensuring that containers can meet users’ security and compliance requirements.

Zun is actually an alternative to Nova Docker Driver. The disadvantage of Nova Docker Driver is that Docker container and virtual machine cannot achieve a completely unified abstraction layer. It is inevitable to lose some excellent container features by operating containers in a virtual machine-like way. Examples include container association, port mapping, and so on. Therefore, Zun is simply a Docker container deployment and scheduling framework independent of Nova. It should have been Magnum’s original intention to implement container creation and lifecycle management as the first class of OpenStack resources. But Magnum has evolved into a deployment scheduling project for COEs, and this is the essential difference between Zum and Magnum.

Personally, I think Zun project is actually quite embarrassing. I always think it is a big misunderstanding to use containers as virtual machines. Containers without arrangement feel like they lack soul. However, I know that some big enterprises in China are using it, so I am curious about the application scenarios of Zun. Wait and see.

Kuryr

  • Introduced port pools feature.
  • Support for running in containers as K8s network addon.
  • Introduced kuryr-daemon service.
  • Introduced liveness and readiness probes for kuryr-controller.
  • Added support for High Availability kuryr-controller in an Active/Passive model, enabling quick and
  • transparent recovery in case kuryr-controller is lost.
  • Added support for namespace isolation, lettting users isolate pods and services in different namespaces, implemented through security groups.
  • Added support for multi-vif based on Kubernetes Network Custom Resource Definition De-facto Standard spec defined by the Network Plumbing Working Group, allowing multiple interfaces per pod.

Kuryr’s historical mission is to interconnect container networks with OpenStack Neutron to achieve intercommunication among VM instances, containers, and external networks. Kuryr has two branches: Kuryr-libnetwork (CNM) and Kuryr-Kubernetes (CNI). Based on the updated list, this year’s Kuryr team seems to be pushing kuryr-Kubernetes. Using Kuryr-Kubernetes allows your OpenStack VM to run on the same network as your Kubernetes Pods, In addition, Neutron L3 and Security Group can be used to implement network routing and isolate specific Ports.

  • Added support for health checks of the CNI daemon, letting users confirm the CNI daemon’s functionality and set limits on resources like memory, improving both stability and performance and for it to be marked as unhealthy if needed.

Kuryr CNI binds the network to specific Pods based on the resources allocated by the Kuryr Controller. Adding a CNI Daemon will greatly improve the scalability of Kubernetes.

  • Added native route support enabling L7 routing via Octavia Amphorae instead of iptables, providing a more direct routing for load balancers and services.

As Octavia matures, more and more projects are taking advantage of Octavia Amphorae’s network connectivity, and I personally think that the ability to get through the network that Amphorae brings is a fantastic way to do it.

In addition to networking, Kuryr hopes to become a Storage Bridge between containers and OpenStack, enabling containers to access Cinder block Storage and Manila shared Storage services. Kuryr is a key node for OpenStack to combine with containers, and deserves attention.

OpenInfra from it

The reason it took so long to sort out the two Queens and Rocky releases this year is because they are the most direct examples of where OpenStack is headed. From a macro point of view, this is how I summarize it.

  • Pike’s version is all about lowering operational costs
  • Queens is all about making things easier and easier to use
  • The Rocky version is dominated by business scenario convergence

It was said before that there would be no obvious mutations in the three versions of PQR, and the stability would still be the main factor. I disagree. From the above, we can see that OpenStack has performed well in large-scale deployment, rapid upgrade, high-performance computing, hardware acceleration, bare-metal cloud, embracing containers and resource management integration this year. These efforts are clearly geared to meet the business loads in the 5G era of Internet of Things, edge computing, telecom NFV and artificial intelligence. The OpenStack community continues to follow user needs and technology trends. As Mark Collier, coo of the OpenStack Foundation said, “What we’re seeing now is a market where people want to do more with the cloud, with new workloads like machine learning, artificial intelligence and containers.”

Last year, I mentioned in my article “2017 OpenStack Days China: The Development Status of Cloud Computing in China” that OpenStack private cloud in China has officially entered the mature stage. The sign of maturity is the shift of user conflict from “how to make cloud” to “how to use cloud”. User deployment cases move from test to production; Workloads entrusted to OpenStack by users are migrated from non-core services to core services. In 2017, OpenStack blossomed everywhere, so in 2018, it should be a year of in-depth exploration of industry value, business integration and market segmentation. A quick summary of the changes THAT HAVE impressed me most about OpenStack this year — freeing up stability and deepening it in application scenarios.

With the rapid development of the whole cloud computing industry, users’ thinking on cloud value transformation is more complex and profound, rather than remaining on the surface of the upper cloud. More users will consider application scenarios such as PaaS, GEO-redundant DISASTER recovery (Dr), hybrid cloud, and multi-cloud data flow. With the diversity of users in the industry, OpenStack also needs different business loads (e.g Artificial intelligence, big data and the Internet of Things (IOT) provide operational support, which are actually rigid requirements for in-depth integration of user services and further improvement of platform autonomy. Users often want a holistic cloud solution, not just IaaS, but not all technologies will be developed within the OpenStack community. So, following the “composability” approach to project collaboration, this year the community came up with a more radical idea of OpenInfra, embracing all open source projects that can be composed with a more “open” attitude, and turning OpenStack into an integration engine for open source infrastructure.

I think the community is well prepared for the transition from OpenStack to OpenInfra. From last year’s Lightweight secure Container project Kata Container (a fusion of virtualization and Container technology), this year’s CI/CD project Zuul, to IaaS/PaaS fusion Airship and Edge computing project StarlingX, One by one, they become the top projects of the OpenStack Foundation, alongside OpenStack projects, creating the blueprint for the entire Open Infrastructure.

“The OpenStack community is all about solving problems and improving computing, storage, and networking,” Collier said. But now it’s more than that. We’re building a technology stack, but it’s not a fixed stack, it’s a flexible, programmable infrastructure technology stack. We need to glue different open source projects together, and members of the OpenStack community already have the expertise to do that.” You should know that every community has its own personality. Win-win cooperation between communities is a very engineering and humanistic pursuit, and the OpenStack community is working hard to achieve it. It has been implemented between OpenStack, OpenNFV, KVM, and Ceph communities, which I admire.

In fact, I personally think that both “composability” and OpenInfra are the right ideas. But there is deviation in practice. In my opinion, the granularity that can be combined should be OpenStack and other open source projects, rather than OpenStack subordinate projects (e.g. Keystone, Cinder, Neutron) and other open source projects. Projects within OpenStack should be further closely connected and interdependent to provide more complete and fluent computing, storage, network and other infrastructure resource services, and then provide simple and unified API externally to achieve an open API economy. As you can see from the above, the Octavia project is a great example in this respect. Unfortunately, some projects will want to go it alone, expend effort to adapt to platforms like Kubernets, and end up diverting and tearing apart as a result of sibuxiang projects. There is still room for improvement in OpenStack computing, storage, and network, such as Nova Cell, Nova Placement, Cinder DR, and Neutron DVR. I wonder if anyone has calculated the cost of communication, research and development and discussion for a project that has spent so much manpower and time to build around OpenStack system. That’s the gap between the ideal and the reality, and that’s why I think some of the projects are getting weak right now.

Of course, this is only part of the picture. With such a large open source community, there are always problems. At present, OpenStack is still bursting with its exuberant vitality, as evidenced by the still rapid growth of the domestic market. In any case, OpenStack has not only become an excellent open source private cloud architecture in the past eight years, but also greatly promoted the development of open source projects such as OpenFlow and Ceph, and even spread open source and development ideas and community operation cases all over the world, which is a success of OpenStack alone.

Someone once asked me what OpenStack means to me. I say: Open source. The future of cloud will not consist of OpenStack or Kubernetes, but there will always be mainstream open source technologies, such as KVM, Container, Ceph, OVS and Cassandra. As a cloud computing technical engineer, THESE are what I value.