Commit what you learn to Git

When I first put Apache Cassandra into production, version.8, there was an expectation that to get it into production, you needed to be prepared to do some work. That’s when I fell in love with the Cassandra community. I knew Cassandra was right for my use, but I never felt like I was on my own because there were so many people in the community who helped me through email and chat. Eventually, I found myself sitting in this tiny square office with Patricio Echague, Nate McCall, and Ed Capriolo (all early submitters), making the last pieces of my code work and troubleshooting. At some point, we need to submit code to Cassandra to fix the problems we find. Doing so ensures that the experience is codified for future users. Neither of us worked at the same company, but we shared a passion for building large-scale infrastructure and cutting-edge engineering. Over the years, as the infrastructure grew, we, as a growing community, translated the hive of thought of experience into code.

While we were doing large-scale database operations and learning how to become the Cassandra DBA, a small project called Kubernetes began to gain traction and attract the attention of the same type of engineers who deployed applications. While DevOps focuses on what needs to be done, Site reliability Engineering (SRE) asks how to do it with automation and observability. Now, SRE is looking at Cassandra with the same problem and strongly wants to use a single control plane. Run Cassandra in Kubernetes.

Put it all together

In the past year, you may have heard that “Cassandra and Kubernetes together are like peanut butter and jelly “, which is a funny way of saying that they are perfectly appropriate. To be honest, when I say I like peanut butter and jelly sandwiches, I don’t mean I want to make one. I prefer it when it’s done! The same goes for infrastructure. Can I get some pre-assembled, sensible defaults and components THAT I need? This is the spirit and mission of K8ssandra. Despite its impressive name, K8ssandra is more than just a collection of tools. It is a collection of experiences from our community of users, packaged and ready to be used freely.

Running Cassandra in Kubernetes, as with most stateful systems, requires a range of knowledge through trial and error or reading documentation. It’s the perfect time to just get sandwiches! As in those early days, we learned new things about deploying Cassandra and submitted them to the project to help DBAs. The K8ssandra project is a place to provide knowledge for SRE deploying data services on Kubernetes using Apache Cassandra.

K8ssandra unpacking

I’m sure you’ve seen videos on YouTube of someone opening a new product and describing what’s in the box. This brings a great perspective as we dig into the details of K8ssandra. This is not a single executable, nor is it a tarball that can be expanded. This is deployed using Helm [3], a Kubernetes packaging tool that makes installation easy. The real value of using K8ssandra is that you have everything you need for a typical Cassandra cluster running, configured and boxed and ready to use. So let’s open the box and see what’s inside (Figure 1).

Figure 1: Under the hood of K8ssandra

Apache Cassandra

Star of the show, but in a new, more modern way. Because Cassandra’s basic deployment unit is a node, it nicely complements Kubernetes’ preference for composability. Stateful workloads in Kubernetes can be a challenge, but the database Dynamo model [4] allows us to be the most basic origin in elastic, scalable, self-healing data services. It is worth noting that the Cassandra project community has accepted Kubernetes as the Cassandra control plane. Expect many changes after 4.0[5] that will make running in Kubernetes much more efficient. At the same time, you get something that works in a Kubernetes environment with very little maintenance.

cass-operator

The most critical factor in connecting Kubernetes and Cassandra is the operator. Over the past two years, the Cassandra community has been focusing a lot of attention on the operators as a proper starting point. If magic happens, it’s with the carriers. It acts as a transition layer between the control plane of Kubernetes and what the Cassandra cluster actually does. More recently, the Apache Cassandra project agreed to cluster around a single operator named Cas-Operator [6]. Some great contributions from CassKop[7] from Orange will be merged with DataStax operators and the final version will be incorporated into the Apache project. This is a prime example of actual production knowledge going into code. The community members who contribute to the Cass-Operator run a large number of Cassandra in Kubernetes every day.

Cassandra Reaper

Reaper is a tool that helps manage the critical maintenance task of anti-entropy repair in Cassandra clusters. Cassandra Reaper was originally created by Spotify and later adopted and maintained by The Last Pickle, [8] which is now a subproject of Apache Cassandra. If you ask a group of Cassandra DBAs to sit down and talk about what they do, chances are they’ll talk about running fixes. This is an important operation for Dynamo based systems because it keeps the data consistent, despite the inevitable problems such as node failures and network partitions. That’s all you need to know, because in K8ssandra, Reaper will automatically run these operations for you because it’s designed for SRE, so you can expect a good set of pre-built metrics to verify that everything is going well.

Cassandra Medusa

If you don’t have a way to back up and restore your data, you can’t say it’s correct in production. Like Reaper, Medusa[9] is now a subproject of Apache Cassandra, via Spotify and The Last Pickle. Backing up a distributed system requires a different approach than most DBAs. Medusa not only helps coordinate these tasks, but also manages the placement of static data. Initial implementations allow backup sets to be stored and retrieved on cloud object storage (S3 buckets), and more options are in the pipeline. Perhaps the most interesting feature is recovery. Not to recover a damaged database, but to quickly replenish your test cluster in the CI/CD pipeline. I can’t wait to hear the first conference talk from SRE’s on creative ways to use this tool.

Index collection and output to Prometheus

At the top of SRE’s wish list is the observability section. In keeping with the slogan “Measure everything”, K8ssandra presets a great set of metrics that need to be implemented in a cluster. Cassandra has a lot of metrics that can be overwhelming for a new user. Not only do we curate the most important and useful metrics, but we also create a pre-built Grafana dashboard (Figure 2) to help you understand them. There is even talk of adding Jaeger tracking. You should see a lot going on with this component because it’s a huge area of Kubernetes that operators add to their favorites. If you are interested in contributing as an SRE, this is a great place to start.

Figure 2: An example of a Grafana dashboard with K8ssandra

Let’s paint an awesome future

That’s what’s in the box today, but now it’s time for our community to pave the way around K8ssandra again. K8ssandra has benefited from contributions from Spotify, Apple, Instaclustr, Orange, New Relic, DataStax, and many other engineering organizations. We will learn more as a community when we use Cassandra as the default data plane for Kubernetes. When we do this, these lessons will be put into the next release of K8ssandra and make it even better.

If you’re ready, or want to learn more about how to get involved, we’ve created the first site at k8ssandra.io [1]. There you can get simple instructions on how to get started, or if you want to connect with the community and be part of the project. For me this is going to be an exciting early phase of the project as we move forward together as a community. There’s a lot of Cassandra in use, there’s a lot of Kubernetes. They don’t always go together, so let’s make it a gathering place and make it a reality!” .

DataStax holds hands-on workshops at different times, so be sure to check out the DataStax Workshop Eventbrite page [10] for any recent events that might suit you, or feel free to try our free hands-on learning package at DataStax/dev[11]. In addition, DataStax will roll out new certifications for running Cassandra in Kubernetes. If you are interested, you can register to receive notice of these certificates [12].

If you are a DBA who wants to become an SRE, this is a good place to start. Building cloud native applications [13] is the future trend, and data on Kubernetes is a key part of this future. And I look forward to building that future together.

Links and literature

[1] k8ssandra.io

[2] helm.sh

[3] helm.sh

[4] en.wikipedia.org/wiki/Dynamo…

[5] cassandra.apache.org/blog/2020/0…

[6] github.com/datastax/ca…

[7] github.com/Orange-Open…

[8] cassandra-reaper.io

[9] github.com/thelastpick…

[10] www.eventbrite.co.uk/o/datastax-…

[11] www.datastax.com/dev/kuberne…

[12] www.datastax.com/dev/certifi…

[13] www.datastax.com/cloud-nativ…

The postCloud-native applications and data with Kubernetes and Apache Cassandra – Part 3appeared first onDevOps Conference.