Chapter 1, “Introduction to Cloud Native Data Infrastructure: Persistence, Streaming, and Batch Analytics”
Specifically, we propose a definition for the term cloud native data and define principles for cloud native data infrastructure that we’ll use to measure technologies throughout the rest of
the book.
Chapter 2, “Managing Data Storage on Kubernetes”
In this chapter, we’ll look at one of the foundational areas for data infrastructure
on Kubernetes: storage. We’ll begin with how storage works in containerized sys‐
tems starting with Docker, then moving to Kubernetes and its PersistentVolume
subsystem. We’ll discuss the various types of storage available including file,
block, and object storage, and the trade-offs of using local versus remote storage
solutions.
Chapter 3, “Databases on Kubernetes the Hard Way”
This chapter introduces Kubernetes compute resources such as Pods, Deploy‐
ments, and StatefulSets and walks you through the step-by-step process of
deploying databases like MySQL and Apache Cassandra using these resources.
You’ll learn some of the strengths and weaknesses of StatefulSets for managing
distributed databases.
Chapter 4, “Automating Database Deployment on Kubernetes with Helm”
Continuing the themes of the previous chapter, we revisit the deployment of
MySQL and Cassandra on Kubernetes, this time in a more automated fashion
using the Helm package manager. You’ll also learn about Kubernetes resources
that help with configuration including ConfigMaps and Secrets. We discuss the
role of Helm in your overall DevOps process and CI/CD toolset and some of its
shortcomings with respect to managing database operations.
Chapter 5, “Automating Database Management on Kubernetes with Operators”
This chapter concludes our sequence on database deployment by introducing the
operator pattern and demonstrating how operators can help manage “day two”
database operations. We’ll examine how operators extend the Kubernetes control
plane to manage databases, using Vitess (MySQL) and Cass Operator (Apache
Cassandra) as examples. Along the way, you’ll learn how to assess operators’
maturity and even how to build your own operators by using frameworks such as
the Operator SDK.
Chapter 6, “Integrating Data Infrastructure in a Kubernetes Stack”
In this chapter, we begin to expand the focus beyond just deploying and operat‐
ing databases to consider how databases and other data infrastructure can be
incorporated in your overall application stack. We’ll look at a project called
K8ssandra that integrates Apache Cassandra along with tools for managing mon‐
itoring, security, and database backups, and an API layer for easier data access.
Chapter 7, “The Kubernetes Native Database”
At this point, we take a step back and summarize what you’ve learned about
cloud native data management in the book’s first half and use that knowledge
to consider the question, “What is a Kubernetes native database?” More than
just a debate about industry buzzwords, this discussion is an important one for
you who are involved in selecting data infrastructure and those developing that
infrastructure.
Chapter 8, “Streaming Data on Kubernetes”
Moving beyond persistence, we’ll start working through the rest of the data infra‐
structure, starting with streaming technologies. Moving and processing data in
cloud native applications is just as prevalent as database persistence, but requires
different strategies in deployment: connecting endpoints securely and building in default resilience and elasticity. In this chapter, Apache Pulsar and Apache Flink will be used to demonstrate those important practices to build.
Chapter 9, “Data Analytics on Kubernetes”
Ironically, the needs for large-scale analytics deployments are part of the origin
story of many of the methodologies we see used in Kubernetes today—namely,
orchestration and resource management. Coming full circle, running analytics in
Kubernetes is now a top priority in many organizations. We highlight changes in
Apache Spark to give you a head start for your use case and look at the leading
edge of analytics in Kubernetes with the Dask and Ray projects.
Chapter 10, “Machine Learning and Other Emerging Use Cases”
The topics of AI and machine learning are already on the cutting edge within
infrastructure. Projects that have started in the past few years could start in
Kubernetes first, and it’s an interesting thing to consider. There are other types of
projects thinking in terms of cloud native first and providing some directionality
to the future of data. This chapter is meant to be a survey of those projects and
offered broadly as ideas and methodologies to consider as you move forward
with cloud native data.
Chapter 11, “Migrating Data Workloads to Kubernetes”
All the knowledge you’ve obtained in reading the book goes to waste if you don’t
put it into practice. In this chapter, we highlight the key teachings of the previous
chapters and propose a framework of people, process, and technology changes
you can make to migrate your stateful workloads to Kubernetes successfully. We
conclude with a vision of what your organization’s data infrastructure could look
like in the near future.