管理 K8s 上面資料的處理

Chapter 1, “Introduction to Cloud Native Data Infrastructure: Persistence, Streaming, and Batch Analytics”

Specifically, we propose a definition for the term cloud native data and define principles for cloud native data infrastructure that we’ll use to measure technologies throughout the rest of the book.

Chapter 2, “Managing Data Storage on Kubernetes”

In this chapter, we’ll look at one of the foundational areas for data infrastructure on Kubernetes: storage. We’ll begin with how storage works in containerized sys‐ tems starting with Docker, then moving to Kubernetes and its PersistentVolume subsystem. We’ll discuss the various types of storage available including file, block, and object storage, and the trade-offs of using local versus remote storage solutions.

Chapter 3, “Databases on Kubernetes the Hard Way”

This chapter introduces Kubernetes compute resources such as Pods, Deploy‐ ments, and StatefulSets and walks you through the step-by-step process of deploying databases like MySQL and Apache Cassandra using these resources. You’ll learn some of the strengths and weaknesses of StatefulSets for managing distributed databases.

Chapter 4, “Automating Database Deployment on Kubernetes with Helm”

Continuing the themes of the previous chapter, we revisit the deployment of MySQL and Cassandra on Kubernetes, this time in a more automated fashion using the Helm package manager. You’ll also learn about Kubernetes resources that help with configuration including ConfigMaps and Secrets. We discuss the role of Helm in your overall DevOps process and CI/CD toolset and some of its shortcomings with respect to managing database operations.

Chapter 5, “Automating Database Management on Kubernetes with Operators”

This chapter concludes our sequence on database deployment by introducing the operator pattern and demonstrating how operators can help manage “day two” database operations. We’ll examine how operators extend the Kubernetes control plane to manage databases, using Vitess (MySQL) and Cass Operator (Apache Cassandra) as examples. Along the way, you’ll learn how to assess operators’ maturity and even how to build your own operators by using frameworks such as the Operator SDK.

Chapter 6, “Integrating Data Infrastructure in a Kubernetes Stack”

In this chapter, we begin to expand the focus beyond just deploying and operat‐ ing databases to consider how databases and other data infrastructure can be incorporated in your overall application stack. We’ll look at a project called K8ssandra that integrates Apache Cassandra along with tools for managing mon‐ itoring, security, and database backups, and an API layer for easier data access.

Chapter 7, “The Kubernetes Native Database”

At this point, we take a step back and summarize what you’ve learned about cloud native data management in the book’s first half and use that knowledge to consider the question, “What is a Kubernetes native database?” More than just a debate about industry buzzwords, this discussion is an important one for you who are involved in selecting data infrastructure and those developing that infrastructure.

Chapter 8, “Streaming Data on Kubernetes”

Moving beyond persistence, we’ll start working through the rest of the data infra‐ structure, starting with streaming technologies. Moving and processing data in cloud native applications is just as prevalent as database persistence, but requires different strategies in deployment: connecting endpoints securely and building in default resilience and elasticity. In this chapter, Apache Pulsar and Apache Flink will be used to demonstrate those important practices to build.

Chapter 9, “Data Analytics on Kubernetes”

Ironically, the needs for large-scale analytics deployments are part of the origin story of many of the methodologies we see used in Kubernetes today—namely, orchestration and resource management. Coming full circle, running analytics in Kubernetes is now a top priority in many organizations. We highlight changes in Apache Spark to give you a head start for your use case and look at the leading edge of analytics in Kubernetes with the Dask and Ray projects.

Chapter 10, “Machine Learning and Other Emerging Use Cases”

The topics of AI and machine learning are already on the cutting edge within infrastructure. Projects that have started in the past few years could start in Kubernetes first, and it’s an interesting thing to consider. There are other types of projects thinking in terms of cloud native first and providing some directionality to the future of data. This chapter is meant to be a survey of those projects and offered broadly as ideas and methodologies to consider as you move forward with cloud native data.

Chapter 11, “Migrating Data Workloads to Kubernetes”

All the knowledge you’ve obtained in reading the book goes to waste if you don’t put it into practice. In this chapter, we highlight the key teachings of the previous chapters and propose a framework of people, process, and technology changes you can make to migrate your stateful workloads to Kubernetes successfully. We conclude with a vision of what your organization’s data infrastructure could look like in the near future.