I’m an average Reddit user, scrolling much more than reading or interacting. Sometimes, however, a post rings a giant red bell. When I stumbled upon If you could add one feature to K8s, what would it be?, I knew the content would be worth it. The most voted answer is:
Namespace scoped CRDs
A short intro to CRDs
Kubernetes comes packed with existing objects, such as Pod
, Service
, DaemonSet
, etc., but you can create your own:
the latter are called Custom Resource Definitions.
Most of the time, CRDs are paired with a custom controller called an operator.
An operator subscribes to the lifecycle events of CRD(s).
When you act upon a CRD by creating, updating, or deleting it, Kubernetes changes its status, and the operator gets notified.
What it does depends on the nature of the CRD.
For example, the Prometheus operator subscribes to the lifecycles of a couple of different CRDs:
Prometheus
, Alertmanager
, ServiceMonitor
, etc., to make operating Prometheus easier.
In particular, it will create a Prometheus instance when it detects a new Prometheus
CR.
It will configure the instance according to the CR’s manifest.
The issue with cluster-wide CRDs
CRDs have a cluster-wide scope;
that is, you install a CRD for an entire cluster.
Note that while the definition is cluster-wide, the CR’s scope is either Cluster
or Namespaced
depending on the CRD.
I noticed the problem of cluster-wide CRDs when I worked with Apache APISIX, an API gateway.
Routing in Kubernetes has evolved across several steps: NodePort
, LoadBalancer
, and IngressController
, each trying to fix the limitations of its predecessor.
The latest step is the Gateway API.
At the time of this writing, the Gateway API is still an add-on and not part of the Kubernetes distro.
You need to install it explicitly as a CRD.
The Gateway API went through several versions.
If team A were a precursor and installed version v1alpha2
, every other team would need to use the same version because the Gateway API is a CRD.
Of course, team B can try to convince team A to upgrade, but if you’ve been in such a situation, you know how painful it can be.
I mentioned above that the magic happened via an operator. The Gateway API doesn’t come with an out-of-the-box operator. Instead, different vendors provide their own. For example, Apache APISIX has one, Traefik has one, etc. Of course, they are more or less advanced. At the time, the APISIX operator only worked with version 0.5.0 of the Gateway API CRD.
So now, it gets worse. Team A installed v0.5.0 to work with APISIX; team B comes later and wants to use Traefik, which fully supports the latest and greatest. Unfortunately, they can’t because it would require the latest CRD.
Don’t get me wrong; I’m all for a lean architectural landscape that limits the number of different technologies. However, it should be a deliberate choice, not a technical limitation. The above also prevents rolling upgrades. Imagine that we decided on Apache APISIX early on. Yet, it hasn’t progressed toward supporting the latest Gateway API versions. We should be able to migrate from APISIX to Traefik (or any other) team by team.
The cluster-wide CRD doesn’t allow it, or at least makes it very hard: we should find a Traefik that handles v0.5.0, if there’s one and it’s still maintained, migrate all APISIX CR to Traefik at once, and then proceed to upgrade. This approach requires expensive coordination, the cost of which grows exponentially with the number of teams involved.
The separate clusters approach
The obvious solution is to have one cluster per team. If you have been operating clusters, you know this approach doesn’t scale.
Each cluster requires a primary node and a control plane. These are just "administrative" costs of running a cluster: they don’t bring anything to the table.
On top of that, every cluster needs a complete monitoring solution. It includes at least metrics and logging, possibly distributed tracing. Whatever your architecture, it’s again an additional burden with no business value. You can generalize the above over every support feature of a cluster, authentication, authorization, etc.
All in all, lots of clusters mean lots of additional operational costs.
vCluster, a sensible alternative
The ideal situation, as the initial quote of this post states, would be to have namespace-scoped CRDs. Unfortunately, it’s not the path that Kubernetes chose. The next best thing would be to add a virtual cluster on top of the real one to partition it: that’s the promise of vCluster.
What are virtual clusters?
Virtual clusters are a Kubernetes concept that enables isolated clusters to be run within a single physical Kubernetes cluster. Each cluster has its own API server, which makes them better isolated than namespaces and more affordable than separate Kubernetes clusters.
vCluster isolates each virtual cluster. Hence, with a single control plane, you can deploy a v1.0 CRD in one cluster and a v1.2 in another without trouble.
Imagine two teams working with different Gateway API providers, each requiring a different CRD version.
Let’s create a virtual cluster for each of them with vCluster so each can work independently from the other team.
I’ll assume you already have the vcluster
CLI installed;
if not, look at the documentation, for we provide a couple of different installation options depending on your platform and your tastes.
We can now create our virtual clusters.
vcluster create teamx
The output should be similar to the following:
08:01:02 info Creating namespace vcluster-teamx 08:01:02 info Detected local kubernetes cluster orbstack. Will deploy vcluster with a NodePort & sync real nodes 08:01:02 info Chart not embedded: "open chart/vcluster-0.21.1.tgz: file does not exist", pulling from helm repository. 08:01:02 info Create vcluster teamx... 08:01:02 info execute command: helm upgrade teamx https://charts.loft.sh/charts/vcluster-0.21.1.tgz --create-namespace --kubeconfig /var/folders/kb/g075x6tx36360yvwjrb1x6yr0000gn/T/83460322 --namespace vcluster-teamx --install --repository-config='' --values /var/folders/kb/g075x6tx36360yvwjrb1x6yr0000gn/T/1777816672 08:01:03 done Successfully created virtual cluster teamx in namespace vcluster-teamx 08:01:07 info Waiting for vcluster to come up... 08:01:32 done vCluster is up and running
Because we didn’t specify any namespace, vcluster
created one with the same name as the virtual cluster.
If you prefer to set a specific namespace, use the -n
option, .e.g., vcluster create mycluster -n mynamespace
.
Note that you can customize each virtual cluster via a values.yaml
configuration file.
In the context of this post, we will keep the default options.
We use the vcluster connect
command to connect to a virtual cluster.
However, we are already connected because we used the vcluster create
command.
At this point, it’s as if we were in a separate Kubernetes cluster. Team X can install the CRDs using the version that they require.
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.0.0/standard-install.yaml
The output is:
customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io created customresourcedefinition.apiextensions.k8s.io/gateways.gateway.networking.k8s.io created customresourcedefinition.apiextensions.k8s.io/httproutes.gateway.networking.k8s.io created customresourcedefinition.apiextensions.k8s.io/referencegrants.gateway.networking.k8s.io created
Team Y can do the same with their version. Because we are both teams X and Y, we need to disconnect first from the virtual cluster.
vcluster disconnect
You should see the result of the operation:
08:05:29 info Successfully disconnected and switched back to the original context: orbstack
Let’s impersonate team Y, create the virtual cluster, and install another version of the CRDs:
vcluster create teamy
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.0/standard-install.yaml
The output of the second command is the following:
customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io created customresourcedefinition.apiextensions.k8s.io/gateways.gateway.networking.k8s.io created customresourcedefinition.apiextensions.k8s.io/grpcroutes.gateway.networking.k8s.io created customresourcedefinition.apiextensions.k8s.io/httproutes.gateway.networking.k8s.io created customresourcedefinition.apiextensions.k8s.io/referencegrants.gateway.networking.k8s.io created
Version 1.2 has a new GRPC route that is not found in version 1.0. This way, team X can now install their Gateway API provider that works with v1.0 and team Y the one that works with 1.2.
CRDs are cluster-wide resources, but there’s no conflict since the virtual clusters behave like isolated clusters. Each team can happily use the version they need without forcing others to use it.
Conclusion
In this post, we touched on the problem of some Kubernetes objects: they are cluster-wide and lock all teams working on the same cluster to use the same version. Running a Kubernetes cluster incurs costs; managing lots of them requires mature and organized automation.
vCluster allows an organization to get the best of both worlds: limit the number of clusters while preventing teams from stepping on each others' toes.