Ready-to-use virtual clusters

vCluster is a solution for creating virtual clusters within a host Kubernetes cluster. Virtual clusters are a Kubernetes concept that enables isolated clusters to be run within a single physical Kubernetes cluster. Each cluster has its own API server, which makes them better isolated than namespaces and more affordable than separate Kubernetes clusters. — What are virtual clusters?

vCluster is a solution for creating virtual clusters within a host Kubernetes cluster.

Virtual clusters are a Kubernetes concept that enables isolated clusters to be run within a single physical Kubernetes cluster. Each cluster has its own API server, which makes them better isolated than namespaces and more affordable than separate Kubernetes clusters.

— What are virtual clusters?

If you start your vCluster journey, you probably create a virtual cluster, then apply your manifests to it. It’s far from efficient and is error-prone. In this post, I’d like to show you a nifty feature that creates ready-to-use virtual clusters.

A use case of ready-to-use virtual clusters

Most organizations are organized around teams. For example, the Team Topology book describes four kinds of teams:

  • Stream-aligned team
  • Complicated subsystem team
  • Enabling team
  • Platform team

With vCluster, the platform team will probably provide the configuration file, while stream-aligned teams will use the virtual cluster. In many cases, however, the virtual cluster runs workloads that are purely related to operations. For example, it stands to reason to scrape the logs of running containers to send them to a centralized logging system, such as Elasticsearch or Loki. For temporary virtual clusters, such as ones created in the context of a Pull Request, it’s even a hard requirement to understand what happens in case of failure.

The Platform team obviously needs to provide a Loki (or any similar tool) on the host cluster. They probably must provide the vCluster configuration to synchronize the logging service endpoint on the host to the virtual cluster. Finally, somebody must configure the scraping. Suppose we use Alloy, the successor of Grafana’s Promtail.

Alloy offers native pipelines for OTel, Prometheus, Pyroscope, Loki, and many other metrics, logs, traces, and profile tools. In addition, you can use Alloy pipelines to do different tasks, such as configure alert rules in Loki and Mimir. Alloy is fully compatible with the OTel Collector, Prometheus Agent, and Promtail. You can use Alloy as an alternative to either of these solutions or combine it into a hybrid system of multiple collectors and agents. You can deploy Alloy anywhere within your IT infrastructure and pair it with your Grafana stack, a telemetry backend from Grafana Cloud, or any other compatible backend from any other vendor. Alloy is flexible, and you can easily configure it to fit your needs in on-prem, cloud-only, or a mix of both.

Here’s a sample Alloy configuration file, which scrapes Kubernetes pods' logs and sends them to a Loki instance:

/etc/alloy/config.alloy
discovery.kubernetes "pods" {
  role = "pod"                                                          (1)
}
loki.source.kubernetes "k8s_logs" {
  targets = discovery.kubernetes.pods.targets
  forward_to = [loki.process.add_metadata.receiver]
}
loki.process "add_metadata" {                                           (2)
  stage.labels {
    values = {
      "namespace" = "kubernetes.namespace_name",
      "pod"       = "kubernetes.pod_name",
      "container" = "kubernetes.container_name",
    }
  }
  forward_to = [loki.write.loki_receiver.receiver]
}
loki.write "loki_receiver" {
  endpoint {
    url = "http://loki.logging.svc.cluster.local:3100/loki/api/v1/push" (3)
    headers = { "X-Scope-OrgID" = "teamA" }                             (4)
  }
}
1 Scrape all pods' logs
2 Enrich the logs with labels
3 Push to a predefined Loki instance, with loki as the service name in a logging namespace
4 Loki is multi-tenant by default and needs a specific tenant header

The job of the platform team

The platform team needs to install Loki on the host cluster and synchronize its service on the virtual cluster. It would also be great if it installed Alloy on the virtual cluster, so that the stream-aligned team (teamA) can focus on its value-added job.

Installing Loki on the host cluster

We will use Loki’s Helm Chart:

helm upgrade --install loki grafana/loki --namespace logging --create-namespace -f values.yaml (1)
1 Install Loki in the logging namespace

The output is very long:

Release "loki" does not exist. Installing it now.
NAME: loki
LAST DEPLOYED: Wed Feb 19 09:25:25 2025
NAMESPACE: logging
STATUS: deployed
REVISION: 1
NOTES:
***********************************************************************
 Welcome to Grafana Loki
 Chart version: 6.27.0
 Chart Name: loki
 Loki version: 3.4.2
***********************************************************************

** Please be patient while the chart is being deployed **

Tip:

  Watch the deployment status using the command: kubectl get pods -w --namespace logging

...

> When using curl you can pass `X-Scope-OrgId` header using `-H X-Scope-OrgId:foo` option, where foo can be replaced with the tenant of your choice.

Here are my values for this simple example:

deploymentMode: SingleBinary                                            (1)
loki:
  commonConfig:
    replication_factor: 1
  storage:
    type: filesystem                                                     (2)
  useTestSchema: true

gateway:                                                                (3)
  service:
    type: NodePort
    nodePort: 31000

backend:                                                                (1)
  replicas: 0
read:
  replicas: 0
write:
  replicas: 0
ingester:
  replicas: 0
querier:
  replicas: 0
queryFrontend:
  replicas: 0
queryScheduler:
  replicas: 0
distributor:
  replicas: 0
compactor:
  replicas: 0
indexGateway:
  replicas: 0
bloomCompactor:
  replicas: 0
bloomGateway:
  replicas: 0
1 Demo setup, don’t use in production!
2 Store the data on the container’s filesystem; only for the sake of the demo
3 Expose the Loki API outside of the cluster

Creating the virtual cluster

Creating the virtual cluster also falls under the platform’s team responsibility: They can use the Helm Chart:

helm upgrade --install vcluster vcluster/vcluster --namespace vcluster --create-namespace  --values vcluster.yaml
Release "vcluster" does not exist. Installing it now.
NAME: vcluster
LAST DEPLOYED: Wed Feb 19 09:26:42 2025
NAMESPACE: vcluster
STATUS: deployed
REVISION: 1
TEST SUITE: None

The configuration file is straightforward; we must synchronize the Loki service from the host cluster to the virtual cluster, so pods in the latter can access it:

networking:
  replicateServices:
    fromHost:
      - from: logging/loki
        to: logging/loki

Installing Alloy on the virtual cluster

At this point, we must install Alloy on the virtual cluster, so it’s ready for the using team, as per the post’s title: ready-to-use virtual clusters. Installing Alloy’s responsibility falls again on the shoulders of the platform team. It’s a mandatory step, meaning it should be automated, though it would be better to have a single command to install both the virtual cluster and Alloy.

The good thing is that it’s part of vCluster’s features, even though it’s experimental. We can update the virtual cluster configuration file to instruct vCluster to do everything in one single command:

networking:                                                             (1)
  replicateServices:
    fromHost:
      - from: logging/loki
        to: logging/loki
experimental:                                                           (2)
  deploy:
    vcluster:                                                           (3)
      manifests: |                                                      (4)
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: alloy
        data:
          config.alloy: |
            discovery.kubernetes "pods" {
              role = "pod"
            }
            loki.source.kubernetes "k8s_logs" {
              targets = discovery.kubernetes.pods.targets
              forward_to = [loki.process.add_metadata.receiver]
            }
            loki.process "add_metadata" {
              stage.labels {
                values = {
                  "namespace" = "kubernetes.namespace_name",
                  "pod"       = "kubernetes.pod_name",
                  "container" = "kubernetes.container_name",
                }
              }
              forward_to = [loki.write.loki_receiver.receiver]
            }
            loki.write "loki_receiver" {
              endpoint {
                url = "http://loki.logging.svc.cluster.local:3100/loki/api/v1/push"
                headers = { "X-Scope-OrgID" = "teamA" }
              }
            }
      helm:                                                             (5)
        - chart:
            name: alloy
            repo: https://grafana.github.io/helm-charts
            version: 0.11.0
          release:
            name: alloy
          values: |
            alloy:
              configMap:
                create: false
                name: alloy                                             (6)
1 Change nothing regarding service synchronization
2 Expect breaking changes
3 What to install on the virtual cluster, in case you can also install things on the host cluster
4 Regular Kubernetes manifests are a single string
5 We also use a Helm chart
6 The Helm Chart relies on the ConfigMap we created in the manifests section

Now, when the platform team creates the virtual cluster, it also creates a ConfigMap that contains Alloy’s configuration and installs the Alloy Helm Chart. The virtual cluster is now ready to use by the stream-aligned team.

Checking the setup

Let’s pretend we are now the stream-aligned team. We should be able to schedule any workload. Alloy should read its logs and send them to Loki.

Here’s the manifest for a pod that produces logs:

apiVersion: v1
kind: Pod
metadata:
  name: random-logger
spec:
  containers:
    - name: random-logger
      image: chentex/random-logger:latest
      args: [ "100", "1000", "500" ]                                    (1)
1 Produce a log at a rate between 1/10s and 1 second, and no more than 500

Let’s impersonate the stream-aligned team and schedule this workload in our virtual cluster. First, we connect to the virtual cluster created by the platform team:

vcluster connect vcluster
09:28:20 done vCluster is up and running
09:28:50 warn Error exposing local vcluster, will fallback to port-forwarding: test connection: context deadline exceeded retrieve default namespace: client rate limiter Wait returned an error: context deadline exceeded
09:28:50 info Starting background proxy container...
09:28:51 done Switched active kube context to vcluster_vcluster_vcluster_orbstack
- Use `vcluster disconnect` to return to your previous kube context
- Use `kubectl get namespaces` to access the vcluster

Now, we apply the manifest:

kubectl apply -f manifest.yaml

We can watch its logs:

kubectl logs random-logger
2025-02-19T08:30:54+0000 ERROR An error is usually an exception that has been caught and not handled.
2025-02-19T08:30:55+0000 DEBUG This is a debug log that shows a log that can be ignored.
2025-02-19T08:30:56+0000 INFO This is less important than debug log and is often used to provide context in the current task.
2025-02-19T08:30:56+0000 WARN A warning that should be ignored is usually at this level and should be actionable.
2025-02-19T08:30:57+0000 ERROR An error is usually an exception that has been caught and not handled.
2025-02-19T08:30:58+0000 DEBUG This is a debug log that shows a log that can be ignored.
2025-02-19T08:30:58+0000 INFO This is less important than debug log and is often used to provide context in the current task.
2025-02-19T08:30:59+0000 INFO This is less important than debug log and is often used to provide context in the current task.
2025-02-19T08:31:00+0000 INFO This is less important than debug log and is often used to provide context in the current task.

We expect Alloy to browse these logs and to send them to Loki. Let’s make sure of that and query Loki:

curl -G "http://localhost:31000/loki/api/v1/query_range" \
     -H "X-Scope-OrgID: teamA" \
     --data-urlencode 'query={job=~".+"}' \
     --data-urlencode 'start='$(date -v-5M +%s) \
     --data-urlencode 'end='$(date +%s) \
     --data-urlencode 'limit=10' | jq

Here’s a sample output:

{
  "status": "success",
  "data": {
    "resultType": "streams",
    "result": [
      {
        "stream": {
          "detected_level": "warn",
          "instance": "default/random-logger:random-logger",
          "job": "loki.source.kubernetes.k8s_logs",
          "service_name": "loki.source.kubernetes.k8s_logs"
        },
        "values": [
          [
            "1739953992976715408",
            "2025-02-19T08:33:12+0000 WARN A warning that should be ignored is usually at this level and should be actionable.\n"
          ],
          [
            "1739953990223682480",
            "2025-02-19T08:33:10+0000 WARN A warning that should be ignored is usually at this level and should be actionable.\n"
          ],
          [
            "1739953988680455003",
            "2025-02-19T08:33:08+0000 WARN A warning that should be ignored is usually at this level and should be actionable.\n"
          ]
        ]
      },
      {
        "stream": {
          "detected_level": "unknown",
          "instance": "default/random-logger:random-logger",
          "job": "loki.source.kubernetes.k8s_logs",
          "service_name": "loki.source.kubernetes.k8s_logs"
        },
        "values": [
          [
            "1739953993211356106",
            "2025-02-19T08:33:13+0000 DEBUG This is a debug log that shows a log that can be ignored.\n"
          ],
          [
            "1739953992020133029",
            "2025-02-19T08:33:12+0000 DEBUG This is a debug log that shows a log that can be ignored.\n"
          ],
          [
            "1739953991824531942",
            "2025-02-19T08:33:11+0000 DEBUG This is a debug log that shows a log that can be ignored.\n"
          ]
        ]
      },
      {
        "stream": {
          "detected_level": "info",
          "instance": "default/random-logger:random-logger",
          "job": "loki.source.kubernetes.k8s_logs",
          "service_name": "loki.source.kubernetes.k8s_logs"
        },
        "values": [
          [
            "1739953991105386360",
            "2025-02-19T08:33:11+0000 INFO This is less important than debug log and is often used to provide context in the current task.\n"
          ]
        ]
      },
      {
        "stream": {
          "detected_level": "error",
          "instance": "default/random-logger:random-logger",
          "job": "loki.source.kubernetes.k8s_logs",
          "service_name": "loki.source.kubernetes.k8s_logs"
        },
        "values": [
          [
            "1739953993425213822",
            "2025-02-19T08:33:13+0000 ERROR An error is usually an exception that has been caught and not handled.\n"
          ],
          [
            "1739953989976791975",
            "2025-02-19T08:33:09+0000 ERROR An error is usually an exception that has been caught and not handled.\n"
          ],
          [
            "1739953989424254865",
            "2025-02-19T08:33:09+0000 ERROR An error is usually an exception that has been caught and not handled.\n"
          ]
        ]
      }
    ],
    "stats": {
      ...
    }
  }
}

Conclusion

In this post, I’ve demoed how your DevOps team can provide ready-to-use virtual clusters to the using teams, by using logging as an example. Of course, you can apply it to any cross-cutting concern: traces, authentication, mTLS, etc.

Though the feature is experimental, it works perfectly. It can deploy on the virtual cluster, as we did, but also on the host cluster. It handles regular manifests, templated manifests, and Helm Charts.

The complete source code for this post can be found on GitHub.