Skip to content

GKE Deployment Guide

This guide covers deploying cloudtaser on GKE with the recommended configuration for maximum protection. With Ubuntu nodes and Confidential Computing enabled, you can achieve 100/115 protection score today, and 115/115 once cloudtaser-ebpf#175 re-routes the last remaining perf_event_open enforcement to a BPF LSM hook. See the protection-score reference for the full breakdown.


Why GKE Ubuntu + Confidential Nodes (today's recommendation)

GKE offers two node image types: Container-Optimized OS (COS) and Ubuntu. For the highest protection score available today, Ubuntu is the recommended choice — synchronous syscall blocking on the override-allowed subset (~15 of 16 syscalls).

Feature Ubuntu (linux-gke 6.8+) COS (5.15 / 6.1 / 6.6)
CONFIG_BPF_KPROBE_OVERRIDE Yes No (upstream kernel default n; locked-down distro)
CONFIG_BPF_LSM Yes Yes — supported on COS, kernel-team-endorsed
memfd_secret (kernel 5.14+) Yes Yes
Synchronous syscall blocking — kprobe path 15 of 16 syscalls today; kprobe_perf_event_open drops (see #175) None — bpf_override_return() not available
Synchronous syscall blocking — LSM path Available post-#174 Available post-#174
Wrapper hardening (dumpable=0, +5) Yes — synchronous baseline independent of kprobe Yes — synchronous baseline independent of kprobe
Today's protection-score ceiling 100/115 85/115
Post-#175 ceiling 115/115 85/115 (still gated on #174)
Post-#174 ceiling 115/115 100/115 (parity via LSM hook)

Ubuntu gives you kprobe override on the override-allowed subset today. The CONFIG_BPF_KPROBE_OVERRIDE=y kernel config enables bpf_override_return(), which lets the eBPF agent prevent a syscall from executing before it completes. The remaining gap is kprobe_perf_event_open: do_sys_perf_event_open is not in the upstream kernel's ALLOW_ERROR_INJECTION allow-list, so this single kprobe will never load on stock kernels (any distro). cloudtaser-ebpf#175 tracks the migration of perf_event_open enforcement to bpf_lsm_perf_event_open, which closes the last 15-point gap on Ubuntu.

On GKE COS today, the eBPF kprobe-override path is unavailable. COS ships without CONFIG_BPF_KPROBE_OVERRIDE (upstream kernel default n; the COS team treats error_injection as a debug-only feature and does not enable it on a hardened production distro). Today, the eBPF agent's syscall blocking on COS runs in detect+kill mode (tracepoint detection followed by SIGKILL); the wrapper's dumpable=0 (+5) provides the synchronous baseline that is independent of kprobe override. BPF LSM hooks ARE supported on COS (verified CONFIG_BPF_LSM=y on cos-5.15 / 6.1 / 6.6 lakitu_defconfig), and are the kernel-team-endorsed path for synchronous policy in production. cloudtaser-ebpf#174 tracks the strategic migration to LSM hooks, which will bring COS / Bottlerocket / Talos to parity with Ubuntu's synchronous-block posture.

Confidential nodes give you hardware memory encryption. GKE Confidential Nodes use AMD SEV-SNP to encrypt VM memory at the hardware level. The hypervisor and cloud provider cannot read the memory contents. This closes the last remaining attack surface after all software protections are in place.


Step 1: Create the GKE Cluster

Create a cluster with Ubuntu nodes and Confidential Computing:

gcloud container clusters create cloudtaser-prod \
  --region europe-west4 \
  --num-nodes 3 \
  --image-type UBUNTU_CONTAINERD \
  --enable-confidential-nodes \
  --machine-type n2d-standard-2 \
  --workload-pool "$(gcloud config get-value project).svc.id.goog" \
  --release-channel regular

Key flags:

Flag Purpose
--image-type UBUNTU_CONTAINERD Ubuntu nodes with kprobe override support
--enable-confidential-nodes AMD SEV-SNP memory encryption on all nodes
--machine-type n2d-standard-2 N2D instances required for Confidential Computing (AMD EPYC)
--workload-pool Workload Identity for GCP service account binding
--region europe-west4 EU region for data residency

N2D machine type required

Confidential Computing on GKE requires N2D (AMD EPYC) instances. Other machine families (N2, E2, C3) do not support AMD SEV-SNP.


Step 2: Connect the Cluster to Your OpenBao

Use the cloudtaser CLI to configure Kubernetes auth on your EU-hosted OpenBao:

# Connect to the cluster
gcloud container clusters get-credentials cloudtaser-prod --region europe-west4

# Connect the cluster to your vault
cloudtaser-cli target connect \
  --secretstore-address https://vault.eu.example.com \
  --secretstore-token hvs.YOUR_ROOT_TOKEN \
  --auth-path kubernetes/gke-prod

This configures OpenBao's Kubernetes auth method to accept ServiceAccount JWTs from the GKE cluster.


Step 3: Install cloudtaser

Install the operator and eBPF daemonset via Helm:

helm repo add cloudtaser https://charts.cloudtaser.io
helm install cloudtaser cloudtaser/cloudtaser \
  --namespace cloudtaser-system \
  --create-namespace \
  --set operator.secretstore.address=https://vault.eu.example.com \
  --set ebpf.enabled=true \
  --set ebpf.enforceMode=true

Or use the CLI:

cloudtaser-cli target install \
  --secretstore-address https://vault.eu.example.com \
  --ebpf \
  --enforce

Verify the installation:

kubectl get pods -n cloudtaser-system

Expected: operator and eBPF daemonset pods in Running state.


Step 4: Deploy a Protected Workload

Annotate your deployment with cloudtaser annotations:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      annotations:
        cloudtaser.io/inject: "true"
        cloudtaser.io/ebpf: "true"
        cloudtaser.io/secretstore-address: "https://vault.eu.example.com"
        cloudtaser.io/secretstore-role: "cloudtaser"
        cloudtaser.io/secret-paths: "secret/data/myapp/config"
        cloudtaser.io/env-map: "db_password=PGPASSWORD,api_key=API_KEY"
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: myorg/myapp:v1.2.3

Required Annotations

Annotation Required Description
cloudtaser.io/inject Yes Enables cloudtaser injection ("true")
cloudtaser.io/ebpf No Enables eBPF runtime enforcement ("true")
cloudtaser.io/secretstore-address Yes URL of the EU-hosted OpenBao
cloudtaser.io/secretstore-role Yes OpenBao Kubernetes auth role name
cloudtaser.io/secret-paths Yes Comma-separated OpenBao secret paths
cloudtaser.io/env-map Yes Maps OpenBao fields to environment variable names

Apply the deployment:

kubectl apply -f deployment.yaml

Step 5: Verify the Protection Score

Check the wrapper logs to confirm the protection score:

kubectl logs -n production deploy/myapp -c myapp 2>&1 | grep "protection"

With Ubuntu + Confidential Nodes + eBPF enforcement, you should see (live demo on GKE Ubuntu, kernel 6.8.0-1042-gke):

[cloudtaser-wrapper] Protection score: 100/115
[cloudtaser-wrapper]   memfd_secret:          OK  (+15)
[cloudtaser-wrapper]   mlock:                 OK  (+10)
[cloudtaser-wrapper]   core_dump_exclusion:   OK  (+5)
[cloudtaser-wrapper]   dumpable_disabled:     OK  (+5)
[cloudtaser-wrapper]   token_protected:       OK  (+10)
[cloudtaser-wrapper]   environ_scrubbed:      OK  (+5)
[cloudtaser-wrapper]   getenv_interposer:     OK  (+10)
[cloudtaser-wrapper]   ebpf_agent_connected:  OK  (+10)
[cloudtaser-wrapper]   cpu_mitigations:       OK  (+5)
[cloudtaser-wrapper]   ebpf_enforce_mode:     OK  (+15)
[cloudtaser-wrapper]   ebpf_kprobes:          PARTIAL (15/16 attached -- perf_event_open dropped)
[cloudtaser-wrapper]   confidential_vm:       OK  (+10)

The 15-point gap on ebpf_kprobes is the upstream-kernel ALLOW_ERROR_INJECTION allow-list constraint described above. 115/115 on Ubuntu post-cloudtaser-ebpf#175 (LSM hook re-route to bpf_lsm_perf_event_open); 100/115 on COS post-cloudtaser-ebpf#174 (BPF LSM migration).


No nodeSelector Needed

When all nodes in the cluster have the cloud.google.com/gke-confidential-nodes=true label (which they do when --enable-confidential-nodes is set at cluster creation), the operator auto-detects confidential node support. There is no need to add a nodeSelector to your workloads.

If you have a mixed cluster with both confidential and non-confidential node pools, the operator still detects the capability per-node and reports the confidential_vm check accordingly.


Container Image Requirements

The getenv_interposer check (10 points) is an optional glibc-only enhancement. The LD_PRELOAD interposer (libcloudtaser.so) blocks getenv() from returning secrets on the heap by returning pointers to memfd_secret-backed memory instead. It does not activate on musl or statically linked binaries -- those use the default env-var delivery path, which works for all binaries without code changes.

Base Image getenv_interposer Recommendation
Debian / Ubuntu Supported Recommended
Red Hat / Fedora Supported Recommended
Alpine (musl) Not supported Use debian-slim instead
Distroless (glibc) Supported Works
Distroless (static) Not supported Use glibc variant
Scratch (static binary) Not supported Uses default env-var delivery

Switch from Alpine to Debian slim

If your application uses alpine as the base image, consider switching to debian:bookworm-slim or ubuntu:24.04 for the same small footprint with glibc support. This enables the optional getenv interposer and adds 10 points to your protection score. Alpine and musl-based images still receive secrets through the default env-var delivery path.


Full Workflow Example

End-to-end deployment from scratch:

# 1. Create the cluster
gcloud container clusters create cloudtaser-prod \
  --region europe-west4 \
  --num-nodes 3 \
  --image-type UBUNTU_CONTAINERD \
  --enable-confidential-nodes \
  --machine-type n2d-standard-2 \
  --workload-pool "$(gcloud config get-value project).svc.id.goog"

# 2. Get credentials
gcloud container clusters get-credentials cloudtaser-prod --region europe-west4

# 3. Connect to vault
cloudtaser-cli target connect \
  --secretstore-address https://vault.eu.example.com \
  --secretstore-token hvs.YOUR_ROOT_TOKEN

# 4. Install cloudtaser
cloudtaser-cli target install \
  --secretstore-address https://vault.eu.example.com \
  --ebpf --enforce

# 5. Discover workloads and generate migration plan
cloudtaser-cli target discover -o plan.yaml

# 6. Apply plan to vault (provision policies and roles)
cloudtaser-cli source apply-plan plan.yaml \
  --openbao-addr https://vault.eu.example.com \
  --token hvs.YOUR_ROOT_TOKEN

# 7. Populate secrets in vault
bao kv put secret/myapp/config db_password=supersecret api_key=sk-live-xxx

# 8. Verify secrets exist
cloudtaser-cli source verify-plan plan.yaml \
  --openbao-addr https://vault.eu.example.com \
  --token hvs.YOUR_ROOT_TOKEN

# 9. Migrate workloads
cloudtaser-cli target protect --plan plan.yaml \
  --secretstore-address https://vault.eu.example.com \
  --interactive

# 10. Verify protection scores
cloudtaser-cli target status --namespace production

Troubleshooting

Symptom Cause Fix
confidential_vm: FAIL Non-N2D machine type Recreate node pool with --machine-type n2d-standard-2 --enable-confidential-nodes
ebpf_kprobes: FAIL (all probes drop) COS node image — CONFIG_BPF_KPROBE_OVERRIDE=n Recreate node pool with --image-type UBUNTU_CONTAINERD. Reactive-kill on the kprobe path is in effect on COS until cloudtaser-ebpf#174 (BPF LSM migration) ships.
ebpf_kprobes: PARTIAL (1 of 16 dropped) kprobe_perf_event_open dropped — do_sys_perf_event_open is not in the upstream ALLOW_ERROR_INJECTION allow-list Expected on every stock kernel today. Tracked under cloudtaser-ebpf#175; resolved by migrating to bpf_lsm_perf_event_open.
getenv_interposer: FAIL Alpine or musl-based image Switch to a debian/ubuntu-based container image
ebpf_agent_connected: FAIL eBPF daemonset not running Check kubectl get ds -n cloudtaser-system
ebpf_enforce_mode: FAIL Enforce mode not enabled Set ebpf.enforceMode=true in Helm values

What we recommend you also run

cloudtaser-ebpf does not occupy the entire BPF LSM stack. BPF LSM-based tools compose cleanly with cloudtaser — they hook different LSM call sites (network egress, file ACLs, capability drops, container lifecycle) and do not conflict with cloudtaser's syscall-blocking programs. Pairing cloudtaser with one of the following layers gives you defense-in-depth without instrumentation overlap:

  • Tetragon — Cilium's runtime security observability and enforcement. Synchronous policy via BPF LSM hooks, fully supported on COS / Bottlerocket / Talos. Hooks process exec, file access, network connect, capability use. Composes cleanly with cloudtaser-ebpf (different hook points; no bpf_override_return collision).
  • KubeArmor — runtime policy via BPF LSM and AppArmor / SELinux fallback. Strong on file-path policy and process whitelisting per container.

A forthcoming comparison page on cloudtaser.io will document the recommended pairings and the threat-model overlap explicitly — see cloudtaser-io-website#277.

Why this matters more on COS / Bottlerocket

Until cloudtaser-ebpf#174 ships and brings COS / Bottlerocket / Talos to synchronous-block parity, pairing cloudtaser with Tetragon or KubeArmor on those distros gives you BPF LSM-based synchronous enforcement now. On Ubuntu, the pairing is still valuable (cloudtaser blocks at the syscall level; Tetragon/KubeArmor add policy at the LSM level) but less load-bearing.


NetworkPolicy and NodeLocal DNSCache

If you use Kubernetes NetworkPolicy to restrict egress from protected namespaces, wrapper pods may fail with context deadline exceeded during the broker secret fetch. The root cause is GKE's NodeLocal DNSCache (node-local-dns), which runs as a hostNetwork: true DaemonSet listening on 169.254.20.10. Because it uses the host network, it does not match a namespaceSelector targeting kube-system — even though the node-local-dns pods live in the kube-system namespace.

The problem

A NetworkPolicy like this looks correct but silently blocks DNS when NodeLocal DNSCache is active:

# BROKEN: does not match hostNetwork pods
egress:
  - to:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: kube-system
    ports:
      - protocol: UDP
        port: 53
      - protocol: TCP
        port: 53

Pods resolve DNS via the node-local cache at 169.254.20.10, not via the kube-dns ClusterIP. The namespaceSelector rule only matches traffic to pod IPs in kube-system, which does not include host-network addresses.

The fix

Add an explicit egress rule for the NodeLocal DNSCache IP in addition to the kube-system selector (the kube-system rule is still needed as a fallback when NodeLocal DNSCache is not present or for TCP DNS to upstream):

egress:
  # NodeLocal DNSCache (hostNetwork, 169.254.20.10)
  - to:
      - ipBlock:
          cidr: 169.254.20.10/32
    ports:
      - protocol: UDP
        port: 53
      - protocol: TCP
        port: 53
  # kube-dns fallback (pod network)
  - to:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: kube-system
    ports:
      - protocol: UDP
        port: 53
      - protocol: TCP
        port: 53

Self-check

Verify DNS works from within a restricted namespace:

kubectl run dns-test --rm -it --restart=Never \
  --namespace <your-namespace> \
  --image=busybox:1.36 -- nslookup kubernetes.default

If the command hangs or returns server can't find kubernetes.default: SERVFAIL, the NetworkPolicy is still blocking DNS.

Applies to any workload behind NetworkPolicy

This is not cloudtaser-specific — any pod behind a restrictive egress NetworkPolicy on GKE with NodeLocal DNSCache enabled will hit the same issue. CloudTaser wrapper pods surface it early because the broker fetch is the first network call after pod start.

Reference: cloudtaser-demo#261


See also

References