GKE Deployment Guide¶

This guide covers deploying cloudtaser on GKE with the recommended configuration for maximum protection. With Ubuntu nodes and Confidential Computing enabled, you can achieve 100/115 protection score today, and 115/115 once cloudtaser-ebpf#175 re-routes the last remaining perf_event_open enforcement to a BPF LSM hook. See the protection-score reference for the full breakdown.

Why GKE Ubuntu + Confidential Nodes (today's recommendation)¶

GKE offers two node image types: Container-Optimized OS (COS) and Ubuntu. For the highest protection score available today, Ubuntu is the recommended choice — synchronous syscall blocking on the override-allowed subset (~15 of 16 syscalls).

Feature	Ubuntu (linux-gke 6.8+)	COS (5.15 / 6.1 / 6.6)
`CONFIG_BPF_KPROBE_OVERRIDE`	Yes	No (upstream kernel default `n`; locked-down distro)
`CONFIG_BPF_LSM`	Yes	Yes — supported on COS, kernel-team-endorsed
`memfd_secret` (kernel 5.14+)	Yes	Yes
Synchronous syscall blocking — kprobe path	15 of 16 syscalls today; `kprobe_perf_event_open` drops (see #175)	None — `bpf_override_return()` not available
Synchronous syscall blocking — LSM path	Available post-#174	Available post-#174
Wrapper hardening (`dumpable=0`, `+5`)	Yes — synchronous baseline independent of kprobe	Yes — synchronous baseline independent of kprobe
Today's protection-score ceiling	100/115	85/115
Post-#175 ceiling	115/115	85/115 (still gated on #174)
Post-#174 ceiling	115/115	100/115 (parity via LSM hook)

Ubuntu gives you kprobe override on the override-allowed subset today. The CONFIG_BPF_KPROBE_OVERRIDE=y kernel config enables bpf_override_return(), which lets the eBPF agent prevent a syscall from executing before it completes. The remaining gap is kprobe_perf_event_open: do_sys_perf_event_open is not in the upstream kernel's ALLOW_ERROR_INJECTION allow-list, so this single kprobe will never load on stock kernels (any distro). cloudtaser-ebpf#175 tracks the migration of perf_event_open enforcement to bpf_lsm_perf_event_open, which closes the last 15-point gap on Ubuntu.

On GKE COS today, the eBPF kprobe-override path is unavailable. COS ships without CONFIG_BPF_KPROBE_OVERRIDE (upstream kernel default n; the COS team treats error_injection as a debug-only feature and does not enable it on a hardened production distro). Today, the eBPF agent's syscall blocking on COS runs in detect+kill mode (tracepoint detection followed by SIGKILL); the wrapper's dumpable=0 (+5) provides the synchronous baseline that is independent of kprobe override. BPF LSM hooks ARE supported on COS (verified CONFIG_BPF_LSM=y on cos-5.15 / 6.1 / 6.6 lakitu_defconfig), and are the kernel-team-endorsed path for synchronous policy in production. cloudtaser-ebpf#174 tracks the strategic migration to LSM hooks, which will bring COS / Bottlerocket / Talos to parity with Ubuntu's synchronous-block posture.

Confidential nodes give you hardware memory encryption. GKE Confidential Nodes use AMD SEV-SNP to encrypt VM memory at the hardware level. The hypervisor and cloud provider cannot read the memory contents. This closes the last remaining attack surface after all software protections are in place.

Step 1: Create the GKE Cluster¶

Create a cluster with Ubuntu nodes and Confidential Computing:

gcloud container clusters create cloudtaser-prod \
  --region europe-west4 \
  --num-nodes 3 \
  --image-type UBUNTU_CONTAINERD \
  --enable-confidential-nodes \
  --machine-type n2d-standard-2 \
  --workload-pool "$(gcloud config get-value project).svc.id.goog" \
  --release-channel regular

Key flags:

Flag	Purpose
`--image-type UBUNTU_CONTAINERD`	Ubuntu nodes with kprobe override support
`--enable-confidential-nodes`	AMD SEV-SNP memory encryption on all nodes
`--machine-type n2d-standard-2`	N2D instances required for Confidential Computing (AMD EPYC)
`--workload-pool`	Workload Identity for GCP service account binding
`--region europe-west4`	EU region for data residency

N2D machine type required

Confidential Computing on GKE requires N2D (AMD EPYC) instances. Other machine families (N2, E2, C3) do not support AMD SEV-SNP.

Step 2: Connect the Cluster to Your OpenBao¶

Use the cloudtaser CLI to configure Kubernetes auth on your EU-hosted OpenBao:

# Connect to the cluster
gcloud container clusters get-credentials cloudtaser-prod --region europe-west4

# Connect the cluster to your vault
cloudtaser-cli target connect \
  --secretstore-address https://vault.eu.example.com \
  --secretstore-token hvs.YOUR_ROOT_TOKEN \
  --auth-path kubernetes/gke-prod

This configures OpenBao's Kubernetes auth method to accept ServiceAccount JWTs from the GKE cluster.

Step 3: Install cloudtaser¶

Install the operator and eBPF daemonset via Helm:

helm repo add cloudtaser https://charts.cloudtaser.io
helm install cloudtaser cloudtaser/cloudtaser \
  --namespace cloudtaser-system \
  --create-namespace \
  --set operator.secretstore.address=https://vault.eu.example.com \
  --set ebpf.enabled=true \
  --set ebpf.enforceMode=true

Or use the CLI:

cloudtaser-cli target install \
  --secretstore-address https://vault.eu.example.com \
  --ebpf \
  --enforce

Verify the installation:

kubectl get pods -n cloudtaser-system

Expected: operator and eBPF daemonset pods in Running state.

Step 4: Deploy a Protected Workload¶

Annotate your deployment with cloudtaser annotations:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      annotations:
        cloudtaser.io/inject: "true"
        cloudtaser.io/ebpf: "true"
        cloudtaser.io/secretstore-address: "https://vault.eu.example.com"
        cloudtaser.io/secretstore-role: "cloudtaser"
        cloudtaser.io/secret-paths: "secret/data/myapp/config"
        cloudtaser.io/env-map: "db_password=PGPASSWORD,api_key=API_KEY"
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: myorg/myapp:v1.2.3

Required Annotations¶

Annotation	Required	Description
`cloudtaser.io/inject`	Yes	Enables cloudtaser injection (`"true"`)
`cloudtaser.io/ebpf`	No	Enables eBPF runtime enforcement (`"true"`)
`cloudtaser.io/secretstore-address`	Yes	URL of the EU-hosted OpenBao
`cloudtaser.io/secretstore-role`	Yes	OpenBao Kubernetes auth role name
`cloudtaser.io/secret-paths`	Yes	Comma-separated OpenBao secret paths
`cloudtaser.io/env-map`	Yes	Maps OpenBao fields to environment variable names

Apply the deployment:

kubectl apply -f deployment.yaml

Step 5: Verify the Protection Score¶

Check the wrapper logs to confirm the protection score:

kubectl logs -n production deploy/myapp -c myapp 2>&1 | grep "protection"

With Ubuntu + Confidential Nodes + eBPF enforcement, you should see (live demo on GKE Ubuntu, kernel 6.8.0-1042-gke):

[cloudtaser-wrapper] Protection score: 100/115
[cloudtaser-wrapper]   memfd_secret:          OK  (+15)
[cloudtaser-wrapper]   mlock:                 OK  (+10)
[cloudtaser-wrapper]   core_dump_exclusion:   OK  (+5)
[cloudtaser-wrapper]   dumpable_disabled:     OK  (+5)
[cloudtaser-wrapper]   token_protected:       OK  (+10)
[cloudtaser-wrapper]   environ_scrubbed:      OK  (+5)
[cloudtaser-wrapper]   getenv_interposer:     OK  (+10)
[cloudtaser-wrapper]   ebpf_agent_connected:  OK  (+10)
[cloudtaser-wrapper]   cpu_mitigations:       OK  (+5)
[cloudtaser-wrapper]   ebpf_enforce_mode:     OK  (+15)
[cloudtaser-wrapper]   ebpf_kprobes:          PARTIAL (15/16 attached -- perf_event_open dropped)
[cloudtaser-wrapper]   confidential_vm:       OK  (+10)

The 15-point gap on ebpf_kprobes is the upstream-kernel ALLOW_ERROR_INJECTION allow-list constraint described above. 115/115 on Ubuntu post-cloudtaser-ebpf#175 (LSM hook re-route to bpf_lsm_perf_event_open); 100/115 on COS post-cloudtaser-ebpf#174 (BPF LSM migration).

No nodeSelector Needed¶

When all nodes in the cluster have the cloud.google.com/gke-confidential-nodes=true label (which they do when --enable-confidential-nodes is set at cluster creation), the operator auto-detects confidential node support. There is no need to add a nodeSelector to your workloads.

If you have a mixed cluster with both confidential and non-confidential node pools, the operator still detects the capability per-node and reports the confidential_vm check accordingly.

Container Image Requirements¶

The getenv_interposer check (10 points) is an optional glibc-only enhancement. The LD_PRELOAD interposer (libcloudtaser.so) blocks getenv() from returning secrets on the heap by returning pointers to memfd_secret-backed memory instead. It does not activate on musl or statically linked binaries -- those use the default env-var delivery path, which works for all binaries without code changes.

Base Image	getenv_interposer	Recommendation
Debian / Ubuntu	Supported	Recommended
Red Hat / Fedora	Supported	Recommended
Alpine (musl)	Not supported	Use debian-slim instead
Distroless (glibc)	Supported	Works
Distroless (static)	Not supported	Use glibc variant
Scratch (static binary)	Not supported	Uses default env-var delivery

Switch from Alpine to Debian slim

If your application uses alpine as the base image, consider switching to debian:bookworm-slim or ubuntu:24.04 for the same small footprint with glibc support. This enables the optional getenv interposer and adds 10 points to your protection score. Alpine and musl-based images still receive secrets through the default env-var delivery path.

Full Workflow Example¶

End-to-end deployment from scratch:

# 1. Create the cluster
gcloud container clusters create cloudtaser-prod \
  --region europe-west4 \
  --num-nodes 3 \
  --image-type UBUNTU_CONTAINERD \
  --enable-confidential-nodes \
  --machine-type n2d-standard-2 \
  --workload-pool "$(gcloud config get-value project).svc.id.goog"

# 2. Get credentials
gcloud container clusters get-credentials cloudtaser-prod --region europe-west4

# 3. Connect to vault
cloudtaser-cli target connect \
  --secretstore-address https://vault.eu.example.com \
  --secretstore-token hvs.YOUR_ROOT_TOKEN

# 4. Install cloudtaser
cloudtaser-cli target install \
  --secretstore-address https://vault.eu.example.com \
  --ebpf --enforce

# 5. Discover workloads and generate migration plan
cloudtaser-cli target discover -o plan.yaml

# 6. Apply plan to vault (provision policies and roles)
cloudtaser-cli source apply-plan plan.yaml \
  --openbao-addr https://vault.eu.example.com \
  --token hvs.YOUR_ROOT_TOKEN

# 7. Populate secrets in vault
bao kv put secret/myapp/config db_password=supersecret api_key=sk-live-xxx

# 8. Verify secrets exist
cloudtaser-cli source verify-plan plan.yaml \
  --openbao-addr https://vault.eu.example.com \
  --token hvs.YOUR_ROOT_TOKEN

# 9. Migrate workloads
cloudtaser-cli target protect --plan plan.yaml \
  --secretstore-address https://vault.eu.example.com \
  --interactive

# 10. Verify protection scores
cloudtaser-cli target status --namespace production

Troubleshooting¶

Symptom	Cause	Fix
`confidential_vm: FAIL`	Non-N2D machine type	Recreate node pool with `--machine-type n2d-standard-2 --enable-confidential-nodes`
`ebpf_kprobes: FAIL` (all probes drop)	COS node image — `CONFIG_BPF_KPROBE_OVERRIDE=n`	Recreate node pool with `--image-type UBUNTU_CONTAINERD`. Reactive-kill on the kprobe path is in effect on COS until cloudtaser-ebpf#174 (BPF LSM migration) ships.
`ebpf_kprobes: PARTIAL` (1 of 16 dropped)	`kprobe_perf_event_open` dropped — `do_sys_perf_event_open` is not in the upstream `ALLOW_ERROR_INJECTION` allow-list	Expected on every stock kernel today. Tracked under cloudtaser-ebpf#175; resolved by migrating to `bpf_lsm_perf_event_open`.
`getenv_interposer: FAIL`	Alpine or musl-based image	Switch to a debian/ubuntu-based container image
`ebpf_agent_connected: FAIL`	eBPF daemonset not running	Check `kubectl get ds -n cloudtaser-system`
`ebpf_enforce_mode: FAIL`	Enforce mode not enabled	Set `ebpf.enforceMode=true` in Helm values

cloudtaser-ebpf does not occupy the entire BPF LSM stack. BPF LSM-based tools compose cleanly with cloudtaser — they hook different LSM call sites (network egress, file ACLs, capability drops, container lifecycle) and do not conflict with cloudtaser's syscall-blocking programs. Pairing cloudtaser with one of the following layers gives you defense-in-depth without instrumentation overlap:

Tetragon — Cilium's runtime security observability and enforcement. Synchronous policy via BPF LSM hooks, fully supported on COS / Bottlerocket / Talos. Hooks process exec, file access, network connect, capability use. Composes cleanly with cloudtaser-ebpf (different hook points; no bpf_override_return collision).
KubeArmor — runtime policy via BPF LSM and AppArmor / SELinux fallback. Strong on file-path policy and process whitelisting per container.

A forthcoming comparison page on cloudtaser.io will document the recommended pairings and the threat-model overlap explicitly — see cloudtaser-io-website#277.

Why this matters more on COS / Bottlerocket

Until cloudtaser-ebpf#174 ships and brings COS / Bottlerocket / Talos to synchronous-block parity, pairing cloudtaser with Tetragon or KubeArmor on those distros gives you BPF LSM-based synchronous enforcement now. On Ubuntu, the pairing is still valuable (cloudtaser blocks at the syscall level; Tetragon/KubeArmor add policy at the LSM level) but less load-bearing.

NetworkPolicy and NodeLocal DNSCache¶

If you use Kubernetes NetworkPolicy to restrict egress from protected namespaces, wrapper pods may fail with context deadline exceeded during the broker secret fetch. The root cause is GKE's NodeLocal DNSCache (node-local-dns), which runs as a hostNetwork: true DaemonSet listening on 169.254.20.10. Because it uses the host network, it does not match a namespaceSelector targeting kube-system — even though the node-local-dns pods live in the kube-system namespace.

The problem¶

A NetworkPolicy like this looks correct but silently blocks DNS when NodeLocal DNSCache is active:

# BROKEN: does not match hostNetwork pods
egress:
  - to:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: kube-system
    ports:
      - protocol: UDP
        port: 53
      - protocol: TCP
        port: 53

Pods resolve DNS via the node-local cache at 169.254.20.10, not via the kube-dns ClusterIP. The namespaceSelector rule only matches traffic to pod IPs in kube-system, which does not include host-network addresses.

The fix¶

Add an explicit egress rule for the NodeLocal DNSCache IP in addition to the kube-system selector (the kube-system rule is still needed as a fallback when NodeLocal DNSCache is not present or for TCP DNS to upstream):

egress:
  # NodeLocal DNSCache (hostNetwork, 169.254.20.10)
  - to:
      - ipBlock:
          cidr: 169.254.20.10/32
    ports:
      - protocol: UDP
        port: 53
      - protocol: TCP
        port: 53
  # kube-dns fallback (pod network)
  - to:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: kube-system
    ports:
      - protocol: UDP
        port: 53
      - protocol: TCP
        port: 53

Self-check¶

Verify DNS works from within a restricted namespace:

kubectl run dns-test --rm -it --restart=Never \
  --namespace <your-namespace> \
  --image=busybox:1.36 -- nslookup kubernetes.default

If the command hangs or returns server can't find kubernetes.default: SERVFAIL, the NetworkPolicy is still blocking DNS.

Applies to any workload behind NetworkPolicy

This is not cloudtaser-specific — any pod behind a restrictive egress NetworkPolicy on GKE with NodeLocal DNSCache enabled will hit the same issue. CloudTaser wrapper pods surface it early because the broker fetch is the first network call after pod start.

Reference: cloudtaser-demo#261

GKE Deployment Guide¶

Why GKE Ubuntu + Confidential Nodes (today's recommendation)¶

Step 1: Create the GKE Cluster¶

Step 2: Connect the Cluster to Your OpenBao¶

Step 3: Install cloudtaser¶

Step 4: Deploy a Protected Workload¶

Required Annotations¶

Step 5: Verify the Protection Score¶

No nodeSelector Needed¶

Container Image Requirements¶

Full Workflow Example¶

Troubleshooting¶

NetworkPolicy and NodeLocal DNSCache¶

The problem¶

The fix¶

Self-check¶

See also¶

References¶

GKE Deployment Guide¶

Why GKE Ubuntu + Confidential Nodes (today's recommendation)¶

Step 1: Create the GKE Cluster¶

Step 2: Connect the Cluster to Your OpenBao¶

Step 3: Install cloudtaser¶

Step 4: Deploy a Protected Workload¶

Required Annotations¶

Step 5: Verify the Protection Score¶

No nodeSelector Needed¶

Container Image Requirements¶

Full Workflow Example¶

Troubleshooting¶

What we recommend you also run¶

NetworkPolicy and NodeLocal DNSCache¶

The problem¶

The fix¶

Self-check¶

See also¶

References¶