Troubleshooting¶

Common issues and their solutions when running CloudTaser.

Pods Stuck in Init¶

Symptom: Pod stays in Init:0/1 state. The init container that copies the wrapper binary has not completed.

Diagnosis:

kubectl describe pod <pod-name>
kubectl logs <pod-name> -c cloudtaser-init

Common causes:

1. Wrapper image not pullable¶

The init container pulls from ghcr.io/skipopsltd/cloudtaser-wrapper. Verify that image pull secrets are configured:

kubectl get pod <pod-name> -o jsonpath='{.spec.imagePullSecrets}'

If using a private registry, ensure imagePullSecrets is set in the Helm values.

2. EmptyDir volume mount failure¶

The wrapper is copied to a memory-backed emptyDir at /cloudtaser/. Check that the node has sufficient memory for the volume (the wrapper binary is approximately 10 MB).

3. Operator not running¶

If the operator is down, the mutating webhook may fail and block pod creation (default failurePolicy: Fail):

kubectl get pods -n cloudtaser-system

failurePolicy: Fail blocks pod creation

If the operator is not running, pods will fail to schedule. Restart the operator or temporarily set failurePolicy: Ignore in the MutatingWebhookConfiguration. Setting it to Ignore means pods will start without injection -- use only as an emergency measure.

Wrapper Cannot Connect to Vault¶

Symptom: Pod starts but the application does not receive secrets. Container logs show vault connection errors.

Diagnosis:

kubectl logs <pod-name> -c <container-name> | grep -i vault

Common causes:

1. Vault endpoint unreachable¶

Verify network connectivity from the pod to the vault:

kubectl exec <pod-name> -- wget -q --spider https://vault.eu.example.com/v1/sys/health

Check firewall rules, VPC peering, security groups, and NetworkPolicies.

2. Kubernetes auth not configured¶

The wrapper authenticates to vault using the pod's ServiceAccount token. Verify the auth method is configured:

cloudtaser validate \
  --vault-address https://vault.eu.example.com \
  --vault-token hvs.YOUR_TOKEN

If auth is not configured, run cloudtaser connect to set it up.

3. Wrong vault role¶

Verify the cloudtaser.io/vault-role annotation matches a role configured in vault:

vault read auth/kubernetes/role/cloudtaser

4. ServiceAccount not bound¶

The vault role must allow the pod's ServiceAccount. Check the bound_service_account_names and bound_service_account_namespaces in the vault role.

5. TLS certificate error¶

If vault uses a private CA, the wrapper cannot verify the certificate. Mount the CA bundle into the pod.

Quick connectivity test

Run cloudtaser validate --vault-address https://vault.eu.example.com to check vault health, seal status, and Kubernetes auth configuration in one step.

eBPF Agent Not Starting¶

Symptom: cloudtaser-ebpf pods are in CrashLoopBackOff or not starting.

Diagnosis:

kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf
kubectl describe daemonset -n cloudtaser-system cloudtaser-ebpf

Common causes:

1. Kernel too old¶

The eBPF agent requires Linux kernel 5.8+ with BTF (BPF Type Format) support. Check the node kernel:

kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.kernelVersion}{"\n"}{end}'

Kernel version recommendations

For full protection (including memfd_secret), kernel 5.14+ is recommended. See Kernel Compatibility for the complete support matrix.

2. BTF not available¶

The agent needs /sys/kernel/btf/vmlinux. GKE COS and Ubuntu nodes have this by default. Some custom AMIs may not:

kubectl exec -n cloudtaser-system <ebpf-pod> -- ls /sys/kernel/btf/vmlinux

3. Insufficient capabilities¶

The eBPF agent requires privileged mode with SYS_ADMIN, SYS_PTRACE, NET_ADMIN, and SYS_RESOURCE capabilities. Verify that PodSecurityPolicy or PodSecurityStandard is not blocking these.

4. GKE Autopilot¶

Autopilot clusters do not allow privileged pods or hostPID. Use GKE Standard instead.

5. Fargate (EKS)¶

Fargate does not support DaemonSets or host-level access. Use managed or self-managed node groups.

Unsupported environments

GKE Autopilot and AWS Fargate are not compatible with the eBPF agent. The wrapper still provides secret injection, but runtime enforcement (blocking /proc reads, ptrace, etc.) is not available.

Secrets Not Injected¶

Symptom: Application starts but environment variables with secrets are missing.

Diagnosis:

# Check if injection annotation is present
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations}' | grep cloudtaser

# Check if wrapper is running as PID 1
kubectl exec <pod-name> -- cat /proc/1/cmdline | tr '\0' ' '

Common causes:

1. Missing inject annotation¶

The pod template must have cloudtaser.io/inject: "true". The annotation must be on the pod template, not on the Deployment itself:

spec:
  template:
    metadata:
      annotations:
        cloudtaser.io/inject: "true"  # HERE, not on the Deployment

2. Wrong env-map syntax¶

The cloudtaser.io/env-map annotation maps vault fields to environment variables. Format: vault_field=ENV_VAR,field2=ENV_VAR2:

cloudtaser.io/env-map: "db_password=PGPASSWORD,api_key=API_KEY"

Common mistake: reversed order

The correct format is vault_field=ENV_VAR, not ENV_VAR=vault_field. A reversed mapping will silently fail to inject the expected secrets.

3. Wrong vault path¶

Ensure cloudtaser.io/secret-paths uses the KV v2 data path:

# Correct:
cloudtaser.io/secret-paths: "secret/data/myapp/config"

# Wrong (missing data/ prefix for KV v2):
cloudtaser.io/secret-paths: "secret/myapp/config"

4. Namespace not in webhook scope¶

The operator only injects pods in namespaces that are not system namespaces (kube-system, kube-public, kube-node-lease). Verify your namespace is not excluded.

5. Webhook not intercepting¶

Check the MutatingWebhookConfiguration:

kubectl get mutatingwebhookconfiguration cloudtaser-webhook -o yaml

High Latency on Pod Startup¶

Symptom: Pods take significantly longer to start after CloudTaser injection.

Common causes:

1. Vault fetch time¶

The wrapper fetches secrets before starting the application. If vault is slow (geographically distant, under load), this adds startup latency.

Reduce vault latency

Deploy vault in the same region as the cluster (still within the EU)
Reduce the number of secret paths per pod (fewer vault API calls)
Ensure vault is not sealed or in standby mode

2. Image entrypoint resolution¶

The operator resolves the container image entrypoint by querying the container registry. This adds latency on the first injection. The result is cached per image.

3. Init container image pull¶

The first pod on a node pulls the wrapper image. Subsequent pods use the cached image. Use imagePullPolicy: IfNotPresent (default) to avoid re-pulling.

Protection Score Low¶

Symptom: cloudtaser status or cloudtaser audit reports a low protection score.

The protection score (max 65) reflects which defenses are active:

Check	Points	Fix
`memfd_secret`	15	Use kernel 5.14+ on nodes
`mlock`	10	Ensure `CAP_IPC_LOCK` is available (or `ulimit -l unlimited`)
`MADV_DONTDUMP`	5	Automatic (requires wrapper v0.0.14+)
`PR_SET_DUMPABLE(0)`	5	Automatic (requires wrapper v0.0.14+)
Token protected	10	Automatic when memfd_secret or mlock is available
eBPF connected	10	Ensure eBPF daemonset is running on the node
Kprobes active	10	Requires kernel `CONFIG_BPF_KPROBE_OVERRIDE`

To improve the score:

Upgrade node kernel to 5.14+ for memfd_secret support (15 points). This is the single most impactful change.

Ensure eBPF agent is running on every node (10 points):

kubectl get daemonset -n cloudtaser-system cloudtaser-ebpf

Check kernel kprobe override support (10 points). Verify CONFIG_BPF_KPROBE_OVERRIDE=y in the node kernel config. GKE Ubuntu nodes typically have this enabled.

eBPF Enforcement Issues¶

Symptom: eBPF agent is running but enforcement events are not generated, or legitimate operations are being blocked.

Diagnosis:

# Check agent logs
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf

# Check enforce mode
kubectl get daemonset -n cloudtaser-system cloudtaser-ebpf \
  -o jsonpath='{.spec.template.spec.containers[0].env}' | grep ENFORCE

Common causes:

1. Enforce mode disabled¶

If ENFORCE_MODE=false, the agent only logs events without blocking. Enable via Helm:

ebpf:
  enforceMode: true

2. Reactive kill fallback¶

On kernels without CONFIG_BPF_KPROBE_OVERRIDE, the agent uses reactive kill (SIGKILL after detection) instead of synchronous blocking. This is less precise. Upgrade the kernel or accept the trade-off.

Reactive kill is still effective

The race window between detection and SIGKILL is microseconds. An attacker reading /proc/pid/environ gets killed before they can exfiltrate the data over the network, because the network send is also monitored and blocked.

3. Application uses io_uring¶

CloudTaser blocks io_uring_setup() for protected processes because io_uring bypasses buffer-level monitoring. Applications requiring io_uring must use standard syscalls instead.

Common CLI Errors¶

"failed to build kubeconfig"¶

The CLI cannot find or parse your kubeconfig. Ensure ~/.kube/config exists or pass --kubeconfig:

cloudtaser status --kubeconfig /path/to/kubeconfig

"vault is sealed"¶

The vault instance is sealed and cannot serve requests. Unseal it:

vault operator unseal <key1>
vault operator unseal <key2>
vault operator unseal <key3>

"permission denied" on vault operations¶

The vault token used with cloudtaser connect requires admin-level permissions. Ensure it has policies for:

sys/auth/* (to enable and configure auth methods)
auth/kubernetes/* (to configure Kubernetes auth)
sys/policies/* (to create policies)

Getting Help¶

Check component logs:

kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf
kubectl logs <pod-name> -c <container-name>

Run validation:

cloudtaser validate --vault-address https://vault.eu.example.com

Run a full audit:

cloudtaser audit --vault-address https://vault.eu.example.com