Skip to content

Troubleshooting

Common issues and their solutions when running CloudTaser.


Pods Stuck in Init

Symptom: Pod stays in Init:0/1 state. The init container that copies the wrapper binary has not completed.

Diagnosis:

kubectl describe pod <pod-name>
kubectl logs <pod-name> -c cloudtaser-init

Common causes:

1. Wrapper image not pullable

The init container pulls from ghcr.io/skipopsltd/cloudtaser-wrapper. Verify that image pull secrets are configured:

kubectl get pod <pod-name> -o jsonpath='{.spec.imagePullSecrets}'

If using a private registry, ensure imagePullSecrets is set in the Helm values.

2. EmptyDir volume mount failure

The wrapper is copied to a memory-backed emptyDir at /cloudtaser/. Check that the node has sufficient memory for the volume (the wrapper binary is approximately 10 MB).

3. Operator not running

If the operator is down, the mutating webhook may fail and block pod creation (default failurePolicy: Fail):

kubectl get pods -n cloudtaser-system

failurePolicy: Fail blocks pod creation

If the operator is not running, pods will fail to schedule. Restart the operator or temporarily set failurePolicy: Ignore in the MutatingWebhookConfiguration. Setting it to Ignore means pods will start without injection -- use only as an emergency measure.


Wrapper Cannot Connect to Vault

Symptom: Pod starts but the application does not receive secrets. Container logs show vault connection errors.

Diagnosis:

kubectl logs <pod-name> -c <container-name> | grep -i vault

Common causes:

1. Vault endpoint unreachable

Verify network connectivity from the pod to the vault:

kubectl exec <pod-name> -- wget -q --spider https://vault.eu.example.com/v1/sys/health

Check firewall rules, VPC peering, security groups, and NetworkPolicies.

2. Kubernetes auth not configured

The wrapper authenticates to vault using the pod's ServiceAccount token. Verify the auth method is configured:

cloudtaser validate \
  --vault-address https://vault.eu.example.com \
  --vault-token hvs.YOUR_TOKEN

If auth is not configured, run cloudtaser connect to set it up.

3. Wrong vault role

Verify the cloudtaser.io/vault-role annotation matches a role configured in vault:

vault read auth/kubernetes/role/cloudtaser

4. ServiceAccount not bound

The vault role must allow the pod's ServiceAccount. Check the bound_service_account_names and bound_service_account_namespaces in the vault role.

5. TLS certificate error

If vault uses a private CA, the wrapper cannot verify the certificate. Mount the CA bundle into the pod.

Quick connectivity test

Run cloudtaser validate --vault-address https://vault.eu.example.com to check vault health, seal status, and Kubernetes auth configuration in one step.


eBPF Agent Not Starting

Symptom: cloudtaser-ebpf pods are in CrashLoopBackOff or not starting.

Diagnosis:

kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf
kubectl describe daemonset -n cloudtaser-system cloudtaser-ebpf

Common causes:

1. Kernel too old

The eBPF agent requires Linux kernel 5.8+ with BTF (BPF Type Format) support. Check the node kernel:

kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.kernelVersion}{"\n"}{end}'

Kernel version recommendations

For full protection (including memfd_secret), kernel 5.14+ is recommended. See Kernel Compatibility for the complete support matrix.

2. BTF not available

The agent needs /sys/kernel/btf/vmlinux. GKE COS and Ubuntu nodes have this by default. Some custom AMIs may not:

kubectl exec -n cloudtaser-system <ebpf-pod> -- ls /sys/kernel/btf/vmlinux

3. Insufficient capabilities

The eBPF agent requires privileged mode with SYS_ADMIN, SYS_PTRACE, NET_ADMIN, and SYS_RESOURCE capabilities. Verify that PodSecurityPolicy or PodSecurityStandard is not blocking these.

4. GKE Autopilot

Autopilot clusters do not allow privileged pods or hostPID. Use GKE Standard instead.

5. Fargate (EKS)

Fargate does not support DaemonSets or host-level access. Use managed or self-managed node groups.

Unsupported environments

GKE Autopilot and AWS Fargate are not compatible with the eBPF agent. The wrapper still provides secret injection, but runtime enforcement (blocking /proc reads, ptrace, etc.) is not available.


Secrets Not Injected

Symptom: Application starts but environment variables with secrets are missing.

Diagnosis:

# Check if injection annotation is present
kubectl get pod <pod-name> -o jsonpath='{.metadata.annotations}' | grep cloudtaser

# Check if wrapper is running as PID 1
kubectl exec <pod-name> -- cat /proc/1/cmdline | tr '\0' ' '

Common causes:

1. Missing inject annotation

The pod template must have cloudtaser.io/inject: "true". The annotation must be on the pod template, not on the Deployment itself:

spec:
  template:
    metadata:
      annotations:
        cloudtaser.io/inject: "true"  # HERE, not on the Deployment

2. Wrong env-map syntax

The cloudtaser.io/env-map annotation maps vault fields to environment variables. Format: vault_field=ENV_VAR,field2=ENV_VAR2:

cloudtaser.io/env-map: "db_password=PGPASSWORD,api_key=API_KEY"

Common mistake: reversed order

The correct format is vault_field=ENV_VAR, not ENV_VAR=vault_field. A reversed mapping will silently fail to inject the expected secrets.

3. Wrong vault path

Ensure cloudtaser.io/secret-paths uses the KV v2 data path:

# Correct:
cloudtaser.io/secret-paths: "secret/data/myapp/config"

# Wrong (missing data/ prefix for KV v2):
cloudtaser.io/secret-paths: "secret/myapp/config"

4. Namespace not in webhook scope

The operator only injects pods in namespaces that are not system namespaces (kube-system, kube-public, kube-node-lease). Verify your namespace is not excluded.

5. Webhook not intercepting

Check the MutatingWebhookConfiguration:

kubectl get mutatingwebhookconfiguration cloudtaser-webhook -o yaml

High Latency on Pod Startup

Symptom: Pods take significantly longer to start after CloudTaser injection.

Common causes:

1. Vault fetch time

The wrapper fetches secrets before starting the application. If vault is slow (geographically distant, under load), this adds startup latency.

Reduce vault latency

  • Deploy vault in the same region as the cluster (still within the EU)
  • Reduce the number of secret paths per pod (fewer vault API calls)
  • Ensure vault is not sealed or in standby mode

2. Image entrypoint resolution

The operator resolves the container image entrypoint by querying the container registry. This adds latency on the first injection. The result is cached per image.

3. Init container image pull

The first pod on a node pulls the wrapper image. Subsequent pods use the cached image. Use imagePullPolicy: IfNotPresent (default) to avoid re-pulling.


Protection Score Low

Symptom: cloudtaser status or cloudtaser audit reports a low protection score.

The protection score (max 65) reflects which defenses are active:

Check Points Fix
memfd_secret 15 Use kernel 5.14+ on nodes
mlock 10 Ensure CAP_IPC_LOCK is available (or ulimit -l unlimited)
MADV_DONTDUMP 5 Automatic (requires wrapper v0.0.14+)
PR_SET_DUMPABLE(0) 5 Automatic (requires wrapper v0.0.14+)
Token protected 10 Automatic when memfd_secret or mlock is available
eBPF connected 10 Ensure eBPF daemonset is running on the node
Kprobes active 10 Requires kernel CONFIG_BPF_KPROBE_OVERRIDE

To improve the score:

  1. Upgrade node kernel to 5.14+ for memfd_secret support (15 points). This is the single most impactful change.
  2. Ensure eBPF agent is running on every node (10 points):

    kubectl get daemonset -n cloudtaser-system cloudtaser-ebpf
    
  3. Check kernel kprobe override support (10 points). Verify CONFIG_BPF_KPROBE_OVERRIDE=y in the node kernel config. GKE Ubuntu nodes typically have this enabled.


eBPF Enforcement Issues

Symptom: eBPF agent is running but enforcement events are not generated, or legitimate operations are being blocked.

Diagnosis:

# Check agent logs
kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf

# Check enforce mode
kubectl get daemonset -n cloudtaser-system cloudtaser-ebpf \
  -o jsonpath='{.spec.template.spec.containers[0].env}' | grep ENFORCE

Common causes:

1. Enforce mode disabled

If ENFORCE_MODE=false, the agent only logs events without blocking. Enable via Helm:

ebpf:
  enforceMode: true

2. Reactive kill fallback

On kernels without CONFIG_BPF_KPROBE_OVERRIDE, the agent uses reactive kill (SIGKILL after detection) instead of synchronous blocking. This is less precise. Upgrade the kernel or accept the trade-off.

Reactive kill is still effective

The race window between detection and SIGKILL is microseconds. An attacker reading /proc/pid/environ gets killed before they can exfiltrate the data over the network, because the network send is also monitored and blocked.

3. Application uses io_uring

CloudTaser blocks io_uring_setup() for protected processes because io_uring bypasses buffer-level monitoring. Applications requiring io_uring must use standard syscalls instead.


Common CLI Errors

"failed to build kubeconfig"

The CLI cannot find or parse your kubeconfig. Ensure ~/.kube/config exists or pass --kubeconfig:

cloudtaser status --kubeconfig /path/to/kubeconfig

"vault is sealed"

The vault instance is sealed and cannot serve requests. Unseal it:

vault operator unseal <key1>
vault operator unseal <key2>
vault operator unseal <key3>

"permission denied" on vault operations

The vault token used with cloudtaser connect requires admin-level permissions. Ensure it has policies for:

  • sys/auth/* (to enable and configure auth methods)
  • auth/kubernetes/* (to configure Kubernetes auth)
  • sys/policies/* (to create policies)

Getting Help

  1. Check component logs:

    kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-operator
    kubectl logs -n cloudtaser-system -l app.kubernetes.io/name=cloudtaser-ebpf
    kubectl logs <pod-name> -c <container-name>
    
  2. Run validation:

    cloudtaser validate --vault-address https://vault.eu.example.com
    
  3. Run a full audit:

    cloudtaser audit --vault-address https://vault.eu.example.com