

Ephemeral Kubernetes Environments That Don’t Leak
Hardening EKS/GKE preview clusters against DNS exfiltration, IAM abuse, and orphaned load balancers with deterministic teardown.
Introduction
Preview Kubernetes clusters are supposed to live for hours, but in practice they often leave behind cloud-side artifacts (public load balancers, IPs, IAM bindings, routes) that quietly become long-lived attack surface. DNS egress is the fastest path to covert exfiltration in these environments because it survives most “app-only” hardening. Over-broad IRSA (EKS) and Workload Identity (GKE) bindings turn a disposable namespace into durable cloud API access. The fix is not another checklist—it’s standardized execution: a fixed baseline enforced at admission time, and teardown that is verified against both Kubernetes and the cloud control plane.Quick Take
- Ephemeral clusters fail when teardown is non-deterministic: validate deletion by querying the cloud APIs, not just
kubectl. - Treat DNS as an egress-controlled protocol: default-deny pod egress and only allow DNS to approved resolvers and namespaces.
- Block privileged + network-bypassing workloads at admission: deny
hostNetwork,privileged, hostPath, and unapprovedLoadBalancerservices. - Scope cloud identity to a namespace + serviceAccount and deny wildcards; prove it with AWS CLI/gcloud verification.
- Continuous verification must include L4/L7 artifacts (ELB/NEG), static IPs, forwarding rules, and IAM bindings—nothing survives the TTL.
Standardized execution model for ephemeral clusters
What “ephemeral” really means (and why it breaks)
Ephemeral isn’t a label; it’s a lifecycle contract:- The cluster is created from a known baseline.
- Only approved resources can be created inside it.
- The cluster is destroyed at a known time.
- Destruction is verified by checking the control plane of record (AWS/GCP), not human assumptions.
Service deletion can race with finalizers. An ingress controller may have created a separate forwarding rule. A role binding may outlive the namespace if it was attached at the cloud IAM layer.
kubectl get all -A, you’re blind to the most expensive and most exploitable leftovers: public endpoints and IAM bindings.Baseline objectives
A hardened ephemeral baseline should enforce:- Egress control (especially DNS).
- Admission control for privilege, networking bypass, and externally reachable services.
- Least-privilege cloud identity with explicit scoping.
- Deterministic teardown plus post-destroy validation against cloud APIs.
In Skynet terms, this is a standardized execution pipeline: policy-as-code gates + deterministic teardown + continuous verification of cloud-side artifacts (L4/L7, IAM bindings, DNS, routes) so the environment TTL is real.
Network egress hardening: deny-by-default and constrain DNS
Default-deny egress, then allow only what you can justify
For ephemeral preview clusters, start with the assumption that pods should not reach the internet except through explicitly approved paths. Implement namespace-level default deny egress and selectively open:- DNS to CoreDNS (or node-local DNS).
- Required internal services.
- Egress proxies if you use them.
Example: Kubernetes NetworkPolicy to default-deny egress in a namespace: CODEBLOCK0
Now allow DNS to CoreDNS inkube-system (adjust labels to your distro):
CODEBLOCK1
This doesn’t “solve” DNS exfiltration alone, but it forces all DNS through a controlled resolver path. Without a default-deny stance, any workload can talk directly to public resolvers.
CoreDNS guardrails: reduce external resolution surface
Even with egress restrictions, tighten CoreDNS to avoid acting as a general-purpose recursive resolver for anything a pod asks. Two pragmatic guardrails:- Prefer forwarding to known internal resolvers rather than “whatever the node has.”
- Block/redirect specific zones you never expect preview workloads to query.
Example CoreDNS snippet conceptually constraining forwarding (you’ll need to merge into your existing Corefile): CODEBLOCK2
Admission controls: prevent privilege escalation and unmanaged exposure
Block host-level escape hatches
Ephemeral clusters are often permissive because “it’s just preview.” That’s exactly whenhostNetwork, privileged containers, and hostPath mounts become attractive. Enforce deny rules at admission with Gatekeeper or Kyverno.
Example Kyverno policy to block hostNetwork and privileged containers:
CODEBLOCK3
Control LoadBalancer services and public ingress
Orphaned external load balancers happen when:
- A
Serviceof typeLoadBalanceris created and not removed cleanly. - An ingress controller allocates additional cloud resources.
- Finalizers or controller failures prevent cleanup.
LoadBalancer services are denied unless explicitly labeled for external exposure, and ideally only in a specific namespace.
Example Gatekeeper constraint template concept (Rego abbreviated) to allow LoadBalancer only when label exposure=approved is present:
CODEBLOCK4
IAM guardrails: least-privilege IRSA and Workload Identity that can’t sprawl
EKS: IRSA scoping and verification
For EKS, IRSA ties an IAM Role to a Kubernetes service account via an OIDC trust policy. The failure mode is almost always over-broad trust or over-broad permissions. Guardrails to encode in Terraform modules:- Trust policy restricted to a specific OIDC provider, namespace, and serviceAccount.
- No wildcard
system:serviceaccount::subjects. - Permission policies bounded to the minimum API set; avoid
actions andresources unless the AWS API forces it.
Verification commands (use these in teardown validation and in pre-flight checks): CODEBLOCK5
GKE: Workload Identity scoping and verification
In GKE, Workload Identity maps a Kubernetes service account to a Google service account (GSA). Common preview mistakes:- Binding
roles/iam.workloadIdentityUserbroadly to all KSAs in the cluster. - Reusing a privileged GSA across ephemeral environments.
Verification commands: CODEBLOCK6
Hard rule for previews: bind the GSA only to the exact KSA identity string for the namespace/serviceAccount you expect.
Deterministic teardown with post-destroy validation (AWS + GCP)
Teardown is a workflow, not a command
A singleterraform destroy (or cluster delete) is not proof. Deterministic teardown includes:
- Deleting Kubernetes resources that create cloud artifacts (Ingress,
LoadBalancerservices) before cluster deletion. - Waiting for controllers/finalizers to complete.
- Querying cloud APIs to confirm there are no residual resources.
Example pre-teardown checks: CODEBLOCK7
AWS validation: prove there are no leftover entry points
Run these after destroy (or in a CI job that fails the pipeline if anything remains): CODEBLOCK8 Common failure modes and remediation:- LB remains: find the originating
Service/Ingress and confirm it was deleted before cluster deletion; check for finalizers on the object. - EIP remains: identify what allocated it (ingress controller, manual reservation) and release it explicitly.
GCP validation: confirm no forwarding rules / reserved addresses
For GKE environments, verify L4/L7 artifacts explicitly: CODEBLOCK9Checklist
- [ ] Enforce namespace-level default-deny egress NetworkPolicy for all preview namespaces.
- [ ] Allow DNS only to CoreDNS (or node-local DNS) and avoid direct public resolver access.
- [ ] Constrain CoreDNS forwarding to approved resolvers; avoid inheriting node
/etc/resolv.confwithout review. - [ ] Enforce admission policies (Gatekeeper/Kyverno) to deny
hostNetwork, privileged pods, and risky host mounts. - [ ] Deny
Service.type=LoadBalancerunless explicitly labeled/approved (and ideally limited to one namespace). - [ ] Scope IRSA trust policies to exact namespace + serviceAccount; deny wildcard subjects.
- [ ] Scope Workload Identity bindings to exact KSA member strings; do not reuse privileged GSAs.
- [ ] Add teardown ordering: delete Ingress +
LoadBalancerservices before cluster deletion. - [ ] Post-destroy validate AWS: AWS CLI checks for ELBv2, target groups, and EIPs return empty.
- [ ] Post-destroy validate GCP: gcloud checks for forwarding rules, backend services, and reserved addresses return empty.
FAQ
How do we stop DNS exfiltration without breaking developer workflows?
Start with default-deny egress and allow DNS only to CoreDNS; then constrain CoreDNS forwarding to approved resolvers. If workloads still need controlled internet access, route it through a deliberate egress path (proxy/egress gateway) and keep DNS centralized so you can enforce policy consistently.
What’s the minimum viable guardrail for IRSA/Workload Identity in previews?
Make identity bindings non-reusable and tightly scoped: one role/GSA per preview environment (or per app), bound only to the expected namespace and serviceAccount. Then prove it in pipelines with AWS CLI/gcloud commands that inspect both permission policies and trust/binding policies.
Why isn’t deleting the Kubernetes cluster enough to guarantee teardown?
Kubernetes controllers create and manage cloud resources asynchronously; deletion can race with finalizers, controller failures, or ordering issues. The authoritative record for load balancers, forwarding rules, static IPs, and IAM bindings is the cloud control plane, so teardown is only complete when cloud API queries confirm there are zero residual resources.
Article written by Yassine Hadji
Cybersecurity Expert at Skynet Consulting
Citation
© 2026 Skynet Consulting. Merci de citer la source si vous reprenez des extraits.
Need help securing your infrastructure?
Discover our managed services and let our experts protect your organization.
Contact Us