Back to Blog
Security audit checklist and compliance evidence collection
AUDITAudit Checklists
9 min read

Kubernetes Audit-Ready Runtime Forensics in Under 24 Hours

Map pod → node → cloud principal → API action using eBPF plus CloudTrail and export a deterministic evidence bundle fast.

#SME#Security#kubernetes#ebpf#cloudtrail#forensics#security-audit#checklist

Introduction

Kubernetes incident forensics breaks down when you can’t prove which workload (pod/service account) triggered specific cloud control-plane API calls. The root cause is identity and network indirection: kube RBAC, IRSA/Workload Identity, node roles, NAT egress, and shared credentials blur attribution. Skynet’s approach is standardized execution: deploy an ephemeral, repeatable forensic stack, pull only a strict time window of telemetry, and deterministically map pod → node → cloud principal → API action. The outcome is an audit-ready timeline bundle with validation checks, produced in hours instead of days.

Quick Take

  • Pod-to-cloud attribution fails without pod-level runtime provenance and control-plane logs in the same time window.
  • eBPF telemetry should be enriched with pod UID, serviceAccount, and image digest to withstand redeploy churn.
  • Cloud control-plane logs (AWS CloudTrail, Azure Activity Log) must be queried for STS/role sessions and API calls, not just the final action.
  • Deterministic correlation hinges on stable join keys: timestamps, node identity/ENI, source IP, and role session (principalId/sessionName).
  • Deliverables should be a signed timeline plus “gaps checks” (missing audit logs, retention limits, clock drift) to avoid false confidence.

Standardized execution: the minimum forensic stack that actually correlates

What you must capture (and why)

To prove pod → cloud API action, you need three planes of evidence captured for the same bounded interval:
  • Runtime provenance (process + network) per pod: “what executed” and “what endpoints were contacted,” with Kubernetes identity attached.
  • Kubernetes control-plane evidence: API server audit events for authn/authz context, object changes, and serviceAccount usage.
  • Cloud control-plane evidence: API activity with the calling principal, session, source IP, and relevant resources.

⚠️
If you only have CloudTrail (or only have eBPF), you can often suspect a workload but you cannot prove it. You need the runtime plane to tie network/process behavior to pod identity, and the cloud plane to tie the same activity to a cloud principal/session.

Deploy an ephemeral eBPF sensor and collectors

Skynet’s runbook-style execution uses a short-lived forensic deployment that can be removed cleanly after evidence export. In practice, this is typically a DaemonSet for eBPF sensors plus minimal log collection.

Example: deploy Cilium Tetragon as an eBPF sensor (cluster-specific manifests omitted intentionally; pin the version you already approve and keep it reproducible):

CODEBLOCK0

If you prefer policy-focused runtime visibility, Falco with eBPF can capture exec/connect signals as well. The key requirement is enrichment: pod UID, namespace, serviceAccount, and container image digest.

💡
Always persist pod UID and image digest (SHA256) into your event stream. Names and labels change during incident response; UID and digest survive redeploys and prevent “wrong pod” attribution.

Time-bounding and evidence minimization

Standardized execution should enforce a tight scope:
  • Start time: earliest suspected malicious activity.
  • End time: containment action completed (or a fixed “+2h” to catch stragglers).
  • Collection: only the fields required for correlation and audit (avoid dumping entire clusters).

This isn’t about being conservative; it’s about producing an evidence bundle that is reviewable and defensible.

Capture pod-level provenance with eBPF (exec + connect) and enrich identity

Required fields for correlation

From the runtime side, you need enough to uniquely identify the workload and the action:
  • Event timestamp (high resolution if available)
  • Pod UID, namespace, name
  • serviceAccount name
  • Node name
  • Container image digest
  • Process path/args (for exec)
  • Destination IP/port and protocol (for connect)

Example: filter eBPF logs down to a suspect namespace and emit JSON lines for later joins:

CODEBLOCK1

⚠️
If your cluster nodes are behind NAT or share egress, “source IP” alone won’t map to a pod. You still need node identity (and often ENI/instance identity) to bridge to cloud logs.

Tie runtime events to node and infrastructure identity

For AWS, you’ll often need to map node → instanceId → ENI(s) → private IP(s). Capture these at collection time so correlation does not depend on later reconstruction.

Example: export a node inventory snapshot (instance ID and provider IDs) from Kubernetes:

CODEBLOCK2

If you run EKS with IRSA, also snapshot serviceAccount annotations (role ARN) for the suspect namespaces:

CODEBLOCK3

At this point you can answer “what executed and what it connected to” with pod UID + image digest, and you have a frozen mapping of node identity and (if applicable) IRSA role bindings.

Correlate to cloud control-plane activity (CloudTrail) and attribute principals

Query the right CloudTrail events for attribution

For AWS, attribution typically hinges on:
  • STS session establishment (AssumeRole, AssumeRoleWithWebIdentity, GetCallerIdentity)
  • The target API actions of interest (e.g., PutObject, GetSecretValue, CreateAccessKey, ModifySecurityGroupRules)

Example: pull all role assumption events for the window:

CODEBLOCK4

Then pull actions for the suspected principal/role session(s). The most reliable join keys vary by organization, but common pivots include username (assumed-role ARN), principalId, sourceIPAddress, and eventTime.

CODEBLOCK5

💡
Don’t skip session establishment events. If you only query for the “final” API calls, you may miss the pivot that proves how the workload obtained credentials (web identity token vs node role vs static keys).

Map CloudTrail source to Kubernetes nodes (and then pods)

You generally bridge cloud → cluster via one (or more) of:
  • sourceIPAddress (when it’s a node IP / egress IP you can map)
  • VPC flow context (if available in your environment)
  • ENI identifiers (from infrastructure inventory)
  • role session naming conventions (if you set them deterministically)

This is why the node inventory snapshot matters. In many incidents, multiple pods share a node and share egress; you need runtime connect events to isolate which pod initiated outbound traffic at the same timestamps.

⚠️
Clock drift kills correlation. If node time is off by minutes, timestamp joins become guesswork. Capture NTP/chrony status (or at least node time) as part of the evidence bundle.

Build a deterministic timeline bundle (and prove your gaps)

Produce the evidence bundle: timeline + joins + validation

A defensible artifact set includes:
  • timeline.jsonl (normalized events)
  • joins.jsonl (explicit mapping podUID → principal/session → API action)
  • inventory/ (nodes, serviceAccounts/roles, container digests)
  • validation.json (gaps checks and collection metadata)

Below is a minimal join script that:

  • Reads Kubernetes audit log JSON lines (kube-audit.jsonl)
  • Reads CloudTrail lookup output (cloudtrail.actions.by-session.json)
  • Emits a joined, time-ordered timeline

CODEBLOCK6

This script is intentionally conservative: it normalizes and orders evidence. Your deterministic correlation step should then add explicit join logic based on your environment (node IP ↔ sourceIP, serviceAccount ↔ role ARN, and session ↔ API calls).

Gaps checks that prevent false conclusions

In addition to the timeline, produce explicit validation artifacts:
  • Audit log continuity: are there missing ranges or dropped events?
  • Log retention: is the requested window fully retained (cluster + cloud)?
  • Clock drift: are node and control-plane timestamps aligned?
  • Identity binding completeness: do you have IRSA/Workload Identity bindings for all suspect serviceAccounts?

A reviewer can replay your joins and understand exactly how you mapped pod UID + image digest to a cloud principal/session and then to a specific API action, with known gaps documented.

Operationalizing in Skynet: runbook-driven correlation in hours

The Pod → Principal Correlation Runbook

Skynet’s standardized execution focuses on repeatability and precision:
  • Deploy ephemeral eBPF sensors and bounded collectors
  • Snapshot node and identity inventories
  • Pull CloudTrail events for a fixed window (including STS establishment)
  • Run deterministic correlation jobs and emit a fixed evidence bundle
  • Execute validation checks and sign the output

The key is that every step is scripted, time-bounded, and produces machine-verifiable artifacts.

What “done” looks like

Your final deliverable should answer, without hand-waving:
  • Which pod UID (and image digest) initiated the activity
  • Which serviceAccount and binding path applied (IRSA/Workload Identity/node role/static keys)
  • Which cloud principal/session performed which API actions
  • When it happened, with a contiguous timeline and declared gaps

💡
Store the evidence bundle as immutable objects (e.g., Amazon S3 with object lock where available) and include the exact collection commands, tool versions, and hashes of exported files.

Checklist

  • [ ] Define the incident time window (start/end) and document the rationale
  • [ ] Deploy an ephemeral eBPF sensor DaemonSet (e.g., Cilium Tetragon or Falco) pinned to an approved version
  • [ ] Verify exec/connect events are captured and enriched with pod UID + serviceAccount + image digest
  • [ ] Export a node inventory snapshot (node name, providerID, internal IP)
  • [ ] Export serviceAccount identity bindings (e.g., IRSA role ARN annotations) for suspect namespaces
  • [ ] Pull AWS CloudTrail events for STS establishment (AssumeRole/AssumeRoleWithWebIdentity) for the window
  • [ ] Pull AWS CloudTrail events for target API actions for the same window and relevant principals
  • [ ] Normalize and time-order evidence into a single timeline file
  • [ ] Perform deterministic joins (pod → node → principal/session → API action) and emit a join artifact
  • [ ] Run gaps checks (audit continuity, retention coverage, clock drift) and record results
  • [ ] Package and hash/sign the evidence bundle for handoff and review

FAQ

What if multiple pods share the same node and egress IP?

Use runtime connect events from the eBPF sensor to attribute outbound connections to a specific pod UID at specific timestamps, then bridge to cloud logs via the node identity/egress context. If you rely on egress IP alone, you can narrow to a node but not prove the initiating workload.

Can I do this without Kubernetes audit logs?

You can still build strong attribution using eBPF runtime provenance plus CloudTrail, but you lose key control-plane context (who created/changed objects, token usage patterns, RBAC decisions). Treat missing audit logs as an explicit gap and document the impact in the validation artifact.

How do I handle time skew between nodes and cloud logs?

Capture node time/clock sync status during collection, then apply a measured offset if required. If you can’t quantify skew, avoid tight timestamp joins and instead join on session identifiers (principalId/sessionName) plus broader time windows, while documenting the reduced certainty.

YH

Article written by Yassine Hadji

Cybersecurity Expert at Skynet Consulting

Citation

© 2026 Skynet Consulting. Merci de citer la source si vous reprenez des extraits.

Kubernetes Audit-Ready Runtime Forensics in Under 24 Hours — Skynet Consulting

Found this article valuable?

Share it with your network

Need help securing your infrastructure?

Discover our managed services and let our experts protect your organization.

Contact Us