Inside a crypto-mining attack: how we detected, contained, and remediated it on a production GKE cluster
An Auckland tourism company got an abuse notification from their cloud provider on a Friday afternoon. By dinner time, the attack was contained — without taking the customer-facing site down. Here is how it played out.
The situation
An Auckland tourism company came to us mid-afternoon on a Friday. Their cloud provider had just sent them an abuse notification — outbound traffic from one of their production VMs was hitting a destination on a known crypto-mining pool blocklist.
Their public-facing product is a Next.js application running on GCP, originally built years ago by an offshore development team. They had no internal engineering team, no current relationship with the original builders, and no security partner. The abuse email was their only signal that something was wrong.
They had two real constraints: the site is live and revenue-generating, so taking it offline was not on the table; and they needed someone to get from "we got an alert" to "we know what happened and it is contained" before the weekend.
What we found in the first 30 minutes
The abuse notice gave us a VM ID and a suspicious destination IP. We pulled the matching audit and network logs and got a quick picture:
- A single VM in the production GCP project was making sustained outbound traffic to an IP on a public crypto-mining pool blocklist
- The same VM was hitting an internal API endpoint repeatedly, getting 500 and 404 responses — likely the attacker probing the application surface
- When we went to inspect the VM, it was no longer in the instance list — between the abuse notice and our investigation, the offending node had been removed from the standard view
A "ghost" VM that is no longer listed but is still emitting logs is almost always a clue that something is being autoscaled, killed, and respawned. In this case, it pointed to a Kubernetes node managed by GKE.
Tracing the ghost: serial console + cgroup forensics
Even when an instance is gone from the GCE list, its serial-console output usually lingers in Cloud Logging for a few days. We pulled it directly and the serial output told us the node was a GKE worker that had been OOMKilled — the Linux kernel had killed processes inside a single cgroup because they exhausted the container's memory limit. Two of the killed processes had short, random-looking names; a third was the legitimate Node.js process from the application. The two random binaries were the cryptominers, and they had just managed to blow through the container's memory budget before the kernel terminated them.
The OOM-kill log also gave us the cgroup path and the Pod UID of the affected container. From there it was a straightforward reverse lookup through Cloud Logging:
gcloud logging read '"<container-id>" OR "<pod-uid>"' --project=<your-project> --freshness=2d --format="json"
That returned the namespace and pod name of the compromised workload, plus the cluster it was running in. From abuse notice to identified pod took about 90 minutes of focused work.
Containment without taking the site down
The cleanest containment was to scale the affected deployment to zero — but that would have taken the customer-facing site offline, which the business could not accept on a Friday afternoon.
We took two interim steps instead:
- Looped the original development contacts in immediately to start verifying which of the running processes were legitimate. We had reason to believe two were injected, but the dev team is the only one who can confirm what their app is meant to be running.
- Blocked the egress destination at the network layer so the mining processes could not phone home, even if they restarted. A single deny-egress firewall rule on the project's network covered TCP and UDP to the offending IP range, with logging enabled so we could see any further attempts.
gcloud compute firewall-rules create deny-egress-mining-<short-id> --project=<your-project> --network=default --direction=EGRESS --action=DENY --rules=tcp,udp --destination-ranges=<offending-ip>/32 --priority=100 --enable-logging
This is an interim measure. A determined attacker can rotate to a different mining pool IP — and they often do — so the rule buys time, not safety. The permanent fix is to patch the root cause: the application vulnerability that let the attacker run code inside the pod in the first place. That work landed early the following week, with the original dev team back in the loop.
What was actually wrong
We will not publish the application-side root cause here. Without naming anything specific, the pattern was familiar to anyone who has run Next.js or similar Node-based front-ends in production for any length of time: an exposed endpoint trusted input it should not have trusted, and an attacker turned that into code execution inside the container. The legitimate workload kept serving traffic; the injected binaries quietly used the leftover CPU and memory to mine cryptocurrency until they tripped the container's memory cap and got OOMKilled — which is exactly the noise pattern that triggered the cloud provider's abuse system.
Why this is the new normal
A few things to take away from this incident, especially for NZ small and medium businesses running web applications:
- Crypto-mining is now the most common "what" of a server-side compromise. It is the path of least resistance for an attacker who has gained code execution — quiet, immediately profitable, and harder to spot than data exfiltration. If your monthly cloud bill is creeping up for "no reason", it is worth a closer look.
- Cloud-provider abuse alerts are the floor, not the ceiling. By the time GCP, AWS, or Azure notice your workload is misbehaving, the attacker has usually been in for hours or days. You want your own detection sitting in front of the provider's.
- AI has changed the threat side too. Attackers now use AI-assisted code generation to fingerprint, probe, and weaponise vulnerabilities at a speed that did not exist three years ago. Open-source vulnerability discovery is more democratised; exploit code that used to take a focused human a week now takes a script-kiddie an afternoon. The economic pressure on web3 mining attacks specifically — where the attacker gets paid the instant they get code execution — means every internet-exposed app is being scanned, constantly.
- "We have a dev team in another country" is not a security strategy. When the abuse alert arrives at 3pm on a Friday, you need someone who can read GCP audit logs, kubectl into a cluster, write a firewall rule, and explain it back to you in plain English — within the same afternoon, not next sprint.
How we help
Techfolks works with NZ small and medium businesses as the on-call security partner you do not have internally. For incidents like this one, that means:
- Same-day triage when a cloud-provider abuse alert lands
- Sanitised forensics so you understand exactly what happened, without the technical jargon
- Containment options that respect business reality — interim fixes that keep your site up while the root cause is being patched
- Long-term hardening: detection rules, egress controls, container security policies, and a security baseline that does not depend on any one person being available
If you are running a customer-facing application on GCP, AWS, or Azure and you do not have an internal engineering team to call when something goes wrong, this is exactly the gap we fill. Book a free 30-minute AI assessment — we will spend the first 15 minutes on security posture before we even get to the AI conversation.
