Random-subdomain and NXDOMAIN floods: DDoS aimed at DNS
Most DDoS coverage fixates on bandwidth: hundreds of gigabits of UDP reflection slamming a pipe until the upstream link gives up. DNS floods are quieter and, in some ways, nastier. They don't try to fill your link. They turn your own resolver into the bottleneck — every malicious query is a fully valid DNS request that the server is obligated to process, recurse, and answer. The packets are small, the rate is achievable from a modest botnet, and the damage compounds because the work happens deep inside the resolution path rather than at the edge.
The signature variant is the random-subdomain attack, also known by the evocative name "water torture." Instead of hammering one name, the attacker fabricates an endless stream of unique labels under a real victim zone — a8f3kq.example.com, zz19m4.example.com, q7x0p2.example.com — and fires them at recursive resolvers. Each name is new, so nothing is in cache, so every single query forces a full recursion. The resolver and the victim's authoritative servers both drown in work they can't avoid. This article walks through how the attack works, why caching is structurally useless against it, and which defences actually hold.
Why DNS makes such a good amplifier of work
DNS is built around an asymmetry the attacker exploits ruthlessly. A query is tiny — a few dozen bytes over UDP/53 (RFC 1035). The work behind answering it is not. A cache miss can trigger a chain of upstream lookups: root, then TLD, then the authoritative server for the zone, each round-trip adding latency, each response needing parsing and validation. The querier pays almost nothing; the resolver pays for the whole walk.
Under normal load this is fine, because caching amortises the cost. The first client to ask for www.example.com triggers the recursion; the next thousand clients get a cached answer in microseconds. The entire economics of a recursive resolver rest on the assumption that names repeat. The random-subdomain attack is engineered specifically to break that assumption.
The random-subdomain (water torture) attack, step by step
Picture a botnet of compromised devices — home routers, IoT cameras, infected desktops — each configured (often unknowingly) to use an open or ISP resolver. The attacker instructs them to flood queries like this:
r4n9k2.targetbank.example A ?
8mz0qx.targetbank.example A ?
p1c7vv.targetbank.example A ?
ll39ab.targetbank.example A ?
... thousands per second, every label unique ...
Here is what happens on each query, and why every actor in the chain suffers:
- The recursive resolver receives the query, checks its cache, finds nothing (the label has never been seen), and is forced to recurse. It allocates an outstanding-query slot, an in-flight state structure, a socket, and a timeout timer.
- The authoritative servers for
targetbank.examplereceive the recursion. They look up the random label, find no such record, and returnNXDOMAIN(RFC 1035 §4.1.1, RCODE 3 — name does not exist). Generating and signing that negative answer is real CPU work, especially with DNSSEC. - The resolver caches the negative answer (RFC 2308 negative caching) — but the next query is a different random label, so that cache entry is never reused. The cache fills with millions of single-use negative records, evicting useful entries.
The victim is usually the authoritative side: the brand whose zone is being abused. But the resolvers caught in the middle — frequently a third party's, sometimes an entire ISP's — take collateral damage. Their outstanding-query tables saturate, legitimate recursions queue behind the flood, and resolution slows or fails for every customer, not just for the targeted zone.
Why the cache cannot save you
This deserves to be stated plainly because it's the whole point. A cache speeds up repeated questions. The attack asks a brand-new question every time. There is no repetition to exploit, so cache hit rate for the attack traffic is structurally zero. Worse, the negative-cache entries the attack generates are pure waste — they consume memory and evict genuinely hot entries, so the cache becomes a liability that degrades performance for legitimate traffic during the flood.
| Traffic type | Cache hit rate | Cost per query | Effect on cache |
|---|---|---|---|
| Normal repeated lookups | High (often 80%+) | Near zero on a hit | Healthy — hot names stay warm |
| Random-subdomain flood | ~0% (every label unique) | Full recursion every time | Pollutes cache with single-use negatives, evicts hot entries |
Resource exhaustion: where it actually breaks
"The server gets slow" is too vague to defend against. The flood exhausts specific, finite resources, and knowing which ones tells you what to watch and what to limit.
- Outstanding-query slots. A recursive resolver tracks every in-flight recursion. This table is bounded. When it fills with random-subdomain recursions waiting on slow or overwhelmed authoritatives, new legitimate queries have nowhere to go and are dropped.
- Ephemeral ports and sockets. Each upstream query consumes a source port. Tens of thousands of concurrent recursions can exhaust the local port range and file-descriptor limits.
- CPU on the authoritative side. Negative answers aren't free. Computing NXDOMAIN, and with DNSSEC computing and signing the NSEC/NSEC3 proof of non-existence (RFC 7129), is markedly more expensive than serving a cached positive record.
- Upstream link and round-trip budget. Every cache miss means a real packet to a real authoritative server. The flood multiplies upstream traffic and ties up the resolver waiting on responses that, by design, take a full RTT.
The cruelty of the design is that none of the individual queries is malformed or abusive in isolation. Each one is a textbook-valid request the server is protocol-bound to handle. The attack is entirely in the aggregate pattern, which is exactly why blunt packet filters miss it.
NXDOMAIN floods and entropy as a detection signal
A pure NXDOMAIN flood is the same idea generalised: drive a high rate of queries that resolve to "does not exist." It might target random labels under a real zone, or random non-existent zones outright. The defining symptom is an abnormal ratio of NXDOMAIN responses, and an abnormal distribution of the names producing them.
That distribution is the detection handle. Legitimate traffic from a single client is repetitive and structured: a handful of real domains, asked again and again, with recognisable, pronounceable labels. Attack traffic is the opposite — a firehose of high-entropy, never-repeating gibberish. Shannon entropy (a measure of randomness in the character distribution of the queried labels) separates the two cleanly. A client suddenly emitting a stream of maximum-entropy labels, all resolving to NXDOMAIN, is not browsing the web.
UnveilDNS scores clients on exactly this combination — query rate, NXDOMAIN ratio, and label entropy — so that a device generating random-subdomain noise lights up well before it exhausts the resolver, while ordinary users asking for ordinary names never trip the threshold.
Response Rate Limiting versus blunt rate limits
The naive defence is a flat cap: no client may exceed N queries per second, drop the rest. It half-works and half-hurts. Set it low and you punish legitimate heavy users — a busy NAT gateway behind a single IP, a recursive forwarder, an office of a hundred people sharing one address. Set it high enough to spare them and you let the flood through.
Response Rate Limiting (RRL) is more surgical. Rather than counting raw queries, it limits responses grouped by characteristics — notably the response type and the client (or client prefix). The key insight is that legitimate traffic and attack traffic differ in their response profile. A normal client gets a varied mix of positive answers. An attacking client gets a monotonous stream of identical-category responses — overwhelmingly NXDOMAIN, or overwhelmingly referrals for one zone. RRL clamps down on that narrow, repetitive category while leaving diverse legitimate traffic untouched.
Slip and truncation instead of silent drops
A subtlety that makes RRL safe to deploy: rather than dropping every rate-limited response, it can periodically "slip" a truncated response (the TC bit, RFC 1035 §4.1.1) that tells the client to retry over TCP. Genuine clients follow the TCP redirect and still get answered; spoofed-source flood traffic, which can't complete a TCP handshake, simply evaporates. This converts a damaging UDP flood into a self-limiting trickle without blackholing real users.
| Mechanism | What it counts | Collateral damage |
|---|---|---|
| Flat per-client rate cap | All queries, indiscriminately | High — penalises legitimate heavy users (NAT, forwarders) |
| Response Rate Limiting | Responses grouped by type + client prefix | Low — targets the repetitive attack signature, spares diverse traffic |
Per-client and per-prefix limiting
Spoofed source addresses complicate everything. Attackers routinely forge the source IP on UDP queries, so per-IP accounting can be evaded by spreading the spoof across a wide address range. The countermeasure is to account by prefix rather than by individual address: group IPv4 sources into /24 blocks and IPv6 sources into appropriately sized prefixes (a /56 or /64, since a single subscriber may legitimately own enormous IPv6 space).
Prefix-based limiting raises the cost of evasion. To stay under a per-prefix budget, the attacker now needs forged addresses spread across many distinct network blocks, not just many addresses within one. It also matches the reality of how legitimate traffic clusters — a real customer network sits inside a known prefix, so limiting per prefix tracks genuine usage more faithfully than per-host counting.
- IPv4: aggregate to /24 — wide enough to catch spoofing within a network, narrow enough not to lump unrelated networks together.
- IPv6: aggregate to a /56 or /64, reflecting how providers delegate address space to subscribers.
- Windowed counting: evaluate rates over a short sliding window so a brief legitimate burst doesn't look like an attack, but a sustained flood does.
Auto-blacklisting with scoring and decay
Rate limiting shapes traffic; it doesn't punish persistent offenders. For sources that keep flooding, a scoring system is the durable answer. Each suspicious behaviour — high query rate, lopsided NXDOMAIN ratio, high label entropy, ANY/AXFR attempts — adds to a per-source score. When the score crosses a threshold, the source is blacklisted for a defined duration and its traffic is dropped outright.
The crucial refinement is decay. A static blacklist is a maintenance burden and a false-positive trap: a device that misbehaved once, perhaps because of a transient bug, shouldn't be exiled forever. Instead, scores decay over time. A source that stops misbehaving sees its score bleed back down minute by minute and is automatically released. A source that keeps offending sees its score climb faster than it decays, and stays blocked. The system is self-correcting in both directions.
# Conceptual scoring loop (illustrative, generic)
on suspicious_event(src_prefix, weight):
score[src_prefix] += weight
if score[src_prefix] >= THRESHOLD:
blacklist(src_prefix, duration=BLOCK_TTL)
every minute:
for prefix in score:
score[prefix] -= DECAY # cool down quiet sources
if score[prefix] <= 0:
unblock(prefix)
The result is a blacklist that breathes: aggressive against active floods, forgiving toward one-off anomalies, and requiring no human to curate it during an incident. Example addresses such as 198.51.100.0/24 (a documentation range) stand in for whatever prefix the scoring actually flags.
Blocking ANY and AXFR
Two query types deserve to be refused outright on a public-facing recursive resolver, because they offer attackers leverage and offer legitimate clients almost nothing.
- ANY queries ask for every record type at a name in a single request, producing a large response from a tiny query — a textbook reflection/amplification primitive. RFC 8482 explicitly sanctions minimal or refused responses to ANY, and major resolvers already do so. There is no good reason for a normal client to issue ANY; refusing it removes an amplification vector at zero cost to legitimate use.
- AXFR (zone transfers) request an entire zone's contents. On an authoritative server, transfers belong strictly between primaries and authorised secondaries; on a recursive resolver they have no place at all. An open AXFR is both an information-disclosure hole and a heavy-response amplifier. Block it categorically except for explicitly whitelisted secondary servers.
Putting the layers together
No single mechanism stops a DNS flood, because the attack attacks several resources at once. The defence is layered, and each layer covers a different failure mode: RRL clamps the repetitive response category that floods generate; per-prefix limiting blunts source-spoofing; entropy-and-NXDOMAIN scoring identifies the water-torture signature specifically; decaying auto-blacklists evict persistent offenders without permanent collateral; and refusing ANY/AXFR closes the amplification doors entirely.
The thread tying them together is a refusal to treat all queries as equal. A flat rate limit treats a flood and a busy office identically and gets both wrong. Every effective defence here works by reading the shape of the traffic — its repetitiveness, its entropy, its response profile, its source distribution — and acting on the pattern rather than the packet. That is the difference between a resolver that buckles under a modest botnet and one that keeps answering real questions while the flood burns itself out against a wall it can't see.
The random-subdomain attack is clever precisely because every packet is legal. You can't filter it on validity, only on behaviour. Build your resolver to watch behaviour, and the flood becomes just another pattern to be scored, limited, and let decay back to nothing.
Keep resolving under fire
UnveilDNS includes rate-limiting and auto-blacklisting to ride out DNS floods.
Deploy UnveilDNS free