DDoS protection for a resolver: rate-limiting, RRL, auto-blacklist
Most DDoS conversations are about web servers: a wall of HTTP requests, a CDN absorbing the blast, a WAF deciding who gets through. DNS is treated as plumbing — something that either works or it doesn't. But a resolver sits in a uniquely exposed position. It speaks a stateless protocol, mostly over UDP, where forging the source address is trivial and a tiny query can summon a large answer. That asymmetry is exactly what attackers exploit, and it puts your resolver in two very different lines of fire at once.
The first is obvious: the resolver is a target. Flood it with enough queries — or, more cleverly, enough queries it has to fail to answer — and it stops serving the legitimate clients behind it. The second is more insidious: the resolver becomes an unwitting weapon. If it answers queries from spoofed source addresses, it can be aimed at a third party in a reflection-and-amplification attack, multiplying the attacker's bandwidth and burning your reputation in the process. Defending the resolver means defending against both roles, and the mechanisms overlap less than you'd hope.
Two roles, two threat models
It is worth separating these cleanly, because a control that helps one role can be useless — or actively harmful — for the other.
| Role | What the attacker wants | Primary defence |
|---|---|---|
| Resolver as victim | Exhaust CPU, sockets, cache, or the upstream link so legitimate clients get no answer | Rate-limiting, NXDOMAIN/entropy protection, auto-blacklisting |
| Resolver as amplifier | Send spoofed-source queries so large answers are reflected at a victim | Response rate limiting (RRL), access lists, refusing recursion from strangers |
The reflection role deserves a moment of arithmetic. A 60-byte query for a record set that returns several hundred bytes — think a large TXT response or a signed answer carrying DNSSEC material — yields an amplification factor in the range of 30× to 50×. The attacker spends one unit of bandwidth and delivers thirty to the victim, with your IP address on the return envelope. The victim sees you attacking them. This is why an open or loosely-scoped resolver is not just a risk to itself; it is a liability to the whole network.
Response Rate Limiting (RRL): the anti-amplification valve
RRL is the oldest and most important reflection defence, formalised in the operational practice that grew up around authoritative servers and adapted for resolvers. The insight is that a reflection attack produces a very specific signature: a flood of identical or near-identical responses being sent to the same network, because the attacker is hammering one query against your resolver with the victim's spoofed address.
Instead of rate-limiting on the query, RRL rate-limits on the response, grouped by client prefix and by response characteristics. When the rate of similar responses to a given prefix exceeds a threshold within a sliding window, the resolver stops sending full answers and starts dropping or truncating them. Crucially, it does this per-prefix, so a busy legitimate client on one subnet does not poison the limits for everyone else.
Slip ratio: don't go fully silent
A naive limiter would simply drop everything over the threshold. RRL is smarter. It uses a slip ratio: out of every N responses that exceed the limit, it lets one through — but as a truncated answer rather than full silence. The slip mechanism matters because of how spoofing works.
- If the source is spoofed (the attack case), the victim never asked for the answer and discards it. Dropping costs the attacker bandwidth and costs the victim nothing.
- If the source is legitimate but caught in a noisy prefix, the occasional truncated answer is a signal, not a denial — which leads directly to the next mechanism.
Truncation to force TCP
A truncated response is a DNS answer with the TC (truncation) bit set and no useful payload. Per RFC 1035 and clarified in RFC 7766, a client that receives a truncated answer is expected to retry over TCP. This is the elegant part of RRL: TCP requires a real three-way handshake, which a spoofed-source attacker cannot complete because they never see your SYN-ACK. So truncation acts as a proof-of-address challenge.
Legitimate clients fall back to TCP and get their answer. Spoofed clients cannot, because the handshake goes to the victim, who never replies. RRL turns DNS's own protocol rules into an authentication test.
The cost is a little extra latency and a TCP socket for the legitimate client that got slipped. On a tuned resolver that is a rounding error compared to participating in a reflection attack.
NXDOMAIN floods and random-subdomain attacks
Not every attack tries to amplify. A class of attack aims squarely at exhausting the resolver by forcing it to do the most expensive thing it can do: a full recursive lookup that fails. These are the random-subdomain (sometimes called "water torture") and NXDOMAIN-flood attacks.
The mechanic is simple and nasty. The attacker picks a real victim domain — say example.com — and fires queries for randomly generated labels: a8f3k2.example.com, zq91x7.example.com, and so on. Each label is unique, so nothing is in cache. The resolver dutifully forwards every one to the authoritative servers for example.com, which return NXDOMAIN. The attack hurts in two directions: it saturates your recursion path and outstanding-query tables, and it can knock over the victim's authoritative infrastructure as collateral.
Detecting randomness with entropy
Volume alone is a weak signal — a busy CDN generates lots of distinct names too. The stronger signal is the character distribution of the queried labels. Algorithmically random labels have high Shannon entropy: a near-uniform spread of characters, few real bigrams, a high consonant-to-vowel ratio. Human and machine-generated-but-legitimate names cluster very differently.
An entropy-based detector scores incoming labels and watches for a sustained spike of high-entropy names aimed at a single parent domain. When it crosses a threshold, the resolver can throttle or refuse the random tail without harming legitimate lookups for the same domain. The same statistical machinery that catches algorithmically generated malware domains is useful here — high entropy is the common thread.
label: zq91x7k2m4.example.com
├─ entropy: high (≈ near-uniform character spread)
├─ bigram freq: low (few real letter pairs)
└─ verdict: random-subdomain candidate → throttle parent
Auto-blacklisting: score, decay, expire
Rate-limiting handles the aggregate. But some sources are simply abusive and should be removed from the conversation entirely, at least for a while. The trick is doing that automatically without permanently banning a client that had one bad minute behind a shared NAT.
A scoring model handles this gracefully. Each source prefix accumulates a score as it triggers limits — a rate-limit hit adds points, a detected random-subdomain burst adds more, a refused zone-transfer attempt adds more still. Three parameters govern the behaviour:
- Score threshold — the point at which a source is blacklisted and its queries dropped outright. Set it high enough that normal bursts never reach it.
- Decay — the score bleeds off over time when the source behaves. A client that misbehaved briefly recovers on its own; a client that keeps offending stays elevated.
- Duration — once blacklisted, the entry expires after a fixed window rather than living forever, so a dynamic IP reassigned to an innocent user isn't punished indefinitely.
The combination of decay and expiry is what makes automatic blacklisting safe to run unattended. Without decay, every transient spike becomes a permanent ban and your blacklist slowly eats your customer base. With it, the system is self-correcting: persistent attackers are held down, and one-off noise washes out.
| Behaviour over time | Score trajectory | Outcome |
|---|---|---|
| Single short burst | Rises, then decays below threshold | Never blacklisted, or briefly then released |
| Sustained abuse | Climbs past threshold faster than it decays | Blacklisted for the full duration, re-armed if it persists |
| Clean traffic | Stays at zero | Never affected |
Query filtering: cut off the dangerous query types
Some queries have no business arriving at a recursive resolver from arbitrary clients, and refusing them outright removes whole attack classes for free.
Block AXFR zone transfers
AXFR is the full-zone transfer mechanism (RFC 5936). It belongs between an authoritative primary and its secondaries, full stop. A recursive resolver should never honour an AXFR from a client — at best it leaks information, at worst it's a reconnaissance step. Blocking AXFR (and its incremental cousin IXFR) at the resolver edge is a clean, zero-false-positive control.
Constrain ANY
The ANY query type asks for every record at a name in one shot, which historically made it a favourite for amplification — one small query, one enormous answer. Modern practice, following RFC 8482, is to refuse or minimise ANY rather than return the full record set. Treating ANY as something to suppress by default removes one of the biggest amplification levers in the protocol.
Per-prefix limits: IPv4 and IPv6 are not the same shape
Every limiter above operates on a prefix, not a single address, and getting the prefix size right is one of the few genuinely tricky tuning decisions. The reason is the structure of address allocation.
In IPv4, a single subscriber typically lives behind one address (or shares one via carrier-grade NAT). Grouping by a moderately-sized prefix balances catching distributed sources against not lumping unrelated customers together. IPv6 is the opposite world: a single subscriber is routinely handed a whole /56 or /64. If you rate-limit IPv6 at the individual-address granularity, an attacker simply rotates through the billions of addresses in their own delegation and never hits a limit. You must aggregate IPv6 at the delegation boundary — the prefix the subscriber actually controls — or the limiter is theatre.
| Family | Typical subscriber footprint | Limiter implication |
|---|---|---|
| IPv4 | One address (often CGNAT-shared) | Aggregate at a modest prefix; mind shared NAT pools |
| IPv6 | An entire /56 or /64 delegation | Aggregate at the delegation prefix, never per-address |
The practical upshot: a resolver needs independent prefix-length settings for IPv4 and IPv6, and the IPv6 one matters more than people expect. A limiter tuned only for IPv4 reasoning will be silently bypassed by any IPv6-capable attacker.
Why you tune to your traffic, not to a template
Every threshold in this article — responses per second per prefix, NXDOMAIN rate, entropy cutoff, blacklist score, decay rate, prefix lengths — is a function of your baseline. A resolver fronting a thousand home routers and one fronting a regional ISP have query distributions that differ by orders of magnitude. A threshold that is paranoid for one is wide open for the other.
The correct procedure is empirical, not aspirational:
- Measure your normal traffic first: peak queries per second, the shape of your busiest legitimate prefixes, your steady-state NXDOMAIN ratio (it is never zero — typos and stale records are constant).
- Set thresholds comfortably above the normal peak, not at it. The goal is to catch the abnormal, not to clip the busy-but-legitimate.
- Run in observation before enforcement where you can, and watch what would have been blocked.
- Revisit after traffic grows. A threshold set for last year's load becomes a false-positive generator as your subscriber base expands.
A DDoS control that blocks legitimate clients is just a slower DDoS that you inflicted on yourself. The defences are only as good as the baseline you measured them against — which is why the tuning, not the toggle, is the real work.
The resolver is the most leverageable host on most networks: small to attack, expensive to lose, and dangerous when turned against others. The good news is that the defences compose. Scope recursion so you are not an open amplifier, let RRL turn the protocol's own truncation rules into an address-authentication test, watch label entropy to catch the floods that try to make you fail, and let a decaying score quietly retire the sources that keep coming back. None of these is exotic. Run together, and tuned to traffic you actually measured, they keep the resolver answering through a storm that would otherwise take it — and everything behind it — offline.
Resolve through the storm
Rate-limiting, NXDOMAIN protection and auto-blacklisting come built in.
Deploy UnveilDNS free