TTL, caching and prefetch: the three levers behind fast DNS

Posted 2026-06-09 · 7 min read · performance

Every web request you make starts with a name that has to become a number. Type a hostname, and before a single byte of HTTP flows, your machine asks "what IP is this?" — and waits. That wait is invisible when it's a fraction of a millisecond and infuriating when it isn't. The difference between those two outcomes is almost never the speed of the authoritative servers on the far side of the internet. It's whether the answer was already sitting in a cache near you.

DNS feels instant because of three levers working together: the TTL that tells a resolver how long it may keep an answer, the cache that holds those answers so the next user doesn't pay the network cost, and prefetch, which quietly refreshes popular records before they expire so nobody ever hits a cold lookup. Pull these levers well and 90%+ of queries never leave your building. Pull them badly and you either serve stale answers during an incident or hammer upstreams with traffic that should have been free. This piece is about how the three interact and where each one bites.

What a TTL actually is — and the trade-off it forces

Every DNS record carries a Time To Live: an unsigned integer, in seconds, that a domain owner attaches to each record. It is a permission slip. It says "any resolver may cache this answer for up to this many seconds before it must ask again." A TTL of 300 means five minutes; 86400 means a day. The field is defined in RFC 1035 and clarified in RFC 2181, which pins it as a 32-bit value to be interpreted in the range 0 to 2147483647 — values with the top bit set should be treated as zero.

The TTL is the single most important number in DNS performance, and it is a genuine trade-off with no free side:

TTL choice	Upside	Downside
Long (hours to a day)	Most queries served from cache; tiny upstream load; fast for users	Changes propagate slowly — a record edit can take the full TTL to be seen everywhere; stale answers during a migration or outage
Short (seconds to minutes)	Changes go live almost immediately; great for failover, load balancing, blue/green deploys	Far more cache misses; every miss is a network round trip; upstreams and your own resolver work harder
Zero	Never cached — always authoritative-fresh	Every single query is a full lookup; this is how you melt a resolver under load

The domain owner sets the TTL, not you. As a resolver operator you mostly respect it — but you also get to decide on guard rails. A record advertising a one-second TTL doesn't have to mean you re-query it every second under a 10,000 queries-per-second flood. Resolvers commonly clamp absurdly low TTLs up to a sane floor, and cap absurdly long ones to a ceiling so a typo of 2592000 (30 days) doesn't pin a bad answer in cache for a month. The art is honouring the intent of the TTL while refusing to let it become a denial-of-service vector against yourself.

Rule of thumb. Use long TTLs for things that rarely move (your apex A record, MX hosts) and short TTLs only for records you actually intend to change quickly — failover targets, CDN steering. A blanket 60-second TTL on everything is a common and expensive mistake.

The cache and the economics of a hit

A recursive resolver's cache is just a map from question to answer, each entry stamped with the moment it expires. When a query arrives, the resolver checks the cache first. A hit returns in microseconds — no packets leave the box. A miss means the resolver has to do the real work: walk down from the root if needed, query the authoritative nameservers, validate the response, store it, and only then answer the client. That round trip is where the milliseconds — sometimes tens or hundreds of them, especially over an encrypted upstream on a high-latency link — get spent.

The economics are stark. Consider a network where each cache miss costs, say, 40 ms over the wire and each hit costs 0.2 ms:

Cache hit rate	Avg lookup time (40 ms miss / 0.2 ms hit)
50%	~20.1 ms
80%	~8.2 ms
90%	~4.2 ms
95%	~2.2 ms
99%	~0.6 ms

Those numbers are illustrative, but the shape is real and it is non-linear: the last few percent of hit rate buy you the biggest perceived speed-up, because they are exactly the queries that would otherwise have been the slow ones. This is why a resolver that almost always hits cache feels dramatically faster than one that hits 80% of the time, even though 80% sounds high. The whole game of DNS performance is pushing that hit rate toward the asymptote — and that is precisely what prefetch and serve-stale are for.

Negative caching: remembering what doesn't exist

Caching a successful answer is obvious. Caching a failure is less obvious and just as important. When a name doesn't exist, the authoritative server returns NXDOMAIN. If a resolver didn't remember that, a misconfigured client looping on a non-existent hostname — or a malware sample generating thousands of junk domains — would force a fresh authoritative lookup every time. That's expensive for you and noisy for everyone upstream.

RFC 2308 defines negative caching. The trick is that the TTL governing how long you may cache a negative answer doesn't come from a normal record — it comes from the SOA record's minimum field in the authority section of the negative response. The resolver caches "this name does not exist" for that duration and answers subsequent identical queries instantly from cache, without bothering the authoritative servers again.

Negative caching is quietly one of the best defences against accidental query floods. A client stuck retrying a dead hostname costs you one upstream lookup, not ten thousand — because every retry after the first is answered from the negative cache.

It matters for security too. Algorithmically generated domains and tunnelling tools tend to produce huge volumes of names that don't resolve. Solid negative caching keeps that chaff from turning into upstream load, while the resolver's analytics still see the query pattern for what it is.

Prefetch: refreshing before anyone waits

Here is the subtle problem with even a perfect cache. An entry eventually expires. The very next user to ask for that name pays the full cost of the cache miss — they get the slow lookup so that the next thousand users get fast ones. For a popular record queried constantly, that unlucky user hits a cold cache every single TTL period. It's a small, recurring tax on your busiest names.

Prefetch removes the tax. The idea: when a frequently-requested record is getting close to expiry, the resolver proactively re-queries it in the background before it goes stale, and swaps the fresh answer in. Users keep getting instant cache hits and the refresh happens invisibly, off the critical path. Nobody ever waits for the popular names — which, by definition, are the ones most users are asking for.

Prefetch only makes sense for records that are actually popular. Re-fetching something queried once an hour wastes upstream queries for no benefit; the value is concentrated in the small set of hostnames that make up the bulk of traffic. A good resolver tracks which records get hit often enough to be worth refreshing and only spends prefetch effort on those. The result is a hit rate that creeps toward 100% on exactly the queries where it matters most.

The interaction with TTL. Prefetch doesn't override the domain owner's TTL — it re-validates against it. You still ask the authoritative server again; you just do it a beat early and in the background, so the answer stays fresh and the user never feels the round trip.

Optimistic answers and serve-stale (RFC 8767)

Prefetch handles the common case. But what about the moment an authoritative server is unreachable — a DDoS against the domain's nameservers, a transit outage, a DNSSEC misconfiguration on their side? Without help, your cached entry expires, your refresh fails, and you're forced to return SERVFAIL to the client. The name effectively goes dark even though you had a perfectly good answer a minute ago.

RFC 8767 — "Serving Stale Data to Improve DNS Resiliency" — formalises the escape hatch. When a resolver cannot reach the authoritative servers to refresh an expired record, it is permitted to serve the last known answer past its TTL, with a short capped TTL on the stale response, while it keeps trying to refresh in the background. The standard recommends a maximum stale period (it suggests a ceiling on the order of a day) so you don't serve genuinely ancient data forever.

There are two related behaviours worth separating:

Optimistic / serve-stale on failure — only kicks in when the upstream is unreachable. It trades a small risk of staleness for staying up during an outage. This is RFC 8767's core promise.
Optimistic response — answer the client immediately from the about-to-expire cache entry, then refresh in the background regardless. It shaves latency off the edge case where a record expires mid-query, at the cost of occasionally handing out an answer that was a few seconds past its nominal TTL.

Both are opinions about the same trade-off: is a slightly stale answer better than no answer? For most user-facing traffic, yes — a page that loads against a minute-old IP beats a page that fails to resolve. For security-sensitive records you may want it tighter. The point is that serve-stale converts an authoritative outage from "the internet is broken for our users" into "we kept serving the last good answer until things recovered."

Cache poisoning resistance: speed you can trust

A cache is only an asset if the answers in it are real. The classic attack — cache poisoning — tries to slip a forged response into the resolver before the legitimate one arrives, so that the poisoned answer gets cached and served to everyone for the whole TTL. The infamous 2008 Kaminsky-class attacks showed how practical this was when resolvers were predictable.

The baseline defences are about making a forged reply astronomically hard to guess. A resolver should:

Randomise the query ID — the 16-bit transaction ID must be unpredictable, not sequential.
Randomise the source port — RFC 5452 made source-port randomisation a baseline requirement, adding ~16 more bits of entropy an attacker has to guess.
Match the question — verify the response actually answers the question that was asked, in the case it was asked, and discard anything that doesn't fit the in-flight query.
Use encrypted transport upstream — when the resolver talks to its upstreams over an encrypted channel, off-path forgery becomes effectively impossible because the attacker can't see or inject into the conversation at all.
Validate DNSSEC where present — cryptographic signatures let the resolver reject forged data outright rather than trusting it on the basis of guessable fields.

A fast cache that can be poisoned is worse than a slow one — it serves the wrong answer at wire speed to every client behind it. Poisoning resistance isn't separate from performance; it's the precondition that makes caching safe to lean on.

Sizing the cache: how much RAM, and why

None of this works if the cache is too small. A cache that can't hold your working set evicts entries before they're reused, your hit rate collapses, and prefetch fights a losing battle against eviction. The cache lives in RAM, and RAM is the constraint.

The sensible approach is to scale cache size with available memory rather than nailing it to a fixed number. A small appliance with a couple of gigabytes shouldn't try to hold the same working set as an ISP-scale box with tens of gigabytes — and it doesn't need to, because its query population is smaller. A reasonable heuristic is to dedicate a bounded fraction of system RAM to the DNS cache, with a sane floor so even a tiny box caches usefully, and a ceiling so the resolver never starves the rest of the system:

System RAM	Roughly the right cache footprint	Why
2 GB	A couple hundred MB	Small client population; working set fits easily
8 GB	Several hundred MB to ~1 GB	Branch / SMB scale; comfortably holds the hot set with headroom
16 GB+	Capped around 1 GB for the record cache	Past a point, more cache buys little — the hot set is finite, and extra RAM is better spent on connection state and analytics

The non-obvious lesson: bigger isn't endlessly better. DNS traffic is heavily skewed — a small number of names account for the overwhelming majority of queries (a Zipf-like distribution). Once the cache comfortably holds that hot set, doubling its size adds entries that are queried once and never again. Beyond the knee of the curve, RAM is better spent elsewhere. UnveilDNS sizes the cache from detected RAM automatically for exactly this reason — the goal is to land on the knee, not past it.

Putting the three levers together

TTL, caching and prefetch aren't three separate features; they're one system. The TTL sets the clock. The cache turns that clock into saved round trips. Prefetch and serve-stale make sure the clock never costs a user a slow lookup — prefetch by refreshing hot records early, serve-stale by holding the line when the authoritative side is down. Negative caching keeps the failures cheap, poisoning resistance keeps the whole thing trustworthy, and right-sized RAM is what lets the cache hold enough to make all of it pay off.

Tuned together, the effect is a resolver where the vast majority of queries are answered in microseconds from local memory, the popular names are always warm, the non-existent ones are remembered cheaply, and an upstream outage degrades into "slightly stale" rather than "down." Tuned badly — short TTLs everywhere, an undersized cache, no prefetch — you get a resolver that's both slow for users and loud against everyone upstream of you. The levers are simple. Pulling them in the right combination is what separates a DNS service that feels instant from one that merely works.

Fast DNS isn't about a faster network path to the answer. It's about arranging things so the answer was already here before you asked.

Faster lookups, fewer round trips

UnveilDNS tunes cache size and prefetch to your hardware automatically.

Deploy UnveilDNS free

UnveilDNS Blog

TTL, caching and prefetch: the three levers behind fast DNS

What a TTL actually is — and the trade-off it forces

The cache and the economics of a hit

Negative caching: remembering what doesn't exist

Prefetch: refreshing before anyone waits

Optimistic answers and serve-stale (RFC 8767)

Cache poisoning resistance: speed you can trust

Sizing the cache: how much RAM, and why

Putting the three levers together

Faster lookups, fewer round trips