DNS tunneling: exfiltrating data through lookups
Most exfiltration channels are loud. A workstation suddenly opening an HTTPS connection to an IP in a country it has never talked to, a 200 MB POST to a pastebin, an outbound SSH session at 3 a.m. — these trip egress rules, proxy logs and NetFlow baselines. DNS is the quiet exception. It is the one protocol almost every network lets leave unconditionally, because if name resolution breaks, everything breaks. Attackers know this, and they have built a whole class of covert channels on top of it: encode your payload into the questions you ask, read the answers that come back, and let the network's own resolvers carry your data out for you.
DNS tunneling is not new — the technique predates most of the firewalls running today — but it remains stubbornly effective precisely because the assumptions that make it work are structural. A recursive resolver must forward a query it has never seen before to an authoritative server. If that authoritative server happens to be controlled by an attacker, every lookup becomes a one-way (or two-way) data pipe. This article walks the mechanism end to end: how data gets in and out, what the throughput ceiling really is, and — the part that matters operationally — the signals that give a tunnel away even when no signature ever fires.
Why DNS makes such a good covert channel
Three properties of DNS, none of them bugs, combine into an almost ideal smuggling medium.
- It is rarely blocked outbound. Port 53 (and increasingly DoH on 443) is treated as load-bearing infrastructure. Blocking it breaks the network, so it survives egress policies that would kill any other protocol.
- It leaves the network by design. A query for a name in a zone you have never cached is forwarded — through your recursive resolver, out to the public internet, to whichever authoritative server owns that zone. The data crosses the perimeter as a normal, expected resolution.
- It is recursive and stateless to the client. The endpoint does not need a direct connection to the attacker's server. It just asks its configured resolver, and the recursion machinery does the routing. From an endpoint-firewall perspective, the host only ever talked to the internal resolver.
Put differently: the attacker registers a domain — say c2.example.net — and points its NS records at a server they control. Any query for <anything>.c2.example.net from inside the victim network will, after recursion, land on that server. The subdomain labels are the upload. The answer records are the download.
How data is encoded into a lookup
A fully-qualified domain name is a sequence of labels separated by dots. RFC 1035 caps each label at 63 octets and the whole name at 255 octets. That is the upstream bandwidth budget for a single query, minus the bytes consumed by the attacker's own zone suffix and any per-message framing.
The uplink: subdomain labels
To send data out, the tunnel client takes a chunk of payload, encodes it into a DNS-safe alphabet, and splits it across labels. DNS names are case-insensitive and restricted in character set, so raw bytes cannot go in directly — they are encoded first. Base32 is common (it survives case folding); some tools use a hex or a custom base to squeeze more in. A query might look like this:
; uplink — payload encoded into labels, queried as TXT
mfrggzdfmztwq2lk.nbswy3dpeb3w64tmmq.s01.c2.example.net. IN TXT
^---------------- encoded data chunks ----------------^ ^seq^ ^attacker zone^
The s01 label is a sequence number so the server can reassemble out-of-order or retransmitted chunks. Because the maximum useful name length is bounded, each query carries only a few dozen usable bytes after encoding overhead. Exfiltrating a megabyte therefore means tens of thousands of queries — which, as we will see, is exactly the signal that betrays the tunnel.
The downlink: answer records
The return path rides in the answer section. The attacker's authoritative server answers each query with data encoded into a resource record. The record type is chosen for capacity and for how little attention it draws:
| Record type | Why it's used for the downlink | Trade-off |
|---|---|---|
TXT | Designed to carry arbitrary text; large payload per answer. | TXT to a random subdomain is conspicuous — humans rarely query it. |
NULL | Opaque binary blob, no formatting constraints; historically high capacity. | Obsolete and almost never seen legitimately — a screaming anomaly. |
CNAME | An encoded hostname looks more "normal" than a TXT blob. | Lower capacity; the alias still has to be a valid name. |
A / AAAA | Blend in with ordinary traffic; data hidden in the address bytes. | Only 4 (or 16) bytes per record — very low throughput. |
EDNS0 (RFC 6891) widens the channel further by allowing UDP responses larger than the legacy 512-byte limit, so a single TXT answer can return a few hundred bytes of downlink. The whole exchange is a request/response loop: the client asks, the server answers, both sides advance their sequence counters, and a session protocol is reconstructed on top of stateless lookups.
The public tooling
You do not have to imagine this. Several mature, openly available tools implement DNS tunneling, originally for legitimate-ish purposes like getting connectivity through captive portals that leak DNS before you have paid for Wi-Fi.
- iodine tunnels IPv4 over DNS. It probes for the largest record type and fragment size the path will tolerate, then runs a point-to-point link over the lookups. It is the canonical "DNS as an IP transport" tool.
- dnscat2 is built for command-and-control rather than raw IP transport. It offers an encrypted, session-oriented channel with multiple concurrent streams, explicitly designed to look like ordinary DNS while carrying a shell.
That second category is the one defenders care about. The same mechanism that bypasses a hotel paywall also exfiltrates a credentials database or maintains a beacon to an external operator — and because the tooling is public and battle-tested, the barrier to using it is effectively zero.
How fast can it actually go?
Slowly, and that is both the attacker's constraint and the defender's tell. Work the arithmetic from the protocol limits:
- A query name maxes out at 255 octets. Subtract the attacker's zone suffix, sequence framing, and base32 overhead (which inflates data by 60%), and you are left with roughly 100–150 bytes of usable uplink per query.
- A downlink TXT answer with EDNS0 might carry a few hundred bytes.
- Every byte is wrapped in a full request/response round trip, gated by recursion latency and any rate limiting along the path.
In practice a careful tunnel moves on the order of a few kilobytes per second; a noisy one can push higher but lights up every counter on the resolver. To move serious volume an attacker must either accept a long, slow drip or crank the query rate — and a host emitting thousands of DNS queries per minute to one zone is not subtle.
The throughput ceiling is the attacker's dilemma in one sentence: go slow enough to hide, and exfiltration takes hours; go fast enough to be useful, and the query rate becomes a flare.
The detection signals
No single indicator proves a tunnel. The reliable approach is to score several behavioral signals together, because the encoding and the round-trip structure leave fingerprints that ordinary resolution does not. Watch for these, per client and per destination zone:
| Signal | Why a tunnel produces it | Why benign traffic usually doesn't |
|---|---|---|
| High query volume to one zone | Every payload chunk is a separate query; bulk transfer means thousands of lookups under a single registered domain. | Real apps reuse a handful of names and benefit from caching; they don't generate unique queries by the thousand. |
| Long, high-entropy labels | Encoded payload is near-random and packs labels close to the 63-octet limit. | Human and app hostnames are short, pronounceable, low-entropy, and heavily repeated. |
| Many unique subdomains, no cache reuse | Each chunk is a fresh name, so cache hit rate collapses toward zero for that zone. | Legitimate zones see the same names over and over; cache absorbs them. |
| Unusual record types | Heavy TXT, or any NULL, to a non-mail, non-verification zone. | End-user clients rarely ask for TXT directly and essentially never for NULL. |
| Skewed uplink/downlink ratio | Long query names paired with small answers (or vice versa) reflect a data channel, not a lookup. | Normal lookups have short questions and proportionate answers. |
| Rare or freshly-registered TLDs/zones | The attacker's C2 domain is often young, obscure, and otherwise unvisited by the network. | Business traffic clusters around established, frequently-seen domains. |
Entropy in one number
Label randomness is measurable. Shannon entropy over the characters of a label separates encoded payload from real hostnames cleanly — a base32 chunk lands near the theoretical maximum for its alphabet, while mail or cdn sit far below it.
# conceptual: per-label Shannon entropy
H = -Σ p(c) · log2 p(c) # over characters c in the label
# "mail" → ~1.9 bits/char (low, repetitive)
# "k7x2m9q4..." → ~4.5 bits/char (near-random, suspicious)
Entropy alone over-flags — long random-looking labels also appear in CDN and cloud hostnames. That is why it is one input among many, weighted alongside query volume, record type, and zone novelty rather than used as a standalone trigger.
Why signature lists alone miss it
The instinct is to block the bad domains. It fails here for a basic reason: in a tunnel, the malicious domain is the parent zone, and the part that varies — the part a blocklist would need to match — is freshly generated for every single query. There is no fixed string to list.
An attacker can register a brand-new zone an hour before the operation. It appears on no feed because nothing has reported it yet; reputation systems have no history to score. The payload labels are unique per query by construction, so even a fuzzy match has nothing stable to lock onto. Curated threat-intelligence feeds are genuinely useful for catching known infrastructure, but a competent tunneling operator simply uses infrastructure that is not yet known. Detection has to come from behavior — what the traffic is doing — not from a list of what is already flagged.
Mitigations that actually move the needle
Defense layers, from the structural to the analytic.
- Force all DNS through a managed resolver. The single highest-value control. Block outbound port 53 and unsanctioned DoH/DoT from every host except your own resolver, so no endpoint can talk directly to an arbitrary authoritative server or smuggle resolution over an encrypted side channel. If every query has to transit one chokepoint, that chokepoint can see — and score — every query.
- Apply per-client, per-zone anomaly detection at that resolver. Baseline normal query rates, label lengths, entropy, record-type mix and cache behavior, then flag the outliers. The signals in the table above combine into a confidence score; a host suddenly emitting thousands of high-entropy TXT queries to one young zone should surface immediately.
- Rate-limit and cap query types per client. Even a perfectly disguised tunnel needs throughput; rate limits turn a slow drip into a slower one and make the volume anomaly sharper. Constraining who can issue bulk TXT/NULL queries removes the high-capacity record types from casual abuse.
- Watch the long tail of zones. Newly-seen, rarely-visited, or obscure-TLD destinations deserve more scrutiny than the established domains your network talks to all day. Novelty is a signal — combine it with volume and entropy rather than acting on it alone.
None of these is a silver bullet in isolation. Egress control without analytics blinds you to tunneling that uses your own resolver as a relay; analytics without egress control lets hosts bypass the resolver entirely over DoH. Together they close the loop: every query passes through one place, and that place understands what normal looks like well enough to notice when a workstation starts speaking in long, entropic, high-volume riddles to a domain nobody has ever heard of.
DNS tunneling survives not because it is clever but because DNS is trusted. The fix is not to distrust DNS — it is to watch it, at the one point every query has to cross, with enough context to tell a lookup from a leak.
See the exfil before it leaves
UnveilDNS surfaces anomalous query patterns so tunneling stands out.
Deploy UnveilDNS free