DNS log retention done right: a GDPR-friendly policy

Posted 2026-06-09 · 7 min read · complianceprivacy

A DNS query log looks innocuous. Each line is a timestamp, a client IP, a domain name, a record type, and a verdict. No payloads, no credentials, no message bodies — just the question "where is this name?" asked thousands of times a second. Operators reach for the easy mental model: it's metadata, it's plumbing, keep it forever, it might be useful one day. That model is wrong, and under the GDPR it is the kind of wrong that gets expensive.

The uncomfortable truth is that a DNS log is both personal data and, in aggregate, unusually revealing personal data. A client IP identifies a person or a household. The sequence of domains they resolve is a near-complete map of what they read, who they bank with, which dating app they opened at 2am, and which medical or religious site they visited last Tuesday. Treat that as plumbing and you are processing special-category-adjacent information at scale with no policy. This article lays out a retention design that keeps DNS logs operationally useful — incident response, threat hunting, troubleshooting — while staying defensible against an Article 5 audit.

The short version. Keep raw, per-query logs for a short window measured in days, roll them into anonymous or pseudonymous daily aggregates for long-term trends, then delete the raw rows. Lock down access, encrypt at rest, write the policy down, and enforce the deletion automatically rather than hoping someone remembers.

Why a query log is personal — and sensitive

Start with the IP address. The Court of Justice of the EU settled this in Breyer (C-582/14): a dynamic IP is personal data in the hands of a party that has, or can lawfully obtain, the means to link it to a person. A resolver operator almost always has those means — DHCP leases, RADIUS sessions, a subscriber database, or simply the fact that the IP belongs to a known device on a managed network. So the client IP column alone makes every row personal data and brings the whole log inside the GDPR.

Then look at the domain column. DNS is a behavioural sensor. You do not need the page content to infer the subject; the name is enough. A handful of resolutions to a cancer-support forum, a fertility clinic, an addiction service, a specific religious community's site, a trade union, or an opposition party's domain lets you infer data that Article 9 treats as special category — health, religious belief, political opinion, sexual orientation. You are not deliberately collecting it. You are collecting it as a side effect of resolving names, which is precisely why the storage decision matters so much.

The data you never wrote to disk cannot leak, cannot be subpoenaed in bulk, and cannot be re-identified by an attacker who breaches your store. Minimisation is not a compliance chore; it is the cheapest security control you own.

Lawful basis: legitimate interest, not consent

You need an Article 6 lawful basis for the processing. For a security-driven resolver, the right basis is almost always legitimate interest (Article 6(1)(f)), not consent. Consent is the wrong tool here: it must be freely given and revocable, and a resolver that stops logging the moment a single user objects cannot do its security job. Network defence is a textbook legitimate interest — the GDPR's own Recital 49 explicitly recognises processing for network and information security as a legitimate interest of the controller.

Legitimate interest is not a free pass. It requires a documented three-part balancing test, often called an LIA (Legitimate Interest Assessment):

Test	Question	Typical answer for a security resolver
Purpose	Is the interest real and specific?	Yes — detecting malware C2, phishing, data exfiltration, and abuse on the network you operate.
Necessity	Could you achieve it with less data or a shorter window?	Granular logs are necessary for incident response; long-term trends do not need raw rows.
Balancing	Do the individual's rights override your interest?	Mitigated by a short retention window, aggregation, access control, and transparency — see below.

The retention design in this article is the balancing test made concrete. A short raw window plus aggregation is the single most powerful argument you have that your interest does not steamroll the data subject's rights. Write the LIA down before you are asked for it, and revisit it when your processing changes.

Article 5 principles, applied to a query log

Article 5 lists the principles every controller must satisfy. Map them onto DNS logging and the policy almost writes itself.

Purpose limitation (5(1)(b))

Decide, in writing, what the logs are for: security detection and response, abuse handling, and service troubleshooting. That is the purpose. Using the same logs to profile users for marketing, to rank "productivity", or to build a behavioural dossier is a new purpose and needs its own basis. Keep the security purpose narrow and you keep the justification strong.

Data minimisation (5(1)(c))

Log only the fields the purpose needs. A security verdict needs the domain, the time, the record type, and enough client identity to act on an incident. It does not need the full resolved answer set stored forever, and it rarely needs the client IP retained in clear once the immediate response window has passed. Drop columns you cannot justify.

Storage limitation (5(1)(e))

This is the heart of the policy: personal data must be kept "no longer than is necessary". "Necessary" is defined by your purpose, not by disk capacity. Necessary for incident response is days, not years. Necessary for capacity planning and trend reporting is satisfied by aggregates that are no longer personal data at all.

Integrity and confidentiality (5(1)(f))

Whatever you do keep must be protected — encrypted at rest, access-controlled, and audited. A retention window is worthless if the store is world-readable.

A concrete retention pattern

Here is a pattern that satisfies storage limitation while keeping the logs genuinely useful. Think of it as three tiers with automatic promotion and deletion between them.

Tier	Contains	Retention	Serves
Raw / granular	Per-query rows: time, client, domain, type, verdict	Short — typically 7–30 days	Incident response, threat hunting, troubleshooting
Daily aggregates	Counts per domain, per category, per verdict, per day — no per-query client linkage	Long — months to a couple of years	Trends, capacity planning, reporting
Anonymous metrics	Totals: queries, block rate, top categories	Indefinite (not personal data)	Dashboards, historical comparison

The mechanism is a scheduled roll-up. Each day, the previous day's raw rows are summarised into aggregate counts, and once that summary is committed the raw rows for the now-expired window are dropped. The aggregate carries the analytical value forward; the personal data does not survive past the window you set.

Day N      raw rows written as queries resolve
Day N+1    roll-up job: raw(Day N) → daily aggregate(Day N)
Day N+W    raw rows for Day N deleted (W = your retention window)
forever    daily aggregate retained (anonymous or pseudonymous)

The key design choice is that the aggregate must not reconstruct the individual. "Domain example-bank.com was resolved 4,210 times on 2026-06-08" is a trend metric. "Client 198.51.100.23 resolved example-bank.com at 02:14" is personal data. The roll-up should produce the former and discard the latter. Where you need a longer client-level history for a specific security reason, pseudonymise — replace the IP with a keyed hash — and document why, but understand that pseudonymous data is still personal data and still subject to the rules.

Tune the window to your risk, not your habit. A high-security network investigating slow, stealthy intrusions may justify 30 days of raw logs. A general-purpose resolver may only need 7. The number is a decision you make and record — what you cannot defend is "we kept everything because deleting it was effort".

Access control and encryption at rest

Retention limits how long data exists; access control limits who can touch it while it does. The two together are what Article 32 ("security of processing") expects. For a DNS log store, the baseline is straightforward and non-negotiable:

Encryption at rest. Full-disk or volume encryption on the log store so a stolen disk or a misplaced backup is not a breach. Encrypt backups with the same rigour as the live store — old backups are where forgotten personal data goes to haunt you.
Least-privilege access. Querying raw logs should be a privileged operation reserved for security and operations staff with a reason, not a default capability of every dashboard user. The people who read trend charts do not need row-level access.
Audit the access. Record who queried the raw logs and when. Access logs are themselves sensitive but they are the evidence that your controls are real.
Network isolation. The log store should not be reachable from the public internet, and the API in front of it should authenticate every call.

None of this is exotic. It is the same hygiene you would apply to any database of customer data — the point is to recognise that a DNS log is a database of customer data and treat it accordingly, rather than as a debug artefact left lying around with open permissions.

Data-subject rights, and how aggregation makes them survivable

Chapter 3 of the GDPR gives individuals rights over their data: access (Article 15), erasure (Article 17), objection (Article 21), and others. For a per-query log retained indefinitely, honouring these is a nightmare. A subject access request would force you to extract every resolution tied to one person across years; an erasure request would force you to surgically delete rows from live and backup stores; an objection would put your whole logging operation in question.

Aggregation quietly dissolves most of this. Once raw rows have aged out and only anonymous aggregates remain, there is no longer personal data to access, rectify, or erase — the right simply does not bite on data that no longer identifies anyone. Within the short raw window, the requests are tractable precisely because the dataset is small and time-bounded.

Right	On indefinite raw logs	On short window + aggregates
Access (Art. 15)	Years of rows to extract per person	Days of rows; aggregates fall outside scope
Erasure (Art. 17)	Surgical deletes across live + backups	Window expiry deletes it for you; aggregates not personal
Objection (Art. 21)	Threatens the whole logging programme	Balancing already favours you; window limits exposure

There is a real tension to acknowledge: erasure is not absolute. Article 17(3) preserves data needed for legal claims or, in some cases, security, and recital 49's security interest can justify retaining a specific record during an active investigation. The clean answer is to scope any such retention narrowly — keep the one record the incident needs, under a documented exception, and let everything else expire on schedule.

Document it — the policy is the deliverable

A retention design that lives only in a cron job and a senior engineer's head is not a compliance posture; it is an accident waiting for an audit. The GDPR's accountability principle (Article 5(2)) requires you to be able to demonstrate compliance, which means writing things down.

A retention schedule stating the raw window, the aggregate retention, and the deletion mechanism — with the numbers, not "as needed".
The Legitimate Interest Assessment recording your purpose, necessity, and balancing reasoning.
A record of processing activities (Article 30) listing the DNS log as a processing activity, its purpose, categories of data, retention, and recipients.
A privacy notice telling users that DNS queries are logged for security, how long, and how to exercise their rights — transparency is itself part of the balancing test.
A DPIA (Article 35) if the processing is large-scale or systematic, which ISP-scale or whole-network DNS logging usually is.

The reassuring part is that good engineering and good compliance point the same way here. The retention pattern that protects users — short raw window, aggregate the rest, encrypt, lock down, automate the deletion — is also the one that makes the paperwork short and the audit boring. You are not trading utility for compliance; you are getting both by refusing to hoard data you were never going to use.

The best DNS log is one you can fully justify keeping. Retain what your security purpose genuinely needs, summarise the rest into numbers that identify no one, delete on a clock you do not have to think about — and let the policy, not your disk, decide how long a person's browsing history lives on your server.

Logs you can defend

UnveilDNS aggregates detail into summaries and ages out raw logs — useful and compliant.

Deploy UnveilDNS free

UnveilDNS Blog

DNS log retention done right: a GDPR-friendly policy

Why a query log is personal — and sensitive

Lawful basis: legitimate interest, not consent

Article 5 principles, applied to a query log

Purpose limitation (5(1)(b))

Data minimisation (5(1)(c))

Storage limitation (5(1)(e))

Integrity and confidentiality (5(1)(f))

A concrete retention pattern

Access control and encryption at rest

Data-subject rights, and how aggregation makes them survivable

Document it — the policy is the deliverable

Logs you can defend