Threat intelligence 101: how domain reputation actually works
Open any security dashboard and you'll see a domain flagged "malicious" or an IP marked "high risk," usually with a tidy little number next to it. That number feels authoritative. It is also, almost always, a compressed summary of a dozen noisy, half-correlated signals that disagree with each other — flattened into a single value so a firewall, a mail filter, or a DNS resolver can make a yes/no decision in under a millisecond. The number is real. The certainty it implies is mostly an illusion of presentation.
"Reputation" is one of the most overloaded words in security, and the confusion costs real money: blocked customers, missed phishing, alert fatigue, and the occasional embarrassing incident where a perfectly legitimate site gets nuked because it shared an IP with something nasty. This article unpacks what reputation actually is — what signals feed it, how scores get combined and aged, why false positives are structural rather than accidental, and why a reputation score should be one vote in a decision, never the whole election.
Reputation is a score, not a verdict
The first mental model to discard is the idea that a domain is "good" or "bad." A domain is an identifier; its reputation is an estimate of the probability that traffic to it is harmful, conditioned on everything we've observed about it. That estimate moves. A domain registered an hour ago has almost no behavioral history, so its reputation is dominated by weak priors (its registrar, its TLD, its hosting). A domain with ten years of clean traffic that suddenly starts serving malware has a stale-good reputation that lags reality by hours or days.
Because reputation is a probability, it has two properties people routinely forget. It is continuous — there's a meaningful difference between 0.55 and 0.95 — and it is uncertain — a score of 0.7 derived from one weak signal is not the same as 0.7 derived from five strong, independent ones. Collapsing both into "blocked" throws away exactly the information you need to make a good decision.
A reputation score answers "how suspicious does the evidence make this look?" — not "is this malicious?" Those are different questions, and conflating them is the root of most reputation-related outages.
Where the signals come from
No single observation tells you a domain is dangerous. Reputation is an exercise in correlation: stacking many independently-collected, individually-weak signals until a pattern emerges. The sources fall into a handful of broad categories, each with its own strengths and blind spots.
| Signal category | What it observes | Strength | Blind spot |
|---|---|---|---|
| Passive DNS | Historical domain↔IP mappings seen by resolvers over time | Reveals infrastructure reuse, fast-flux, sudden re-pointing | Slow to populate for low-traffic domains; privacy-sensitive |
| Spam & abuse telemetry | Domains/IPs reported across email, web, and network abuse channels | High volume, near-real-time for active campaigns | Reporting bias; shared infrastructure causes collateral hits |
| Honeypots & sinkholes | Hosts that lure scanners, malware callbacks, and crawlers | Direct evidence of malicious behavior, low false-positive rate | Only sees attackers who reach the trap |
| Certificate Transparency logs | Public append-only log of every issued TLS certificate | Catches lookalike/typosquat domains at issuance, before they go live | Issuance ≠ malice; floods of benign certs to filter |
| Registration data (WHOIS/RDAP) | Age, registrar, contact privacy, bulk-registration patterns | Newly-registered + privacy-shielded + cheap TLD = strong prior | Redaction and resellers obscure ownership |
| Hosting / ASN reputation | The autonomous system and provider hosting the content | "Bulletproof" hosts and abuse-heavy ASNs raise the baseline | Major clouds and CDNs host everything, good and bad |
The reason analysts cross-reference so many categories is that each one is easy to evade in isolation but hard to evade jointly. An attacker can register a fresh domain (defeating age-based heuristics), but the certificate they request lands in public CT logs the moment it's issued — and tools like crt.sh make that searchable. They can rotate IPs to dodge a static IP blocklist, but passive DNS records the rotation pattern. They can host on a reputable cloud to inherit a clean ASN, but their callback traffic still trips a honeypot. Reputation works because attackers have to win on every axis simultaneously, and defenders only have to catch them on one.
A note on what these feeds are not
Threat-intelligence feeds are observations, not oracles. A domain appearing on an abuse list means someone reported behavior that looked abusive from where they were standing. That's valuable, but it's testimony, not proof — and testimony has bias, latency, and the occasional outright error baked in. Good reputation systems treat each feed as a witness with a known reliability, not as ground truth.
How scores get combined
Once you have a pile of signals, you have to fuse them into something decidable. The naive approach — block if any feed says "bad" — maximizes recall and torches precision; you'll catch everything malicious and a painful amount of everything else. Real systems weight and combine.
A simplified weighted model looks like this: each signal contributes points scaled by how predictive that signal type has historically been, and the total is clamped into a bounded range. The illustrative pseudocode below uses placeholder weights — actual weights are tuned against labeled data and are revisited constantly.
score = 0
score += w_pdns * passive_dns_anomaly # infrastructure churn
score += w_abuse * abuse_report_density # how many, how recent
score += w_honeypot * honeypot_hits # direct callbacks
score += w_ct * lookalike_similarity # CT-log typosquat match
score += w_age * registration_recency # newer = riskier prior
score += w_asn * hosting_asn_badness # provider baseline
score = min(score, 100)
confidence = independent_signals_agreeing / total_signals_consulted
Two things matter more than the exact formula. First, independence: ten feeds that all re-publish the same upstream source are one signal wearing ten coats — they should not multiply confidence. Second, the confidence term is separate from the score. A high score backed by one lonely signal and a high score backed by five mutually-independent signals are very different bets, and the system needs to remember which is which all the way to the blocking decision.
Decay: reputation is perishable
Infrastructure turns over. Compromised sites get cleaned. Phishing pages get taken down. IPs get reassigned to entirely new tenants. A reputation score that never forgets is a score that slowly fills with ghosts — yesterday's malicious IP is today's small business on a recycled address. So scores decay: the weight of a signal diminishes as it ages, on a half-life appropriate to how fast that signal type goes stale.
| Signal type | Typical freshness window | Why |
|---|---|---|
| Active honeypot callback | Hours to a few days | C2 infrastructure is short-lived and disposable |
| Abuse report | Days to weeks | Campaigns burn out; takedowns happen |
| Domain age / registration | Weeks to months | Risk genuinely falls as a domain proves benign over time |
| ASN / hosting reputation | Months | Provider behavior shifts slowly |
Decay is also what lets a freshly-compromised legitimate site recover. Without it, one bad week would condemn a domain forever, and operators would (correctly) stop trusting your feed.
The precision/recall trade-off, and why false positives are structural
Every blocking system sits somewhere on a curve between two failure modes. Crank the threshold low and you catch almost everything malicious (high recall) while blocking a lot of innocents (low precision). Crank it high and your blocks are almost always right (high precision) while a lot of bad traffic sails through (low recall). You cannot maximize both with the same threshold; you choose where to sit based on the cost of each kind of mistake.
What makes false positives structural rather than fixable is that the modern internet shares infrastructure aggressively:
- Shared hosting. Thousands of unrelated sites can live behind one IP. One sends spam; the IP earns a bad reputation; the other 4,999 inherit it. IP-level blocking punishes the neighbors of the guilty.
- CDNs and clouds. A handful of large providers front a huge fraction of the web. The same IP ranges serve your bank, a hospital portal, and a phishing kit someone spun up an hour ago. ASN-level signals are nearly useless here, and IP-level signals are dangerous.
- Freshly-compromised legitimate sites. A WordPress install gets popped and starts serving malware. For a window of hours, a domain with years of pristine history is genuinely malicious — and the moment it's cleaned, it's genuinely fine again. Any static label is wrong half the time.
- Typosquat collateral. Similarity heuristics that catch
paypa1.comwill occasionally snag a real business whose name happens to resemble a popular brand.
Confidence bands: blocking should respect them
If a score carries a confidence estimate, the worst thing you can do is ignore it and apply one global threshold. The better pattern is banded action: the same score triggers different responses depending on how much evidence stands behind it and how reversible the consequence is.
| Band | Evidence | Reasonable action |
|---|---|---|
| High confidence, high score | Multiple independent strong signals agree | Block outright; log the evidence |
| Medium confidence | Some signals agree, others silent or weak | Block but make it appealable; flag for review; soft-fail for trusted clients |
| Low confidence | One weak signal, or a fresh domain with only priors | Monitor, rate-limit, or warn — do not hard-block |
The asymmetry that should drive band design is the cost of being wrong. Blocking a malware C2 callback that turns out benign costs almost nothing — nobody legitimately needs that lookup. Blocking a payroll provider, a payment gateway, or a hospital's patient portal during business hours can be a genuine emergency. Reversible, low-cost blocks can sit on a hair trigger; high-cost blocks should demand high confidence and a fast path to undo.
Reputation is one input, not the decision
Here's the contrarian part, and the part that matters most operationally: a reputation score should rarely be the sole reason you block something. It's a prior — a starting estimate you update with everything else you know in the moment.
Consider how much context a reputation feed simply doesn't have. It doesn't know that this client is a kiosk that should only ever talk to three domains, or that the lookup volume to a freshly-seen domain just spiked 50× in ninety seconds (a tell for algorithmically-generated domains), or that the query pattern looks like data being smuggled out one label at a time (the signature of tunneling tools like iodine or dnscat2). Reputation is backward-looking and global; your local behavioral signals are real-time and specific. Fused together they're far stronger than either alone.
A 60th-percentile reputation score plus an anomalous local behavior is a confident block. The same score on a quiet, well-behaved client is a reason to watch, not to break things.
This is also why out-of-band reputation checks belong beside the resolution path, not inside it. A DNS lookup has a sub-millisecond budget; you cannot stall it on a network round-trip to a scoring service. The pragmatic architecture answers the query immediately using fast local signals, and consults heavier reputation asynchronously — feeding what it learns back into the blocklists that govern the next lookup. Reputation shapes policy over time; it doesn't gate individual packets in real time.
Putting it together
If you remember one thing, make it this: reputation is a weather forecast, not a verdict. A 70% chance of rain doesn't mean it rained; it means a sensible person carries an umbrella. A 70% reputation score doesn't mean a domain is malicious; it means a sensible resolver treats it with proportionate suspicion — and keeps watching, because by tomorrow the forecast will have changed.
The systems that get this right share a few habits. They prefer many independent weak signals over one loud feed. They track confidence separately from score. They decay aggressively so the past doesn't haunt the present. They block by band, matching the severity of the action to the strength of the evidence and the cost of being wrong. And they never let reputation stand alone — it's the prior, your own real-time behavioral telemetry is the update, and the decision lives in the combination. Do that, and reputation becomes what it should be: a powerful, honest input. Treat the number as gospel, and sooner or later it will block your own payroll on a Tuesday morning and tell you, with total confidence, that it was right to.
Reputation, applied at the lookup
UnveilDNS scores domains and clients and shows you the evidence behind each call.
Deploy UnveilDNS free