Systems Admin

KCC: How AD’s Knowledge Consistency Checker Builds the Replication Topology

You never have to tell Active Directory “DC1 should replicate with DC2.” AD figures it out itself. The component that does the figuring is the Knowledge Consistency Checker (KCC) — a background process on every DC that runs every 15 minutes, looks at the forest’s sites, site links, and DCs, and computes the replication topology. The KCC creates and maintains the connection objects that everything in Part 5 (frequency) actually flows over.

This is Part 6 of the AD Replication Deep Dive series. Understand the KCC and you understand why AD just works without admin care — and what breaks the day you assume it doesn’t need any.

What the KCC actually does

The KCC’s job is to answer two questions, continuously:

  1. Intra-site: Which DCs inside this site should replicate directly with which others, so every DC is at most 3 hops from every other?
  2. Inter-site: Which DC in this site should be the bridgehead, and which other-site bridgeheads should it talk to?

The output of those answers is a set of connection objects stored under each DC’s NTDS Settings object in the Configuration NC. A connection object says “DC X pulls from DC Y for these naming contexts.”

Connection objects: the actual contract

You can see them in Sites and Services: open a site, expand a Server, open NTDS Settings, and look at the connection objects under it. Each one represents one inbound replication relationship for that DC.

Important: connection objects are directional. A connection on DC2 that says “pull from DC1” doesn’t imply a return connection. DC1 needs its own inbound-from-DC2 object for the reverse flow. The KCC builds both directions for normal partners.

Connection objects come in two flavours:

  • Automatically generated (the default) — the KCC owns them. It will rebuild or remove them if the topology changes. Annotated “<automatically generated>”.
  • Manually created — an admin made them. The KCC will not touch these. They override automatic ones for the same source/target. Useful for force-routing, dangerous to forget about.

Intra-site topology: the ring + chords

Within a site, the KCC builds a bidirectional ring across all DCs. With 5 DCs you get 5 connections forward + 5 reverse, total 10 connection objects. Every DC has exactly two intra-site partners by default.

The 3-hop guarantee: in a 5-DC ring the worst case is 2 hops in either direction. As the site grows, the KCC adds chords — cross-ring connections — so the worst-case stays at 3 hops. A 12-DC site might have 24 ring connections plus 4 chords.

Why the 3-hop limit? Each hop adds the intra-site notification delay (15 sec). 3 hops × 15 sec = 45 sec maximum convergence within a site. That’s the SLA AD makes you for free.

Inter-site topology: the ISTG

Inter-site work is handled by one specific DC per site: the Inter-Site Topology Generator (ISTG). It’s a role implicitly held by the DC with the lowest GUID in the site (you don’t configure it).

The ISTG’s job:

  1. Read the site links defined in the Configuration NC.
  2. Run a shortest-path algorithm using site-link costs to figure out which sites talk to which directly, and which transitively via intermediate sites.
  3. For each cross-site partner site, pick one DC in this site to be the bridgehead for each NC.
  4. Create connection objects on that bridgehead pointing at the remote site’s bridgehead.

If the ISTG dies, another DC in the site takes over within minutes — it’s an implicit, not assigned, role.

The 15-minute heartbeat

Every DC runs its KCC every 15 minutes. Each run:

  • Re-evaluates the intra-site ring — adds connection objects for new DCs, removes for demoted DCs.
  • Checks the ISTG-generated inter-site objects on this DC — if this DC just became a bridgehead, builds its outbound connections.
  • Detects failed connection partners (no replication for X cycles) and routes around them.

You can force a run with repadmin /kcc DC01. This is rarely useful — the regular 15-minute cycle catches everything — but it’s handy after deliberately changing site links and wanting the topology to update immediately.

A worked example: adding a new site

Imagine a forest with three sites — HQ, NYC, LON — connected by a hub-spoke topology centred on HQ. You add a fourth site, TOR, with a site link TOR↔HQ at cost 100. Here’s what the KCC does, in order:

  1. You promote DC-TOR-01 into the TOR site. As soon as Configuration NC replicates, every ISTG sees TOR exists.
  2. HQ’s ISTG runs (next 15-min cycle): “TOR has a DC; I need a bridgehead here pointing at TOR.” Picks a DC in HQ, creates outbound connection objects to DC-TOR-01.
  3. TOR’s ISTG (which is DC-TOR-01, since it’s the only DC in TOR) runs: “HQ has DCs, I’m the bridgehead here, create inbound connection objects.”
  4. For NYC and LON: their ISTGs run. “Is there a direct site link to TOR? No. Is there a transitive path? Yes — through HQ.” They do not build direct connections to TOR. TOR traffic for NYC will flow TOR → HQ → NYC.
  5. The whole topology stabilises within one 15-min KCC cycle.

No admin involvement, no manual connection objects, no errors.

What happens when a DC fails

Connection objects continue to exist on every other DC’s NTDS Settings even after the partner dies. The KCC detects the failure on its next run (replication errors X times in a row) and:

  • For an intra-site failure: routes around the dead DC by adding a new connection object that skips it.
  • For an inter-site failure: if the dead DC was a bridgehead, the ISTG re-picks a different DC in that site as the new bridgehead.

If you completely demote the failed DC (clean ntdsutil metadata cleanup), the connection objects get deleted by the KCC within a cycle or two. If you leave stale NTDS Settings objects in the Configuration NC, the KCC keeps trying to route through them — one of the most common “why is replication broken?” root causes.

Manual connection objects: when and why

The KCC’s topology is usually right. Cases where you’d override it:

  • Forcing specific paths over compliance boundaries: If a regulator says “data must flow A → B directly, never through C,” a manual connection guarantees it.
  • Bandwidth-asymmetric WAN links: The KCC doesn’t know one direction of a link is 10x faster than the other. You can pin replication to the fast direction.
  • Slow links inside a site: If two sub-sites are technically “one site” in AD but have a slow inter-building link, manual connections + the “disable automatic generation” flag let you avoid the slow path.

The hidden cost: the KCC won’t auto-heal a manual connection. If the source DC goes away, the manual object becomes a stale pointer that has to be cleaned up by hand.

Visualising the topology

Three tools:

repadmin /showrepl DC01

Shows every inbound connection on DC01 with last-success timestamps and any recent failures.

repadmin /showrepl * /csv | Out-File -Encoding utf8 repl.csv

Dumps the full forest topology into a CSV file. Open in Excel to spot DCs that haven’t replicated for hours.

Get-ADReplicationConnection -Filter * |
  Select-Object Name, AutoGenerated, ReplicateFromDirectoryServer, ReplicateToDirectoryServer |
  Format-Table -AutoSize

Lists all connection objects forest-wide via PowerShell.

Things that bite people

Sites with no site link

A site in Sites and Services but no site link connecting it to anything is invisible to the ISTG — the KCC builds no inter-site connections to or from it. New DCs in that site replicate intra-site only and never see the rest of the forest. The fix is to add a site link.

Stale connection objects from demoted DCs

If you reinstall a DC without first cleanly demoting the old one, manual or automatic connection objects pointing at the “ghost” DC linger. The KCC tries to use them and fails. Always do ntdsutil metadata cleanup after a forced removal.

Disabling automatic generation site-wide

An “Options” bit on a site or NTDS Settings object can tell the KCC: “don’t auto-generate connections in this site.” Useful in massive forests where Microsoft Consulting Services has designed the topology by hand. Catastrophic in normal environments — new DCs get no connections, replication never starts.

Confusing the KCC with replication itself

The KCC doesn’t move data. It only computes which DCs talk to which. If replication is failing, repadmin /kcc rebuilds the topology but doesn’t force a sync — use repadmin /syncall for that.

What’s next

The KCC computes the graph. The edges of that graph between sites land on specific DCs — the bridgeheads. Part 7 in the AD Replication Deep Dive pathway covers bridgehead servers in detail: how they’re elected, why their failure is special, and when (rarely) to override the election with a preferred bridgehead.

Leave a Reply