homelab/docs/NETWORK.md

# Network

How I think about segmentation and why the policy looks the way it does. Specific subnets, VLAN IDs, IP plans, and firewall rule listings live in the private repo.

## Why segmentation matters here

A homelab pulls together an unusually wide trust spread on one piece of hardware: cloud-managed IoT devices that phone home constantly, a work laptop that touches an employer network, a guest WiFi that strangers join, internal services holding sensitive data, and admin surfaces that should never be exposed. Treating all of that as one flat network treats it like it has the same trust level. It doesn't.

The model here is **trust-tier VLANs** with explicit policy between them. Every tier has a documented purpose and a defined inbound/outbound posture.

## Trust tiers

Seven VLANs, organized roughly by how much I trust what's on them:

| Tier | What's on it | Posture |
|---|---|---|
| **Management** | Hypervisor, firewall, backup server, network controllers | Most trusted. Reachable only via VPN. Doesn't initiate outbound unless it has to. |
| **Internal services** | LXCs and VMs running the internal app stack | Trusted. Serves clients in adjacent tiers per policy. |
| **LAN** | Personal devices on home WiFi/Ethernet | Trusted. Consumes internal services. |
| **Work-from-home** | Employer-owned laptop | Untrusted lateral. Internet only — blocked from everything else, including internal DNS. |
| **IoT** | Smart devices, cloud-managed appliances | Untrusted. Internet only. Isolated from everything internal. |
| **Guest** | Visitor WiFi | Untrusted. Internet only. |
| **DMZ** | Internet-facing services | Treated as compromised by default. Locked down on outbound; inbound to internal is a tight allowlist. |
| **VPN (WireGuard)** | Authenticated remote clients | Same posture as LAN, plus admin-tier visibility. |

## Policy posture

- **Default deny inter-VLAN.** Every cross-tier flow is an explicit allow rule with a reason written next to it.
- **WFH and IoT are jailed.** They reach the internet and nothing else internal — not even DNS for the local hostnames. This is the most important rule in the firewall.
- **Management is the smallest possible tier.** Only what *runs* the lab lives there. No user-facing services. No outbound internet from anything that doesn't strictly need it.
- **DMZ is one-way.** Public services live there. They can't initiate connections inward except through a tight, firewall-enforced allowlist by source IP and destination port. The reverse proxy in the DMZ is *configured* to respect that, and the firewall is *also* configured to enforce it. Two layers, on purpose — misconfiguring the proxy is way easier than misconfiguring the firewall.
- **Admin surfaces are VPN-only.** Hypervisor, firewall, backup server, switches, APs — none of them are reachable from the internet. WireGuard first or it doesn't happen.

## DNS

Three layers, each doing one job:

1. **Pi-hole** — first hop for clients on most VLANs. Filters ad/tracker domains and holds the local A records that map internal hostnames to internal IPs. Not used by management hosts (see below) or by the WFH VLAN.
2. **Unbound on the firewall** — Pi-hole's upstream. Recursive resolver, validates DNSSEC.
3. **Cloudflare** — Unbound's eventual upstream when needed.

**Bootstrap exception:** the hypervisor itself (which is the box Pi-hole runs on) is statically pointed at the firewall's resolver, not Pi-hole. Otherwise there's a circular dependency at boot — the hypervisor needs DNS to come up, and Pi-hole is one of the things the hypervisor brings up.

**Known SPOF:** Pi-hole is the only thing resolving internal hostnames. If it dies, internal hostnames stop resolving until it's back. I thought about mirroring the records into Unbound on pfSense and decided not to — I'd rather know if Pi-hole is unhealthy than paper over it. Documented as a known limitation in the private repo.

## Internet exposure

Three ports forwarded from WAN to internal:

- **HTTP / HTTPS** — to the DMZ reverse proxy. Serves the small public service set.
- **WireGuard** — to the firewall. The only remote admin path.

Everything else is closed. I verify this from outside the network on a regular basis — the only way to actually know what's exposed is to scan from somewhere that isn't the LAN.

## IPv6

Disabled at the carrier-provided gateway. The lab is IPv4-only by design — fewer surfaces, simpler firewall reasoning, no AAAA leakage. I'll revisit this if I have a reason to; today I don't.

## Things that are easy to overlook

A couple of things worth being explicit about, because they bit me at some point:

- **Intra-VLAN traffic between LXCs on the same Proxmox bridge doesn't traverse the firewall.** Isolation is enforced *per-VLAN*, not *per-LXC*. Two LXCs sharing a tier can talk to each other directly. Useful to remember when you're reasoning about blast radius — the firewall doesn't see anything that doesn't cross a VLAN boundary.
- **Certificate Transparency.** Caddy uses Cloudflare DNS-01 for cert issuance, which is great because services don't have to be exposed to the internet to get a cert. But every cert that gets issued lands in CT logs forever, and per-hostname certs basically publish the internal hostname inventory to anyone who runs a CT search on the domain. A wildcard cert would limit CT exposure to `*.lerkolabs.com` and the apex; it's on my list as a future change, with the tradeoff being that wildcard compromise is worse than per-host.