Limiting Attack Surface: From Architecture to Operations
In my Security by Design post, I listed Minimize Attack Surface as one of the core principles. Minimizing the attack surface is more than just closing open ports. While closing unnecessary ports, and consequently services, is critical and part of it, minimizing the attack surface means reducing the total number of ways an attacker can gain access to or interact with a system, or, put simply, reducing the attack vectors or angles that exist in the system. That also includes closing deprecated APIs, management planes, identity paths, dependency hooks, and operational tooling.
The general philosophy is simple to understand. It is about having the fewest possible entry points, privileged paths, forgotten interfaces, and places in general where small mistakes can turn into large incidents.
As software is developed, the number of potential attack vectors increases, often without being properly tracked: an added library, an uncontrolled default setting, a forgotten API.
While sometimes it is impossible to control or checkpoint all potential threat vectors, you should identify all of them. That way, you can do basic risk management and choose how to address them. Some can be managed, others controlled, others mitigated, and even transferred. But nonetheless, knowing they exist allows you to at least prepare for the scenario where everything fails. And to prepare for it to fail in a constrained space, rather than across the whole system.
Network & Ports
The fastest and highest-leverage place to reduce attack surface is reachability. Open ports are not just ports, they are leads. The first step of a vulnerability scan, penetration test, or worse, a real attack, is checking what open services exist on the target.
Every exposed service is an interface an attacker can:
- scan
- fingerprint
- brute force
- exploit
- deny service against
- or simply observe for patterns and configuration mistakes
Most of the time, systems are not breached because attackers are brilliant. They are compromised because the environment offered too many easy attempts or ways in. Sometimes, it really is as easy as spotting a live legacy service with a known high-score CVE and launching a ready-made exploit from Metasploit.
1. Management as a separate plane
A common recurring situation is exposing admin interfaces temporarily and then forgetting to remove or disable them:
- SSH or RDP on public IPs
- Kubernetes dashboards
- Grafana and Prometheus interfaces
- database admin ports
- hypervisor consoles
- vendor appliance web interfaces
The basic rule to follow here is quite simple: if it is management, it should not live in the same plane as user traffic.
This means:
- production services have public ingress only where required
- management access happens via private routing and strong identity controls
- admin interfaces are reachable only from a restricted network path, such as a VPN or dedicated admin subnet, and/or even authorized IPs, if that is possible to implement
In practice, this is where setups like OPNsense with WireGuard help. Not because WireGuard is magically safe in itself, but because it allows you to easily create a dedicated subnet just for management, making access structurally private.
The best management port is the one that is not reachable from the public internet.
2. Default deny inbound and be intentional about every exception
Default deny is rare in practice because exceptions easily accumulate throughout the development process. But ideally, it should be supported by an explicit public exposure inventory:
- what is reachable from the internet
- on which ports
- through which ingress points
- for what reason
- with which authentication and authorization
- with what monitoring
- who owns it
If you cannot list those quickly, you are not in control of your exposure.
3. Collapse ingress points into enforceable chokepoints
A common microservice failure mode is every service having its own ingress. That expands surface area and makes policy inconsistent. Instead:
- terminate external traffic through a small number of controlled ingress points
- enforce authentication and authorization at those boundaries
- standardize rate limits and abuse controls
- centralize TLS policy and certificate rotation
Whether you are using an API gateway, ingress controller, reverse proxy tier, or service mesh ingress matters less than the architectural property: fewer doors with better locks.
4. Do not trust the internal network
Internal networks are not inherently trusted. Flat internal networks turn any foothold into lateral movement. If there are no controls in the local network, then once access is gained, it becomes free to navigate. Attack surface is not only internet-facing. It is also east-west reachability. Operationally, this means:
- segment subnets by function and sensitivity
- restrict service-to-service communication to what is required
- authenticate and authorize internal calls
- avoid Layer 2 bridging across sites unless there is a strong reason
Layer 3 routing, clear subnets, and explicit firewall rules are boring, both in planning and in putting them in place. But that is the point. They may be boring, but they are also debuggable and containable. And if your network is ever breached because of a vulnerability someone else forgot to patch, you will surely appreciate having them in place.
5. Prefer silent services and remove unnecessary protocol negotiation
Some services are chatty by design and can easily hand attackers or scanners metadata. For example, ssh can be quite safe, especially with some basic configurations like using key-only authentication and pairing it with fail2ban. But even ssh can leak information about a system to an attacker, like the service version, the cryptography used, and sometimes even the operating system running.
WireGuard is an example of the opposite model. It does not provide a banner and does not expose meaningful metadata during connection attempts. Without valid keys, attackers or probes cannot complete the handshake or confirm that the service is actually present. To an unauthenticated scanner, the port behaves like generic UDP traffic with no identifying response, which makes the service very difficult to fingerprint from the outside. It is not a security control on its own, but it significantly reduces the information available to opportunistic scanning and fingerprinting.
Practical takeaways:
- minimize services that expose metadata during negotiation
- avoid leaving default banners and version identifiers visible
- prefer no response over a helpful response for unauthenticated callers
- do not leak environment details through error pages at the edge
Stealth services are like a professional upgrade of security through obscurity. Not because the system is hidden, but because you cannot even find the door or the keys unless you already know both.
6. Kill temporary exposure paths
It is common to find insecure temporary exceptions opened during the development process:
- a firewall rule opened for debugging
- a NodePort left reachable
- a port forward left running
- a cloud security group rule that was never removed
- a temporary VPN split-tunnel exception
A strong operational pattern here is automation:
- infrastructure as code for network policy
- policy guardrails that reject wide-open rules by default
- scheduled audits that compare intended exposure with actual exposure
Attack surface reduction is sometimes mostly about fighting entropy.
7. Reduce ports by reducing protocols
If your environment needs:
- SSH into every host
- database ports broadly accessible
- custom admin services per system
It will be very difficult to defend. Centralized control planes and standardized access paths can remove entire classes of connectivity requirements. Some examples:
- use SSO-backed access proxies instead of exposing management interfaces
- use private runners and tightly scoped CI connectivity rather than broad inbound access
- adopt structured remote access such as VPN plus routing instead of per-host exposure
The smaller the digital space, the easier it is to protect it.
Application Wide
Network exposure is usually what everyone thinks of when limiting the attack surface, but it is not the only aspect to take into consideration. Application behavior is just as important. There is no point in securing the network if the application itself is a way into the system. Many production systems contain endpoints and flows that are rarely used, poorly tested, and lightly monitored, sometimes even totally unmonitored, but still reachable.
Attackers look for forgotten functionality, and they often find it.
1. Remove endpoints, do not just deprecate them
Deprecation notices are not security controls. If an endpoint is not used, remove it from the system. If you cannot remove it, put it behind a hard boundary:
- internal network path
- strict authentication and authorization
- allowlisted callers
- aggressive rate limiting
- explicit monitoring
Forgotten endpoints are one of those elements that stack security debt very quickly. Treat old endpoints as liabilities with a carrying cost.
2. Make admin a separate product
Many compromises come from admin functionality living too close to user functionality:
- admin endpoints in the same API surface
- admin pages reachable from the same origin
- internal tooling deployed in the same cluster and reachable through the same ingress
Admin can be more than just a role. Improperly configured, it is a threat model.
Operationally:
- separate admin interfaces into a different access path
- require stronger authentication by default
- log admin actions with higher fidelity
- design explicit workflows for privileged changes where appropriate
Treating administrative capabilities as a separate product forces you to design explicit boundaries around power. And, not to paraphrase Machiavelli, but also in security, power without boundaries is usually where the worst failures originate.
3. Reduce input complexity at the boundaries
Attack surface includes inputs:
- large and complex payloads
- dynamic query languages
- file uploads
- deserialization formats
- flexible filters that become injection surfaces
Every flexible input format expands what can go wrong.
Some simple wins:
- strict schema validation at the edge
- reject unknown fields
- cap sizes and recursion depth
- prefer constrained query patterns over arbitrary JSON filters
- treat file uploads as hostile and isolate their handling paths
Tight controls over what can be introduced into a system are a simple but very efficient security control.
4. Contract-first design reduces ambiguity
In distributed systems, inconsistency becomes attack surface. Different services can interpret the same request differently.
Versioned contracts reduce:
- unintended behavior changes
- parameter parsing differences
- silent widening of accepted inputs
- accidental bypasses due to different validation rules
Security benefits from stable and explicit contracts.
5. Fail closed and fail clean
When systems fail, they can accidentally expand attack surface:
- verbose error leaks
- partial authentication bypass during exception handling
- fallback modes that are too permissive
- debug paths left enabled in production
Failure paths should be:
- as strict as normal paths
- less informative to an attacker
- more informative to operators through logs and traces rather than user responses
More than just failing safely, as approached in the security by design principles, it is also important to make sure that failure does not give away exploitable information about the system.
Operational attack surfaces
Other attack surfaces are easily forgotten and often where painful incidents originate.
1. Identity is attack surface
Every token, role, scope, and service account is a reachable capability.
Reduce surface by:
- removing wildcard permissions
- shortening token lifetimes
- scoping CI/CD credentials tightly
- limiting where privileged identities can be used from
- separating break-glass access from normal admin workflows
A system with minimal network exposure can still be wide open if identity is broad and uncontrolled.
2. Build and deployment pipeline as a front door
Attackers target pipelines because compromise there scales.
Reduce pipeline surface by:
- minimizing who can modify build definitions
- isolating runners
- pinning dependencies
- protecting signing keys
- enforcing least privilege for CI tokens
- making artifact promotion explicit and auditable
When the pipeline is compromised, an attacker no longer needs to break into production directly. Production itself will deploy the compromise for them and open the way in.
3. Observability systems can become soft targets
Dashboards and logging platforms frequently end up:
- internet-reachable
- weakly authenticated
- overpowered and able to expose secrets in logs
Treat them as sensitive systems:
- private access paths
- strong authentication
- role separation
- careful control over logged data
Monitoring and logging should help defenders understand the system, not give attackers a map of it.
4. Non-production environments count
Most defensive effort is usually focused on production environments. But development and staging are often softer than production while still containing:
- production data copies
- shared credentials
- public exposure
- weaker monitoring
If an attacker can land in staging and pivot, your production posture may not matter much.
Attack surface reduction includes making non-production environments:
- private by default
- data sanitized
- isolated from production identity and secrets
So even if they are breached, there is nothing meaningful to gain from them.
5. Dependencies and supply chain exposure
Attack surface is not limited to the systems you write. It also includes the software you depend on. Modern applications often include:
- hundreds of open source dependencies
- transitive libraries nobody consciously selected
- build-time tooling with high privileges
- automated update mechanisms
Every dependency adds code that runs with your application’s privileges. Reducing supply chain attack surface includes:
- removing unused dependencies
- pinning versions explicitly
- auditing transitive dependencies
- limiting who can publish internal packages
- controlling which registries builds are allowed to pull from
Dependencies should be treated as part of the system’s exposed surface, not as invisible implementation details. This is probably one of the most laborious parts to track, but there are known cases of large applications suffering from exploits that originated in vulnerable libraries.
6. Operational tooling and automation
Operational tooling frequently holds the highest privileges in an environment. Examples include:
- configuration management systems
- infrastructure automation
- backup and restore systems
- cluster administration tools
- secret rotation jobs
These systems are powerful by design. They can often:
- execute commands across environments
- access sensitive infrastructure APIs
- read or write production data
- modify system configuration at scale
That power also makes them attractive attack targets. Reducing operational attack surface includes:
- limiting which systems can execute automation
- separating operational roles from application roles
- isolating automation credentials
- logging and auditing privileged automation activity
In many environments, compromise of operational tooling is equivalent to compromising the infrastructure itself.
Final Remarks
Limiting attack surface is one of my favorite parts of cybersecurity in general. In many ways, it reminds me of some basic principles of StarCraft base defense. While we are talking about very different universes and systems, the strategic principles of warfare and battle tactics often apply here. The bigger your base, the bigger your presence, the more careful and planned your defense must be.
It is also one of the few security efforts that tends to pay off immediately:
- fewer scans hitting you
- fewer alerts that are just internet noise
- fewer urgent patch situations
- fewer systems to reason about under stress
- fewer “I had no idea that was exposed” moments
The principle is simple:
- if it does not need to exist, remove it
- if it needs to exist, constrain it
- if it must be reachable, make it observable and enforceable
Reducing attack surface is not a one-time hardening sprint. It is a discipline of subtraction.
In this battle, you do not win by building a taller wall. You win by designing a smaller castle. Reduce the number of doors, then harden the ones you keep.