As offensive tooling becomes increasingly autonomous, the line between detection and prevention keeps moving. My current focus is building systems that learn the intent behind an attack rather than the signature.
Why RL Beats Static Playbooks
Static rule engines break the moment an adversary mutates their tooling. By contrast, reinforcement learning policies observe the effect of an action and continuously adapt. In my adaptive mitigation stack, each agent receives a reward shaped around lateral movement suppression and dwell-time minimisation.
Building the Simulation Loop
- Signal ingestion: telemetry flows in from Zeek, Suricata, and OT edge sensors.
- Environment modelling: a graph environment replays events and surfaces decision points.
- Policy training: agents explore countermeasures, synthesise new ones, and log confidence envelopes.
Human Oversight Without Slowing Response
Every policy promotion requires explicit analyst approval. A review console visualises what changed, the statistical lift in containment time, and the blast radius under worst-case regression tests.
Autonomous defence is not "set and forget"—it is continuous intent alignment between humans and machines.