Walk into a security operations center at almost any large enterprise in 2026 and the scene is depressingly familiar: multiple consoles, queues that never empty, and a tier-1 analyst three hours into a shift who has already closed two hundred alerts without finding a single confirmed incident. Ask the security director what they want to fix, and the answer almost always centers on the same word: “automation,” usually paired with “AI.”
After nearly two decades of working on enterprise security at scale, I have come to believe that this answer, on its own, is wrong. The order matters. Automation applied to a broken detection pipeline just gives you a faster broken pipeline. In my experience, the problem is almost always the same: the tools are fine — the discipline around their use is missing.
The quiet shift I see happening in the best-run security teams today is at a level deeper than the tool stack. Detection logic, which used to be treated as simple configuration, is now being treated as code.
Why “more automation” fails on its own
Recent industry numbers tell the story that most security leaders already feel in their bones. An early-2026 survey of nearly 500 cybersecurity decision-makers, reported by Abnormal Security, found that 49% of SOC analysts cited “alert overload” as their top operational challenge. Another analysis of over five hundred client environments found that an average tier-1 analyst handles around 174 alerts per day, with only ~22% of those requiring real investigation. False-positive alerts are endemic: they commonly constitute 50-80% of all alerts in enterprise SOCs, and some studies have found false-positive rates approaching 99% in extreme cases.
Behind these numbers sits a more uncomfortable fact. Most large enterprises now run dozens of different security products across numerous vendors. Each tool ships with a set of default detection rules. Each one was tuned, in theory, on day one. But in practice, the rules calcify while the environment drifts. Within eighteen months, you end up with detections firing on network traffic patterns nobody truly owns, alerts arriving without context, and analyst trust eroding by the week. By the time someone proposes adding a SOAR platform or an AI co-pilot on top, the underlying detection portfolio is already a museum of unmaintained rules.
Bolting automation onto that museum does not fix it; it only accelerates the problem.
What detection-as-code actually means
The term detection as code itself is not new. Security expert Anton Chuvakin framed it years ago as the threat-detection equivalent of infrastructure as code: a systematic, testable, version-controlled way of writing the logic that decides what your SOC should investigate. What has changed is the growing maturity of the practice and the mounting evidence that it works.
Splunk’s “State of Security 2025” report found that 63% of security teams want to use detection as code frequently or always in the future – but only 35% do so today, a 28-point gap that, in my view, is one of the most useful diagnostics in a large enterprise. It shows where the field is heading, and how far most teams still have to go.
In practical terms, treating detections as code means a few specific things. Large enterprises often get partway there and then stop:
- First: Every detection rule lives in a version-controlled repository. Not in a SIEM’s UI, not locked away on a Confluence page or someone’s laptop. In a Git repository, with full history, diffs, and the ability to roll back a bad rule the same way you would roll back a bad software deployment.
- Second: Every rule change goes through peer review (at least two people). One engineer writes or updates the rule; at least one other person reviews it. The review is not a rubber stamp; the reviewers check for false-positive risk, redundancy with existing detections, proper MITRE ATT&CK technique mapping, and the presence of enrichment fields that will make the alert actionable for an investigator.
- Third: Every rule is tested. Not just “we ran it in dev for a week,” but explicitly tested against known-good telemetry (which should not fire the rule) and known-bad samples that should trigger it. Regression tests run automatically on every change, and a CI/CD pipeline blocks deployment of any rule that fails its tests.
- Fourth: Every rule has a retirement criterion. This is the part most enterprises skip. A detection that fires fifty times a month and never produces a true positive in a year isn’t a meaningful detection at all – it’s just technical debt with a job title. Without a retirement policy, your detection portfolio grows monotonically until the analysts give up.
The Intercom data point
A case study I find especially instructive comes from Intercom’s detection team, documented through their work with Panther (a cloud-native SIEM platform). Their team motto, by their own description, was “Every alert must add value.” They moved their detections into Python-based rules under version control, set up automated data-replay testing, and built deduplication into the workflow itself.
The results they’ve published since are striking. They now handle threats roughly twice as fast as before, and have reduced the time spent on each investigation by about 90%. In other words, they applied software engineering discipline to detection content — no new XDR product, no generative AI co-pilot, no fancy new “single pane of glass.” The platform underneath was, by industry standards, ordinary.
Two takeaways are worth highlighting. First, this kind of discipline produces compounding returns over time, not just a one-off bump. Second, the analysts who lived through the transition were the same people, in the same seats, just armed with substantially better processes and tools to do their jobs. The technology didn’t replace them; it empowered them. What changed was the process around the rules. The people running them stayed the same.
Metrics that show the discipline is working
Most SOC dashboards I review for clients still put two things front and center: mean time to detect (MTTD) and mean time to respond (MTTR). These are useful operational metrics – they tell you whether yesterday went well – but they do not tell you whether your detection portfolio is improving.
A detection-as-code program adds a second tier of metrics that read more like a maturity scorecard for your detection engineering function. Examples include:
- ATT&CK technique coverage (the percentage of relevant MITRE ATT&CK techniques for which you have active detections), tracked over time – especially mapped against the threat profile for your industry or environment.
- Number of rules retired per quarter, which sounds counterintuitive until you realize that healthy detection portfolios shed their dead weight regularly.
- False-positive rate trends per detection family or category, not just an aggregate false-alert rate. Tracking FP percentages by detection use-case can show you where your content needs attention.
- Detection-to-incident conversion ratio, which measures what fraction of your alerts actually lead to confirmed security incidents. This effectively shows whether your alerts are producing real security outcomes or just noise.
- Test coverage on detection logic (similar to how a software team would measure unit-test coverage for their code base).
Crucially, these numbers should be used as part of a feedback control loop. That is what separates a working program from a vanity dashboard. For example, when a particular detection family shows a rising false-positive rate, that’s a signal to invest in better enrichment or to consider retiring those rules. When your ATT&CK coverage flatlines for two consecutive quarters, it means your detection engineering resources might be getting diverted elsewhere and it’s time to refocus.
How to start without rebuilding the SOC
Detection engineering as a discipline does not require a platform migration or a complete SOC rebuild. The teams I see succeed most often start small and deliberately.
Pick five to ten existing detections – ideally the noisiest, most frequently alerting ones that everyone complains about. Move them into a Git-managed project. Stand up a basic CI/CD pipeline that at least checks rule syntax and runs the sample-data tests on every change. Require every rule update to go through two-person peer review via pull requests. Set a retirement criterion: for instance, any rule that fires more than X times without a true positive in 90 days gets flagged for evaluation and likely deletion. Run this pilot for one quarter, and measure the false-positive rate of those detections before and after.
This isn’t glamorous work. There’s no flashy demo, no keynote slide, no single executive presentation that magically unlocks budget. But after a quarter, you’ll have a working pattern. After two quarters, you will have a detection engineering function that other teams start to notice and want to emulate. After four quarters, you will have a detection content portfolio that AI assistants or XDR platforms can genuinely amplify – because the underlying content is now structured, tested, and continuously improving.
The pattern is the same one I’ve seen in every successful enterprise security modernization over the past twenty years. The platforms get the headlines. The discipline does the work.