AI security tools are here. The hard part was never the tools.

Two things happened in the last few weeks that, taken together, mark a genuine shift in how security work gets done.

First, Anthropic moved Claude Security into public beta for Enterprise customers. The product — previously called Claude Code Security when it launched in limited preview in February — scans a full codebase or a targeted directory, reasons through data flows and component interactions in the way a human researcher would, then produces findings with severity ratings, confidence scores, and suggested patches, all before anything reaches an analyst. Anthropic's own team used the underlying model (Claude Opus 4.6 at preview time, Opus 4.7 in the current release) to find over 500 vulnerabilities in production open-source codebases — bugs that had survived years of expert review and fuzzing. That isn't a marketing stat: Anthropic is in active responsible disclosure with the relevant maintainers.

Second, the MCP ecosystem has quietly produced a category of security intelligence server that would have taken months to build in-house just two years ago. Projects like the cve-mcp-server — a production-grade server giving an AI assistant 27 integrated security intelligence tools across 21 APIs — put CVE lookup, EPSS scoring, CISA KEV status, MITRE ATT&CK mapping, Shodan host reconnaissance, and VirusTotal hash checking behind a single natural-language interface. You point an AI client at it, and you can ask it to evaluate a dependency manifest, score a suspicious IP, or confirm whether a specific CVE is actively exploited — and it goes and does that work.

These two things are related. They tell the same story from different ends of the attack surface.

What has actually changed

The dominant critique of traditional static analysis — SAST tools, linters, dependency scanners — has always been the same: pattern matching finds known patterns. It catches exposed secrets, outdated library versions, and common injection vectors, but it misses the stuff that actually gets organisations compromised: broken access control across service boundaries, logic flaws in authentication flows, chained vulnerabilities that look safe in isolation but deadly in combination.

What reasoning-capable models do differently is context. Claude Security traces data as it moves through an application, reads the call graph, and understands what a piece of code is supposed to do before it decides whether it does it safely. Each finding goes through an internal adversarial pass — the model attempts to disprove its own result before surfacing it. That reduces false positives and, more importantly, reduces analyst fatigue on the back end. The findings that land on a developer's screen have already been interrogated once.

On the intelligence side, the MCP tooling story is that an analyst with a capable AI client now has a composable interface into the same threat data sources that required bespoke integrations or expensive platform licences. EPSS scoring tells you how likely a given CVE is to be exploited in the next 30 days. CISA's Known Exploited Vulnerabilities catalogue tells you whether it already is. ATT&CK mapping tells you which adversary techniques the vulnerability enables. Put those together in one query chain, and you have triage logic that used to take a senior analyst half an hour per finding, running in seconds.

The velocity of this tooling is striking. The CVE MCP server project appeared and matured in a matter of weeks. Anthropic moved from research preview to enterprise beta for Claude Security in under three months. The category is not slowing down.

The catch

None of this means vulnerability management has become easy. It means the detection portion of vulnerability management has become dramatically cheaper and faster. Those are not the same thing.

Consider what happens after a scan. Claude Security identifies a business logic flaw in an authentication service. The recommended patch is technically correct. Now what? The service is owned by a team that has three open sprints and a product release in two weeks. The patch touches a component that two other services depend on. Security needs to communicate the finding, validate it against the production architecture (which may differ from what was scanned), negotiate a remediation timeline, track the fix through review and deployment, and verify it closes the original exposure without opening a new one. That's not a model problem. That's a coordination problem.

Or consider rolling out any of this tooling into an organisation for the first time. Adding Claude Security to a development pipeline requires deciding which repositories get scanned, how findings route to engineering teams, what the escalation path is for critical findings, how the security team absorbs an initial volume of results from years of accumulated technical debt, and how you prevent the tool from being quietly switched off because a team lead found it noisy. Each of those questions is a change management question. None of them has a technical answer.

This is the part of security work that AI is not compressing. Planning an engagement, scoping it against a live organisation's actual risk posture, running discovery across business units that have different maturity levels, sequencing remediation against business risk rather than CVSS scores, getting buy-in from teams who were not consulted during procurement — this is where engagements succeed or fail. The tool finding the bug is the easy bit.

The service provider's role has shifted, not shrunk

Some practitioners have started treating the drop in tooling cost as a reason to worry about value compression. I don't read it that way. What has changed is where the value sits, not whether it exists.

When detection was expensive and time-consuming, a significant share of an engagement was the mechanical work of finding things. Now that finding is fast and cheap, the engagement is almost entirely the other parts: understanding the organisation deeply enough to scope the right attack surface, prioritising findings against actual business impact rather than severity ratings, managing the communication between security findings and engineering roadmaps, providing the legal and regulatory framing that determines which findings carry notification obligations under the Privacy Act 1988 (Cth) or trigger reporting requirements under the Security of Critical Infrastructure Act 2018 (Cth), and staying with the organisation through the remediation cycle rather than dropping a report and leaving.

The AI handles the throughput. The practitioner handles the judgment. A scan that finds 300 vulnerabilities in a codebase is not a finished engagement — it's the start of one. Triaging those 300 findings against an organisation's actual architecture, threat model, regulatory exposure, and change capacity requires someone who understands all of those things and can translate between the security team, the engineering teams, and executive stakeholders. That work is harder than it was before, not easier, because the volume of raw signal has increased and the expectation that it will be turned into action has increased with it.

What this looks like in practice

If you're looking at standing up AI-assisted code scanning for the first time, the practical questions worth resolving before you pick a tool are:

Who owns the findings backlog, and does that person have authority to direct engineering remediation? If not, findings will accumulate and the scan will eventually be de-prioritised.

How does the tool fit your existing development workflow? Claude Security integrates with Claude Code and can push to Slack, Jira, or a CSV export. But the integration has to be designed — it doesn't self-configure to match how your teams actually work.

What's your false-positive handling process? Even with adversarial self-verification, findings will occasionally be wrong or context-dependent. A dismissal workflow that carries decisions forward across scans matters for analyst efficiency and for audit purposes.

How do you handle a scan that surfaces a vulnerability in a dependency you can't patch because a vendor doesn't support the newer version? That's a risk acceptance and contractual question, not a technical one.

None of these questions are hard. But they are the questions that determine whether the tool produces outcomes or produces noise.

Putting this into your own environment

The AI security tooling wave is real and it's moving fast. For organisations that have been waiting for the technology to mature before investing in AppSec uplift, the maturity is here. For organisations already running static analysis pipelines, the question is whether reasoning-based scanning changes what you find — and based on Anthropic's own disclosure programme, the answer is almost certainly yes.

What we do at Artificer Cyber is design and run these engagements: scoping the right attack surface, integrating the tooling into your development and security workflows, interpreting findings against your regulatory context, and running the remediation cycle through to close. If you're thinking about standing up AI-assisted code security or want a second opinion on a scanning programme that isn't producing the outcomes you expected, reach out through the contact page.