Checkmarx's pitch on its new SAST engine: the classifier in front of the queue is the product

Every SAST tool on the market promises to keep developers safe. The ones developers actually keep switched on are the ones that don't bury them in noise. (Those two sets are not the same size.) This week Checkmarx made its play at the second list: a new SAST engine, reported by The New Stack on June 19, built around the idea that a finding gets a second opinion before it ever lands on a pull request.

What Checkmarx actually shipped

Three components, wired in series. A deterministic rules-based scanner — the familiar piece, the one that fires on patterns it knows. An LLM trained on security data. And a Findings Analysis Engine — FAE — that classifies findings as true or false positives before they reach the developer. The rules and the model can disagree; the FAE adjudicates.

Jonathan Rende, Checkmarx's Chief Product Officer, is positioned in the coverage as putting that classifier forward as the differentiator alongside the rules engine and the AI-powered coverage. Read the framing carefully: the vendor is not selling the model. It is selling the layer that sits in front of the model, sorting what is worth waking a human for.

Why this lands on a CI/CD owner's desk

If the FAE genuinely closes the false-positive gap most SAST suites bury teams in, the news for the rest of us is less about the LLM and more about where in the pipeline a security finding becomes blocking. A noisy scanner gate teaches developers to memorise the bypass syntax and look away. A pre-triaged stream of findings is a different integration story — fewer items hit the merge queue, and the ones that do arrive with a verdict already attached.

Or that's the sales pitch. Whether the gate stays useful depends on how the FAE classifies findings in your codebase, not in theirs.

What the F1 number is, and is not, telling you

Checkmarx's headline metric is an F1 score of 0.499, against what it presents as a category average of 0.20. That is the vendor's own measurement against the vendor's own bar. It deserves the same scepticism we apply to every vendor benchmark on slide three of a pitch deck. F1 also collapses precision and recall into one figure: a classifier that is fantastic at recall and mediocre at precision can land in the same neighbourhood as one that does the opposite — and "mediocre precision" is the entire reason we were complaining about SAST in the first place.

The other claim is more concrete. Checkmarx says its engine found 327 true positives missed by a leading frontier model in head-to-head testing across four production codebases. A useful data point. Slightly less useful because Checkmarx declined to name which frontier model it tested against. (You can imagine the reasons.) Four codebases is also a thin sample to generalise from.

Wiring a triage layer in without re-noising your pipeline

Take the announcement as permission to think about the pattern, not the product. A pre-developer classifier only earns its place if you treat it carefully in CI:

Gate on the classifier's output, not the raw scan. A blocking step that fires on every raw finding will train the team to bypass it; one that fires only on triaged true positives is a real signal.
Make the classifier's verdict traceable. Every blocked PR should carry the rule that fired, the model's contribution, and the classifier's reason — auditable, not magical.
Keep an override path with an audit trail. Classifiers will be wrong. A reviewer with the right role should be able to mark a finding as a false positive and have that decision flow back as a signal, not as permanent silence on the file.
Don't put the LLM on the merge button. The classifier is a recommendation; humans still own the policy.

Pseudocode-shaped, the pipeline becomes: scanner → classifier → merge gate, with the gate consuming only what the classifier marks true_positive. Everything else lands in a queue for batch review, not on a developer's notifications.

How the wider field is framing the same problem

Checkmarx is not the first to try to fix SAST by adding a layer in front of it. Across the category, vendors have been stitching machine-learning ranking, exploitability checks and reachability analysis in front of raw scanner output for a while, with varying honesty about how much of it is heuristic and how much is model. Pure-LLM-as-scanner shops have gone the other way: skip the rules engine, let the model find issues from cold. Both bets have downsides — the first risks giving heuristics a vote on real security findings; the second tends to invent vulnerabilities that look plausible until someone actually reads the code.

The combined shape Checkmarx is pitching — deterministic rules first, model second, classifier last — splits the difference. Whether it survives contact with codebases that look nothing like its training set is the question that matters. Until that evidence comes from someone who is not also selling the engine, treat the F1 number as a press release and the wiring pattern as the takeaway.

The SAST gate has been noise for so long that "the classifier is the product" is a reasonable bet. Just remember the failure mode that gets less attention than the noisy one: a classifier that quietly hides a real vulnerability is worse than a scanner that yells about a fake one.

Checkmarx's pitch on its new SAST engine: the classifier in front of the queue is the product

What Checkmarx actually shipped

Why this lands on a CI/CD owner's desk

What the F1 number is, and is not, telling you

Wiring a triage layer in without re-noising your pipeline

How the wider field is framing the same problem

Enterprise MCP adoption keeps outrunning its authorization layer

GitHub Actions hands platform teams a workflow-trigger allow list

HCP Packer's enforced provisioners turn golden-image policy into a contract teams can't quietly skip

Turn this into your pipeline. Build it on Buddy.