LLM Security Capabilities Skepticism

Some commenters doubt LLMs can autonomously find and patch vulnerabilities, comparing them to slot machines requiring expert handlers. Others cite Mozilla's AI security research as evidence.

View on HN · Topics

I’m with you until that last sentence, which I’ve been thinking about as “… until AI code testing, vulnerability scanning, and developer support tools help to limit the number of 0-days and vulnerabilities making it into production”.

So prevention will be more important than ai-assisted rapid containment or patching, though both of those capabilities will be necessary as part of defense in depth.

And some sort of AI-enabled security analysis across the organization’s architecture that is done as part of testing ahead of new software entering production to ID potential vulnerabilities caused by configuration changes or upgrades that modify how systems interact with each other.

I’ve been trying to guess the timeframe for seeing improved secure development, but I’m hoping it’s a bit closer to 6 months - 1 year given the speed of AI adoption and AI progression. May be closer to 3 years as you stated.

In the meantime, is there more to be done than this (not in order)?

- Patch COTS software

- re-evaluate the scoring for previous vulnerabilities

- set up up containment measures capabilities for systems that can’t be patched / high risk vendors

- use frontier model vuln scanning and patching for home grown systems that may have more 0-days than COTS depending on the organization’s capability

- limit the number of vendors / simplifying the tech stack.

I’d be happy to hear how others are thinking about this.

View on HN · Topics

we simply can't absolve ourselves of responsibility in input and expect a hardened output. It's ABSOLUTELY up to the engineers to have test harnesses and scenarios for testing, vulnerability scanning, etc. Just because we can move faster via prompts doesn't mean we neglect the SDLC.

I think there's opportunity to reinvent the pipeline with AI powered tools to assist but the onus is still on the person to ensure they are deploying something that has been tested.

View on HN · Topics

>any vulnerability in any software available for inspection is going to be instant public knowledge. Or at least public among anybody who matters.

Shouldn't this naturally lead to a state where all (new) code is vulnerability-free? If AI vulnerability detection friction becomes low enough it'll become common/forced practice to pre-scan code.

View on HN · Topics

Certainly, and some discoveries have been attributed to AI (I was reading that mozilla firefox were praising mythos recently)

But that's not accounting for all of the discoveries, not at all.

I've also seen the npm people talking about the surge in AI code overwhelming the ability to properly review what's being distributed, and a large number of vulnerabilities being attributed to that

View on HN · Topics

> people were already diffing kernel commits and figuring out which ones were security fixes
With skill, and usually not consistently and systematically. With AI, anyone can do this to any software.

I would like to see actual evidence of this, not.. vibes

I mean, this reeks of "Anyone is a Principal developer now" when the truth is there is still work to do.

View on HN · Topics

If we assume that there will be an AI that is perfect in terms of ability to find vulnerabilities, cheap to run and widely available to everyone, then anyone can run it on any piece of software before deploying it. All vulnerabilities get found before they can be exploited.

One of the big challenges with cybersecurity is that attackers only need to find one exploit, while defenders need to stop everything. When you have a large surface area and limited resources, it's much easier to be the side that only has to succeed once. AI eliminates the limited resources problem.

View on HN · Topics

> If we assume that there will be an AI that is perfect in terms of ability to find vulnerabilities

...so if we assume a halting oracle?

View on HN · Topics

It's not only Linux though and many projects don't have the funding to perpetually use something like Mythos.

View on HN · Topics

LLMs aren't capable of doing this, and never will be no matter what Anthropic tries tell you.

View on HN · Topics

That's the same mindset some people had 3 years ago when they said AI wouldn't be capable of software development. Look where we are now.

View on HN · Topics

I have unlimited access to every single frontier model, I've tested all of them, they are not good at writing software.

They are basically slot machines, sometimes you win a little bit and sometimes you win a lot but usually you just burn a ton of time and money sitting and staring at a screen (and frying your brain).

View on HN · Topics

Mozilla seems to think it can.

https://blog.mozilla.org/en/privacy-security/ai-security-zer...

View on HN · Topics

Ahh yes, I'm sure agents did this all autonomously without any human in the loop what so ever. They are useless without experts to handle them.

View on HN · Topics

So then have the Linux-using organizations employ experts to handle them then.

View on HN · Topics

The old saying of Tony Hoare about no obvious bugs vs obviously no bugs holds in the age of LLMs more than ever

View on HN · Topics

> There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies." — Tony Hoare

for those (like me) who hadn't seen it before.

View on HN · Topics

The quick test doesn't show a lot - by out straight asking if this is a security patch, it implies and guides AI to have output more probably to agree on this assumption. A confusion matrix is more useful. Nonetheless of course this is not a detailed ai capability testing blog.

View on HN · Topics

[author]

I agree it is not much additional evidence! If someone wanted to try running the same test on a series of N commits from that list including this one I'd be very curious to see the answer!

View on HN · Topics

Realistically, if you are scanning each kernel commit to check if they might be patching a security issue, you are going to be asking an LLM "is this security related, if so vaguely how" with low effort and taking "maybe" as a yes before feeding it to a more expensive model. You aren't trying to establish a probability of an ultimately unknowable fact, there is ground truth that you can find by producing an exploit, so you are just trying to pre-filter before spending the money to find it.

View on HN · Topics

Yeah, ideally we would need the phi coefficient (aka MCC, the binary Pearson correlation), which can be calculated from a confusion matrix of yes/no LLM classifications for all kernel diffs. (Number of true positives, true negatives, false positives, false negatives.)

View on HN · Topics

Sounds like you're expecting the AI-based tools that are finding bugs to also provide fixes.

I've been dealing with a bunch of AI-generated (or at least -assisted) vulnerability reports lately. In many cases the reports include proposed patches to fix the issues.

It's been..... interesting. In many cases, the analysis provided in the report has been accurate and helpful. In some cases, the proposed patches have also been good, and we've accepted them with minimal or no changes.

In other cases, despite finding a valid issue, and even providing a good analysis of the problem, the AI tool's suggested patch has been, quite simply, wrong.

Careful review from somebody who really _understands_ the code -- and the wider context in which it is operating -- is still absolutely necessary. That's not always going to happen in an hour.

View on HN · Topics

Yes, that's why I specified "patched product ready for QA testing". It speeds up the development cycle by making a first pass and ensuring it basically works before passing it to a developer for manual review and a QA tester to ensure the fix doesn't break anything else. Both dev and QA are still in the feedback loop and can make changes until it's ready for release

View on HN · Topics

Maybe it is about time for Linux to get a real CD/CI and start using AI extensively.

Not just for vulnerabilities, having a nice agents|skills|etc.md definitions would encourage new devs to contribute instead of dealing with an overworked maintener repeating the same thing for n time.

Summarizer