Back

-fbounds-safety: Enforcing bounds safety for C

86 points3 daysclang.llvm.org
jcalvinowens33 minutes ago

> As local variables are typically hidden from the ABI, this approach has a marginal impact on it.

I'm skeptical this is workable... it's pretty common in systems code to take the address of a local variable and pass it somewhere. Many event libraries implement waiting for an event that way: push a pointer to a futex on the stack to a global list, and block on it.

They address it explicitly later:

> Although simply modifying types of a local variable doesn’t normally impact the ABI, taking the address of such a modified type could create a pointer type that has an ABI mismatch

That breaks a lot of stuff.

The explicit annotations seem like they could have real value for libraries, especially since they can be ifdef'd away. But the general stack variable thing is going to break too much real world code.

tandr14 minutes ago

Niklaus Wirth died in 2024, and yet I hope he is having a major I-told-you-so moment about people blaming Pascal's bounds checking to be unneeded and making things slow.

clarabennett262 hours ago

The key issue is whether this can be gradually implemented in large codebases without needing a complete rewrite. ASan detects bounds violations during runtime, but its performance cost makes it unsuitable for production use. If -fbounds-safety can enforce checks while maintaining reasonable overhead and working seamlessly with unchecked translation units, it could significantly improve C safety in practice, far more than simply urging everyone to switch to Rust would.

hoyhoy1 hour ago

I looked at trying to implement -fbounds-safety and -Wunsafe-buffer on a reasonably large codebase (4,000 C and C++ files), and it's basically impossible.

You have to instrument every single file. It can be done in stages though. Just turn the flag on one-by-one for each file. The xnu kernel is _mostly_ instrumented with -fbounds-safety.

safercplusplus15 minutes ago

Plug: In theory you could auto-convert to a memory-safe subset of C++ as a build step. Auto-converted code would have some run-time overhead, but you can mark any performance-sensitive parts of the code to be exempt from conversion. And you get lifetime and type safety too. For full coverage, performance-sensitive parts of the code can be manually converted to the safe subset to minimize overhead. (Interfaces in extern C blocks remain unconverted by default to maintain ABI compatibility.)

[1]: https://duneroadrunner.github.io/scpp_articles/PoC_autotrans...

jimmaswell18 minutes ago

This sounds like the kind of low-thought pattern-based repetitive task where you could tell an LLM to do it and almost certainly expect a fully correct result (and for it to find some bugs along the way), especially if there's some test coverage for it to verify itself against. If you're skeptical, you could tell it to do it on some files you've already converted by hand and compare the results. This kind of thing was a slam dunk for an LLM even a year or two ago.

adrianN2 hours ago

There is GWPAsan that has lower overhead than asan but still is not super popular.

vlovich12314 minutes ago

Because it can only catch a subset of issues, it’s not guaranteed to catch issues (probabilistic), even issues it “could” catch may not be caught due to temporal distance of the free and a subsequent use, and requires the use of a different allocator that supports it. It’s also unclear to me how it know whether a given free is for a sampled or unsampled region - I suspect it must capture all free/realloc to accomplish that but it does imply all of these are sampled.

It’s nowhere near the same as robust bounds checking.

hoyhoy1 hour ago

ASAN/LSAN is amazing. It absolutely monkey-hammers performance though.

ndiddy3 hours ago

Has any progress been made on this? I remember seeing this proposal 3 or 4 years ago but it looks like it still hasn't been implemented. It's a shame because it seems like a useful feature. It looks like Microsoft has something similar (https://learn.microsoft.com/en-us/cpp/code-quality/understan...) but it would be nice to have something that worked on other platforms.

mrpippy11 minutes ago

Apple is shipping code built with this, and is supporting it for developers to use (see https://developer.apple.com/documentation/xcode/enabling-enh...)

groos1 hour ago

Microsoft's SAL annotations are meant to inform the static analyzer how the parameters are meant to be used so any violations of the contract can be diagnosed at compile time. The LLVM proposal is different in that it is checked at run time and will stop your program before it makes an out of bounds access. Static analyzers can obviously use the information in the type to help diagnose a subset of such problems at compile time.

Someone3 hours ago

https://discourse.llvm.org/t/the-preview-of-fbounds-safety-i...:

“-fbounds-safety is a language extension to enforce a strong bounds safety guarantee for C. Here is our original RFC.

We are thrilled to announce that the preview implementation of -fbounds-safety is publicly available at this fork of llvm-project. Please note that we are still actively working on incrementally open-sourcing this feature in the llvm.org/llvm-project . To date, we have landed only a small subset of our implementation, and the feature is not yet available for use there. However, the preview does contain the working feature. Here is a quick instruction on how to adopt it.”

“This fork” is https://github.com/swiftlang/llvm-project/tree/stable/202407..., Apple’s fork of LLVM. That branch is from a year ago.

I don’t know whether there’s a newer publicly available version.

There is a GSoC 2026 opportunity on upstreaming this into mainline LLVM (https://discourse.llvm.org/t/gsoc-2026-participating-in-upst...)

taminka3 hours ago

this is amazing, counter to what most ppl think, majority of memory bugs are from out of bounds access, not stuff like forgetting to free a pointer or some such

Night_Thastus1 hour ago

Personally, as someone in C and C++ for the last few years, memory access is almost never the root bug. It's almost always logic errors. Not accounting for all paths, not handling edge cases, not being able to handle certain combinations of user or file input, etc.

Occasionally an out-of-bounds access pops up, but they're generally so blindingly obvious and easy to fix that it's never been the slow part of bug fixing.

lelanthran1 hour ago

I've been programming for long; the ratio of memory errors to logic bugs in production is so low as to be non-existent.

My last memory error in C code in production was in 2018. Prior to that it I had a memory error in C code in production in 2007 or 2008.

In C++, I eventually gave up trying to ship the same level of quality and left the language altogether.

vlovich1238 minutes ago

The wider industry data gathered indicates that for memory unsafe languages 80% of issues are due to memory vulnerabilities, including mature codebases like Linux kernel, curl, V8, Chrome, Mach kernel, qemu etc etc etc. This doesn’t mean that logic bugs are less common, it just means that memory safety issues are the easiest way to get access.

As for why your experience may be different, my hunch is that either your code was super simple OR you didn’t test it thoroughly enough against malicious/unexpected inputs OR you never connected the code to untrusted I/O.

Keep in mind the data for this comes from popular projects that have enough attention to warrant active exploit research by a wide population. This is different from a project you wrote that doesn’t have the same level of attention.

taminka1 hour ago

logic errors aren't memory errors, unless you have some complex piece of logic for deallocating resources, which, yeah, is always tricky and should just generally be avoided

woodruffw1 hour ago

"Majority" could mean a few things; I wouldn't be surprised if the majority of discovered memory bugs are spatial, but I'd expect the majority of widely exploited memory bugs to be temporal (or pseudo-temporal, like type confusions).

Retr0id3 hours ago

I think UAFs are more common in mature software

q3k3 hours ago

Or type confusion bugs, or any other stuff that stems from complex logic having complex bugs.

Boundary checking for array indexing is table stakes.

michh3 hours ago

table stakes, but people still mess up on it constantly. The "yeah, but that's only a problem if you're an idiot" approach to this kind of thing hasn't served us very well so it's good to see something actually being done.

Trains shouldn't collide if the driver is correctly observing the signals, that's table stakes too. But rather than exclusively focussing on improving track to reduce derailments we also install train protection systems that automatically intervene when the driver does miss a signal. Cause that happens a lot more than a derailment. Even though "pay attention, see red signal? stop!" is conceptually super easy.

+1
q3k3 hours ago
random_mutex3 hours ago

There is use after free

eecc3 hours ago

Majority. Parent said majority

IshKebab28 minutes ago

Exactly. Use after free is common enough that you can't just assert that out-of-bounds is the majority without evidence.

hoyhoy2 hours ago

Xcode (AppleClang) has had -fbounds-safety for a while now. What is the delay getting this into merged into LLVM?

worldsavior3 hours ago

Very cool. I always wondered why there isn't something like this in GCC/LLVM, it would obviously solve uncountable of security issues.

manbash2 hours ago

Exciting! It doesn't imply that we should now sprinkle the new annotations everywhere. We still should keep working with proper iterators and robust data structures, and those would need to add such annotations.

musicale2 days ago

I want an OS distro where all C code is compiled this way.

OpenBSD maybe? or a fork of CheriBSD?

macOS clang has supported -fbounds-safety for a while, but I"m not sure how extensively it is used.

kgeist3 hours ago

Maybe this:

https://fil-c.org/pizlix

>Pizlix is LFS (Linux From Scratch) 12.2 with some added components, where userland is compiled with Fil-C. This means you get the most memory safe Linux-like OS currently available.

The author, @pizlonator, is active on HN.

hsaliak2 hours ago

https://github.com/hsaliak/filc-bazel-template i created this recently to make it super easy to get started with fil-c projects. If you find it daunting to get started with the setup in the core distribution and want a 3-4 step approach to building a fil-c enabled binary, then try this.

functionmouse3 hours ago

hot dang that's neato. shame about the name, though.

wyldfire5 hours ago

You need to annotate your program with indications of what variable tracks the size of the allocation. So, sure, but first work on the packages in the distro.

Note that corresponding checks for C++ library containers can be enabled without modifying the source. Google measured some very small overhead (< 0.5% IIRC) so they turned it on in production. But I'd expect an OS distro to be mostly C.

[1] https://libcxx.llvm.org/Hardening.html

bombcar5 hours ago

Get gentoo, add this to CFLAGS and start fixing everything that breaks. Become a hero.

pjmlp4 hours ago

It is called Solaris, and has this enabled since 2015 on SPARC.

https://docs.oracle.com/en/operating-systems/solaris/oracle-...

salawat26 minutes ago

Might as well not even talk about anything with the Oracular kiss of death.

1over1375 hours ago

>I want an OS distro where all C code is compiled this way.

You first have to modify "all C code". It's not just a set and forget compiler flag.

prussian4 hours ago

Fedora and its kernels are built with GCC's _FORTIFY_SOURCE and I've seen modules crash for out of bounds reads.

dezgeg4 hours ago

_FORTIFY_SOURCE is way smaller in scope (as in, closes less vulnerabilities) than -fbounds-safety.

groundzeros20153 hours ago

What are you hoping it will achieve?

irishcoffee3 hours ago

The internet went down because cloudflare used a bad config... a config parsed by a rust app.

One of these days the witch hunt against C will go away.

hypeatei3 hours ago

The internet didn't go down and you're mischaracterizing it as a parsing issue when the list would've exceeded memory allocation limits. They didn't hardcode a fallback config for that case. What memory safety promise did Rust fail there exactly?

groundzeros20153 hours ago

I think the point is memory bugs are only one (small) subset of bugs.

random_mutex3 hours ago

A panic in Rust is easier to diagnose and fix than some error or grabage data that was caused by an out of bounds access in some random place in the call stack

wat100002 hours ago

A service going down is a million times better than being exploited by an attacker. If this is a witch hunt then C is an actual witch.

pezgrande4 hours ago

does any distro uses clang? I thought all linux kernels were compiled using gcc.

yjftsjthsd-h25 minutes ago

https://www.kernel.org/doc/html/latest/kbuild/llvm.html

> The Linux kernel has always traditionally been compiled with GNU toolchains such as GCC and binutils. Ongoing work has allowed for Clang and LLVM utilities to be used as viable substitutes. Distributions such as Android, ChromeOS, OpenMandriva, and Chimera Linux use Clang built kernels. Google’s and Meta’s datacenter fleets also run kernels built with Clang.

honktime3 hours ago

Chimera does, it also has a FreeBSD userland AFAIU.

https://chimera-linux.org/

zmodem4 hours ago

Not a Linux distro, but FreeBSD uses Clang.

And Android uses Clang for its Linux kernel.

-fbounds-safety is not yet available in upstream Clang though:

> NOTE: This is a design document and the feature is not available for users yet.

nananana94 hours ago

  template <typename T>
  struct Slice {
      T* data = nullptr;
      size_t size = nullptr;

      T& operator[](size_t index) {
        if (index >= size) crash_the_program();
        return data[index];
      }
  };

If you're considering this extension, just use C++ and 5 lines of standard, portable, no-weird-annotations code instead.
uecker4 hours ago

Or just do it in C.

  #define span(T) struct span_##T { size_t len; T *data; }
  #define span_access(T, x, i) (*({              \
    span(T) *_v = (x);                           \
    auto _i = (i);                               \
    if (((size_t)_i) >= _v->len) abort();        \
    &_v->data[_i];                               \
  }))
https://godbolt.org/z/TvxseshGc
nananana93 hours ago

Still requires a gcc/clang specific extension (although this one I'd be very happy to see standardized)

uecker38 minutes ago

Only statement expressions, but one can also implement this without them.

fuhsnn2 hours ago

The fact that pointer types can't be used with this pattern without typedef still seems kinda primitive to me.

uecker36 minutes ago

You can use pointer types by using a typedef first, but I agree this not nice (I hope we will fix this in future C). But then, I think this is a minor inconvenience for having an otherwise working span type in C.

zmodem4 hours ago

The extension is for hardening legacy C code without breaking ABI.

pjmlp4 hours ago

Even better, starting with C++26, and considered to be done with DR for previous versions, hardned runtimes now have a portable way to be configured across compilers, instead of each having their own approach.

However, you still need something like -fbounds-safety in C++, due to the copy-paste compatibility with C, and too many people writing Orthodox C++, C with Classes, Better C, kind of code, that we cannot get rid of.

nananana94 hours ago

I'm sure std::span is great, but I like mine better :)

I find it a bit hard to justify using the STL when a single <unordered_map> include costs 250ms compile time per compile unit.

The fact that I don't have to step through this in the debugger is also a bonus:

  template <size_t _Offset, size_t _Count = dynamic_extent>
  [[nodiscard]] _LIBCPP_HIDE_FROM_ABI constexpr auto subspan() const noexcept
      -> span<element_type, _Count != dynamic_extent ? _Count : _Extent - _Offset> {
    static_assert(_Offset <= _Extent, "span<T, N>::subspan<Offset, Count>(): Offset out of range");
    static_assert(_Count == dynamic_extent || _Count <= _Extent - _Offset,
                  "span<T, N>::subspan<Offset, Count>(): Offset + Count out of range");

    using _ReturnType = span<element_type, _Count != dynamic_extent ? _Count : _Extent - _Offset>;
    return _ReturnType{data() + _Offset, _Count == dynamic_extent ? size() - _Offset : _Count};
  }
pjmlp4 hours ago

Only if not able to do import std, or pre-compiled headers, and not using modern IDEs with "just my code" filters.

As someone that enjoys C++ since 1993, alongside other ecosystems, many pain points on using C++ complaints are self inflicted, by avoiding using modern tools.

Heck, C++ had nice .NET and Java alike frameworks, with bounds checking even, before those two systems came to exist, and nowadays all those frameworks are mostly gone with exception of Qt and C++ Builder ones, due to bias.

kitsune13 hours ago

[dead]

osmsucks1 hour ago

  size_t size = nullptr;
wat
wat100004 hours ago

You should tell the LLVM folks, I guess they didn't know about this.

baq4 hours ago

and if you write directly in assembly you don't even need a C++ compiler

nananana94 hours ago

That's an objectively correct statement, but I don't see how it makes sense as a response to my comment, as I'm advocating to use the more advanced feature-rich tool over the compiler-specific-hacks one.

yjftsjthsd-h20 minutes ago

If you're advocating switching languages, then there's no reason to stop at C++. It's more common to propose just converting the universe to Rust, but assembly also enjoys the possibility of being fairly easy to drop in on an existing C project.

zephen3 hours ago

> I don't see how it makes sense as a response to my comment

Your comment started out with "just."

As if there are never any compelling reasons to want to make existing C code better.

But instead of taking that as an opportunity to reflect on when various tools might be appropriate,

> as I'm advocating to use the more advanced feature-rich tool over the compiler-specific-hacks one.

You've simply doubled down.

cranberryturkey3 hours ago

The real question is adoption friction. The annotation requirement means this won't just slot into existing codebases — someone has to go through and mark up every buffer relationship. Google turning on libcxx hardening in production with <0.5% overhead is compelling precisely because it required zero source changes.

The incremental path matters more than the theoretical coverage. I'd love to see benchmarks on a real project — how many annotations per KLOC, and what % of OOB bugs it actually catches in practice vs. what ASAN already finds in CI.

nimbus-hn-test3 hours ago

[dead]