Landlock-Ing Linux

adeon • 2 months ago

I've used Landlock to detect and stop unwanted telemetry. I wrote some C that stopped networking except to accept connections on a single port, no outgoing connections and no accepting connections on any other port.

`dmesg` shows the connections it blocks (I think this is maybe the audit feature). I used an example sandboxer.c as a base (https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux...) except I just set mine up to not touch file restricting, just networking so that it has that one whitelisted incoming port.

    ./network-sandboxer-tool 8000 some-program arg1 arg2 etc.

I like it because it just works as an unprivileged usermode program without setting anything up. A tiny C program. It works inside containers without having to set up any firewalls. Aside from having to compile a small C program, there is little fuss. I found the whole Landlock thing trying to find out alternatives to bubblewrap because I couldn't figure out how to do the same thing in bwrap conveniently.

The "unprivileged" in "Landlock: unprivileged access control" for me was the selling point for this use case.

I don't consider this effective against actively adversarial programs though.

ranger_danger • 2 months ago

Would you mind sharing your source code?

adeon • 2 months ago

I dumped it here just now: https://github.com/Noeda/landlock-network-sandboxer-tool

It hopefully will be obvious that nobody should expect quality :) it is like a simplified version of the sandboxer sample in my other comment. E.g. it maybe does not need to touch filesystem stuff at all.

I'd also look at some of the sibling comments for maybe more refined tooling than this thing. Maybe it's useful as a sample though.

ranger_danger • 2 months ago

thanks!

dannyfritz07 • 2 months ago

I've been messing with sandboxing using "bwrap" for random itch.io games I download to play and it isn't trivial to get it working with least privileges. I have so far been unable to get "Microlandia" to run, but other Unity games are running just fine under "bwrap". I am excited to see more Landlock tools emerge that make this task easier.

- https://github.com/containers/bubblewrap

- https://codeberg.org/dannyfritz/dotfiles/src/commit/38343008...

- https://explodi.itch.io/microlandia

webstrand • 2 months ago

I was just playing with bwrap for isolating npm project actions from the rest of my system.

    bwrap --unshare-pid --dev-bind / / --tmpfs /home --bind "$(pwd)" "$(pwd)" bash

it seems to work fairly well? But I just started playing with bwrap this weekend. I do wish bwrap could be told "put the program in this pre-prepared network namespace" because accessing unsecured local dev servers could also be an issue.

tommica • 2 months ago

I had this idea of having toolbox+custom user for each project - that way it would be "simple" to have isolated environments, but it does lead to a lot of bloat. And I do think it is a naive solution.

Bwrap seems like a better option.

jeroenhd • 2 months ago

I think a combination of custom users + a whole bunch of sandboxing is exactly what you'd get out of systemd-nspawn if you're willing to write the config: https://wiki.archlinux.org/title/Systemd-nspawn

bwrap seems a lot easier but if you want more control (or, for instance, want to run a Ubuntu basis because that's what a lot of games are compiled against), systemd-nspawn can be quite powerful.

bflesch • 2 months ago

thats how android does it. every app is different user.

tommica • 2 months ago

Oh really? That's a surprise

tux3 • 2 months ago

What's the status of Landlock in container runtimes? A quick search makes it seem like CRIs are trying to define their own custom Landlock interface.

That will inevitably lag behind what the kernel supports, but more importantly I don't foresee many container image packagers, Helm recipe maintainers and other YAML wranglers getting into the business of maintaining a Landlock sandbox policy.

It makes sense for an application to use Landlock directly to sandbox some parser or other sensitive component. But if the CRI just blocks the syscalls by default, no infra person is going to take on the maintainance of their own sandbox policy for every app. The app will just see ENOSYS and not be sandboxed.

I might be missing the whole idea here, but I really don't see why we need some custom layer in the middle instead of having container runtimes let the security syscalls through?

codethief • 2 months ago

> A quick search makes it seem like CRIs are trying to define their own custom Landlock interface.

Are you referring to [0, 1]?

> But if the CRI just blocks the syscalls by default

Does it? Where are you getting this from?

> I might be missing the whole idea here, but I really don't see why we need some custom layer in the middle instead of having container runtimes let the security syscalls through?

Because in the latter case you have to trust the application it will actually do the appropriate locking?

[0]: https://github.com/opencontainers/runc/issues/2859

[1]: https://github.com/opencontainers/runtime-spec/issues/1110

ameliaquining • 2 months ago

I don't think this is really intended for container runtimes. You might be able to make it work in a square-peg-round-hole sort of way but the core use case is different.

als0 • 2 months ago

If the application in the container wants to add more restrictive rules then it should be allowed to. But it should not be able to mess with the existing rules imposed by the container manager. This would be the ideal outcome.

arianvanp • 2 months ago

There is nothing to do here. Landlock already a guarantees that you can't undo rules that were already applied. Your application can further restrict itself but it can't unrestrict itself.

als0 • 2 months ago

Just need the container manager to not block the landlock system call

smartmic • 2 months ago

I am puzzled by this:

> A official c library doesn’t exist yet unfortunately, but there’s several out there you can try.

> Landlock is a Linux Security Module (LSM) available since Linux 5.13

Since when is not a C API the first and foremost interface for developers when it comes to Linux kernel stuff?

muvlon • 2 months ago

The first and foremost interface of the kernel is the syscall interface aka the uapi. libc and other C libraries like liburing or libcap are downstream of that. Many syscalls still don't have wrappers in libc after years of use.

samus • 2 months ago

Yet for many syscalls there is an official library - in most cases a wrapper in libc, but especially io_uring is known to provide a C library that most applications ought to use instead of the raw syscalls.

ijustlovemath • 2 months ago

Is io_uring not itself a set of syscalls?

projektfu • 2 months ago

https://github.com/axboe/liburing

"This is the io_uring library, liburing. liburing provides helpers to setup and teardown io_uring instances, and also a simplified interface for applications that don't need (or want) to deal with the full kernel side implementation."

samus • 2 months ago

Yes, it is. But it's rather complicated and not all applications need its full power.

https://lwn.net/Articles/810414/

smartmic • 2 months ago

Thanks for clarification! I meant more, why isn't there a C API first, but Rust, Haskell, and Go before that — that's kind of surprising or new to me.

wging • 2 months ago

I read the article as saying that there's no official C library but unofficial ones do exist. Quote below, emphasis mine.

> A official c library doesn’t exist yet unfortunately, but there’s several out there you can try.

Also, it looks like there is more than zero support for C programs calling Landlock APIs. Even without a 3rd-party library you're not just calling syscall() with a magic number:

https://docs.kernel.org/userspace-api/landlock.html

https://github.com/torvalds/linux/blob/6bda50f4/include/uapi...

https://github.com/torvalds/linux/blob/6bda50f4/include/linu...

WJW • 2 months ago

I don't understand what you mean. There's no "official" Rust, Haskell and Go APIs for this thing either. All libraries available seem to be just what some third party made available. There's also several C libraries, just none that have been officially endorsed by the Linux kernel team.

nine_k • 2 months ago

Go is famous for not needing libc and talking to the kernel. Rust and Haskell have communities that are very interested in safety and security, so they are earlier adopters.

For C, unofficial support apparently sufficed for now.

moffkalast • 2 months ago

What do you call syscalls in then? Assembly?

wyldfire • 2 months ago

It's pretty subtle but it's referring to The C Library, libc.{a,so,dll,etc}. The library provided by your toolchain that supports the language.

Meaning glibc or musl or your favorite C library probably doesn't have this yet, but since the system calls are well defined you can use A C library (create your own header file using the _syscallN macro for example).

Levitating • 2 months ago

> Since when is not a C API the first and foremost interface for developers when it comes to Linux kernel stuff?

Since the kernel developers don't make userland software?

jeroenhd • 2 months ago

The lack of a C API should not stop any C developers from using it, hopefully. The wrapper libraries are relatively simple (i.e. https://codeberg.org/git-bruh/landbox) and both Rust and Go can expose a C FFI in case developers would rather link against a more "official" library instead.

The Linux kernel has a relatively simple example on how to use the syscall even if you can't find a library to deal with it for you: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

There is no "liblandlock" or whatever, though there totally could be. The only reason Rust, Go, and Haskell have an easy-to-use API for this syscall is because someone bothered to implement a wrapper and publish it to the usual package managers. Whatever procedure distros use to add new libraries could just as easily be used to push a landlock library/header to use it in C.

eikenberry • 2 months ago

Because C is not the primary interface language for kernel syscalls. There is no language specific primary interface as the syscall is the primary interface and it is language agnostic. This is one of Linux's great strengths, a stable syscall API that doesn't rely on a system library.

megous • 2 months ago

There is a C API. man landlock_add_rule for example.

https://github.com/torvalds/linux/blob/master/include/uapi/l...

you can add simple one-line wrappers if you don't like using syscall() function.

habbekrats • 2 months ago

https://man7.org/linux/man-pages/man7/landlock.7.html most is rly there.

mulle_nat • 2 months ago

I use this and have no problems with it: https://github.com/marty1885/landlock-unveil

fiiin • 2 months ago

Interesting they added new syscalls, instead of handling configuration via /sys like with SELinux and AppArmor. I suppose it must be because of the no privilege principle.

Can sysadmins disable access to Landlock syscalls via seccomp? Not that I can see why they'd want to, just wondering how this is layered.

I suppose the problem might be if the system has been set up at some earlier point in time to whitelist a set of syscalls for a process and, as Landlock is newer, its syscall numbers won't be included. A program that has been updated to use Landlock while a seccomp policy that predates Landlock is applied would presumably be terminated with SIGSYS due to this?

How can a program determine if Landlock is present without just trying the syscalls and seeing if they work?

cyphar • 2 months ago

Other LSMs are slowly switching to syscalls too, and while I in principle like (and have abused) the whole "everything is a file" principle, most security mechanisms really should be done via special-purpose syscalls. Way too many footguns with filesystem-based APIs. Also, you wouldn't be able to use Landlock to restrict filesystem access based on dirfds with a filesystem-based API.

The questions you have about seccomp depend on the rules. Well-written filters would return -ENOSYS in that case, so it would look to the program as though the syscall is unsupported.

razighter777 • 2 months ago

You can restrict the landlock syscalls with seccomp.

I also don't think doing so is extraordinarily useful.

If you allow something in landlock, it's still subject to traditional DAC and other restrictions because its a stackable LSM. It can only restrict existing access, not allow new accesses.

ijustlovemath • 2 months ago

Medical device developer here: this is precisely the kind of work we need in highly regulated industries. We use an internal version of something with a similar API to manage our critical threads/processes. Keep it up!

bboozzoo • 2 months ago

It's not all roses unfortunately. See discussions https://github.com/landlock-lsm/linux/issues/28 and https://lore.kernel.org/all/CAG48ez1O0VTwEiRd3KqexoF78WR+cmP...

Even the example code builds a somewhat questionable 'sandbox' that hits a problem discussed in those threads. Say we're ok with an app having r-w access to home except for a couple of places such as ~/.ssh. Now you could try to add a rule to exclude access to ~/.ssh, but the security object must exist when the policy is being established (the rules refer to directories by fds). As such, no .ssh directory, means not rules denying access. You start a sandboxed app thinking you've set up a tight sandbox, at some point ~/.ssh gets created, and now the untrusted app can read your ssh keys.

razighter777 • 2 months ago

Valid concern. Maybe this can be addressed with a patch...

Seems solvable by perhaps storing paths that don't exist yet on the filesystem in landlock's red black tree.

Workaround might be creating .ssh ahead of time

roer • 2 months ago

If you're trying to deny internet access to a program, beware that landlock only restricts tcp sockets. Programs are free to setup udp or just raw sockets.

parlortricks • 2 months ago

Well that seems like a major oversight there...what is the reasoning for that?

arianvanp • 2 months ago

It's just incomplete and very early days for landlock.

Landlock requires you to commit upfront to what is "deny-default"ed but they only added a control for TCP socket bind and nothing else. So you can "default-deny" tcp bind but all the other socket paths in the kernel are not guarded by landlock. It tries really hard to have the commit of features be an integral part of the landlock API so that you can have an application able to run on multiple kernel versions that support different parts of the landlock spec. But that means that as they develop the API the older versions of landlock need to be less restrictive than newer versions otherwise programs dont work across kernel versions.

That way, a program that is very restrictive on say kernel 6.30 can also run on kernel 6.1 with less restrictions. The program keeps functioning the same way (never break userspace). The only way to do that is to have the developer tell what parts need to be restricted explicitly and you can't restrict what isn't implemented yet.

They're planning to extend it to all socket types. This is also mentioned in the linked article https://github.com/landlock-lsm/linux/issues/6

I guess if you want to run without networking at all today you can just unshare into a fresh network namespace, or maybe use seccomp strict mode

razighter777 • 2 months ago

There's always a lot of caution and review that goes into a new syscall feature, because once you add a feature, there's no takebacks. All the libraries downstream from landlock rely on the kernel API being good.

There is an ongoing patch series for udp and another one for general socket control.

You can read about it on the linux-security-module mailing list.

Basically UDP is harder to hook into because it's a connectionless protocol. So bind and connect don't really work the same way.

https://lore.kernel.org/all/20241214184540.3835222-1-matthie...

https://lore.kernel.org/linux-security-module/20251118134639...

rini17 • 2 months ago

They can be disabled by firewall, iptables can match outgoing sockets by owner uid. I know it's not the same thing as landlock, still can come in handy.

And raw sockets require elevated privileges anyway iirc.

thayne • 2 months ago

Well you need root access, or at least the CAP_NET_RAW capability to use raw sockets. UDP seems pretty bad though.

ape4 • 2 months ago

Pretty big loophole!

habbekrats • 2 months ago

oof! thats terrible... :/ good to know..... what a weird restriction.

ranger_danger • 2 months ago

I think it's only "weird" if you don't understand why it is the case... adding UDP/raw socket support is much more difficult, and waiting to get that implemented would have much larger downsides for the project as a whole to gain any traction in the meantime.

sofixa • 2 months ago

For a cool practical example, check out Nomad's (flexible workload orchestrator) exec2 task driver: https://github.com/hashicorp/nomad-driver-exec2

It allows running non/semi-trusted workloads with isolation. Pretty useful to onboard applications into a proper scheduler with all bells and whistles without having to containerise, but still with decent levels of isolation between them.

jcgl • 2 months ago

I switched away from Nomad when HashiCorp moved from FOSS licenses to the BSL. But man, I do miss its simplicity.

pdimitar • 2 months ago

I'm very slowly taking an interest in Linux security as I'm starting to disentangle from my Mac and preparing to get a Linux workstation and make it my forever home for personal and work computing. So I'm very new to all this.

My questions are:

- How does this help with malware? I want to craft an environment where any program trying to read f.ex. anything inside ~/.ssh is automatically denied. I don't want a malicious build script to exfiltrate all my sensitive data!

- It seems that this software is well-positioned for us to write application launchers with, is that true? If so, well, I like the idea but it seems too manual.

Maybe I'm looking at the wrong thing. I strongly prefer deny-by-default in an invisible manner i.e. my system to refuse most requests to access this or that. Not opting in to it. Bad actors will not graciously limit their own program with Landlock. They'll try to get anything before I can even blink my eyes.

I feel I'm missing crucially important context. Can somebody help?

ameliaquining • 2 months ago

The threat model here is not malware, but code-execution vulnerabilities in legitimate apps. If you're developing an application, you might use this API to deny yourself privileges that you know you won't need, so that if an attacker finds a code-execution vulnerability in your app, they can't use it to take over the user's machine.

It is not a suitable technology for sandboxing a program that wasn't designed to be sandboxed in this way. For that, you need one of the other technologies listed in the article.

tremon • 2 months ago

I want to craft an environment where any program trying to read f.ex. anything inside ~/.ssh is automatically denied

That requires a MAC security model like apparmor [0] or selinux [1]. Those can deny filesystem access based on process environment data, such as the executable path or its security context. But these require the access rules to be enumerated externally, whereas landlock is about an application voluntarily limiting itself -- or limiting its children: e.g. it would be a very good idea for npm to restrict the scope of package post-install scripts to only the npm cache/build tree.

It seems that this software is well-positioned for us to write application launchers

Like OpenBSD's pledge[2], this API is primarily meant for application writers, not launchers. But where the Openbsd base system is maintained as a whole by the same group of people, Linux is a hodgepodge of different distributions using various software to construct a complete system. This means it's going to take a long while before landlock will reach anywhere close to the same coverage that pledge already has in OpenBSD. In the meantime, wrappers/launchers is the best that you can do on Linux.

[0] https://en.opensuse.org/SDB:AppArmor_geeks#Anatomy_of_a_prof...

[1] https://manpages.opensuse.org/Tumbleweed/selinux-policy-doc/...

[2] https://unix.stackexchange.com/a/411157

pdimitar • 2 months ago

Thank you, that was super useful.

When I said launchers, I meant more like f.ex. my own bespoke mini Golang program that lists rules inside its via Landlock's DSL and then launches f.ex. Firefox.

But maybe that's unnecessary because I hear that Flatpak has a big database of such rules already. I'll find out in the future.

Dig1t • 2 months ago

Mac and iOS have something that is almost exactly the same as this called sandboxing. When a daemon or app starts one of the first things it does (usually right inside of “main”) is enable the sandbox and declare which resources to whitelist, everything else is denied.

It is only useful for guarding your own process against someone using malicious inputs to get your process to do something you don’t intend. It is not a guard against programs written by malicious actors (malware), there exist other mechanisms to guard against malware.

fragmede • 2 months ago

Linux has selinux and apparmor already.

staticassertion • 2 months ago

SELinux and Apparmor are typically configured by admins. They require root privileges and are designed with human interfaces. It is certainly atypical for a program to say "hey kernel, apply this apparmor profile to me" and they're not designed for incrementally dropping rights either.

On Windows and MacOS programs are free to sandbox themselves programmatically and without privileges. Linux is the odd one out, basically every way of reducing your privileges programmatically requires already being root or at least having an admin preconfigure the system in a way that would allow it.

baq • 2 months ago

Which both are so hard to get correctly that everyone on the desktop disables them. Ergonomics matter.

preisschild • 2 months ago

Thats not true. Fedora has SELinux enabled by default and I dont have issues with it.

staticassertion • 2 months ago

> - How does this help with malware? I want to craft an environment where any program trying to read f.ex. anything inside ~/.ssh is automatically denied. I don't want a malicious build script to exfiltrate all my sensitive data!

Your package manager would specify a policy that only allows specific access by build scripts. Or you'd use a wrapper.

> - It seems that this software is well-positioned for us to write application launchers with, is that true? If so, well, I like the idea but it seems too manual.

It could be. It's for anyone who knows what their program does, basically.

habbekrats • 2 months ago

its kind of funny to say: "A official c library doesn’t exist yet unfortunately, but there’s several out there you can try." if its litterally in the standard library...

https://man7.org/linux/man-pages/man7/landlock.7.html

But i suppose i am missing somehting then people would like...

What would you want an library to do here? abstract over it to make it easier? (relatively simple api already)

legit question, not trying to poke anyone here.. trying to find out what ppl expect from libraries which wrap around these syscalls or stdlib things.

jeroenhd • 2 months ago

Actually providing a method rather than documenting the syscall would be a good start. libc patches over a lot of syscall requirements and side effects, as well as keeping track of the individual syscall numbers for you.

I'm kind of surprised glibc doesn't provide a normal interface yet, but I suppose it has to do with non-Linux compatibility?

cyphar • 2 months ago

glibc has been reticent about adding new syscall wrappers for a few years. The situation did improve for a bit recently (and they added something like 5 years of syscalls from their backlog in the past few years) but I'm not surprised it's taking some time.

Thankfully we have had unified syscall numbers on Linux (for almost all architectures) for the past few years so tracking them is less painful than it used it be.

kosolam • 2 months ago

So it works also by using some cli utility to run my software for example?

razighter777 • 2 months ago

Yup. There are tools that use landlock to accomplish just that.

https://github.com/Zouuup/landrun

All you gotta do is apply a policy and do a fork() exec(). There is also support in firejail.

seethishat • 2 months ago

Firejail requires SUID, LandLock does not.

Also, it's very easy to write your own LandLock policy in the programming language of your choice and wrap whatever program you like rather than downloading stuff from Github. Here's another example in Go:

    package main

    import (
     "fmt"
     "github.com/landlock-lsm/go-landlock/landlock"
     "log"
     "os"
     "os/exec"
    )

    func main() {
        // Define the LandLock policy
        err := landlock.V1.RestrictPaths(...)

        // Execute FireFox
        cmd := exec.Command("/usr/bin/firefox")
    }

butvacuum • 2 months ago

Isn't this example just "downloading stuff from GitHub,"(the external Go dependency) but with extra steps? (Having to write and compile a golang app)

pdimitar • 2 months ago

So you're basically writing a program launcher? In this case this program is what you'd want to have a desktop shortcut to and not to Firefox itself, is that it?

codethief • 2 months ago

Yeah, see e.g. sydbox: https://gitlab.exherbo.org/sydbox/sydbox

PeterWhittaker • 2 months ago

So like using seccomp with a whitelist (fairly easy to do) with per-object access rights.

I'd love to see a comparison of landlock to restricted containers.

staticassertion • 2 months ago

> I'd love to see a comparison of landlock to restricted containers.

One thing to consider is that containers virtualize. You enter new "namespaces" where you aren't necessarily restricted within that namespace, but the namespace as a whole is sort of your own playground. So a PID namespace only allows you to see other processes within that namespace.

This is very distinct from a resource oriented approach like landlock. Landlock may allow you to say "you can do certain actions to certain processes" but you wouldn't get the same semantics as "I can only see specific processes to begin with". They would layer nicely.

Similarly, containers provide virtualized file systems. A write happens in a container and it's allowed, but the write is isolated from the host. Landlock would instead allow or deny that write.

They go very well together.

razighter777 • 2 months ago

Comparing landlock to containers isn't really an apples to apples comparison. Containers use a bunch of linux security mechanisms together like chroot seccomp and user namespaces to accomplish their goals. Landlock is just another building block that devs can use.

Fun fact: because landlock is unprivleged, you can even use it inside containers; or to build an unprivileged container runtime :)

vaylian • 2 months ago

seccomp is for restricting syscalls to the kernel. But because "everything is a file" on UNIX systems, you can do a lot of good and bad things just with `open`, `openat`, `read` and `write`.

PeterWhittaker • 2 months ago

Of course, but you can also restrict those operations. The seccomp whitelist library I wrote only sealed itself after all FDs were opened for specific operations, and the API didn't expose the calls directly. Once sealed, the app got only those operations now specifically allowed.

yalogin • 2 months ago

As a noob in this space, why is this needed when every job already runs inside a VM or a container? Again, a noob so please bear with me

woodruffw • 2 months ago

I think it's a reasonable question. The answer is that not everything does indeed run in a VM or a container: lots of things (notably on developer machines) run directly in a host user context, where they have access to all kinds of global state that they don't really need (developer credentials, browser state, etc.).

But also: even within a container (which isn't itself a sandbox) or a VM, you still have concentric circles of trust and/or privilege. If you're installing arbitrary dependencies from the Internet, for example, you probably want a basic initial defense of preventing those dependencies from exfiltrating your secrets at build time.

torton • 2 months ago

On your desktop/laptop, most tasks probably don't run inside VMs or containers. Perhaps some applications use Flatpak or snaps or similar, but the default state for many currently popular Linux distributions is "no sandboxing of any kind".

Linux holds on to a negligible share of the overall desktop market OS, but it is marginally more popular among tech savvy people, which have plenty of disposable income, meaning the platform has steadily growing interest for malware authors and distributors despite its relatively low usage.

myaccountonhn • 2 months ago

Its a way for legitimate apps to add an extra protection layer to protect the system from bad inputs or compromised dependencies, and it's very easy to use (see https://github.com/landlock-lsm/go-landlock). As an app developer it's so easy to add landlock to your app.

Another benefit is that it makes it easier for fine-grain control of resources in the application lifecycle. Maybe on initialization the app needs credentials to fetch some data and later on the all doesnt need them. Landlock allows the app to remove its own access to those credentials.

staticassertion • 2 months ago

One of the most annoying parts of being in a container is that you can't sandbox yourself further within that container. Normal approaches like namespaces, mounts, chroot, etc, are all incompatible with running in a container. Therefor, if you want to go further than what a container provides, landlock is a powerful solution.

Further, while "whole process" sandboxing like containerizing is very effective under some conditions, having more fine grained access and the ability to reduce permissions over time is incredible.

Consider that I may need to open a file in my program. The file path will be provided by an env var `CONFIG_PATH`. My program now has to have total file system read permissions if it is going to support reading arbitrary configuration file paths, even though it only has to read one file.

I can instead set my program up to read that file one time and then never again, or I can set things up to only ever need to read that single file and no others, etc. I can incrementally reduce permissions, and that's really cool. You can't do that with a container - containers get what they get.

chuckadams • 2 months ago

both cgroups and namespaces are hierarichal, so you certainly can subdivide the sandbox. That is, if you're a decent C programmer and can navigate some dense kernel documentation. You can also run Docker in Docker, but it requires a privileged root container, and even the creator of that feature suggests just bind-mounting the docker socket instead.

I have a nagging feeling Plan9 probably had a solution for all this 30 years ago.

staticassertion • 2 months ago

> both cgroups and namespaces are hierarichal, so you certainly can subdivide the sandbox.

This is true, you can enter a namespace while in another namespace, but it's a privileged operation to namespace.

Docker in Docker does use socket bind mounting already afaik, and it's a trivial privesc because docker runs as root and the ability to talk to the socket means you can run `docker run --privileged --user root image_name -it bash` and get a shell as the host root user.

The solution is to allow unprivileged users to drop privileges, which is how MacOS and Windows work. On Windows you have integrity levels, tokens, etc, all of which you can drop without privileges. On MacOS you have seatbelt.

Linux almost had this with unprivileged user namespaces but that's not viable because 30 years of "root -> kernel privesc isn't a security issue" attitude proved to be problematic.

chuckadams • 2 months ago

zie • 2 months ago

Containers are NOT security wrappers. They are convenience to avoid dependency hell from lazy people.

VM's can be security wrappers, but if you expose all of $HOME to a VM, then there really isn't much security happening, in terms of your data.

This lets developers of applications harden themselves, it doesn't require the end-user to do anything(like put it in a VM).

zbentley • 2 months ago

The opposite is true. Containwrization systems were built into operating systems as security features. The whole “Linux packaging is a hellscape of self-induced problems, so let’s duct tape a squashfs onto the side of this new security isolation system and call it a deployment primitive” use case we now call “containers” came later and is a fairly inelegant and wasteful way to avoid needing to solve the packaging hellscape problem. It’s valuable to many! But definitely is the square peg to the round hole (security isolation layer) of setns and chroot and friends.

zie • 2 months ago

You can make containers mostly as hardened security wise as a VM (but generally none of that comes by default), the big thing you can't get that a VM gives you is a new kernel instance. In a VM you have to break 2 kernels to totally own a machine.

In a container, provided the container software doesn't do it for you(which is likely true), you just have to break 1 kernel.

zie • 2 months ago

OH I think I finally understand your comment... I think you are confusing FreeBSD jails/Solaris Zones with containers. They are not the same things. Containers for instance exist on FreeBSD now, as a totally different thing than FreeBSD Jails.

Jails/Zones are definitely security features. That's not the case for Containers(popularized via Docker).

ryandrake • 2 months ago

I always thought containers were how lazy developers solved the "I dunno, it works on my machine" problem: By shipping their entire machine.

zbentley • 2 months ago

Not the case; there's a fascinating history here.

The technologies that enabled containerization (namespaces, chroot, and cgroups, and their predecessors on BSD/Solaris) were created specifically for security and resource isolation.

The people who came up with "containers" as we know them today found a clever hack: combining those security-oriented tools with a filesystem-in-a-box and packaging system allowed people to package entire OS userlands and run them pretty deterministically in multiple places. The security isolation properties of namespaces/cgroups/chroot also happened to provide increased determinism.

And I'm not criticizing that; containers are a very clever hack that solved a problem a lot of people have. I use them every day.

That said, the fact that containers became so ubiquitous in the first place speaks a completely self-induced problem that we didn't need to have in the software engineering community. That problem is, unfortunately, human/incentive-related in nature, so containers are probably the best we're going to get--problem is, they're not that good.

I complained about the root problems here awhile ago, easier to link than rehash that here: https://news.ycombinator.com/item?id=44069483

Drew deVault also explained it much more thoroughly and better than I could: https://drewdevault.com/2021/09/27/Let-distros-do-their-job....

fragmede • 2 months ago

> It provides a simple, developer-friendly way to add defense-in-depth to applications.

Defense in depth. Lock your valuables inside a safe, inside of your locked house. Why lock them in a safe when your house is already locked? Because if someone breaks into your house, you want additional defense "just in case". So just in case I wrote some shitty code and my server got hacked, lock the valuables in a safe anyway so that thief can't steal the expensive silverware (prod credentials).

yalogin • 2 months ago

Aren’t there existing methods to do this using selinux or apparmor?

zie • 2 months ago

Yes, but basically nobody uses either of those things. Some vendors like Redhat enables some of it by default, but when people have issues getting software to work, the first thing they are told to try is to turn all that stuff off.

Which means in the real world, the likelihood of that stuff being on and secure is fairly low, but not zero.

With landlock, pledge/unveil and similar tech, the developers of the software write and configure it, it's on by default and probably can't be turned off(or at least not easily).

staticassertion • 2 months ago

You need to be root to set those up. These are typically admin-driven policies, not dev-driven. Landlock is unprivileged, meaning that a program can set its own policy up without root.

This is massive since most ways of dropping privileges on Linux require already having significant permissions (ie: root).

loeg • 2 months ago

Both have shortcomings.

razighter777 • 2 months ago

Landlock isn't really an alternative to containers. You can use it as another layer of security, within or outside a container.

It could even be paired with a chroot to make a container runtime. It's more like a building block for process restrictions

notatoad • 2 months ago

i have zero experience with linux system programming so i'm probably missing something, but what's the point of an application restricting itself at runtime? if the application were compromised in some way, wouldn't it simply un-restrict itself?

zanchey • 2 months ago

LWN's article on unveil() is a good explanation - the restrictions are permanently applied to the process and its children until termination: https://lwn.net/Articles/767137/

razighter777 • 2 months ago

The kernel enforces that once the policy gets added it can't be removed.

So the restrictions are permanent for the life of the program. Even root can't undo them.

cortesoft • 2 months ago

Since it can’t re-enable privileges during runtime, the compromise would have to modify its own code and restart; if you don’t allow the running process to access its own code, it couldn’t make any changes that would persist across a restart of the code.

RonanSoleste • 2 months ago

As the article states. You can not give extra permissions only limit further.

crabmusket • 2 months ago

Reading this as a web developer, it reminds me of Demo's permission system.

Deno is a JS runtime that often runs, at my behest, code that I did not myself write and haven't vetted. At run time, I can invoke Deno with --allow-read=$PWD and know that Deno will prevent all that untrusted JS from reading any files outside the current directory.

If Deno itself is compromised then yeah, that won't work. But that's a smaller attack surface than all my NPM packages.

Just one example of how something like this helps in practise.

swiftcoder • 2 months ago

> if the application were compromised in some way, wouldn't it simply un-restrict itself?

The API doesn't allow un-restriction, only restriction. Since one typically applies restrictions at program start, they will be applied before an attacker gains remote-execution, and the attacker is then limited in what they can do...

zie • 2 months ago

The kernel guarantees that once restricted, that process will stay restricted. The only way for it to un-restrict itself would be to also compromise the Linux kernel. So you have 2 things you have to compromise to own the machine, instead of just 1.

johncolanduoni • 2 months ago

For sandboxes where the underlying software is assumed to be non-hostile (e.g. browser sandboxes), these kind of restrictions can be applied very early in a program's execution. If the program doesn't accept any untrusted input until after the restrictions are applied, it can still provide a strong defense against escalation in the event of a vulnerability.

williamstein • 2 months ago

codex-cli is a neat example of an open source Rust program that uses Landlock to run commands that an LLM comes up with when writing code (see [1]). The model is that a user trusts the agent program (codex-cli), but has much more limited trust of the commands the remote LLM asks codex-cli to run.

[1] https://developers.openai.com/codex/security/

baq • 2 months ago

The point is it can’t.

brainless • 2 months ago

Thanks for sharing.

I did not know of this and I am looking for simple ways to isolate processes for multiple reasons. I am building a coding agent, https://github.com/brainless/nocodo, that runs (headless) on a Linux instance. Generated code is immediately available for demo.

I am new to isolation and not looking for a container based approach. Isolation from a security standpoint but I do not know enough. This approach looks like a great start for me.

imcritic • 2 months ago

This approach is stupid.

That's like relying on criminals to cuff themselves when they have committed a crime.

woodruffw • 2 months ago

That’s not how userspace sandboxing works. The assumption is that privilege flows from a trusted parent process to an untrusted child, so the trusted parent is the one responsible for setting the access controls.

tremon • 2 months ago

Not really. It's more like wearing seatbelts: the car is not supposed to crash, but in case something unforeseen happens, please don't let the passengers exit through the windshield.

nesarkvechnep • 2 months ago

Wait until the author discovers FreeBSD’s Capsicum. I believe it’s superior to most of the APIs provided by other major OSs.

initramfs • 2 months ago

bookmarked.

razighter777 • 2 months ago

[flagged]

Cthulhu_ • 2 months ago

Is this a statement or a question?

razighter777 • 2 months ago

whoops it got added by the post creation form. I thought it would appear as a subtitle not a comment lol

seethishat • 2 months ago

LandLock is a Minor LSM intended for software developers. They incorporate it into their source code to limit where the programs may read/write. Here's a simple Go example:

    package main

    import (
     "flag"
     "fmt"
     "github.com/landlock-lsm/go-landlock/landlock"
     "io/ioutil"
     "log"
     "os"
    )

    // simple program that demonstrates how landlock works in Go on Linux systems.
    // Requires 5.13 or newer kernel and .config should look something like this:
    // CONFIG_SECURITY_LANDLOCK=y
    //  CONFIG_LSM="landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo"
    func main() {
     var help = flag.Bool("help", false, "landlock-example -f /path/to/file.txt")
     var file = flag.String("f", "", "the file path to read")

    flag.Parse()
     if *help || len(os.Args) == 1 {
      flag.PrintDefaults()
      return
     }
    
    // allow the program to read files in /home/user/tmp
     err := landlock.V1.RestrictPaths(landlock.RODirs("/home/user/tmp"))
     if err != nil {
     log.Fatal(err)
     }
    
    // attempt to read a file
     bytes, err := ioutil.ReadFile(*file)
     if err != nil {
     log.Fatal(err)
     }
    
    fmt.Println(string(bytes))
    }

Cthulhu_ • 2 months ago

I feel like I need to ask; did you write this comment and the code example yourself, or did you ask an AI to generate it? If it's AI, why didn't you disclose it? If it's the former, why the weird formatting etc instead of linking to one of the official examples at https://github.com/landlock-lsm/go-landlock/blob/main/exampl... ?

razighter777 • 2 months ago

Yup. In the application code itself is where landlock shines at the moment.

It's becoming increasingly usable as a wrapper for untrusted applications as well.

unsnap_biceps • 2 months ago

I don't understand why someone would wrap an untrusted application with their own code vs using something like Systemd's exec capabilities to do the same without having to have a binary wrapper. What benefits do you see over the systemd solution?

razighter777 • 2 months ago

Systemd's exec capabilities are great, but don't allow the application developer to dynamically restrict access rights to resources. So you could restrict a text editor for instance to the file it was launched to edit, instead of a hardcoded directory.