Lessandro

A website containing something I’ve been working on

12 Apr 2022

Security Containers: System Call and Permissions

The majority of the time, you’ll see containers running a Linux operation system. Containers run Linux processes that are visible from the host.

A regular and a containerized process use system calls and need permission and privileges, but the difference is how these permissions are assigned at runtime or during the image build process for containers.

System Calls

Application run in what’s called user space.

The application will ask the kernel on its behalf if the application needs to do something like accessing a file or even find the time of the day. System Call or syscall interface is when the user space code uses to make these requests of the kernel.

Here are some examples of System calls:

read

read data from a file

write

write data to a file

open

open a file for subsequent reading or writing

execve

run an executable program

chown

change the owner of a file

clone

create a new process

Uncommonly application developers will need to worry about system calls. The lowest level that they can come across is the Glibc library or the Golang syscall package.

The system call is used the same way either if it’s running in a container or not. The challenge is due to containers are making a system call to the same kernel, they’re on a single host share.

Following the principle, least privilege, describe here we can allow users to limit the set of system calls that different programs can access.

File Permissions

There is a saying that in Linux, everything is a file.

Permissions on files determine which users are allowed to access those files and what actions they can perform on the files where it’s referred to as discretionary access control or DAC.

The below image describes the output command of ls -l. It shows information about the files and their attributes.

Image alt

For the file permission group, there are three actions that users might be able to perform on this file: read (r), write (w) or execute (x).

In the above example the owner, group, and any user can execute the file.

Image alt

Permissions on the file can be affected by the use of setuid, setgid and sticky bits. Setuid and setgid can be used by attackers to obtain additional permissions.

setuid and setgid

When you execute a file, the process that gets started inherits your user ID.

The below example shows a file owned by the user called lessandrougulino. Run this under root by executing sudo sleep 100

Image alt

As you can see, the sleep process is running under the root UID.

Image alt

Now, let’s try turning on the setuid bit.

chmod +s sleep.sh

Image alt

Running the sleep.sh again, we can see the file (sleep.sh) has taken its user ID from the owner of the file

Image alt

This bit is typically used to give a program privileges that it needs but that is not usually extended to regular users.

Let’s try with the command ping. As you can see, I can run the command as non-root.

Image alt

Now, let’s take a copy of the ping command.

cp /usr/bin/ping ./myping

The permissions of myping command.

Image alt

Let’s try to run it using myping command.

Image alt

As you can see above, when you copy an executable, the file ownership attributes are set according to the user ID you’re operating as, and the setuid bit is not carried over.

The original ping executable has the s or setuid bit instead of a regular x.

Image alt

Let’s set the setuid on the executable and try again.

sudo chmod +s ./myping

Image alt

It’s working because myping executable has the setuid bit, which allows it to operate as root.

Image alt

If you open a new terminal, you can see the myping executable isn’t running as root. It’s happening because the modern version of ping starts off running as root, but it explicitly sets just the capabilities that it needs and resets its user ID to be that of the original user, it jumps through some hoops.

Image alt

You can explore this in more detail, using trace.

Security implications of setuid

It’s very easy to write your own program that does setuid on itself and then, having already transitioned to root, calls the shell.

Some container image scanners will report on the presence of files with the setuid bit set.

You can prevent it from being used with the --no-new-privileges flag on a docker run command.

Version 2.2 of the Linux kernel introduced more granular control over these extra privileges through capabilities. I’ll cover it in the next post.