Dakshesh Jain | Tiny Container

tinycontainer is built around three core concepts:

Isolation using Linux namespaces and pivot_root
Security using seccomp and Linux capabilities
Resource Control using cgroup v2

The project uses a bootstrap process created inside new namespaces. Once the bootstrap process starts, it configures the isolated environment by preparing the root filesystem, mounts, cgroups, and security policies. Finally, it replaces itself with the target workload using syscall.Exec().

The project is divided into a few small components with clear ownership:

API / CLI: Application entry points.
Executor: Orchestrates code execution requests.
Runtime: Manages isolated execution environments.
Container: Creates and manages root filesystems.
Workspace: Creates temporary directories for user code.
Process: Handles namespace creation, bootstrap flow, and workload execution.
Security: Applies privilege dropping, capabilities, and seccomp filters.
Cgroup: Enforces resource limits.

Runtime Lifecycle

Each execution receives:

A fresh workspace
A fresh root filesystem
Dedicated namespaces
A dedicated cgroup

The runtime creates an isolated environment, launches a bootstrap process, applies resource limits and security policies, executes the workload, collects the output, and finally cleans up all temporary resources.

Flow:

Request
Workspace create
Container create
Launch Bootstrap
Apply Cgroups
Execute program
Collect output
Cleanup

Process Bootstrap Flow

Linux namespaces must be applied when a process is created. Because of this, tinycontainer uses a re-exec pattern.

The parent process launches a new instance of the current binary inside newly created namespaces. The child process detects bootstrap mode, configures the isolated environment, and finally replaces itself with the target workload using syscall.Exec().

This approach allows the same process to both prepare the sandbox and execute the untrusted code.

Cgroup Handling

Resource limits are enforced using cgroup v2.

Currently supported limits include:

Memory limits
PID limits

Each execution receives its own cgroup, and the process is attached to that cgroup before execution begins. Resource enforcement is performed directly by the Linux kernel.

Seccomp Filtering

Seccomp is used to reduce the system call surface available to untrusted workloads.

After namespaces are configured and privileges are dropped, a seccomp filter is applied to the process.

The current implementation follows a deny-list approach, blocking a small set of dangerous system calls while allowing everything else. This keeps the implementation simple but is not ideal for production-grade sandboxing. A strict allow-list approach would provide significantly stronger isolation.

Seccomp acts as an additional security layer on top of namespaces, capabilities, and cgroups.

Execution Pipeline

The execution pipeline is intentionally simple:

Receive execution request.
Create workspace.
Write source code.
Create container root filesystem.
Launch bootstrap process.
Create namespaces.
Apply security restrictions.
Apply cgroup limits.
Execute workload.
Collect stdout and stderr.
Clean up resources.

Check my blog for more details.

Tiny Container

Tiny Container

Description

Overview

Runtime Lifecycle

Process Bootstrap Flow

Cgroup Handling

Seccomp Filtering

Execution Pipeline

Technologies Used