Architecture

teru is a terminal emulator, multiplexer, and (optionally) a Wayland compositor — all from one codebase. The terminal is ~37K lines of Zig. The compositor adds ~3,700 lines on top.

Two binaries, one library

libteru (shared)
├── Grid, VtParser, Pane     — terminal emulation
├── LayoutEngine, Workspace  — 8 tiling layouts, 10 workspaces
├── SoftwareRenderer         — SIMD pixel blitting
├── BarWidget, BarRenderer   — configurable status bars
├── Config, Keybinds         — configuration system
└── MCP, ProcessGraph        — AI agent integration

teru (terminal binary)       — uses libteru, runs inside any WM
teruwm (compositor binary)   — uses libteru + wlroots, IS the WM

teru is the standalone terminal. It runs on Linux (X11 + Wayland), macOS, and Windows.

teruwm is the Wayland compositor. It embeds libteru for terminal panes and uses wlroots for display management, input handling, and client window compositing. Under 1MB stripped.

Why CPU rendering beats GPU for terminals

Every GPU-based terminal (Alacritty, Ghostty, WezTerm, Kitty) uploads text textures to the GPU 60 times per second. For a monospace grid that changes a few cells per frame, this is wasteful:

Texture upload latency. CPU→GPU transfer takes ~0.5ms per frame. teru’s framebuffer stays in CPU memory — zero transfer.
GPU context switching. The GPU switches between render mode (draw glyphs) and composite mode (blend windows). Each switch costs ~0.1ms. teru’s CPU renderer runs independently while the GPU composites.
Monospace = perfect SIMD. Every cell is the same size. Every glyph lookup is O(1) into a pre-rasterized atlas. Every alpha blend is the same operation on 4 pixels. Zig’s @Vector(4, u32) processes this at 128-bit width on all CPUs, auto-fused to 256-bit on AVX2.
Works everywhere. No EGL, no OpenGL, no GPU driver required. SSH sessions, VMs, containers, $5 VPS instances — anywhere with a framebuffer.

The result: under 50μs per frame for a 200×50 grid. That’s 0.05ms — faster than a single GPU texture upload.

Compositor: hybrid CPU+GPU

When running as teruwm, teru uses both CPU and GPU — each for what it does best:

Terminal panes:  Grid → SIMD render → CPU pixel buffer → wlr_scene_buffer
External apps:   Client GPU texture → wlr_scene_surface
Compositing:     wlr_scene blends all surfaces → GPU → display

Terminal content is CPU-rendered (faster for text grids)
Browser/app windows are GPU-rendered by the client (as usual)
Final compositing uses GPU via wlroots scene graph (damage-tracked)
Fullscreen terminal bypasses GPU entirely via direct DRM scanout

This hybrid approach gives teruwm lower frame overhead than pure-GPU compositors:

Metric	Sway	Hyprland	teruwm
Compositor frame overhead	~2ms	~3ms	~0.5ms
Terminal input-to-pixel	~20ms	~25ms	~8ms
Memory (idle)	~80MB	~120MB	~30MB
Binary size	~4MB	~12MB	~1MB

Zero-copy rendering pipeline

Terminal panes in teruwm render directly into wlroots buffer memory:

1. PTY output arrives (epoll wakeup)
2. VtParser processes bytes → Grid cells update
3. SoftwareRenderer blits changed cells into wlr_buffer
   (the framebuffer pointer IS the wlr_buffer data — no copy)
4. wlr_scene_buffer_set_buffer_with_damage notifies wlroots
5. wlr_scene composites on next frame callback
6. GPU presents to display

Step 3 is the key: the SoftwareRenderer’s framebuffer slice points directly at the wlr_buffer’s pixel data. When rendering completes, the pixels are already where wlroots needs them. Zero memcpy.

Smooth resize

During mouse-drag resize of tiled panes, teruwm uses GPU scaling instead of re-rendering:

During drag:  wlr_scene_buffer_set_dest_size(new_w, new_h)
              → GPU scales existing pixels to new rect (instant)
On release:   tp.resize(new_w, new_h)
              → actual grid reflow + PTY resize + full re-render

This makes drag feel instant — zero CPU work per frame during the drag. The actual terminal resize (grid reflow, shell redraw) happens once when you release the mouse.

Source structure

src/
├── core/           VtParser, Grid, Pane, Multiplexer, KeyHandler, ViMode
├── tiling/         LayoutEngine (8 layouts), Workspace, types
├── render/         SoftwareRenderer (SIMD), FontAtlas, BarWidget, BarRenderer
├── config/         Config parser, Keybinds (u32 keysym), ConfigWatcher
├── agent/          McpServer (19 tools), PaneBackend, HookHandler, OSC 9999
├── graph/          ProcessGraph (DAG of processes/agents)
├── persist/        Session serialization, Scrollback compression
├── server/         Daemon mode, IPC protocol
├── pty/            PTY abstraction (POSIX, ConPTY, RemotePty)
├── platform/       X11 (XCB), Wayland (xdg-shell), macOS (AppKit), Windows (Win32)
├── compositor/     teruwm: Server, Output, Input, XdgView, XwaylandView,
│                   TerminalPane, Bar, Launcher, Notification, Node, WmConfig
├── main.zig        Terminal entry point
└── lib.zig         libteru public API (37 exports)

Key invariants

Grid cursor always in bounds: 0 ≤ row < rows, 0 ≤ col < cols
Zero allocations in the render hot path
VtParser is pure computation (no I/O, no allocation)
io: std.Io threaded through every I/O function
All colors flow from ColorScheme (base16, configurable)
CSI params capped at 16; overflow stops collecting, never crashes
OSC strings bounded at 256 bytes; overflow truncates, never crashes

Dependencies

teru (terminal)

Library	Purpose
libxcb + libxcb-shm	X11 display (optional: `-Dx11=false`)
libxkbcommon	Keyboard translation
libwayland-client	Wayland display (optional: `-Dwayland=false`)
stb_truetype.h (vendored)	Font rasterization

teruwm (compositor, adds)

Library	Purpose
wlroots 0.18	Compositor framework (DRM, libinput, Wayland server)
wayland-server	Server-side Wayland protocol
XWayland (runtime)	X11 application compatibility

Testing

440+ inline tests covering VT parsing, grid operations, tiling layouts, scrollback compression, session serialization, agent protocol, keybind parsing, font rasterization, bar widget parsing, and SIMD renderer correctness.

zig build test              # run all tests
zig build -Dcompositor=true # build teruwm