Architecture
teru is a terminal emulator, multiplexer, and (optionally) a Wayland compositor — all from one codebase. The terminal is ~37K lines of Zig. The compositor adds ~3,700 lines on top.
Two binaries, one library
libteru (shared)
├── Grid, VtParser, Pane — terminal emulation
├── LayoutEngine, Workspace — 8 tiling layouts, 10 workspaces
├── SoftwareRenderer — SIMD pixel blitting
├── BarWidget, BarRenderer — configurable status bars
├── Config, Keybinds — configuration system
└── MCP, ProcessGraph — AI agent integration
teru (terminal binary) — uses libteru, runs inside any WM
teruwm (compositor binary) — uses libteru + wlroots, IS the WM teru is the standalone terminal. It runs on Linux (X11 + Wayland), macOS, and Windows.
teruwm is the Wayland compositor. It embeds libteru for terminal panes and uses wlroots for display management, input handling, and client window compositing. Under 1MB stripped.
Why CPU rendering beats GPU for terminals
Every GPU-based terminal (Alacritty, Ghostty, WezTerm, Kitty) uploads text textures to the GPU 60 times per second. For a monospace grid that changes a few cells per frame, this is wasteful:
Texture upload latency. CPU→GPU transfer takes ~0.5ms per frame. teru’s framebuffer stays in CPU memory — zero transfer.
GPU context switching. The GPU switches between render mode (draw glyphs) and composite mode (blend windows). Each switch costs ~0.1ms. teru’s CPU renderer runs independently while the GPU composites.
Monospace = perfect SIMD. Every cell is the same size. Every glyph lookup is O(1) into a pre-rasterized atlas. Every alpha blend is the same operation on 4 pixels. Zig’s
@Vector(4, u32)processes this at 128-bit width on all CPUs, auto-fused to 256-bit on AVX2.Works everywhere. No EGL, no OpenGL, no GPU driver required. SSH sessions, VMs, containers, $5 VPS instances — anywhere with a framebuffer.
The result: under 50μs per frame for a 200×50 grid. That’s 0.05ms — faster than a single GPU texture upload.
Compositor: hybrid CPU+GPU
When running as teruwm, teru uses both CPU and GPU — each for what it does best:
Terminal panes: Grid → SIMD render → CPU pixel buffer → wlr_scene_buffer
External apps: Client GPU texture → wlr_scene_surface
Compositing: wlr_scene blends all surfaces → GPU → display - Terminal content is CPU-rendered (faster for text grids)
- Browser/app windows are GPU-rendered by the client (as usual)
- Final compositing uses GPU via wlroots scene graph (damage-tracked)
- Fullscreen terminal bypasses GPU entirely via direct DRM scanout
This hybrid approach gives teruwm lower frame overhead than pure-GPU compositors:
| Metric | Sway | Hyprland | teruwm |
|---|---|---|---|
| Compositor frame overhead | ~2ms | ~3ms | ~0.5ms |
| Terminal input-to-pixel | ~20ms | ~25ms | ~8ms |
| Memory (idle) | ~80MB | ~120MB | ~30MB |
| Binary size | ~4MB | ~12MB | ~1MB |
Zero-copy rendering pipeline
Terminal panes in teruwm render directly into wlroots buffer memory:
1. PTY output arrives (epoll wakeup)
2. VtParser processes bytes → Grid cells update
3. SoftwareRenderer blits changed cells into wlr_buffer
(the framebuffer pointer IS the wlr_buffer data — no copy)
4. wlr_scene_buffer_set_buffer_with_damage notifies wlroots
5. wlr_scene composites on next frame callback
6. GPU presents to display Step 3 is the key: the SoftwareRenderer’s framebuffer slice points directly at the wlr_buffer’s pixel data. When rendering completes, the pixels are already where wlroots needs them. Zero memcpy.
Smooth resize
During mouse-drag resize of tiled panes, teruwm uses GPU scaling instead of re-rendering:
During drag: wlr_scene_buffer_set_dest_size(new_w, new_h)
→ GPU scales existing pixels to new rect (instant)
On release: tp.resize(new_w, new_h)
→ actual grid reflow + PTY resize + full re-render This makes drag feel instant — zero CPU work per frame during the drag. The actual terminal resize (grid reflow, shell redraw) happens once when you release the mouse.
Source structure
src/
├── core/ VtParser, Grid, Pane, Multiplexer, KeyHandler, ViMode
├── tiling/ LayoutEngine (8 layouts), Workspace, types
├── render/ SoftwareRenderer (SIMD), FontAtlas, BarWidget, BarRenderer
├── config/ Config parser, Keybinds (u32 keysym), ConfigWatcher
├── agent/ McpServer (19 tools), PaneBackend, HookHandler, OSC 9999
├── graph/ ProcessGraph (DAG of processes/agents)
├── persist/ Session serialization, Scrollback compression
├── server/ Daemon mode, IPC protocol
├── pty/ PTY abstraction (POSIX, ConPTY, RemotePty)
├── platform/ X11 (XCB), Wayland (xdg-shell), macOS (AppKit), Windows (Win32)
├── compositor/ teruwm: Server, Output, Input, XdgView, XwaylandView,
│ TerminalPane, Bar, Launcher, Notification, Node, WmConfig
├── main.zig Terminal entry point
└── lib.zig libteru public API (37 exports) Key invariants
- Grid cursor always in bounds:
0 ≤ row < rows,0 ≤ col < cols - Zero allocations in the render hot path
- VtParser is pure computation (no I/O, no allocation)
io: std.Iothreaded through every I/O function- All colors flow from
ColorScheme(base16, configurable) - CSI params capped at 16; overflow stops collecting, never crashes
- OSC strings bounded at 256 bytes; overflow truncates, never crashes
Dependencies
teru (terminal)
| Library | Purpose |
|---|---|
| libxcb + libxcb-shm | X11 display (optional: -Dx11=false) |
| libxkbcommon | Keyboard translation |
| libwayland-client | Wayland display (optional: -Dwayland=false) |
| stb_truetype.h (vendored) | Font rasterization |
teruwm (compositor, adds)
| Library | Purpose |
|---|---|
| wlroots 0.18 | Compositor framework (DRM, libinput, Wayland server) |
| wayland-server | Server-side Wayland protocol |
| XWayland (runtime) | X11 application compatibility |
Testing
440+ inline tests covering VT parsing, grid operations, tiling layouts, scrollback compression, session serialization, agent protocol, keybind parsing, font rasterization, bar widget parsing, and SIMD renderer correctness.
zig build test # run all tests
zig build -Dcompositor=true # build teruwm