Pure Go Foreign Function Interface for calling C libraries without CGO. Designed for WebGPU and GPU computing — zero C dependencies, zero per-call allocations, 88–114 ns overhead.
Deep dive: How We Call C Libraries Without a C Compiler — architecture, assembly, callbacks, and ecosystem.
// Load library, prepare once, call many times — no CGO required
handle, _ := ffi.LoadLibrary("wgpu_native.dll")
sym, _ := ffi.GetSymbol(handle, "wgpuCreateInstance")
cif := &types.CallInterface{}
ffi.PrepareCallInterface(cif, types.DefaultCall, returnType, argTypes)
ffi.CallFunction(cif, sym, unsafe.Pointer(&result), args)| Feature | Details | |
|---|---|---|
| Zero CGO | Pure Go | No C compiler needed. go get and build. |
| Fast | 88–114 ns/op | Pre-computed CIF, zero per-call allocations |
| Cross-platform | 6 targets | Windows, Linux, macOS × AMD64 + ARM64 |
| Callbacks | C→Go safe | crosscall2 integration, works from any C thread |
| Type-safe | Runtime validation | 5 typed error types with errors.As() support |
| Struct passing | Full ABI | ≤8B (RAX), 9–16B (RAX+RDX), >16B (sret) |
| Context | Timeouts | CallFunctionContext(ctx, ...) cancellation |
| Tested | 89% coverage | CI on Linux, Windows, macOS |
go get github.com/go-webgpu/goffigoffi requires CGO_ENABLED=0. This is automatic when no C compiler is installed or when cross-compiling. If you have gcc/clang:
CGO_ENABLED=0 go build ./...Why? goffi uses Go's
cgo_import_dynamicfor dynamic library loading, which only activates when CGO is disabled.
package main
import (
"fmt"
"runtime"
"unsafe"
"github.com/go-webgpu/goffi/ffi"
"github.com/go-webgpu/goffi/types"
)
func main() {
// Load platform-specific C library
libName := "libc.so.6"
if runtime.GOOS == "windows" {
libName = "msvcrt.dll"
}
handle, err := ffi.LoadLibrary(libName)
if err != nil {
panic(err)
}
defer ffi.FreeLibrary(handle)
strlen, err := ffi.GetSymbol(handle, "strlen")
if err != nil {
panic(err)
}
// Prepare call interface once — reuse for all subsequent calls
cif := &types.CallInterface{}
err = ffi.PrepareCallInterface(
cif,
types.DefaultCall, // auto-detects platform ABI
types.UInt64TypeDescriptor, // return: size_t
[]*types.TypeDescriptor{types.PointerTypeDescriptor}, // arg: const char*
)
if err != nil {
panic(err)
}
// Call strlen — avalue elements are pointers TO argument values
testStr := "Hello, goffi!\x00"
strPtr := uintptr(unsafe.Pointer(unsafe.StringData(testStr)))
var length uint64
err = ffi.CallFunction(cif, strlen, unsafe.Pointer(&length), []unsafe.Pointer{unsafe.Pointer(&strPtr)})
if err != nil {
panic(err)
}
fmt.Printf("strlen(%q) = %d\n", testStr[:len(testStr)-1], length)
// Output: strlen("Hello, goffi!") = 13
}FFI overhead: 88–114 ns/op (Windows AMD64, Intel i7-1255U)
| Benchmark | Time | Allocations |
|---|---|---|
Empty function (getpid) |
88 ns | 2 allocs |
Integer argument (abs) |
114 ns | 3 allocs |
String processing (strlen) |
98 ns | 3 allocs |
At 60 FPS with ~50 FFI calls per frame, overhead is 5 µs per frame — 0.03% of the 16.6 ms budget. Unmeasurable in profiling.
See docs/PERFORMANCE.md for detailed analysis, optimization strategies, and when NOT to use goffi.
goffi transitions from Go's managed runtime to C code through three layers:
Go Code
│ ffi.CallFunction()
▼
runtime.cgocall ← Go runtime: system stack switch, GC coordination
│
▼
Assembly Wrapper ← Hand-written: load GP/SSE registers per ABI
│ CALL target_function
▼
C Function ← External library
Three ABIs, hand-written assembly for each:
| ABI | GP Registers | FP Registers | Notes |
|---|---|---|---|
| System V AMD64 | RDI, RSI, RDX, RCX, R8, R9 | XMM0–XMM7 | Linux, macOS, FreeBSD |
| Win64 | RCX, RDX, R8, R9 | XMM0–XMM3 | 32-byte shadow space mandatory |
| AAPCS64 | X0–X7 | D0–D7 | HFA support for ARM64 |
See docs/ARCHITECTURE.md for the full technical deep dive.
WebGPU fires async callbacks from internal Metal/Vulkan threads. These threads have no goroutine — calling Go directly would crash.
goffi uses crosscall2 for safe C→Go transitions from any thread:
cb := ffi.NewCallback(func(status uint32, adapter uintptr, msg uintptr, ud uintptr) {
// Safe even when called from a C thread
result.handle = adapter
close(done)
})
ffi.CallFunction(cif, wgpuRequestAdapter, nil, args)
<-done // Wait for GPU driver callback2000 pre-compiled trampoline entries per process. AMD64: 5 bytes/entry. ARM64: 8 bytes/entry.
Five typed error types for precise diagnostics:
handle, err := ffi.LoadLibrary("nonexistent.dll")
if err != nil {
var libErr *ffi.LibraryError
if errors.As(err, &libErr) {
fmt.Printf("Failed to %s %q: %v\n", libErr.Operation, libErr.Name, libErr.Err)
}
}| Error Type | When |
|---|---|
InvalidCallInterfaceError |
CIF preparation failures |
LibraryError |
Library loading / symbol lookup |
CallingConventionError |
Unsupported calling convention |
TypeValidationError |
Invalid type descriptor |
UnsupportedPlatformError |
Platform not supported |
| Feature | goffi | purego | CGO |
|---|---|---|---|
| C compiler required | No | No | Yes |
| API style | libffi-like (prepare once, call many) | reflect-based (RegisterFunc) | Native |
| Per-call allocations | Zero (CIF reusable) | reflect + sync.Pool per call | Zero |
| Struct pass/return | Full (RAX+RDX, sret) | Partial (no Windows structs) | Full |
| Callback float returns | XMM0 in asm | Not supported (panic) | Full |
| ARM64 HFA detection | Recursive (nested structs) | Partial (bug in nested path) | Full |
| Typed errors | 5 types + errors.As() | Generic | N/A |
| Context support | Timeouts/cancellation | No | No |
| C-thread callbacks | crosscall2 | crosscall2 | Full |
| String/bool/slice args | Raw pointers only | Auto-marshaling | Full |
| Platform breadth | 6 targets | 8 GOARCH / 20+ OS×ARCH | All |
| AMD64 overhead | 88–114 ns | Not published | ~140 ns (Go 1.26 claims ~30% reduction) |
Choose goffi for GPU/real-time workloads: struct passing, zero per-call overhead, callback float returns, typed errors.
Choose purego for general-purpose bindings: string auto-marshaling, broad architecture support, less boilerplate.
See also: JupiterRider/ffi — pure Go binding for libffi via purego. Supports struct pass/return and variadic functions; requires libffi at runtime.
Windows: C++ exceptions may crash the program (#12516)
- Go runtime limitation, not goffi-specific. Go 1.22+ added partial SEH support (#58542), but edge cases remain.
- Workaround: build native libraries with
panic=abort.
Windows: float return values not captured from XMM0
syscall.SyscallNreturns RAX only. Gosyscallpackage limitation.
Variadic functions not supported (printf, sprintf)
- Use non-variadic wrappers. Planned for v0.5.0.
Struct packing follows System V ABI only
- Windows
#pragma packnot honored. Manually specifySize/AlignmentinTypeDescriptor.
No bitfields in struct types.
Unix: duplicate symbol conflict with purego (#22)
- When using goffi and purego in the same binary with
CGO_ENABLED=0, the linker reportsduplicated definition of symbol _cgo_init. Both libraries includeinternal/fakecgowhich defines identical runtime symbols. - Workaround: build with
-tags nofakecgoto disable goffi's fakecgo, relying on purego's copy:CGO_ENABLED=0 go build -tags nofakecgo ./...
| Platform | Arch | ABI | Since | CI |
|---|---|---|---|---|
| Windows | amd64 | Win64 | v0.1.0 | Tested |
| Linux | amd64 | System V | v0.1.0 | Tested |
| macOS | amd64 | System V | v0.1.1 | Tested |
| FreeBSD | amd64 | System V | v0.1.0 | Untested |
| Linux | arm64 | AAPCS64 | v0.3.0 | Cross-compile verified |
| macOS | arm64 | AAPCS64 | v0.3.7 | Tested (M3 Pro) |
| Version | Status | Highlights |
|---|---|---|
| v0.2.0 | Released | Callback API, 2000-entry trampoline table |
| v0.3.x | Released | ARM64 (AAPCS64), HFA, Apple Silicon |
| v0.4.0 | Released | crosscall2 for C-thread callbacks |
| v0.4.1 | Released | ABI compliance audit — 10/11 gaps fixed |
| v0.5.0 | Next | Variadic functions, builder API, Windows struct packing |
| v1.0.0 | Planned | API stability (SemVer 2.0), security audit |
See CHANGELOG.md for version history and ROADMAP.md for the full plan.
go test ./... # all tests
go test -cover ./... # with coverage (89%)
go test -bench=. -benchmem ./ffi # benchmarks
go test -v ./ffi # verbose, auto-detects platform| Document | Description |
|---|---|
| docs/ARCHITECTURE.md | Technical architecture: assembly, ABIs, callbacks |
| docs/PERFORMANCE.md | Benchmarks, optimization strategies, Go 1.26 |
| CHANGELOG.md | Version history, migration guides |
| ROADMAP.md | Development roadmap to v1.0 |
| CONTRIBUTING.md | Contribution guidelines |
| SECURITY.md | Security policy |
| examples/ | Working code examples |
See CONTRIBUTING.md for guidelines.
- Fork → feature branch → tests (80%+ coverage) → lint → PR
- Conventional commits:
feat:,fix:,docs:,test:
- purego — proved that pure Go FFI is possible. The
crosscall2callback mechanism,fakecgoapproach, and assembly trampoline patterns were pioneered by purego. goffi exists because purego cleared the path. - libffi — reference for FFI architecture patterns and CIF design.
- Go runtime —
runtime.cgocallfor GC-safe stack switching,crosscall2for C→Go transitions.
goffi powers an ecosystem of pure Go GPU libraries:
| Project | Description |
|---|---|
| go-webgpu/webgpu | Zero-CGO WebGPU bindings (wgpu-native) |
| born-ml/born | ML framework for Go, GPU-accelerated |
| gogpu | GPU computing platform — dual Rust + Pure Go backends |
| wgpu-native | Native WebGPU implementation (upstream) |
MIT — see LICENSE.
goffi v0.4.1 | GitHub | pkg.go.dev | Dev.to