Skip to content

go-webgpu/goffi

goffi — Zero-CGO FFI for Go

CI codecov Go Report Card GitHub release Go version License Go Reference Dev.to

Pure Go Foreign Function Interface for calling C libraries without CGO. Designed for WebGPU and GPU computing — zero C dependencies, zero per-call allocations, 88–114 ns overhead.

Deep dive: How We Call C Libraries Without a C Compiler — architecture, assembly, callbacks, and ecosystem.

// Load library, prepare once, call many times — no CGO required
handle, _ := ffi.LoadLibrary("wgpu_native.dll")
sym, _ := ffi.GetSymbol(handle, "wgpuCreateInstance")

cif := &types.CallInterface{}
ffi.PrepareCallInterface(cif, types.DefaultCall, returnType, argTypes)
ffi.CallFunction(cif, sym, unsafe.Pointer(&result), args)

Features

Feature Details
Zero CGO Pure Go No C compiler needed. go get and build.
Fast 88–114 ns/op Pre-computed CIF, zero per-call allocations
Cross-platform 6 targets Windows, Linux, macOS × AMD64 + ARM64
Callbacks C→Go safe crosscall2 integration, works from any C thread
Type-safe Runtime validation 5 typed error types with errors.As() support
Struct passing Full ABI ≤8B (RAX), 9–16B (RAX+RDX), >16B (sret)
Context Timeouts CallFunctionContext(ctx, ...) cancellation
Tested 89% coverage CI on Linux, Windows, macOS

Quick Start

Installation

go get github.com/go-webgpu/goffi

Requirements

goffi requires CGO_ENABLED=0. This is automatic when no C compiler is installed or when cross-compiling. If you have gcc/clang:

CGO_ENABLED=0 go build ./...

Why? goffi uses Go's cgo_import_dynamic for dynamic library loading, which only activates when CGO is disabled.

Example: Calling strlen

package main

import (
	"fmt"
	"runtime"
	"unsafe"

	"github.com/go-webgpu/goffi/ffi"
	"github.com/go-webgpu/goffi/types"
)

func main() {
	// Load platform-specific C library
	libName := "libc.so.6"
	if runtime.GOOS == "windows" {
		libName = "msvcrt.dll"
	}

	handle, err := ffi.LoadLibrary(libName)
	if err != nil {
		panic(err)
	}
	defer ffi.FreeLibrary(handle)

	strlen, err := ffi.GetSymbol(handle, "strlen")
	if err != nil {
		panic(err)
	}

	// Prepare call interface once — reuse for all subsequent calls
	cif := &types.CallInterface{}
	err = ffi.PrepareCallInterface(
		cif,
		types.DefaultCall,                                     // auto-detects platform ABI
		types.UInt64TypeDescriptor,                            // return: size_t
		[]*types.TypeDescriptor{types.PointerTypeDescriptor},  // arg: const char*
	)
	if err != nil {
		panic(err)
	}

	// Call strlen — avalue elements are pointers TO argument values
	testStr := "Hello, goffi!\x00"
	strPtr := uintptr(unsafe.Pointer(unsafe.StringData(testStr)))
	var length uint64

	err = ffi.CallFunction(cif, strlen, unsafe.Pointer(&length), []unsafe.Pointer{unsafe.Pointer(&strPtr)})
	if err != nil {
		panic(err)
	}

	fmt.Printf("strlen(%q) = %d\n", testStr[:len(testStr)-1], length)
	// Output: strlen("Hello, goffi!") = 13
}

Performance

FFI overhead: 88–114 ns/op (Windows AMD64, Intel i7-1255U)

Benchmark Time Allocations
Empty function (getpid) 88 ns 2 allocs
Integer argument (abs) 114 ns 3 allocs
String processing (strlen) 98 ns 3 allocs

At 60 FPS with ~50 FFI calls per frame, overhead is 5 µs per frame — 0.03% of the 16.6 ms budget. Unmeasurable in profiling.

See docs/PERFORMANCE.md for detailed analysis, optimization strategies, and when NOT to use goffi.


Architecture

goffi transitions from Go's managed runtime to C code through three layers:

Go Code
  │  ffi.CallFunction()
  ▼
runtime.cgocall               ← Go runtime: system stack switch, GC coordination
  │
  ▼
Assembly Wrapper              ← Hand-written: load GP/SSE registers per ABI
  │  CALL target_function
  ▼
C Function                    ← External library

Three ABIs, hand-written assembly for each:

ABI GP Registers FP Registers Notes
System V AMD64 RDI, RSI, RDX, RCX, R8, R9 XMM0–XMM7 Linux, macOS, FreeBSD
Win64 RCX, RDX, R8, R9 XMM0–XMM3 32-byte shadow space mandatory
AAPCS64 X0–X7 D0–D7 HFA support for ARM64

See docs/ARCHITECTURE.md for the full technical deep dive.


Callbacks (C → Go)

WebGPU fires async callbacks from internal Metal/Vulkan threads. These threads have no goroutine — calling Go directly would crash.

goffi uses crosscall2 for safe C→Go transitions from any thread:

cb := ffi.NewCallback(func(status uint32, adapter uintptr, msg uintptr, ud uintptr) {
    // Safe even when called from a C thread
    result.handle = adapter
    close(done)
})

ffi.CallFunction(cif, wgpuRequestAdapter, nil, args)
<-done // Wait for GPU driver callback

2000 pre-compiled trampoline entries per process. AMD64: 5 bytes/entry. ARM64: 8 bytes/entry.


Error Handling

Five typed error types for precise diagnostics:

handle, err := ffi.LoadLibrary("nonexistent.dll")
if err != nil {
	var libErr *ffi.LibraryError
	if errors.As(err, &libErr) {
		fmt.Printf("Failed to %s %q: %v\n", libErr.Operation, libErr.Name, libErr.Err)
	}
}
Error Type When
InvalidCallInterfaceError CIF preparation failures
LibraryError Library loading / symbol lookup
CallingConventionError Unsupported calling convention
TypeValidationError Invalid type descriptor
UnsupportedPlatformError Platform not supported

Comparison: goffi vs purego vs CGO

Feature goffi purego CGO
C compiler required No No Yes
API style libffi-like (prepare once, call many) reflect-based (RegisterFunc) Native
Per-call allocations Zero (CIF reusable) reflect + sync.Pool per call Zero
Struct pass/return Full (RAX+RDX, sret) Partial (no Windows structs) Full
Callback float returns XMM0 in asm Not supported (panic) Full
ARM64 HFA detection Recursive (nested structs) Partial (bug in nested path) Full
Typed errors 5 types + errors.As() Generic N/A
Context support Timeouts/cancellation No No
C-thread callbacks crosscall2 crosscall2 Full
String/bool/slice args Raw pointers only Auto-marshaling Full
Platform breadth 6 targets 8 GOARCH / 20+ OS×ARCH All
AMD64 overhead 88–114 ns Not published ~140 ns (Go 1.26 claims ~30% reduction)

Choose goffi for GPU/real-time workloads: struct passing, zero per-call overhead, callback float returns, typed errors.

Choose purego for general-purpose bindings: string auto-marshaling, broad architecture support, less boilerplate.

See also: JupiterRider/ffi — pure Go binding for libffi via purego. Supports struct pass/return and variadic functions; requires libffi at runtime.


Known Limitations

Windows: C++ exceptions may crash the program (#12516)

  • Go runtime limitation, not goffi-specific. Go 1.22+ added partial SEH support (#58542), but edge cases remain.
  • Workaround: build native libraries with panic=abort.

Windows: float return values not captured from XMM0

  • syscall.SyscallN returns RAX only. Go syscall package limitation.

Variadic functions not supported (printf, sprintf)

  • Use non-variadic wrappers. Planned for v0.5.0.

Struct packing follows System V ABI only

  • Windows #pragma pack not honored. Manually specify Size/Alignment in TypeDescriptor.

No bitfields in struct types.

Unix: duplicate symbol conflict with purego (#22)

  • When using goffi and purego in the same binary with CGO_ENABLED=0, the linker reports duplicated definition of symbol _cgo_init. Both libraries include internal/fakecgo which defines identical runtime symbols.
  • Workaround: build with -tags nofakecgo to disable goffi's fakecgo, relying on purego's copy:
    CGO_ENABLED=0 go build -tags nofakecgo ./...

Platform Support

Platform Arch ABI Since CI
Windows amd64 Win64 v0.1.0 Tested
Linux amd64 System V v0.1.0 Tested
macOS amd64 System V v0.1.1 Tested
FreeBSD amd64 System V v0.1.0 Untested
Linux arm64 AAPCS64 v0.3.0 Cross-compile verified
macOS arm64 AAPCS64 v0.3.7 Tested (M3 Pro)

Roadmap

Version Status Highlights
v0.2.0 Released Callback API, 2000-entry trampoline table
v0.3.x Released ARM64 (AAPCS64), HFA, Apple Silicon
v0.4.0 Released crosscall2 for C-thread callbacks
v0.4.1 Released ABI compliance audit — 10/11 gaps fixed
v0.5.0 Next Variadic functions, builder API, Windows struct packing
v1.0.0 Planned API stability (SemVer 2.0), security audit

See CHANGELOG.md for version history and ROADMAP.md for the full plan.


Testing

go test ./...                          # all tests
go test -cover ./...                   # with coverage (89%)
go test -bench=. -benchmem ./ffi       # benchmarks
go test -v ./ffi                       # verbose, auto-detects platform

Documentation

Document Description
docs/ARCHITECTURE.md Technical architecture: assembly, ABIs, callbacks
docs/PERFORMANCE.md Benchmarks, optimization strategies, Go 1.26
CHANGELOG.md Version history, migration guides
ROADMAP.md Development roadmap to v1.0
CONTRIBUTING.md Contribution guidelines
SECURITY.md Security policy
examples/ Working code examples

Contributing

See CONTRIBUTING.md for guidelines.

  1. Fork → feature branch → tests (80%+ coverage) → lint → PR
  2. Conventional commits: feat:, fix:, docs:, test:

Acknowledgments

  • purego — proved that pure Go FFI is possible. The crosscall2 callback mechanism, fakecgo approach, and assembly trampoline patterns were pioneered by purego. goffi exists because purego cleared the path.
  • libffi — reference for FFI architecture patterns and CIF design.
  • Go runtimeruntime.cgocall for GC-safe stack switching, crosscall2 for C→Go transitions.

Ecosystem

goffi powers an ecosystem of pure Go GPU libraries:

Project Description
go-webgpu/webgpu Zero-CGO WebGPU bindings (wgpu-native)
born-ml/born ML framework for Go, GPU-accelerated
gogpu GPU computing platform — dual Rust + Pure Go backends
wgpu-native Native WebGPU implementation (upstream)

License

MIT — see LICENSE.


goffi v0.4.1 | GitHub | pkg.go.dev | Dev.to

About

Pure Go FFI for WebGPU - CGO-free GPU access via wgpu-native bindings

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors