Skip to content

Comments

Parser architecture rework#4

Draft
xnacly wants to merge 15 commits intomasterfrom
parser-architecture-rework
Draft

Parser architecture rework#4
xnacly wants to merge 15 commits intomasterfrom
parser-architecture-rework

Conversation

@xnacly
Copy link
Owner

@xnacly xnacly commented Feb 18, 2026

This pr aims to rework the parser to remove intermediate values, omittable allocations and in general be faster due to less recursion and less gc pressure.

Goals:

  • replace recursion in the parser with an explicit stack for intermediate containers and jumping around inside of Parser.parse based on token type (this should also account for feat: recursive json object test #2, since the stack overflow for large recursion is now replaced by OOM 😹)
  • replace inline allocations with a more preallocation to omit slice and map grow
  • replace io.ReadAll with a syscall.Mmap in a new libjson.FromFile function
  • deal with escapes in strings, which somehow is still missing
  • add t_string_escapes to only pay the cost of calling unescapeInPlace for strings containing escapes (slowed things down due to multiple extra branches)

Pre pr Benchmarks:

Test input is generated with test/gen.py

Input size library time faster
1MB libjson 9.6ms 1.57x
encoding/json 15.0ms
2MB libjson 36.2ms 1.82x
encoding/json 65.9ms
5MB libjson 74.1ms 1.71x
encoding/json 126.6ms

encoding/json

nogc
$ go run cmd/lj.go -libjson=false -s -nogc -pprof test/10MB.json
$ go tool pprof 10MB.json.pprof
(pprof) top
Showing nodes accounting for 120ms, 100% of 120ms total
Showing top 10 nodes out of 36
      flat  flat%   sum%        cum   cum%
      30ms 25.00% 25.00%       40ms 33.33%  encoding/json.(*Decoder).readValue
      10ms  8.33% 33.33%       10ms  8.33%  encoding/json.(*decodeState).convertNumber
      10ms  8.33% 41.67%       20ms 16.67%  encoding/json.(*decodeState).literalInterface
      10ms  8.33% 50.00%       10ms  8.33%  encoding/json.stateEndValue
      10ms  8.33% 58.33%       10ms  8.33%  internal/chacha8rand.(*State).Next
      10ms  8.33% 66.67%       10ms  8.33%  internal/runtime/maps.(*ctrlGroup).setEmpty
      10ms  8.33% 75.00%       10ms  8.33%  runtime.convTslice
      10ms  8.33% 83.33%       10ms  8.33%  runtime.getMCache
      10ms  8.33% 91.67%       10ms  8.33%  runtime.memclrNoHeapPointers
      10ms  8.33%   100%       10ms  8.33%  runtime.memmove
$ hyperfine "go run cmd/lj.go -libjson=false -s -nogc -pprof test/10MB.json"
Benchmark 1: go run cmd/lj.go -libjson=false -s -nogc -pprof test/10MB.json
  Time (mean ± σ):     163.9 ms ±   3.4 ms    [User: 130.0 ms, System: 88.7 ms]
  Range (min … max):   159.0 ms … 171.9 ms    17 runs
gc
$  go run cmd/lj.go -libjson=false -s -pprof test/10MB.json
$ go tool pprof 10MB.json.pprof
(pprof) top
Showing nodes accounting for 150ms, 83.33% of 180ms total
Showing top 10 nodes out of 55
      flat  flat%   sum%        cum   cum%
      20ms 11.11% 11.11%       40ms 22.22%  encoding/json.(*Decoder).readValue
      20ms 11.11% 22.22%       20ms 11.11%  internal/runtime/gc/scan.scanSpanPackedAVX512
      20ms 11.11% 33.33%       20ms 11.11%  runtime.memclrNoHeapPointers
      20ms 11.11% 44.44%       20ms 11.11%  runtime.memmove
      20ms 11.11% 55.56%       20ms 11.11%  runtime.suspendG
      10ms  5.56% 61.11%       30ms 16.67%  encoding/json.(*decodeState).scanWhile
      10ms  5.56% 66.67%       10ms  5.56%  encoding/json.(*scanner).pushParseState
      10ms  5.56% 72.22%       20ms 11.11%  encoding/json.stateBeginValue
      10ms  5.56% 77.78%       10ms  5.56%  encoding/json.stateEndValue
      10ms  5.56% 83.33%       10ms  5.56%  internal/runtime/maps.(*ctrlGroup).setEmpty
$ hyperfine "go run cmd/lj.go -libjson=false -s -pprof test/10MB.json"
Benchmark 1: go run cmd/lj.go -libjson=false -s -pprof test/10MB.json
  Time (mean ± σ):     160.5 ms ±   3.0 ms    [User: 186.3 ms, System: 79.4 ms]
  Range (min … max):   156.3 ms … 167.9 ms    18 runs

libjson

nogc
$ go run cmd/lj.go -nogc -s -pprof test/10MB.json
$ go tool pprof 10MB.json.pprof
(pprof) top
Showing nodes accounting for 60ms, 100% of 60ms total
Showing top 10 nodes out of 31
      flat  flat%   sum%        cum   cum%
      20ms 33.33% 33.33%       20ms 33.33%  runtime.memclrNoHeapPointers
      10ms 16.67% 50.00%       10ms 16.67%  github.com/xnacly/libjson.(*lexer).next
      10ms 16.67% 66.67%       10ms 16.67%  github.com/xnacly/libjson.pow10 (inline)
      10ms 16.67% 83.33%       10ms 16.67%  internal/runtime/maps.(*ctrlGroup).setEmpty (inline)
      10ms 16.67%   100%       10ms 16.67%  runtime.rand
         0     0%   100%       10ms 16.67%  github.com/xnacly/libjson.(*parser).advance
         0     0%   100%       50ms 83.33%  github.com/xnacly/libjson.(*parser).array
         0     0%   100%       20ms 33.33%  github.com/xnacly/libjson.(*parser).atom
         0     0%   100%       50ms 83.33%  github.com/xnacly/libjson.(*parser).expression
         0     0%   100%       50ms 83.33%  github.com/xnacly/libjson.(*parser).object
$ hyperfine "go run cmd/lj.go -nogc -s -pprof test/10MB.json"
Benchmark 1: go run cmd/lj.go -nogc -s -pprof test/10MB.json
  Time (mean ± σ):     106.6 ms ±   2.1 ms    [User: 84.9 ms, System: 76.6 ms]
  Range (min … max):   103.8 ms … 110.7 ms    28 runs
gc
$ go run cmd/lj.go -s -pprof test/10MB.json
$ go tool pprof 10MB.json.pprof
(pprof) top
Showing nodes accounting for 100ms, 100% of 100ms total
Showing top 10 nodes out of 39
      flat  flat%   sum%        cum   cum%
      10ms 10.00% 10.00%       10ms 10.00%  github.com/xnacly/libjson.pow10
      10ms 10.00% 20.00%       10ms 10.00%  internal/runtime/gc/scan.scanSpanPackedAVX512
      10ms 10.00% 30.00%       10ms 10.00%  internal/runtime/maps.(*ctrlGroup).setEmpty
      10ms 10.00% 40.00%       10ms 10.00%  runtime.acquirem (inline)
      10ms 10.00% 50.00%       10ms 10.00%  runtime.findObject
      10ms 10.00% 60.00%       10ms 10.00%  runtime.heapArenaOf
      10ms 10.00% 70.00%       30ms 30.00%  runtime.mallocgcSmallScanNoHeader
      10ms 10.00% 80.00%       10ms 10.00%  runtime.memclrNoHeapPointers
      10ms 10.00% 90.00%       10ms 10.00%  runtime.nextFreeFast (inline)
      10ms 10.00%   100%       10ms 10.00%  runtime.typePointers.next
$ hyperfine "go run cmd/lj.go -s -pprof test/10MB.json"
Benchmark 1: go run cmd/lj.go -s -pprof test/10MB.json
  Time (mean ± σ):     105.5 ms ±   2.5 ms    [User: 116.6 ms, System: 77.9 ms]
  Range (min … max):   101.5 ms … 110.0 ms    27 runs

Before this change "\uD834\uDD1E" would result in "�DD1E" but should
have resulted in "��", due to both being unmerged surrogates.
Previously time spent for parsing 100MB JSON input (600ms) took 60ms in
a number of unnecessary bound checks: CALL runtime.panicBounds(SB), now
reduced to 20ms by moving explicit bound checks before indizes, reusing
indexed slots and merging manual out of loop increments.
Reduced time taken in unescapeInPlace by 30ms (from 5.75% to 3.41%)
This commit replaces the need for hashing json object keys at parse time
by replacing the previously used map[string]any with the new obj struct:

    | Benchmark | LibJson B/op | EncodingJson B/op | LibJson x Less Memory | LibJson Allocs | EncodingJson Allocs | LibJson x Fewer Allocs |
    | --------- | ------------ | ----------------- | --------------------- | -------------- | ------------------- | ---------------------- |
    | Naive     | 29,632,671   | 42,744,497        | 1.44x                 | 450,023        | 1,050,031           | 2.33x                  |
    | Escaped   | 22,471,438   | 37,544,412        | 1.67x                 | 350,023        | 1,100,030           | 3.14x                  |
    | Hard      | 121,444,318  | 173,944,500       | 1.43x                 | 1,400,023      | 3,000,032           | 2.14x                  |

These changes result in a ~10-15% speedup and allows libjson to hit the
~2x faster than encoding/json milestone. For instance with 1MB, 5MB, 10MB
and 100MB sized files filled with:

    {
        "id": 12345,
        "name": "very_long_string_with_escapes_and_unicode_abcdefghijklmnopqrstuvwxyz_0123456789",
        "description": "This string contains\nmultiple\nlines\nand \"quotes\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"",
        "nested": {
            "level1": {
                "level2": {
                    "level3": {
                        "level4": {
                            "array": [
                                "short",
                                "string_with_escape\\n",
                                "another\\tvalue",
                                "unicode\u2603",
                                "escaped_quote_\"_and_backslash_\\",
                                11234567890,1234567890,1234567890,1234567890,1234567890,1234567890,1234567890,1234567890,1234567890,1234567890,1234567890,1234567890,234567890,
                                -1.2345e67,
                                3.1415926535897932384626433832795028841971,
                                True,
                                False,
                                None,
                                "\u0041\u0042\u0043\u00A9\u20AC\u0041\u0042\u0043\u00A9\u20AC\u0041\u0042\u0043\u00A9\u20AC\u0041\u0042\u0043\u00A9\u20AC\u0041\u0042\u0043\u00A9\u20AC\u0041\u0042\u0043\u00A9\u20AC\u0041\u0042\u0043\u00A9\u20AC\u0041\u0042\u0043\u00A9\u20AC\u0041\u0042\u0043\u00A9\u20AC\u0041\u0042\u0043\u00A9\u20AC\u0041\u0042\u0043\u00A9\u20AC",
                                "mix\\n\\t\\r\\\\\\\"end"
                            ]
                        }
                    }
                }
            }
        }
    }

libjson now outperforms encoding/json:

    $ cd ./benchmarks
    $ ./bench.sh | rg "faster"
    1.72 ± 0.15 times faster than ./test -s -libjson=false ./1MB.json
    1.89 ± 0.11 times faster than ./test -s -libjson=false ./5MB.json
    1.90 ± 0.06 times faster than ./test -s -libjson=false ./10MB.json
    1.95 ± 0.05 times faster than ./test -s -libjson=false ./100MB.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant