Example: Parsing JSON

Here we prepare a parser of a very small subset of JSON.

The main features of the parser include:

  • handling sequences with separators
  • handling string escapes
  • building native Julia data objects using a dictionary of handlers

The simplifications that we choose not to handle are the following:

  • we do not support space between tokens (you can pipe your json through jq -c . to remove unnecessary spaces)
  • support for numbers is very ad-hoc, Float64-only
  • the escape sequences allowed in strings are rather incomplete

Preparing the grammar

import PikaParser as P

rules = Dict(
    :t => P.tokens("true"),
    :f => P.tokens("false"),
    :null => P.tokens("null"),
    :digit => P.satisfy(isdigit),
    :number => P.seq(
        P.first(P.token('-'), P.epsilon),
        P.some(:digit),
        P.first(P.seq(P.token('.'), P.many(:digit)), P.epsilon),
    ),
    :quote => P.token('"'),
    :esc => P.token('\\'),
    :string => P.seq(:quote, :instrings => P.many(:instring), :quote),
    :instring => P.first(
        :escaped => P.seq(:esc, P.first(:esc, :quote)),
        :notescaped => P.satisfy(x -> x != '"' && x != '\\'),
    ),
    :array => P.seq(P.token('['), P.first(:inarray, P.epsilon), P.token(']')),
    :sep => P.token(','),
    :inarray => P.tie(P.seq(P.seq(:json), P.many(:separray => P.seq(:sep, :json)))),
    :obj => P.seq(P.token('{'), P.first(:inobj, P.epsilon), P.token('}')),
    :pair => P.seq(:string, P.token(':'), :json),
    :inobj => P.tie(P.seq(P.seq(:pair), P.many(:sepobj => P.seq(:sep, :pair)))),
    :json => P.first(:obj, :array, :string, :number, :t, :f, :null),
);

Making the "fold" function

To manage the folding easily, we keep the fold functions in a data structure with the same order as rules:

folds = Dict(
    :t => (v, s) -> true,
    :f => (v, s) -> false,
    :null => (v, s) -> nothing,
    :number => (v, s) -> parse(Float64, v),
    :quote => (v, s) -> v[1],
    :esc => (v, s) -> v[1],
    :escaped => (v, s) -> s[2],
    :notescaped => (v, s) -> v[1],
    :string => (v, s) -> String(Char.(s[2])),
    :instrings => (v, s) -> s,
    :array => (v, s) -> s[2],
    :inarray => (v, s) -> s,
    :separray => (v, s) -> s[2],
    :obj => (v, s) -> Dict{String,Any}(isnothing(s[2]) ? [] : s[2]),
    :pair => (v, s) -> (s[1] => s[3]),
    :sepobj => (v, s) -> s[2],
    :inobj => (v, s) -> s,
);

default_fold(v, subvals) = isempty(subvals) ? nothing : subvals[1]

g = P.make_grammar([:json], P.flatten(rules, Char));

Parsing JSON

Let's parse a simple JSONish string that demonstrates most of the rules:

input = """{"something":123,"other":false,"refs":[1,-2.345,[],{},true,false,null,[1,2,3,"haha"],{"is\\"Finished\\"":true}]}""";

p = P.parse(g, input);

From the result we can build a Julia JSON-like structure:

result = P.traverse_match(
    p,
    P.find_match_at!(p, :json, 1),
    fold = (m, p, s) -> get(folds, m.rule, default_fold)(m.view, s),
)
Dict{String, Any} with 3 entries:
  "other"     => false
  "something" => 123.0
  "refs"      => Any[1.0, -2.345, nothing, Dict{String, Any}(), true, false, no…

Detail:

result["refs"]
9-element Vector{Any}:
     1.0
    -2.345
      nothing
      Dict{String, Any}()
  true
 false
      nothing
      Any[1.0, 2.0, 3.0, "haha"]
      Dict{String, Any}("is\"Finished\"" => true)

This page was generated using Literate.jl.