Reference
Data types
PikaParser.Clause — Typeabstract type Clause{G, T}Abstract type for all clauses that match a grammar with rule labels of type G that match sequences of tokens of type T.
Currently implemented clauses:
Often it is better to use convenience functions for rule construction, such as seq or token; see flatten for details.
PikaParser.EndOfInput — Typestruct EndOfInput{G, T} <: PikaParser.Clause{G, T}Matches at the end of the input.
PikaParser.Epsilon — Typestruct Epsilon{G, T} <: PikaParser.Clause{G, T}An always-succeeding epsilon match.
PikaParser.Fail — Typestruct Fail{G, T} <: PikaParser.Clause{G, T}An always-failing match.
PikaParser.First — Typestruct First{G, T} <: PikaParser.Clause{G, T}Match the first possibility of several matches. Empty First is equivalent to unconditional failure.
Fields
children::Vector
PikaParser.FollowedBy — Typestruct FollowedBy{G, T} <: PikaParser.Clause{G, T}Zero-length match that succeeds if follow does match at the same position.
Fields
follow::Any
PikaParser.Grammar — Typestruct Grammar{G, T}A representation of the grammar prepared for parsing.
Fields
names::Vector: Topologically sorted list of rule labels (non-terminals)idx::Dict{G, Int64} where G: Mapping of rule labels to their indexes innamesclauses::Array{PikaParser.Clause{Int64, T}, 1} where T: Clauses of the grammar converted to integer labels (and again sorted topologically)can_match_epsilon::Vector{Bool}: Flags for the rules being able to match on empty string unconditionallyseed_clauses::Vector{Vector{Int64}}: Which clauses get seeded upon matching of a clauseterminals::Vector{Int64}: Sorted indexes of terminal clauses that are checked against each input item.
PikaParser.Many — Typestruct Many{G, T} <: PikaParser.Clause{G, T}Greedily matches a sequence of matches that can be empty.
Fields
item::Any
PikaParser.Match — Typestruct MatchInternal match representation.
Fields
clause::Int64: Which clause has matched here?first::Int64: Where the match started?last::Int64: What is the last item of the match? (In case of empty match, this MUST be the previous index beforefirst, i.e.,prevind(input, first).)option_idx::Int64: Which possibility (given by the clause) did we match?submatches::Int64: Index to the first submatch, the range of submatches spans all the way to the first submatch of the next Match.left::Int64: Left child in the memo tree.right::Int64: Right child in the memo tree.parent::Int64: Parent in the memo tree.
PikaParser.MatchResult — Typeprimitive type Int64 <: Signed 64A match index in ParserState field matches, or nothing if the match failed.
PikaParser.NotFollowedBy — Typestruct NotFollowedBy{G, T} <: PikaParser.Clause{G, T}Zero-length match that succeeds if reserved does not match at the same position.
Fields
reserved::Any
PikaParser.ParserState — Typemutable struct ParserState{G, T, I}Intermediate parsing state. The match tree is built in a vector of matches that grows during the matching, all match indexes point into this vector.
This structure is also a "result" of the parsing, used to reconstruct the match tree.
Fields
grammar::PikaParser.Grammar: Copy of the grammar used to parse the input.q::PikaParser.PikaQueue: Queue for rules that should match, used only internally.matches::Vector{PikaParser.Match}: Matches, connected by indexes to form a memo table search tree.memo_root::Int64: Root of the memotable search tree (stored in thematches).submatches::Vector{Int64}: Children pointers of the matches that form the match tree.input::Any: Parser input, used to reconstruct match data.
PikaParser.Satisfy — Typestruct Satisfy{G, T} <: PikaParser.Terminal{G, T}A single terminal. Matches a token from the input stream where the match function returns true, otherwise it returns false.
Fields
match::Function
PikaParser.Scan — Typestruct Scan{G, T} <: PikaParser.Terminal{G, T}A single terminal, possibly made out of multiple input tokens.
Given the input stream view, the match function scans the input forward and returns the position of the last item of the matched terminal (which is assumed to start at the beginning of the stream view). In case there's no match, it returns zero.
Fields
match::Function
PikaParser.Seq — Typestruct Seq{G, T} <: PikaParser.Clause{G, T}Sequence of matches. Empty Seq is equivalent to an always-succeeding empty match, as in Epsilon.
Fields
children::Vector
PikaParser.Some — Typestruct Some{G, T} <: PikaParser.Clause{G, T}Greedily matches a sequence of matches, with at least 1 match.
Fields
item::Any
PikaParser.Tie — Typestruct Tie{G, T} <: PikaParser.Clause{G, T}Produces the very same match as the item, but concatenates the user views of the resulting submatches into one big vector (i.e., basically squashing the 2 levels of child matches to a single one.) Useful e.g. for lists with different initial or final elements.
As a result, the item and its immediate children are not going to be present in the parse tree.
Fields
tuple::Any
PikaParser.Token — Typestruct Token{G, T} <: PikaParser.Terminal{G, T}A single token equal to match.
Fields
token::Any
PikaParser.Tokens — Typestruct Tokens{G, T, I} <: PikaParser.Terminal{G, T}A series of tokens equal to match.
Fields
tokens::Any
PikaParser.TraverseNode — Typemutable struct TraverseNode{G, S}Part of intermediate tree traversing state.
Fields
parent_idx::Int64parent_sub_idx::Int64match::PikaParser.UserMatchopen::Boolsubvals::Vector
PikaParser.UserMatch — Typestruct UserMatch{G, S}User-facing representation of a Match.
Fields
rule::Any: Which rule ID has matched here?first::Int64: Where the match started?last::Int64: What is the last item of the match? (In case of empty match, this is the previous index beforefirst.)view::Any: View of the matched part of the input, usually aSubArrayorSubString.submatches::Vector{Int64}: Indexes and rule labels of the matched submatches. This forms the edges in the match tree.
Preparing the grammar
Specifying rules
PikaParser.end_of_input — Constantend_of_input :: ClauseAn EndOfInput clause. Translate to strongly typed grammar with flatten.
Example
whole_file = seq(:file_contents, end_of_input)PikaParser.epsilon — Constantepsilon :: ClauseAn Epsilon clause. Translate to strongly typed grammar with flatten.
Example
maybe_letter_a = first(token('a'), epsilon)PikaParser.fail — Constantfail :: ClauseA Fail clause. Translate to strongly typed grammar with flatten.
Useful for avoiding rule specification when matching terminals using the fast_match parameter of parse.
Example
seq(:this, :that, fail) # this rule is effectively disabledPikaParser.first — Methodfirst(args...) -> PikaParser.First{Any, Any}
Build a First clause. Translate to strongly typed grammar with flatten.
Example
first(:something, :fallback, :fallback2)PikaParser.flatten — Methodflatten(rules::Dict{G}, tokentype::DataType) -> Dict
flatten(
rules::Dict{G},
tokentype::DataType,
childlabel::Function
) -> Dict
Convert a possibly nested and weakly typed rules into a correctly typed and unnested ruleset, usable in make_grammar. This allows use of convenience rule building functions:
satisfyscantokentokensepsilon(not a function!)fail(not a function!)seqfirstnot_followed_byfollowed_bysomemanytieprecedence_cascade(not backed by an actualClause!)
Anonymous nested rules are assigned names that are constructed by childlabel function (gets the original G and and integer with position integer). By default, childlabel concatenates the parent rule name, hyphen, and the position number to form a Symbol (i.e., the default works only in cases when the rules are labeled by Symbols, and you need to provide your own implementation for other grammars labeled e.g. by integers or strings).
PikaParser.followed_by — Methodfollowed_by(x) -> PikaParser.FollowedBy{_A, Any} where _A
Build a FollowedBy clause. Translate to strongly typed grammar with flatten.
Example
seq(:digits, followed_by(:whitespace))PikaParser.many — Methodmany(x) -> PikaParser.Many{_A, Any} where _A
Build a Many clause. Translate to strongly typed grammar with flatten.
Example
seq(:quote, many(:quote_contents), :quote)PikaParser.not_followed_by — Methodnot_followed_by(
x
) -> PikaParser.NotFollowedBy{_A, Any} where _A
Build a NotFollowedBy clause. Translate to strongly typed grammar with flatten.
Example
seq(not_followed_by(tokens(collect("reservedWord"))), :identifier)PikaParser.precedence_cascade — Methodprecedence_cascade(label::Function, levels...) -> Vector
Convert a list of rules of increasing associativity to a typical precedence-handling "failthrough" construction. The result must be post-processed by flatten.
Each of the rules is abstracted by "same-associativity" and "higher-associativity" rules (i.e., it is a binary function), which is used to correctly link the rules within the precedence group. The first rule is of the lowest precedence. All rules except the last automatically fallback to the next rule. The higher-precedence parameter of the last rule is the label of the first rule.
label is a function that generates the label for given n-th level of the grammar.
Use @precedences for a less verbose construction.
Returns a vector of labeled rules; that must usually be interpolated into the ruleset.
Example
Dict(
precedence_cascade(
n -> Symbol(:exprlevel, n),
(same, next) -> :expr => first(
:plus => seq(same, token('+'), next),
:minus => seq(same, token('-'), next),
),
(same, next) -> :times => seq(same, token('*'), next), # left associative
(same, next) -> :power => seq(next, token('^'), same), # right associative
(_, restart) -> first(
:parens => seq(token('('), restart, token(')')),
:digits => some(satisfy(isdigit)),
),
)...,
)PikaParser.satisfy — Methodsatisfy(f::Function) -> PikaParser.Satisfy{Any, Any}
Build a Satisfy clause. Translate to strongly typed grammar with flatten.
Example
satisfy(isdigit)PikaParser.scan — Methodscan(f::Function) -> PikaParser.Scan{Any, Any}
Build a Scan clause. Translate to strongly typed grammar with flatten.
Example
# a rule to match any pair of equal tokens
scan(m -> (length(m) >= 2 && m[1] == m[2]) ? 2 : 0)PikaParser.seq — Methodseq(args...) -> PikaParser.Seq{Any, Any}
Build a Seq clause. Translate to strongly typed grammar with flatten.
Example
digit_in_parents = seq(token('('), :digit, token(')'))PikaParser.some — Methodsome(x) -> PikaParser.Some{_A, Any} where _A
Build a Some clause. Translate to strongly typed grammar with flatten.
Example
some(satisfy(isspace))PikaParser.tie — Methodtie(x) -> PikaParser.Tie{_A, Any} where _A
Build a Tie clause. Translate to strongly typed grammar with flatten.
Example
:alternating_A_and_B => tie(many(seq(:A, :B)))PikaParser.token — Methodtoken(x) -> PikaParser.Token{Any}
Build a Token clause. Translate to strongly typed grammar with flatten.
Example
token('a')PikaParser.tokens — Methodtokens(xs) -> PikaParser.Tokens{Any}
Build a Tokens clause. Translate to strongly typed grammar with flatten.
Example
tokens("keyword")PikaParser.@precedences — Macro@precedences labeller same::Symbol next::Symbol rulesA shortcut macro for precedence_cascade. Automatically adds lambda heads with fixed argument names, and splats itself with ... into the surrounding environment.
Example
Dict(
@precedences (n->Symbol(:exprlevel, n)) same next begin
:expr => seq(same, token('+'), next)
seq(same, token('*'), next)
first(
token('x'),
seq(token('('), next, token(')'))
)
end
)Converting to a Grammar
PikaParser.make_grammar — Methodmake_grammar(
starts::AbstractArray{G, 1},
rules_dict::Dict{G, PikaParser.Clause{G, T}}
) -> PikaParser.Grammar
Produce a Grammar with rules of type G that can be used to parse inputs.
starts should collect top-level rules (these will be put at the top of the topological order of the parsing).
rules_dict is a dictionary of grammar Clauses.
Parsing
PikaParser.lex — Methodlex(g::PikaParser.Grammar{G, T}, input) -> Vector
Greedily find terminals in the input sequence. For performance and uniqueness purposes, terminals are only looked for at stream indexes that follow the final indexes of terminals found previously. That allows the lexing process to skip many redundant matches that could not ever be found by the grammar.
As a main outcome, this prevents the typical pika-parser behavior when matching sequences using many, where e.g. an identifier like abcd also produces redundant (and often invalid) matches for bcd, cd and d. Colaterally, greedy lexing also creates less tokens in the match table, which results in faster parsing.
To produce good terminal matches quickly, use scan.
In a typical use, this function is best called indirectly via parse_lex.
PikaParser.parse — Methodparse(
grammar::PikaParser.Grammar{G, T},
input
) -> PikaParser.ParserState
parse(
grammar::PikaParser.Grammar{G, T},
input,
fast_match
) -> PikaParser.ParserState
Take a Grammar and an indexable input sequence (typically Vector or String), and return a final ParserState that contains all matched grammar productions.
The input must be random-indexable (because PikaParsers require a lot of random indexing) using Int indexes, must support firstindex, lastindex, prevind, nextind, index arithmetics with + and -, and indexes in views must be the same as in original container except for a constant offset.
Lexing and fast terminal matching
If fast_match is specified, the function does not match terminals using the associated grammar rules, but with a fast_match function that reports the matched terminals via a callback. The function is called exactly once for each position in input in reverse order (i.e., the indexes will start at lastindex(input) and continue using prevind all the way to firstindex(input)), which can be utilized by the application for optimization. The call parameters consist of the input vector, position in the input vector, and a "report" function used to send back a clause ID (of same type as G in typeof(grammar)) and the last item of the terminal match that can starts at that position. Calls to the reporting function can be repeated if more terminal types match. Terminals not reported by the calls to fast_match will not be matched.
For complicated grammars, this may be much faster than having the parser to try matching all terminal types at each position.
If your grammar does not contain dangerous or highly surprising kinds of terminals (in particular, it can be scanned greedily left-to-right), you may use parse_lex to run a reasonable terminal-only lexing step, which is then automatically used as a basis for fast matching.
Caveats
Take care when indexing Strings. With UTF-8, not all codepoints may necessarily have length 1.
If unsure, you may always collect the strings to vectors of Chars (basically converting to UTF-32), where each character occupies precisely one index.
Results
Use find_match_at! to extract matches from ParserState.
Pika parsing never really fails. Instead, in case when the grammar rule is not matched in the input, the expected rule match match is either not going to be found at the starting position with find_match_at!, or it will not span the whole input.
Example
parse(
g,
"abcde123",
(input, i, match) -> isdigit(input[i]) ? match(:digit, i) : match(:letter, i),
)PikaParser.parse_lex — Methodparse_lex(
g::PikaParser.Grammar{G, T},
input
) -> PikaParser.ParserState
Use lex to greedily produce lexemes for a given grammar, and run the parsing atop the result.
While this will produce a different (much more sparse) parsing table and the resulting parse tree may be different from the "full" parse, having the lower levels of the parse tree efficiently pre-chewed vastly simplifies the overall parsing, thus saving a lot of time.
Traversing and folding the parse tree
PikaParser.default_fold — Methoddefault_fold(m, p, subvals) -> Expr
The default function used as fold argument in traverse_match.
PikaParser.default_open — Methoddefault_open(
m,
p
) -> Base.Generator{_A, PikaParser.var"#35#36"} where _A
The default function used as open argument in traverse_match.
PikaParser.find_match_at! — Methodfind_match_at!(
st::PikaParser.ParserState{G},
rule,
pos::Int64
) -> Int64
Find the Match index in ParserState that matched rule at position pos, or nothing if there is no such match.
Zero-length matches may not be matched at all positions by default; this function creates the necessary matches in the tables in st in case they are missing. (That is the reason for the ! label.)
PikaParser.traverse_match — Methodtraverse_match(
st::PikaParser.ParserState{G, I},
mid::Int64;
open,
fold
) -> Expr
Given a Match index in ParserState st, recusively depth-first traverse the match tree using functions open (called upon entering a submatch) and fold (called upon leaving the submatch).
open is given a UserMatch structure and a reference to the parser state. It should return a vector of boolean values that tell the traversal which submatches from the UserMatch should be opened. That can be used to skip parsing of large uninteresting parts of the match tree, such as whitespace or comments. By default, it opens all submatches, thus the whole subtree is traversed.
fold is given the UserMatch structure and a reference to the parser state, and additionally a vector of folded values from the submatches. The values returned by fold invocations are collected and transferred to higher-level invocations of fold. In case open disabled the evaluation of a given submatch, nothing is used as the folded value for the submatch. The default open and fold (default_open, default_fold) just collect all submatch values and produce a Julia Expr AST structure where rule expansions are represented as function calls.
For producing the UserMatch structures correctly, traverse_match additionally requires that the input vector (stored here in ParserState) has a compatible overload of the standard Base.view.
PikaParser.view_match — Methodview_match(st::PikaParser.ParserState, mid::Int64) -> Any
Get a view of input that corresponds to the match identified by given match ID.
PikaParser.view_match — Methodview_match(
st::PikaParser.ParserState,
match::Union{PikaParser.Match, PikaParser.UserMatch}
) -> Any
Get a view of input that corresponds to the given match.