Reference
Data types
PikaParser.Clause
— Typeabstract type Clause{G, T}
Abstract type for all clauses that match a grammar with rule labels of type G
that match sequences of tokens of type T
.
Currently implemented clauses:
Often it is better to use convenience functions for rule construction, such as seq
or token
; see flatten
for details.
PikaParser.EndOfInput
— Typestruct EndOfInput{G, T} <: PikaParser.Clause{G, T}
Matches at the end of the input.
PikaParser.Epsilon
— Typestruct Epsilon{G, T} <: PikaParser.Clause{G, T}
An always-succeeding epsilon match.
PikaParser.Fail
— Typestruct Fail{G, T} <: PikaParser.Clause{G, T}
An always-failing match.
PikaParser.First
— Typestruct First{G, T} <: PikaParser.Clause{G, T}
Match the first possibility of several matches. Empty First
is equivalent to unconditional failure.
Fields
children::Vector
PikaParser.FollowedBy
— Typestruct FollowedBy{G, T} <: PikaParser.Clause{G, T}
Zero-length match that succeeds if follow
does match at the same position.
Fields
follow::Any
PikaParser.Grammar
— Typestruct Grammar{G, T}
A representation of the grammar prepared for parsing.
Fields
names::Vector
: Topologically sorted list of rule labels (non-terminals)idx::Dict{G, Int64} where G
: Mapping of rule labels to their indexes innames
clauses::Array{PikaParser.Clause{Int64, T}, 1} where T
: Clauses of the grammar converted to integer labels (and again sorted topologically)can_match_epsilon::Vector{Bool}
: Flags for the rules being able to match on empty string unconditionallyseed_clauses::Vector{Vector{Int64}}
: Which clauses get seeded upon matching of a clauseterminals::Vector{Int64}
: Sorted indexes of terminal clauses that are checked against each input item.
PikaParser.Many
— Typestruct Many{G, T} <: PikaParser.Clause{G, T}
Greedily matches a sequence of matches that can be empty.
Fields
item::Any
PikaParser.Match
— Typestruct Match
Internal match representation.
Fields
clause::Int64
: Which clause has matched here?first::Int64
: Where the match started?last::Int64
: What is the last item of the match? (In case of empty match, this MUST be the previous index beforefirst
, i.e.,prevind(input, first)
.)option_idx::Int64
: Which possibility (given by the clause) did we match?submatches::Int64
: Index to the first submatch, the range of submatches spans all the way to the first submatch of the next Match.left::Int64
: Left child in the memo tree.right::Int64
: Right child in the memo tree.parent::Int64
: Parent in the memo tree.
PikaParser.MatchResult
— Typeprimitive type Int64 <: Signed 64
A match index in ParserState
field matches
, or nothing
if the match failed.
PikaParser.NotFollowedBy
— Typestruct NotFollowedBy{G, T} <: PikaParser.Clause{G, T}
Zero-length match that succeeds if reserved
does not match at the same position.
Fields
reserved::Any
PikaParser.ParserState
— Typemutable struct ParserState{G, T, I}
Intermediate parsing state. The match tree is built in a vector of matches that grows during the matching, all match indexes point into this vector.
This structure is also a "result" of the parsing, used to reconstruct the match tree.
Fields
grammar::PikaParser.Grammar
: Copy of the grammar used to parse the input.q::PikaParser.PikaQueue
: Queue for rules that should match, used only internally.matches::Vector{PikaParser.Match}
: Matches, connected by indexes to form a memo table search tree.memo_root::Int64
: Root of the memotable search tree (stored in thematches
).submatches::Vector{Int64}
: Children pointers of the matches that form the match tree.input::Any
: Parser input, used to reconstruct match data.
PikaParser.Satisfy
— Typestruct Satisfy{G, T} <: PikaParser.Terminal{G, T}
A single terminal. Matches a token from the input stream where the match
function returns true
, otherwise it returns false
.
Fields
match::Function
PikaParser.Scan
— Typestruct Scan{G, T} <: PikaParser.Terminal{G, T}
A single terminal, possibly made out of multiple input tokens.
Given the input stream view, the match
function scans the input forward and returns the position of the last item of the matched terminal (which is assumed to start at the beginning of the stream view). In case there's no match, it returns zero.
Fields
match::Function
PikaParser.Seq
— Typestruct Seq{G, T} <: PikaParser.Clause{G, T}
Sequence of matches. Empty Seq
is equivalent to an always-succeeding empty match, as in Epsilon
.
Fields
children::Vector
PikaParser.Some
— Typestruct Some{G, T} <: PikaParser.Clause{G, T}
Greedily matches a sequence of matches, with at least 1 match.
Fields
item::Any
PikaParser.Tie
— Typestruct Tie{G, T} <: PikaParser.Clause{G, T}
Produces the very same match as the item
, but concatenates the user views of the resulting submatches into one big vector (i.e., basically squashing the 2 levels of child matches to a single one.) Useful e.g. for lists with different initial or final elements.
As a result, the item
and its immediate children are not going to be present in the parse tree.
Fields
tuple::Any
PikaParser.Token
— Typestruct Token{G, T} <: PikaParser.Terminal{G, T}
A single token equal to match
.
Fields
token::Any
PikaParser.Tokens
— Typestruct Tokens{G, T, I} <: PikaParser.Terminal{G, T}
A series of tokens equal to match
.
Fields
tokens::Any
PikaParser.TraverseNode
— Typemutable struct TraverseNode{G, S}
Part of intermediate tree traversing state.
Fields
parent_idx::Int64
parent_sub_idx::Int64
match::PikaParser.UserMatch
open::Bool
subvals::Vector
PikaParser.UserMatch
— Typestruct UserMatch{G, S}
User-facing representation of a Match
.
Fields
rule::Any
: Which rule ID has matched here?first::Int64
: Where the match started?last::Int64
: What is the last item of the match? (In case of empty match, this is the previous index beforefirst
.)view::Any
: View of the matched part of the input, usually aSubArray
orSubString
.submatches::Vector{Int64}
: Indexes and rule labels of the matched submatches. This forms the edges in the match tree.
Preparing the grammar
Specifying rules
PikaParser.end_of_input
— Constantend_of_input :: Clause
An EndOfInput
clause. Translate to strongly typed grammar with flatten
.
Example
whole_file = seq(:file_contents, end_of_input)
PikaParser.epsilon
— Constantepsilon :: Clause
An Epsilon
clause. Translate to strongly typed grammar with flatten
.
Example
maybe_letter_a = first(token('a'), epsilon)
PikaParser.fail
— Constantfail :: Clause
A Fail
clause. Translate to strongly typed grammar with flatten
.
Useful for avoiding rule specification when matching terminals using the fast_match
parameter of parse
.
Example
seq(:this, :that, fail) # this rule is effectively disabled
PikaParser.first
— Methodfirst(args...) -> PikaParser.First{Any, Any}
Build a First
clause. Translate to strongly typed grammar with flatten
.
Example
first(:something, :fallback, :fallback2)
PikaParser.flatten
— Methodflatten(rules::Dict{G}, tokentype::DataType) -> Dict
flatten(
rules::Dict{G},
tokentype::DataType,
childlabel::Function
) -> Dict
Convert a possibly nested and weakly typed rules
into a correctly typed and unnested ruleset, usable in make_grammar
. This allows use of convenience rule building functions:
satisfy
scan
token
tokens
epsilon
(not a function!)fail
(not a function!)seq
first
not_followed_by
followed_by
some
many
tie
precedence_cascade
(not backed by an actualClause
!)
Anonymous nested rules are assigned names that are constructed by childlabel
function (gets the original G and and integer with position integer). By default, childlabel
concatenates the parent rule name, hyphen, and the position number to form a Symbol
(i.e., the default works only in cases when the rules are labeled by Symbols, and you need to provide your own implementation for other grammars labeled e.g. by integers or strings).
PikaParser.followed_by
— Methodfollowed_by(x) -> PikaParser.FollowedBy{_A, Any} where _A
Build a FollowedBy
clause. Translate to strongly typed grammar with flatten
.
Example
seq(:digits, followed_by(:whitespace))
PikaParser.many
— Methodmany(x) -> PikaParser.Many{_A, Any} where _A
Build a Many
clause. Translate to strongly typed grammar with flatten
.
Example
seq(:quote, many(:quote_contents), :quote)
PikaParser.not_followed_by
— Methodnot_followed_by(
x
) -> PikaParser.NotFollowedBy{_A, Any} where _A
Build a NotFollowedBy
clause. Translate to strongly typed grammar with flatten
.
Example
seq(not_followed_by(tokens(collect("reservedWord"))), :identifier)
PikaParser.precedence_cascade
— Methodprecedence_cascade(label::Function, levels...) -> Vector
Convert a list of rules of increasing associativity to a typical precedence-handling "failthrough" construction. The result must be post-processed by flatten
.
Each of the rules is abstracted by "same-associativity" and "higher-associativity" rules (i.e., it is a binary function), which is used to correctly link the rules within the precedence group. The first rule is of the lowest precedence. All rules except the last automatically fallback to the next rule. The higher-precedence parameter of the last rule is the label of the first rule.
label
is a function that generates the label for given n
-th level of the grammar.
Use @precedences
for a less verbose construction.
Returns a vector of labeled rules; that must usually be interpolated into the ruleset.
Example
Dict(
precedence_cascade(
n -> Symbol(:exprlevel, n),
(same, next) -> :expr => first(
:plus => seq(same, token('+'), next),
:minus => seq(same, token('-'), next),
),
(same, next) -> :times => seq(same, token('*'), next), # left associative
(same, next) -> :power => seq(next, token('^'), same), # right associative
(_, restart) -> first(
:parens => seq(token('('), restart, token(')')),
:digits => some(satisfy(isdigit)),
),
)...,
)
PikaParser.satisfy
— Methodsatisfy(f::Function) -> PikaParser.Satisfy{Any, Any}
Build a Satisfy
clause. Translate to strongly typed grammar with flatten
.
Example
satisfy(isdigit)
PikaParser.scan
— Methodscan(f::Function) -> PikaParser.Scan{Any, Any}
Build a Scan
clause. Translate to strongly typed grammar with flatten
.
Example
# a rule to match any pair of equal tokens
scan(m -> (length(m) >= 2 && m[1] == m[2]) ? 2 : 0)
PikaParser.seq
— Methodseq(args...) -> PikaParser.Seq{Any, Any}
Build a Seq
clause. Translate to strongly typed grammar with flatten
.
Example
digit_in_parents = seq(token('('), :digit, token(')'))
PikaParser.some
— Methodsome(x) -> PikaParser.Some{_A, Any} where _A
Build a Some
clause. Translate to strongly typed grammar with flatten
.
Example
some(satisfy(isspace))
PikaParser.tie
— Methodtie(x) -> PikaParser.Tie{_A, Any} where _A
Build a Tie
clause. Translate to strongly typed grammar with flatten
.
Example
:alternating_A_and_B => tie(many(seq(:A, :B)))
PikaParser.token
— Methodtoken(x) -> PikaParser.Token{Any}
Build a Token
clause. Translate to strongly typed grammar with flatten
.
Example
token('a')
PikaParser.tokens
— Methodtokens(xs) -> PikaParser.Tokens{Any}
Build a Tokens
clause. Translate to strongly typed grammar with flatten
.
Example
tokens("keyword")
PikaParser.@precedences
— Macro@precedences labeller same::Symbol next::Symbol rules
A shortcut macro for precedence_cascade
. Automatically adds lambda heads with fixed argument names, and splats itself with ...
into the surrounding environment.
Example
Dict(
@precedences (n->Symbol(:exprlevel, n)) same next begin
:expr => seq(same, token('+'), next)
seq(same, token('*'), next)
first(
token('x'),
seq(token('('), next, token(')'))
)
end
)
Converting to a Grammar
PikaParser.make_grammar
— Methodmake_grammar(
starts::AbstractArray{G, 1},
rules_dict::Dict{G, PikaParser.Clause{G, T}}
) -> PikaParser.Grammar
Produce a Grammar
with rules of type G
that can be used to parse
inputs.
starts
should collect top-level rules (these will be put at the top of the topological order of the parsing).
rules_dict
is a dictionary of grammar Clause
s.
Parsing
PikaParser.lex
— Methodlex(g::PikaParser.Grammar{G, T}, input) -> Vector
Greedily find terminals in the input sequence. For performance and uniqueness purposes, terminals are only looked for at stream indexes that follow the final indexes of terminals found previously. That allows the lexing process to skip many redundant matches that could not ever be found by the grammar.
As a main outcome, this prevents the typical pika-parser behavior when matching sequences using many
, where e.g. an identifier like abcd
also produces redundant (and often invalid) matches for bcd
, cd
and d
. Colaterally, greedy lexing also creates less tokens in the match table, which results in faster parsing.
To produce good terminal matches quickly, use scan
.
In a typical use, this function is best called indirectly via parse_lex
.
PikaParser.parse
— Methodparse(
grammar::PikaParser.Grammar{G, T},
input
) -> PikaParser.ParserState
parse(
grammar::PikaParser.Grammar{G, T},
input,
fast_match
) -> PikaParser.ParserState
Take a Grammar
and an indexable input sequence (typically Vector
or String
), and return a final ParserState
that contains all matched grammar productions.
The input
must be random-indexable (because PikaParsers require a lot of random indexing) using Int indexes, must support firstindex
, lastindex
, prevind
, nextind
, index arithmetics with +
and -
, and indexes in view
s must be the same as in original container except for a constant offset.
Lexing and fast terminal matching
If fast_match
is specified, the function does not match terminals using the associated grammar rules, but with a fast_match
function that reports the matched terminals via a callback. The function is called exactly once for each position in input
in reverse order (i.e., the indexes will start at lastindex(input)
and continue using prevind
all the way to firstindex(input)
), which can be utilized by the application for optimization. The call parameters consist of the input vector, position in the input vector, and a "report" function used to send back a clause ID (of same type as G
in typeof(grammar)
) and the last item of the terminal match that can starts at that position. Calls to the reporting function can be repeated if more terminal types match. Terminals not reported by the calls to fast_match
will not be matched.
For complicated grammars, this may be much faster than having the parser to try matching all terminal types at each position.
If your grammar does not contain dangerous or highly surprising kinds of terminals (in particular, it can be scanned greedily left-to-right), you may use parse_lex
to run a reasonable terminal-only lexing step, which is then automatically used as a basis for fast matching.
Caveats
Take care when indexing String
s. With UTF-8, not all codepoints may necessarily have length 1.
If unsure, you may always collect
the strings to vectors of Char
s (basically converting to UTF-32), where each character occupies precisely one index.
Results
Use find_match_at!
to extract matches from ParserState
.
Pika parsing never really fails. Instead, in case when the grammar rule is not matched in the input, the expected rule match match is either not going to be found at the starting position with find_match_at!
, or it will not span the whole input.
Example
parse(
g,
"abcde123",
(input, i, match) -> isdigit(input[i]) ? match(:digit, i) : match(:letter, i),
)
PikaParser.parse_lex
— Methodparse_lex(
g::PikaParser.Grammar{G, T},
input
) -> PikaParser.ParserState
Use lex
to greedily produce lexemes for a given grammar, and run the parsing atop the result.
While this will produce a different (much more sparse) parsing table and the resulting parse tree may be different from the "full" parse, having the lower levels of the parse tree efficiently pre-chewed vastly simplifies the overall parsing, thus saving a lot of time.
Traversing and folding the parse tree
PikaParser.default_fold
— Methoddefault_fold(m, p, subvals) -> Expr
The default function used as fold
argument in traverse_match
.
PikaParser.default_open
— Methoddefault_open(
m,
p
) -> Base.Generator{_A, PikaParser.var"#35#36"} where _A
The default function used as open
argument in traverse_match
.
PikaParser.find_match_at!
— Methodfind_match_at!(
st::PikaParser.ParserState{G},
rule,
pos::Int64
) -> Int64
Find the Match
index in ParserState
that matched rule
at position pos
, or nothing
if there is no such match.
Zero-length matches may not be matched at all positions by default; this function creates the necessary matches in the tables in st
in case they are missing. (That is the reason for the !
label.)
PikaParser.traverse_match
— Methodtraverse_match(
st::PikaParser.ParserState{G, I},
mid::Int64;
open,
fold
) -> Expr
Given a Match
index in ParserState
st
, recusively depth-first traverse the match tree using functions open
(called upon entering a submatch) and fold
(called upon leaving the submatch).
open
is given a UserMatch
structure and a reference to the parser state. It should return a vector of boolean values that tell the traversal which submatches from the UserMatch
should be opened. That can be used to skip parsing of large uninteresting parts of the match tree, such as whitespace or comments. By default, it opens all submatches, thus the whole subtree is traversed.
fold
is given the UserMatch
structure and a reference to the parser state, and additionally a vector of folded values from the submatches. The values returned by fold
invocations are collected and transferred to higher-level invocations of fold
. In case open
disabled the evaluation of a given submatch, nothing
is used as the folded value for the submatch. The default open
and fold
(default_open
, default_fold
) just collect all submatch values and produce a Julia Expr
AST structure where rule expansions are represented as function calls.
For producing the UserMatch
structures correctly, traverse_match
additionally requires that the input vector (stored here in ParserState
) has a compatible overload of the standard Base.view
.
PikaParser.view_match
— Methodview_match(st::PikaParser.ParserState, mid::Int64) -> Any
Get a view of input that corresponds to the match identified by given match ID.
PikaParser.view_match
— Methodview_match(
st::PikaParser.ParserState,
match::Union{PikaParser.Match, PikaParser.UserMatch}
) -> Any
Get a view of input that corresponds to the given match
.