Reference

Data types

PikaParser.FirstType
struct First{G, T} <: PikaParser.Clause{G, T}

Match the first possibility of several matches. Empty First is equivalent to unconditional failure.

Fields

  • children::Vector
source
PikaParser.FollowedByType
struct FollowedBy{G, T} <: PikaParser.Clause{G, T}

Zero-length match that succeeds if follow does match at the same position.

Fields

  • follow::Any
source
PikaParser.GrammarType
struct Grammar{G, T}

A representation of the grammar prepared for parsing.

Fields

  • names::Vector: Topologically sorted list of rule labels (non-terminals)

  • idx::Dict{G, Int64} where G: Mapping of rule labels to their indexes in names

  • clauses::Array{PikaParser.Clause{Int64, T}, 1} where T: Clauses of the grammar converted to integer labels (and again sorted topologically)

  • can_match_epsilon::Vector{Bool}: Flags for the rules being able to match on empty string unconditionally

  • seed_clauses::Vector{Vector{Int64}}: Which clauses get seeded upon matching of a clause

  • terminals::Vector{Int64}: Sorted indexes of terminal clauses that are checked against each input item.

source
PikaParser.ManyType
struct Many{G, T} <: PikaParser.Clause{G, T}

Greedily matches a sequence of matches that can be empty.

Fields

  • item::Any
source
PikaParser.MatchType
struct Match

Internal match representation.

Fields

  • clause::Int64: Which clause has matched here?

  • first::Int64: Where the match started?

  • last::Int64: What is the last item of the match? (In case of empty match, this MUST be the previous index before first, i.e., prevind(input, first).)

  • option_idx::Int64: Which possibility (given by the clause) did we match?

  • submatches::Int64: Index to the first submatch, the range of submatches spans all the way to the first submatch of the next Match.

  • left::Int64: Left child in the memo tree.

  • right::Int64: Right child in the memo tree.

  • parent::Int64: Parent in the memo tree.

source
PikaParser.NotFollowedByType
struct NotFollowedBy{G, T} <: PikaParser.Clause{G, T}

Zero-length match that succeeds if reserved does not match at the same position.

Fields

  • reserved::Any
source
PikaParser.ParserStateType
mutable struct ParserState{G, T, I}

Intermediate parsing state. The match tree is built in a vector of matches that grows during the matching, all match indexes point into this vector.

This structure is also a "result" of the parsing, used to reconstruct the match tree.

Fields

  • grammar::PikaParser.Grammar: Copy of the grammar used to parse the input.

  • q::PikaParser.PikaQueue: Queue for rules that should match, used only internally.

  • matches::Vector{PikaParser.Match}: Matches, connected by indexes to form a memo table search tree.

  • memo_root::Int64: Root of the memotable search tree (stored in the matches).

  • submatches::Vector{Int64}: Children pointers of the matches that form the match tree.

  • input::Any: Parser input, used to reconstruct match data.

source
PikaParser.SatisfyType
struct Satisfy{G, T} <: PikaParser.Terminal{G, T}

A single terminal. Matches a token from the input stream where the match function returns true, otherwise it returns false.

Fields

  • match::Function
source
PikaParser.ScanType
struct Scan{G, T} <: PikaParser.Terminal{G, T}

A single terminal, possibly made out of multiple input tokens.

Given the input stream view, the match function scans the input forward and returns the position of the last item of the matched terminal (which is assumed to start at the beginning of the stream view). In case there's no match, it returns zero.

Fields

  • match::Function
source
PikaParser.SeqType
struct Seq{G, T} <: PikaParser.Clause{G, T}

Sequence of matches. Empty Seq is equivalent to an always-succeeding empty match, as in Epsilon.

Fields

  • children::Vector
source
PikaParser.SomeType
struct Some{G, T} <: PikaParser.Clause{G, T}

Greedily matches a sequence of matches, with at least 1 match.

Fields

  • item::Any
source
PikaParser.TieType
struct Tie{G, T} <: PikaParser.Clause{G, T}

Produces the very same match as the item, but concatenates the user views of the resulting submatches into one big vector (i.e., basically squashing the 2 levels of child matches to a single one.) Useful e.g. for lists with different initial or final elements.

As a result, the item and its immediate children are not going to be present in the parse tree.

Fields

  • tuple::Any
source
PikaParser.TokenType
struct Token{G, T} <: PikaParser.Terminal{G, T}

A single token equal to match.

Fields

  • token::Any
source
PikaParser.TokensType
struct Tokens{G, T, I} <: PikaParser.Terminal{G, T}

A series of tokens equal to match.

Fields

  • tokens::Any
source
PikaParser.TraverseNodeType
mutable struct TraverseNode{G, S}

Part of intermediate tree traversing state.

Fields

  • parent_idx::Int64

  • parent_sub_idx::Int64

  • match::PikaParser.UserMatch

  • open::Bool

  • subvals::Vector

source
PikaParser.UserMatchType
struct UserMatch{G, S}

User-facing representation of a Match.

Fields

  • rule::Any: Which rule ID has matched here?

  • first::Int64: Where the match started?

  • last::Int64: What is the last item of the match? (In case of empty match, this is the previous index before first.)

  • view::Any: View of the matched part of the input, usually a SubArray or SubString.

  • submatches::Vector{Int64}: Indexes and rule labels of the matched submatches. This forms the edges in the match tree.

source

Preparing the grammar

Specifying rules

PikaParser.failConstant
fail :: Clause

A Fail clause. Translate to strongly typed grammar with flatten.

Useful for avoiding rule specification when matching terminals using the fast_match parameter of parse.

Example

seq(:this, :that, fail)  # this rule is effectively disabled
source
PikaParser.firstMethod
first(args...) -> PikaParser.First{Any, Any}

Build a First clause. Translate to strongly typed grammar with flatten.

Example

first(:something, :fallback, :fallback2)
source
PikaParser.flattenMethod
flatten(rules::Dict{G}, tokentype::DataType) -> Dict
flatten(
    rules::Dict{G},
    tokentype::DataType,
    childlabel::Function
) -> Dict

Convert a possibly nested and weakly typed rules into a correctly typed and unnested ruleset, usable in make_grammar. This allows use of convenience rule building functions:

Anonymous nested rules are assigned names that are constructed by childlabel function (gets the original G and and integer with position integer). By default, childlabel concatenates the parent rule name, hyphen, and the position number to form a Symbol (i.e., the default works only in cases when the rules are labeled by Symbols, and you need to provide your own implementation for other grammars labeled e.g. by integers or strings).

source
PikaParser.manyMethod
many(x) -> PikaParser.Many{_A, Any} where _A

Build a Many clause. Translate to strongly typed grammar with flatten.

Example

seq(:quote, many(:quote_contents), :quote)
source
PikaParser.not_followed_byMethod
not_followed_by(
    x
) -> PikaParser.NotFollowedBy{_A, Any} where _A

Build a NotFollowedBy clause. Translate to strongly typed grammar with flatten.

Example

seq(not_followed_by(tokens(collect("reservedWord"))), :identifier)
source
PikaParser.precedence_cascadeMethod
precedence_cascade(label::Function, levels...) -> Vector

Convert a list of rules of increasing associativity to a typical precedence-handling "failthrough" construction. The result must be post-processed by flatten.

Each of the rules is abstracted by "same-associativity" and "higher-associativity" rules (i.e., it is a binary function), which is used to correctly link the rules within the precedence group. The first rule is of the lowest precedence. All rules except the last automatically fallback to the next rule. The higher-precedence parameter of the last rule is the label of the first rule.

label is a function that generates the label for given n-th level of the grammar.

Use @precedences for a less verbose construction.

Returns a vector of labeled rules; that must usually be interpolated into the ruleset.

Example

Dict(
    precedence_cascade(
        n -> Symbol(:exprlevel, n),
        (same, next) -> :expr => first(
            :plus => seq(same, token('+'), next),
            :minus => seq(same, token('-'), next),
        ),
        (same, next) -> :times => seq(same, token('*'), next), # left associative
        (same, next) -> :power => seq(next, token('^'), same), # right associative
        (_, restart) -> first(
            :parens => seq(token('('), restart, token(')')),
            :digits => some(satisfy(isdigit)),
        ),
    )...,
)
source
PikaParser.scanMethod
scan(f::Function) -> PikaParser.Scan{Any, Any}

Build a Scan clause. Translate to strongly typed grammar with flatten.

Example

# a rule to match any pair of equal tokens
scan(m -> (length(m) >= 2 && m[1] == m[2]) ? 2 : 0)
source
PikaParser.seqMethod
seq(args...) -> PikaParser.Seq{Any, Any}

Build a Seq clause. Translate to strongly typed grammar with flatten.

Example

digit_in_parents = seq(token('('), :digit, token(')'))
source
PikaParser.someMethod
some(x) -> PikaParser.Some{_A, Any} where _A

Build a Some clause. Translate to strongly typed grammar with flatten.

Example

some(satisfy(isspace))
source
PikaParser.tieMethod
tie(x) -> PikaParser.Tie{_A, Any} where _A

Build a Tie clause. Translate to strongly typed grammar with flatten.

Example

:alternating_A_and_B => tie(many(seq(:A, :B)))
source
PikaParser.@precedencesMacro
@precedences labeller same::Symbol next::Symbol rules

A shortcut macro for precedence_cascade. Automatically adds lambda heads with fixed argument names, and splats itself with ... into the surrounding environment.

Example

Dict(
    @precedences (n->Symbol(:exprlevel, n)) same next begin
        :expr => seq(same, token('+'), next)
        seq(same, token('*'), next)
        first(
            token('x'),
            seq(token('('), next, token(')'))
        )
    end
)
source

Converting to a Grammar

PikaParser.make_grammarMethod
make_grammar(
    starts::AbstractArray{G, 1},
    rules_dict::Dict{G, PikaParser.Clause{G, T}}
) -> PikaParser.Grammar

Produce a Grammar with rules of type G that can be used to parse inputs.

starts should collect top-level rules (these will be put at the top of the topological order of the parsing).

rules_dict is a dictionary of grammar Clauses.

source

Parsing

PikaParser.lexMethod
lex(g::PikaParser.Grammar{G, T}, input) -> Vector

Greedily find terminals in the input sequence. For performance and uniqueness purposes, terminals are only looked for at stream indexes that follow the final indexes of terminals found previously. That allows the lexing process to skip many redundant matches that could not ever be found by the grammar.

As a main outcome, this prevents the typical pika-parser behavior when matching sequences using many, where e.g. an identifier like abcd also produces redundant (and often invalid) matches for bcd, cd and d. Colaterally, greedy lexing also creates less tokens in the match table, which results in faster parsing.

To produce good terminal matches quickly, use scan.

In a typical use, this function is best called indirectly via parse_lex.

source
PikaParser.parseMethod
parse(
    grammar::PikaParser.Grammar{G, T},
    input
) -> PikaParser.ParserState
parse(
    grammar::PikaParser.Grammar{G, T},
    input,
    fast_match
) -> PikaParser.ParserState

Take a Grammar and an indexable input sequence (typically Vector or String), and return a final ParserState that contains all matched grammar productions.

The input must be random-indexable (because PikaParsers require a lot of random indexing) using Int indexes, must support firstindex, lastindex, prevind, nextind, index arithmetics with + and -, and indexes in views must be the same as in original container except for a constant offset.

Lexing and fast terminal matching

If fast_match is specified, the function does not match terminals using the associated grammar rules, but with a fast_match function that reports the matched terminals via a callback. The function is called exactly once for each position in input in reverse order (i.e., the indexes will start at lastindex(input) and continue using prevind all the way to firstindex(input)), which can be utilized by the application for optimization. The call parameters consist of the input vector, position in the input vector, and a "report" function used to send back a clause ID (of same type as G in typeof(grammar)) and the last item of the terminal match that can starts at that position. Calls to the reporting function can be repeated if more terminal types match. Terminals not reported by the calls to fast_match will not be matched.

For complicated grammars, this may be much faster than having the parser to try matching all terminal types at each position.

If your grammar does not contain dangerous or highly surprising kinds of terminals (in particular, it can be scanned greedily left-to-right), you may use parse_lex to run a reasonable terminal-only lexing step, which is then automatically used as a basis for fast matching.

Caveats

Take care when indexing Strings. With UTF-8, not all codepoints may necessarily have length 1.

If unsure, you may always collect the strings to vectors of Chars (basically converting to UTF-32), where each character occupies precisely one index.

Results

Use find_match_at! to extract matches from ParserState.

Pika parsing never really fails. Instead, in case when the grammar rule is not matched in the input, the expected rule match match is either not going to be found at the starting position with find_match_at!, or it will not span the whole input.

Example

parse(
    g,
    "abcde123",
    (input, i, match) -> isdigit(input[i]) ? match(:digit, i) : match(:letter, i),
)
source
PikaParser.parse_lexMethod
parse_lex(
    g::PikaParser.Grammar{G, T},
    input
) -> PikaParser.ParserState

Use lex to greedily produce lexemes for a given grammar, and run the parsing atop the result.

While this will produce a different (much more sparse) parsing table and the resulting parse tree may be different from the "full" parse, having the lower levels of the parse tree efficiently pre-chewed vastly simplifies the overall parsing, thus saving a lot of time.

source

Traversing and folding the parse tree

PikaParser.find_match_at!Method
find_match_at!(
    st::PikaParser.ParserState{G},
    rule,
    pos::Int64
) -> Int64

Find the Match index in ParserState that matched rule at position pos, or nothing if there is no such match.

Zero-length matches may not be matched at all positions by default; this function creates the necessary matches in the tables in st in case they are missing. (That is the reason for the ! label.)

source
PikaParser.traverse_matchMethod
traverse_match(
    st::PikaParser.ParserState{G, I},
    mid::Int64;
    open,
    fold
) -> Expr

Given a Match index in ParserState st, recusively depth-first traverse the match tree using functions open (called upon entering a submatch) and fold (called upon leaving the submatch).

open is given a UserMatch structure and a reference to the parser state. It should return a vector of boolean values that tell the traversal which submatches from the UserMatch should be opened. That can be used to skip parsing of large uninteresting parts of the match tree, such as whitespace or comments. By default, it opens all submatches, thus the whole subtree is traversed.

fold is given the UserMatch structure and a reference to the parser state, and additionally a vector of folded values from the submatches. The values returned by fold invocations are collected and transferred to higher-level invocations of fold. In case open disabled the evaluation of a given submatch, nothing is used as the folded value for the submatch. The default open and fold (default_open, default_fold) just collect all submatch values and produce a Julia Expr AST structure where rule expansions are represented as function calls.

For producing the UserMatch structures correctly, traverse_match additionally requires that the input vector (stored here in ParserState) has a compatible overload of the standard Base.view.

source
PikaParser.view_matchMethod
view_match(st::PikaParser.ParserState, mid::Int64) -> Any

Get a view of input that corresponds to the match identified by given match ID.

source
PikaParser.view_matchMethod
view_match(
    st::PikaParser.ParserState,
    match::Union{PikaParser.Match, PikaParser.UserMatch}
) -> Any

Get a view of input that corresponds to the given match.

source