Reference

Data types

PikaParser.Clause — Type

abstract type Clause{G, T}

Abstract type for all clauses that match a grammar with rule labels of type G that match sequences of tokens of type T.

Currently implemented clauses:

Satisfy
Scan
Token
Tokens
Epsilon
Fail
EndOfInput
Seq
First
NotFollowedBy
FollowedBy
Some
Many
Tie

Often it is better to use convenience functions for rule construction, such as seq or token; see flatten for details.

source

PikaParser.EndOfInput — Type

struct EndOfInput{G, T} <: PikaParser.Clause{G, T}

Matches at the end of the input.

source

PikaParser.Epsilon — Type

struct Epsilon{G, T} <: PikaParser.Clause{G, T}

An always-succeeding epsilon match.

source

PikaParser.Fail — Type

struct Fail{G, T} <: PikaParser.Clause{G, T}

An always-failing match.

source

PikaParser.First — Type

struct First{G, T} <: PikaParser.Clause{G, T}

Match the first possibility of several matches. Empty First is equivalent to unconditional failure.

Fields

children::Vector

source

PikaParser.FollowedBy — Type

struct FollowedBy{G, T} <: PikaParser.Clause{G, T}

Zero-length match that succeeds if follow does match at the same position.

Fields

follow::Any

source

PikaParser.Grammar — Type

struct Grammar{G, T}

A representation of the grammar prepared for parsing.

Fields

names::Vector: Topologically sorted list of rule labels (non-terminals)
idx::Dict{G, Int64} where G: Mapping of rule labels to their indexes in names
clauses::Array{PikaParser.Clause{Int64, T}, 1} where T: Clauses of the grammar converted to integer labels (and again sorted topologically)
can_match_epsilon::Vector{Bool}: Flags for the rules being able to match on empty string unconditionally
seed_clauses::Vector{Vector{Int64}}: Which clauses get seeded upon matching of a clause
terminals::Vector{Int64}: Sorted indexes of terminal clauses that are checked against each input item.

source

PikaParser.Many — Type

struct Many{G, T} <: PikaParser.Clause{G, T}

Greedily matches a sequence of matches that can be empty.

Fields

item::Any

source

PikaParser.Match — Type

struct Match

Internal match representation.

Fields

clause::Int64: Which clause has matched here?
first::Int64: Where the match started?
last::Int64: What is the last item of the match? (In case of empty match, this MUST be the previous index before first, i.e., prevind(input, first).)
option_idx::Int64: Which possibility (given by the clause) did we match?
submatches::Int64: Index to the first submatch, the range of submatches spans all the way to the first submatch of the next Match.
left::Int64: Left child in the memo tree.
right::Int64: Right child in the memo tree.
parent::Int64: Parent in the memo tree.

source

PikaParser.MatchResult — Type

primitive type Int64 <: Signed 64

A match index in ParserState field matches, or nothing if the match failed.

source

PikaParser.NotFollowedBy — Type

struct NotFollowedBy{G, T} <: PikaParser.Clause{G, T}

Zero-length match that succeeds if reserved does not match at the same position.

Fields

reserved::Any

source

PikaParser.ParserState — Type

mutable struct ParserState{G, T, I}

Intermediate parsing state. The match tree is built in a vector of matches that grows during the matching, all match indexes point into this vector.

This structure is also a "result" of the parsing, used to reconstruct the match tree.

Fields

grammar::PikaParser.Grammar: Copy of the grammar used to parse the input.
q::PikaParser.PikaQueue: Queue for rules that should match, used only internally.
matches::Vector{PikaParser.Match}: Matches, connected by indexes to form a memo table search tree.
memo_root::Int64: Root of the memotable search tree (stored in the matches).
submatches::Vector{Int64}: Children pointers of the matches that form the match tree.
input::Any: Parser input, used to reconstruct match data.

source

PikaParser.Satisfy — Type

struct Satisfy{G, T} <: PikaParser.Terminal{G, T}

A single terminal. Matches a token from the input stream where the match function returns true, otherwise it returns false.

Fields

match::Function

source

PikaParser.Scan — Type

struct Scan{G, T} <: PikaParser.Terminal{G, T}

A single terminal, possibly made out of multiple input tokens.

Given the input stream view, the match function scans the input forward and returns the position of the last item of the matched terminal (which is assumed to start at the beginning of the stream view). In case there's no match, it returns zero.

Fields

match::Function

source

PikaParser.Seq — Type

struct Seq{G, T} <: PikaParser.Clause{G, T}

Sequence of matches. Empty Seq is equivalent to an always-succeeding empty match, as in Epsilon.

Fields

children::Vector

source

PikaParser.Some — Type

struct Some{G, T} <: PikaParser.Clause{G, T}

Greedily matches a sequence of matches, with at least 1 match.

Fields

item::Any

source

PikaParser.Tie — Type

struct Tie{G, T} <: PikaParser.Clause{G, T}

Produces the very same match as the item, but concatenates the user views of the resulting submatches into one big vector (i.e., basically squashing the 2 levels of child matches to a single one.) Useful e.g. for lists with different initial or final elements.

As a result, the item and its immediate children are not going to be present in the parse tree.

Fields

tuple::Any

source

PikaParser.Token — Type

struct Token{G, T} <: PikaParser.Terminal{G, T}

A single token equal to match.

Fields

token::Any

source

PikaParser.Tokens — Type

struct Tokens{G, T, I} <: PikaParser.Terminal{G, T}

A series of tokens equal to match.

Fields

tokens::Any

source

PikaParser.TraverseNode — Type

mutable struct TraverseNode{G, S}

Part of intermediate tree traversing state.

Fields

parent_idx::Int64
parent_sub_idx::Int64
match::PikaParser.UserMatch
open::Bool
subvals::Vector

source

PikaParser.UserMatch — Type

struct UserMatch{G, S}

User-facing representation of a Match.

Fields

rule::Any: Which rule ID has matched here?
first::Int64: Where the match started?
last::Int64: What is the last item of the match? (In case of empty match, this is the previous index before first.)
view::Any: View of the matched part of the input, usually a SubArray or SubString.
submatches::Vector{Int64}: Indexes and rule labels of the matched submatches. This forms the edges in the match tree.

source

Preparing the grammar

Specifying rules

PikaParser.end_of_input — Constant

end_of_input :: Clause

An EndOfInput clause. Translate to strongly typed grammar with flatten.

Example

whole_file = seq(:file_contents, end_of_input)

source

PikaParser.epsilon — Constant

epsilon :: Clause

An Epsilon clause. Translate to strongly typed grammar with flatten.

Example

maybe_letter_a = first(token('a'), epsilon)

source

PikaParser.fail — Constant

fail :: Clause

A Fail clause. Translate to strongly typed grammar with flatten.

Useful for avoiding rule specification when matching terminals using the fast_match parameter of parse.

Example

seq(:this, :that, fail)  # this rule is effectively disabled

source

PikaParser.first — Method

first(args...) -> PikaParser.First{Any, Any}

Build a First clause. Translate to strongly typed grammar with flatten.

Example

first(:something, :fallback, :fallback2)

source

PikaParser.flatten — Method

flatten(rules::Dict{G}, tokentype::DataType) -> Dict
flatten(
    rules::Dict{G},
    tokentype::DataType,
    childlabel::Function
) -> Dict

Convert a possibly nested and weakly typed rules into a correctly typed and unnested ruleset, usable in make_grammar. This allows use of convenience rule building functions:

satisfy
scan
token
tokens
epsilon (not a function!)
fail (not a function!)
seq
first
not_followed_by
followed_by
some
many
tie
precedence_cascade (not backed by an actual Clause!)

Anonymous nested rules are assigned names that are constructed by childlabel function (gets the original G and and integer with position integer). By default, childlabel concatenates the parent rule name, hyphen, and the position number to form a Symbol (i.e., the default works only in cases when the rules are labeled by Symbols, and you need to provide your own implementation for other grammars labeled e.g. by integers or strings).

source

PikaParser.followed_by — Method

followed_by(x) -> PikaParser.FollowedBy{_A, Any} where _A

Build a FollowedBy clause. Translate to strongly typed grammar with flatten.

Example

seq(:digits, followed_by(:whitespace))

source

PikaParser.many — Method

many(x) -> PikaParser.Many{_A, Any} where _A

Build a Many clause. Translate to strongly typed grammar with flatten.

Example

seq(:quote, many(:quote_contents), :quote)

source

PikaParser.not_followed_by — Method

not_followed_by(
    x
) -> PikaParser.NotFollowedBy{_A, Any} where _A

Build a NotFollowedBy clause. Translate to strongly typed grammar with flatten.

Example

seq(not_followed_by(tokens(collect("reservedWord"))), :identifier)

source

PikaParser.precedence_cascade — Method

precedence_cascade(label::Function, levels...) -> Vector

Convert a list of rules of increasing associativity to a typical precedence-handling "failthrough" construction. The result must be post-processed by flatten.

Each of the rules is abstracted by "same-associativity" and "higher-associativity" rules (i.e., it is a binary function), which is used to correctly link the rules within the precedence group. The first rule is of the lowest precedence. All rules except the last automatically fallback to the next rule. The higher-precedence parameter of the last rule is the label of the first rule.

label is a function that generates the label for given n-th level of the grammar.

Use @precedences for a less verbose construction.

Returns a vector of labeled rules; that must usually be interpolated into the ruleset.

Example

Dict(
    precedence_cascade(
        n -> Symbol(:exprlevel, n),
        (same, next) -> :expr => first(
            :plus => seq(same, token('+'), next),
            :minus => seq(same, token('-'), next),
        ),
        (same, next) -> :times => seq(same, token('*'), next), # left associative
        (same, next) -> :power => seq(next, token('^'), same), # right associative
        (_, restart) -> first(
            :parens => seq(token('('), restart, token(')')),
            :digits => some(satisfy(isdigit)),
        ),
    )...,
)

source

PikaParser.satisfy — Method

satisfy(f::Function) -> PikaParser.Satisfy{Any, Any}

Build a Satisfy clause. Translate to strongly typed grammar with flatten.

Example

satisfy(isdigit)

source

PikaParser.scan — Method

scan(f::Function) -> PikaParser.Scan{Any, Any}

Build a Scan clause. Translate to strongly typed grammar with flatten.

Example

# a rule to match any pair of equal tokens
scan(m -> (length(m) >= 2 && m[1] == m[2]) ? 2 : 0)

source

PikaParser.seq — Method

seq(args...) -> PikaParser.Seq{Any, Any}

Build a Seq clause. Translate to strongly typed grammar with flatten.

Example

digit_in_parents = seq(token('('), :digit, token(')'))

source

PikaParser.some — Method

some(x) -> PikaParser.Some{_A, Any} where _A

Build a Some clause. Translate to strongly typed grammar with flatten.

Example

some(satisfy(isspace))

source

PikaParser.tie — Method

tie(x) -> PikaParser.Tie{_A, Any} where _A

Build a Tie clause. Translate to strongly typed grammar with flatten.

Example

:alternating_A_and_B => tie(many(seq(:A, :B)))

source

PikaParser.token — Method

token(x) -> PikaParser.Token{Any}

Build a Token clause. Translate to strongly typed grammar with flatten.

Example

token('a')

source

PikaParser.tokens — Method

tokens(xs) -> PikaParser.Tokens{Any}

Build a Tokens clause. Translate to strongly typed grammar with flatten.

Example

tokens("keyword")

source

PikaParser.@precedences — Macro

@precedences labeller same::Symbol next::Symbol rules

A shortcut macro for precedence_cascade. Automatically adds lambda heads with fixed argument names, and splats itself with ... into the surrounding environment.

Example

Dict(
    @precedences (n->Symbol(:exprlevel, n)) same next begin
        :expr => seq(same, token('+'), next)
        seq(same, token('*'), next)
        first(
            token('x'),
            seq(token('('), next, token(')'))
        )
    end
)

source

Converting to a `Grammar`

PikaParser.make_grammar — Method

make_grammar(
    starts::AbstractArray{G, 1},
    rules_dict::Dict{G, PikaParser.Clause{G, T}}
) -> PikaParser.Grammar

Produce a Grammar with rules of type G that can be used to parse inputs.

starts should collect top-level rules (these will be put at the top of the topological order of the parsing).

rules_dict is a dictionary of grammar Clauses.

source

Parsing

PikaParser.lex — Method

lex(g::PikaParser.Grammar{G, T}, input) -> Vector

Greedily find terminals in the input sequence. For performance and uniqueness purposes, terminals are only looked for at stream indexes that follow the final indexes of terminals found previously. That allows the lexing process to skip many redundant matches that could not ever be found by the grammar.

As a main outcome, this prevents the typical pika-parser behavior when matching sequences using many, where e.g. an identifier like abcd also produces redundant (and often invalid) matches for bcd, cd and d. Colaterally, greedy lexing also creates less tokens in the match table, which results in faster parsing.

To produce good terminal matches quickly, use scan.

In a typical use, this function is best called indirectly via parse_lex.

source

PikaParser.parse — Method

parse(
    grammar::PikaParser.Grammar{G, T},
    input
) -> PikaParser.ParserState
parse(
    grammar::PikaParser.Grammar{G, T},
    input,
    fast_match
) -> PikaParser.ParserState

Take a Grammar and an indexable input sequence (typically Vector or String), and return a final ParserState that contains all matched grammar productions.

The input must be random-indexable (because PikaParsers require a lot of random indexing) using Int indexes, must support firstindex, lastindex, prevind, nextind, index arithmetics with + and -, and indexes in views must be the same as in original container except for a constant offset.

Lexing and fast terminal matching

If fast_match is specified, the function does not match terminals using the associated grammar rules, but with a fast_match function that reports the matched terminals via a callback. The function is called exactly once for each position in input in reverse order (i.e., the indexes will start at lastindex(input) and continue using prevind all the way to firstindex(input)), which can be utilized by the application for optimization. The call parameters consist of the input vector, position in the input vector, and a "report" function used to send back a clause ID (of same type as G in typeof(grammar)) and the last item of the terminal match that can starts at that position. Calls to the reporting function can be repeated if more terminal types match. Terminals not reported by the calls to fast_match will not be matched.

For complicated grammars, this may be much faster than having the parser to try matching all terminal types at each position.

If your grammar does not contain dangerous or highly surprising kinds of terminals (in particular, it can be scanned greedily left-to-right), you may use parse_lex to run a reasonable terminal-only lexing step, which is then automatically used as a basis for fast matching.

Caveats

Take care when indexing Strings. With UTF-8, not all codepoints may necessarily have length 1.

If unsure, you may always collect the strings to vectors of Chars (basically converting to UTF-32), where each character occupies precisely one index.

Results

Use find_match_at! to extract matches from ParserState.

Pika parsing never really fails. Instead, in case when the grammar rule is not matched in the input, the expected rule match match is either not going to be found at the starting position with find_match_at!, or it will not span the whole input.

Example

parse(
    g,
    "abcde123",
    (input, i, match) -> isdigit(input[i]) ? match(:digit, i) : match(:letter, i),
)

source

PikaParser.parse_lex — Method

parse_lex(
    g::PikaParser.Grammar{G, T},
    input
) -> PikaParser.ParserState

Use lex to greedily produce lexemes for a given grammar, and run the parsing atop the result.

While this will produce a different (much more sparse) parsing table and the resulting parse tree may be different from the "full" parse, having the lower levels of the parse tree efficiently pre-chewed vastly simplifies the overall parsing, thus saving a lot of time.

source

Traversing and folding the parse tree

PikaParser.default_fold — Method

default_fold(m, p, subvals) -> Expr

The default function used as fold argument in traverse_match.

source

PikaParser.default_open — Method

default_open(
    m,
    p
) -> Base.Generator{_A, PikaParser.var"#35#36"} where _A

The default function used as open argument in traverse_match.

source

PikaParser.find_match_at! — Method

find_match_at!(
    st::PikaParser.ParserState{G},
    rule,
    pos::Int64
) -> Int64

Find the Match index in ParserState that matched rule at position pos, or nothing if there is no such match.

Zero-length matches may not be matched at all positions by default; this function creates the necessary matches in the tables in st in case they are missing. (That is the reason for the ! label.)

source

PikaParser.traverse_match — Method

traverse_match(
    st::PikaParser.ParserState{G, I},
    mid::Int64;
    open,
    fold
) -> Expr

Given a Match index in ParserState st, recusively depth-first traverse the match tree using functions open (called upon entering a submatch) and fold (called upon leaving the submatch).

open is given a UserMatch structure and a reference to the parser state. It should return a vector of boolean values that tell the traversal which submatches from the UserMatch should be opened. That can be used to skip parsing of large uninteresting parts of the match tree, such as whitespace or comments. By default, it opens all submatches, thus the whole subtree is traversed.

fold is given the UserMatch structure and a reference to the parser state, and additionally a vector of folded values from the submatches. The values returned by fold invocations are collected and transferred to higher-level invocations of fold. In case open disabled the evaluation of a given submatch, nothing is used as the folded value for the submatch. The default open and fold (default_open, default_fold) just collect all submatch values and produce a Julia Expr AST structure where rule expansions are represented as function calls.

For producing the UserMatch structures correctly, traverse_match additionally requires that the input vector (stored here in ParserState) has a compatible overload of the standard Base.view.

source

PikaParser.view_match — Method

view_match(st::PikaParser.ParserState, mid::Int64) -> Any

Get a view of input that corresponds to the match identified by given match ID.

source

PikaParser.view_match — Method

view_match(
    st::PikaParser.ParserState,
    match::Union{PikaParser.Match, PikaParser.UserMatch}
) -> Any

Get a view of input that corresponds to the given match.

source

Reference

Data types

Preparing the grammar

Specifying rules

Converting to a Grammar

Parsing

Traversing and folding the parse tree

Converting to a `Grammar`