Introducing ...
lrparsing.py
Russell Stuart
Russell Stuart
.....Surely?
Has an example grammar for Sqlite.
... and has nice syntax:
from pyparsing import * integer = Word(nums).setParseAction(lambda t:int(t[0])) variable = Word(alphas,exact=1) operand = integer | variable expop, signop, multop = Literal('^'), oneOf('+ -'), oneOf('* /') plusop, factop = oneOf('+ -'), Literal('!') expr = operatorPrecedence( operand, [("!", 1, opAssoc.LEFT), ("^", 2, opAssoc.RIGHT), (signop, 1, opAssoc.RIGHT), (multop, 2, opAssoc.LEFT), (plusop, 2, opAssoc.LEFT),] )
- Syntax for Sqlite buggy.
- Poor free documentation - buy the book.
- And its absurdly slow:
$ ./pyparsing-sqlite.py select * from xyzzy where z > 100 parse time=0.389035 select * from xyzzy where z > 100 order by zz parse time=0.646955 select * from xyzzy parse time=0.001180 select a, b, c from t1 as A, t2 as B where x1 > x2 + 2 * 3 - 1 and x1 < 3 order by 1 asc, 2 desc parse time=1.323233
select a, b, c from t1 as A, t2 as B where t1.a > t2.b + 2 * 3 - 1 and t1 < ? order by 1 asc, 2 desc, t1.a parse time= 0.001747
A sane person probably would have stopped at this point.
An LR(1) parser generator with LR(1) and GLR(1) parsers.
select a, b, c from t1 as A, t2 as B where t1.a > t2.b + 2 * 3 - 1 and t1 < ? order by 1 asc, 2 desc, t1.a parse time= 0.004702
Grammar definition:
class NArithExpr(NonTerm): @NonTerm.reduce def reduceA(self, expr1): "%reduce NUnaryExpr" self.HIDDEN = True @NonTerm.reduce def reduceB(self, expr1, mult, expr2): "%reduce NArithExpr SymbolMult NArithExpr [PrecMult]" @NonTerm.reduce def reduceC(self, expr1, div, expr2): "%reduce NArithExpr SymbolDiv NArithExpr [PrecMult]" @NonTerm.reduce def reduceD(self, expr1, mod, expr2): "%reduce NArithExpr SymbolMod NArithExpr [PrecMult]" @NonTerm.reduce def reduceE(self, expr1, plus, expr2): "%reduce NArithExpr SymbolPlus NArithExpr [PrecAdd]" @NonTerm.reduce def reduceF(self, expr1, minus, expr2): "%reduce NArithExpr SymbolMinus NArithExpr [PrecAdd]" @NonTerm.reduce def reduceG(self, atom): "%reduce NAtom" self.HIDDEN = True
Tokens have to be defined individually:
class Token(parsing.Token): __metaclass__ = TokenMeta # # ... etc ... # class Keyword(Token): SUPPRESS = True class KeywordAnd(Keyword): keyword = "and" class KeywordAs(Keyword): keyword = "as" class KeywordAsc(Token): keyword = "asc" class KeywordBetween(Keyword): keyword = "between" class KeywordBy(Keyword): keyword = "by" class KeywordDesc(Token): keyword = "desc" class KeywordFirst(Keyword): keyword = "first" class KeywordFrom(Keyword): keyword = "from" class SymbolPlus(Token): token = '+' class SymbolMinus(Token): token = '-' class SymbolParam(Token): token = '?' class SymbolMult(Token): token = '*' class SymbolDiv(Token): token = '/' class SymbolMod(Token): token = '%' # and on and on ...
Each precedence is a class too:
class Prec(parsing.Precedence): pass class PrecUnary(Prec): "%right" class PrecMult(Prec): "%left <PrecUnary" class PrecAdd(Prec): "%left <PrecMult" class PrecCompare(Prec): "%left <PrecAdd" class PrecEquivalence(Prec): "%left <PrecCompare" class PrecBetween(Prec): "%left <PrecEquivalence" class PrecAnd(Prec): "%left <PrecBetween" class PrecOr(Prec): "%left <PrecAnd" class PrecIdent(Prec): "%left" class PrecExpr(Prec): "%left <PrecIdent"
Better than a hand written LL parser?
An LALR(1) parser with inbuilt tokeniser.
select a, b, c from t1 as A, t2 as B where x1 > x2 + 2 * 3 - 1 and x1 < 3 order by 1 asc, 2 desc parse time=0.001505
Had I found ply, I would have used it.
class Lexer(object): reserved = { "and": "KeywordAnd", "as": "KeywordAs", "asc": "KeywordAsc", "by": "KeywordBy", # and so on ... tokens = ( # more lines like below .. "SymbolEquals", "SymbolStringCat", "TokenString", "TokenInteger", "TokenDotIdent", "TokenIdent",) + tuple(reserved.itervalues()) # more lines like below .. t_SymbolEquals = '=|==' t_SymbolStringCat = '\|\|' t_TokenString = "(?:'(?:[^']*)')+" t_TokenFloat = "[0-9]+[.][0-9]*|[.][0-9]+" t_TokenInteger = "[0-9]+" t_TokenDotIdent = '[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z][a-zA-Z_0-9]*)+|"[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z][a-zA-Z_0-9]*)+"' def t_TokenIdent(self, t): '[a-zA-Z_][a-zA-Z_0-9]*|"[a-zA-Z_][a-zA-Z_0-9]*"' t.type = self.reserved.get(t.value.lower(), 'TokenIdent') return t
class Parser(object): tokens = Lexer.tokens def p_NWhere(self, p): """NWhere : KeywordWhere NBoolExpr""" p[0] = ("NWhere",) + tuple(p[i] for i in range(1,len(p))) def p_NBoolExpr(self, p): """NBoolExpr : KeywordNot NBoolExpr %prec KeywordNotU | NBoolExpr KeywordAnd NBoolExpr | NBoolExpr KeywordOr NBoolExpr | NArithExpr SymbolLessThan NArithExpr | NArithExpr SymbolLessEqual NArithExpr | NArithExpr SymbolGreaterEqual NArithExpr | NArithExpr SymbolGreaterThan NArithExpr | NArithExpr SymbolEquals NArithExpr | NArithExpr SymbolNotEquals NArithExpr | SymbolLeftParen NBoolExpr SymbolRightParen""" p[0] = ("NBoolExpr",) + tuple(p[i] for i in range(1,len(p)))
Lots of choices ..
Some acceptable.
Probably more ...
So why a talk about lrparsing?
An LR(1) parser with inbuilt tokeniser.
select a, b, c from t1 as A, t2 as B where x1 > x2 + 2 * 3 - 1 and x1 < 3 order by 1 asc, 2 desc parse time=0.000807
class Parser(Grammar): class T(TokenRegistry): float = Token(re="[0-9]+[.][0-9]*|[.][0-9]+") integer = Token(re="[0-9]+") ident = Token(re='[a-zA-Z_][a-zA-Z_0-9]*|"[a-zA-Z_][a-zA-Z_0-9]*') dot_ident = Token(re='[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z_][a-zA-Z_0-9]*)+|"[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z][a-zA-Z_0-9]*)+"') string = Token(re="(?:'(?:[^']*)')+") NArithExpr = Ref("NArithExpr"); NBoolExpr = Ref("NBoolExpr") NFuncCall = T.ident + '(' + List(NBoolExpr, ',') + ')' NArithExpr = Prio( T.ident | T.dot_ident | T.string | T.integer | T.float | '?' | NFuncCall, Tk("+ -") + THIS, THIS << Tk("* / %") << THIS, THIS << Tk("+ -") << THIS,) NOrderByTerm = Prio(T.ident | T.integer, NArithExpr) + Opt * Tk("", "asc desc") NOrderBy = K("order") + K("by") + List(NOrderByTerm, ',', 1) NBoolExpr = Prio( '(' + THIS + ')', K("not") + THIS, NArithExpr + Tk("< <= >= > == = != <>") + NArithExpr, THIS << K("or") << THIS, THIS << K("and") << THIS,) NFrom = K("from") + List(T.ident + Opt(K("as") + T.ident), ",", 1) NSelectColumn = Prio( (T.ident | T.dot_ident) + Opt(K("as") + T.ident) | '*', NArithExpr + Opt(K("as") + T.ident)) NSelect = K("select") + List(NSelectColumn, ",", 1) NQuery = NSelect + NFrom + Opt(K("where") + NBoolExpr) NUnion = List(NQuery, K("union"), 1) + Opt * NOrderBy START = NUnion
We start with a string like this:
4 + -3 ** 2 / (1 ++ 2)
expr
And we write that as:
(expr ....)
Next break down the ... like this:
number=4 '+' expr
And we write that as:
(expr number=4 '+' (expr ...))
And we repeat:
expr '/' expr
(expr number=4 '+' (expr (expr ...) '/' (expr ...)))
So a parser is something that starts with a string:
4 + -3 ** 2 / (1 ++ 2)
And turns it into a parse tree:
(expr number=4 '+' (expr (expr '-' (expr number=3 '**' number=2) '/' (expr '(' (expr number=1 '+' (expr '+' number=2)) ')'))))
And with that parse tree, we could evaluate the expression according to the rules of arithmetic.
The tokeniser (aka lexer) names the smallest chunks.
» * is a '*'
» ** is a '**'
» ++ is two tokens:
'+' followed by '+'.
A grammar that says what the language looks like.
A parser generator that compiles the grammar.
» The compiled form is called a parse table.
A parser. A small piece of code that:
» Process the tokenised input stream,
» according to a parse table,
» to produce a parse tree.
The parse tree:
expr '/' expr
Suggests this pattern:
expr = expr '/' expr
This is called a production. A grammar is just a collection of all productions making up the language:
expr = expr '+' expr
expr = expr '**' expr
expr = '+' expr
expr = '-' expr
expr = '(' expr ')'
expr = number
lrparing uses Python expressions to represent it's grammar.
from lrparsing import Grammar, THIS, Token class Expr(Grammar): expr = ( THIS + '/' + THIS | THIS + '+' + THIS | THIS + '**' + THIS | '+' + THIS | '-' + THIS | '(' + THIS + ')' | Token(re="[0-9]+")) START = expr
Internally tokens are recognised using a single large regular expression:
token1_re|token2_re|...
Token constructors:
Token(literal="**", [case=True])
, Token("**", [case=True])
, "**"
.Token(re="[a-zA-Z_][a-zA-Z0-9_]*")
Keyword("and", [case=True])
UnrecognisedToken()
from lrparsing import Grammar, THIS, Token class Expr(Grammar): expr = ( THIS + '/' + THIS | THIS + '+' + THIS | THIS + '**' + THIS | '+' + THIS | '-' + THIS | '(' + THIS + ')' | Token(re="[0-9]+")) START = expr print Expr.repr_parse_tree(Expr.parse("4 + -3 ** 2 / (1 ++ 2)"))
While trying to recognise state 11: expr = '+' expr ^ expr = expr ^ '+' expr on seeing a '+' I could not decide between: replace the sequence ['+' expr] with expr and accept the '+' in the hope it will match: expr = expr '+' ^ expr Please remove this ambiguity from your grammar
While trying to recognise state 11: expr = '+' expr ^ expr = expr ^ '+' expr on seeing a '+' I could not decide
Substitute the Reduce into the Shift and we get:
The choice is between two parse trees:
expr '+' expr
'+' expr
Rewrite the grammar to get rid of the ambiguity:
class Expr(Grammar): expr = Prio( '(' + THIS + ')' | Token(re="[0-9]+"), THIS + '**' + THIS, '+' + THIS | '-' + THIS, THIS + '/' + THIS, THIS + '+' + THIS) START = expr
lrparsing.GrammarError: Conflict: Shift/Reduce and no associativity While trying to recognise state 22: expr.Prio.Prioritised4 = expr '+' expr ^ expr.Prio.Prioritised4 = expr ^ '+' expr on seeing a '+' I could not decide between: replace the sequence [expr '+' expr] with expr.Prio.Prioritised4 and accept the '+' in the hope it will match: expr.Prio.Prioritised4 = expr '+' ^ expr Please remove this ambiguity from your grammar
Lrparsing has compiled your grammar into productions.
#!/usr/bin/python from lrparsing import Grammar, Prio, THIS, Token class Expr(Grammar): expr = Prio( '(' + THIS + ')' | Token(re="[0-9]+"), THIS + '**' + THIS, '+' + THIS | '-' + THIS, THIS + '/' + THIS, THIS + '+' + THIS) START = expr try: Expr.compile_grammar() except: print Expr.repr_productions() raise print Expr.repr_parse_tree(Expr.parse("4 + -3 ** 2 / (1 ++ 2)"))
0 := START __end_of_input__ 1 : START = expr 2 : expr = expr.Prio 3 : expr.Prio = expr.Prio.Prioritised0 expr.Prio = expr.Prio.Prioritised1 expr.Prio = expr.Prio.Prioritised2 expr.Prio = expr.Prio.Prioritised3 expr.Prio = expr.Prio.Prioritised4 4.1 : expr.Prio.Prioritised0 = '(' expr ')' expr.Prio.Prioritised0 = /[0-9]+/ 5.1 : expr.Prio.Prioritised1 = expr '**' expr 6.1 : expr.Prio.Prioritised2 = '+' expr expr.Prio.Prioritised2 = '-' expr 7.1 : expr.Prio.Prioritised3 = expr '/' expr 8.1 : expr.Prio.Prioritised4 = expr '+' expr Traceback (most recent call last): File "example-3.py", line 15, in Expr.compile_grammar() ........
While trying to recognise state 23: expr.Prio.Prioritised3 = expr ^ '/' expr expr.Prio.Prioritised3 = expr '/' expr ^ on seeing a '/' I could not decide
Substitute the Reduce into the Shift and we get:
The choice is between two parse trees:
expr '/' expr
expr '/' expr
Rewrite the grammar to get rid of the ambiguity:
#!/usr/bin/python from lrparsing import Grammar, Prio, THIS, Token class Expr(Grammar): expr = Prio( '(' + THIS + ')' | Token(re="[0-9]+"), THIS >> '**' >> THIS, '+' + THIS | '-' + THIS, THIS << '/' << THIS, THIS << '+' << THIS) START = expr print Expr.repr_parse_tree(Expr.parse("4 + -3 ** 2 / (1 ++ 2)"))
(START (expr (expr '4') '+' (expr (expr '-' (expr (expr '3') '**' (expr '2')) '/' (expr '(' (expr (expr '1') '+' (expr '+' (expr '2'))) ')'))))
List(sym, delim, min=0, max=None, opt=False)
name = Ref("name")
expr = Ref("expr") call = ident + '(' + List(expr, ',') + ')' expr = number | call
Repeat(symbol, min=0, max=None)
list = '[' + List(expr, ',') + Repeat(',', 0, 1) + ']'
» Opt(symbol) or Opt * symbol
» Some(symbol) or Some * symbol
» Many(symbol) or Many * symbol
» symbol * N
» THIS * 0 (idiom for name = )
» symbol * 1 (idiom for name2 = name1)
» Choice(s0, s1, ...) or s0 | s1 | ...
» Left(s0 + s1 + ...) or s0 << s2 << ...
» Right(s0 + s1 + ...) or s0 >> s2 >> ...
» Sequence(s0, s1, ...) or s0 + s1 + ...
» Tokens(literals, keywords, case=True)
Tokens("== !=", "like in")
Is the same as:
Token("==") | Token("!=") | Keyword("like") | Keyword("in")
class Parser(Grammar): class T(TokenRegistry): float = Token(re="[0-9]+[.][0-9]*|[.][0-9]+") integer = Token(re="[0-9]+") ident = Token(re='[a-zA-Z_][a-zA-Z_0-9]*|"[a-zA-Z_][a-zA-Z_0-9]*') dot_ident = Token(re='[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z_][a-zA-Z_0-9]*)+|"[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z][a-zA-Z_0-9]*)+"') string = Token(re="(?:'(?:[^']*)')+") Tk = lambda s,k=None: Tokens(s, k, case=False); K = lambda k: Keyword(k, case=False) NArithExpr = Ref("NArithExpr"); NBoolExpr = Ref("NBoolExpr") NFuncCall = T.ident + '(' + List(NBoolExpr, ',') + ')' NArithExpr = Prio( T.ident | T.dot_ident | T.string | T.integer | T.float | '?' | NFuncCall, Tk("+ -") + THIS, THIS << Tk("* / %") << THIS, THIS << Tk("+ -") << THIS,) NOrderByTerm = Prio(T.ident | T.integer, NArithExpr) + Opt * Tk("", "asc desc") NOrderBy = K("order") + K("by") + List(NOrderByTerm, ',', 1) NBoolExpr = Prio( '(' + THIS + ')', K("not") + THIS, NArithExpr + Tk("< <= >= > == = != <>") + NArithExpr, THIS << K("or") << THIS, THIS << K("and") << THIS,) NFrom = K("from") + List(T.ident + Opt(K("as") + T.ident), ",", 1) NSelectColumn = Prio( (T.ident | T.dot_ident) + Opt(K("as") + T.ident) | '*', NArithExpr + Opt(K("as") + T.ident)) NSelect = K("select") + List(NSelectColumn, ",", 1) NQuery = NSelect + NFrom + Opt(K("where") + NBoolExpr) NUnion = List(NQuery, K("union"), 1) + Opt * NOrderBy START = NUnion
$ python Python 2.7.3 (default, Jan 2 2013, 13:56:14) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> class X(object): ... def x(self): pass ... y = [x] >>> X.x == X.x True >>> X.x == X.y[0] False >>> from lrparsing import Grammar, Token >>> class G(Grammar): ... START = Token('x') ... y = [START] >>> G.START == G.START True >>> G.START == G.y[0] False >>>
>>> from lrparsing import * >>> class Expr(Grammar): ... class T(TokenRegistry): ... a = Token('a') ... A = T.a ... B = A + Token('b') + Token('b') ... START = A | B >>> parse_tree = Expr.parse("a\n b b") >>> print Expr.repr_parse_tree(parse_tree, False) (START (B (A 'a') 'b' 'b')) >>> print repr(parse_tree) (START = A | B, (B = A + 'b' + 'b', (A = T.a='a', (T.a='a', 'a', 0, 1, 1)), \ ('b', 'b', 5, 2, 4), ('b', 'b', 8, 2, 7))) >>> print str(Expr.START) START >>> print repr(Expr.START) START = A | B >>> print repr(Expr.T.a) T.a='a' >>> parse_tree[0] is Expr.START True >>> parse_tree[1][1][1][0] is Expr.T.a True >>> isinstance(parse_tree[1][2][0], TokenSymbol) True >>>
>>> class E(Grammar): ... class T(TokenRegistry): ... u = UserToken() ... x = UnrecognisedToken() ... START = Repeat(T.u) + Repeat(Token('a') | T.x) ... COMMENTS = ( ... Token(re='#[^\n]*\n?') | ... Token(re=r'/\*(?:[^*]|\*[^/])*\*/')) ... WHITESPACE = "." >>> parse_tree = E.parse( ... ((E.T.u, 1), (E.T.u, 2, set()), "a.a.xxxa# some stuff")) >>> print E.repr_parse_tree(parse_tree) (START T.u T.u 'a' 'a' '.xxx' 'a') >>> print repr(parse_tree) (START = T.u * () + ('a' | T.x=UnrecognisedToken()) * (), (T.u, 1), \ (T.u, 2, set([])), ('a', 'a', 0, 1, 1), ('a', 'a', 2, 1, 3), \ (T.x=UnrecognisedToken(), '.xxx', 3, 1, 4), ('a', 'a', 7, 1, 8)) >>>
#!/usr/bin/python import operator from lrparsing import * class E(Grammar): e = Ref("e") number = Token(re="[0-9]+") binary_expr = Prio(e << Token("/") << e, e << Token("+") << e) unary_expr = Token("-") + e e = Prio(number, unary_expr, binary_expr) START = e @classmethod def eval_node(cls, n): return n[1] if not "eval" in n[0] else n[0]["eval"](n) binary_ops = {'+': operator.add, '/': operator.floordiv} unary_ops = {'-': operator.neg} number["eval"] = lambda n: int(n[1], 10) binary_expr["eval"] = lambda n: E.binary_ops[n[2]](n[1], n[3]) unary_expr["eval"] = lambda n: E.unary_ops[n[1]](n[2]) print E.parse("2 + 3 / -1", tree_factory=E.eval_node)
#!/usr/bin/python import operator from lrparsing import * class E(Grammar): e = Ref("e") number = Token(re="[0-9]+") binary_expr = Prio(e << Token("/") << e, e << Token("+") << e) unary_expr = Token("-") + e e = Prio(number, unary_expr, binary_expr) START = e class Node(tuple): value = None def __new__(cls, n): return super(Node, cls).__new__(cls, n) def __repr__(self): return E.repr_parse_tree(self, False) class Compiler(object): def __call__(self, node): node = Node(node) name = node[0].name if not isinstance(node[0], TokenSymbol): node.value = node[1].value else: name = name.split(".")[-1] if name in self.__class__.__dict__: self.__class__.__dict__[name](self, node) return node def number(self, node): node.value = int(node[1][1], 10) binary_ops = {'+': operator.add, '/': operator.floordiv} def binary_expr(self, node): node.value = self.binary_ops[node[2][1]](node[1].value, node[3].value) unary_ops = {'-': operator.neg} def unary_expr(self, node): node.value = self.unary_ops[node[1][1]](node[2].value) print E.parse("2 + 3 / -1", tree_factory=Compiler()).value
#!/usr/bin/python import sys from lrparsing import * class E(Grammar): class T(TokenRegistry): number = Token(re="[0-9]+") e = Ref("e") binary_expr = Prio(e << Token("/") << e, e << Token("+") << e) unary_expr = Token("-") + e e = Prio(T.number, unary_expr, binary_expr) START = e try: print E.repr_parse_tree(E.parse("""2 + 3 // -1""")) except ParseError, e: sys.stderr.write("%s\n" % e)
line 1 column 8: Got '/' when expecting '-' or T.number while trying to match binary_expr in state 17
Parser generation can be eliminated by pre-compiling it.
import sys from lrparsing import * class E(Grammar): e = Ref("e") binary_expr = Prio(e << Token("/") << e, e << Token("+") << e) unary_expr = Token("-") + e e = Prio(Token(re="[0-9]+"), unary_expr, binary_expr) START = e PRE_COMPILED = ('5709b73c7a329fb147636020987104248ebb2273bfb5d2153e30b452ba9645420e3e7e851f6288a663bb8445987692b5b020c5c1cfd863ef23587416bd0904b3', 0, ({'/[0-9]+/': 7, "'-'": 10}, {2: 1, 4: 2, 5: 3, 6: 4, 7: 5, 8: 6, 10: 8, 11: 9, 13: 11, 14: 12, 15: 13, 16: 14, 17: 15}, ['<E>']), ({'__end_of_input__': 16}, {}, ['<E>']), ({'__end_of_input__': (2, 1, 'START'), "'/'": 17, "'+'": 18}, {}, ['START', 'binary_expr']), ({'__end_of_input__': (4, 1, 'e'), "'/'": (4, 1, 'e'), "'+'": (4, 1, 'e')}, {}, ['e']), ({'__end_of_input__': (5, 1), "'/'": (5, 1), "'+'": (5, 1)}, {}, ['e']), ({'__end_of_input__': (5, 1), "'/'": (5, 1), "'+'": (5, 1)}, {}, ['e']), ({'__end_of_input__': (5, 1), "'/'": (5, 1), "'+'": (5, 1)}, {}, ['e']), ({'__end_of_input__': (6, 1), "'/'": (6, 1), "'+'": (6, 1)}, {}, ['e']), ({'__end_of_input__': (7, 1), "'/'": (7, 1), "'+'": (7, 1)}, {}, ['e']), ({'__end_of_input__': (8, 1), "'/'": (8, 1), "'+'": (8, 1)}, {}, ['e']), ({'/[0-9]+/': 7, "'-'": 10}, {4: 19, 5: 3, 6: 4, 7: 5, 8: 6, 10: 8, 11: 9, 13: 11, 14: 12, 15: 13, 16: 14, 17: 15}, ['unary_expr']), ({'__end_of_input__': (11, 1, 'binary_expr'), "'/'": (11, 1, 'binary_expr'), "'+'": (11, 1, 'binary_expr')}, {}, ['binary_expr']), ({'__end_of_input__': (13, 1), "'/'": (13, 1), "'+'": (13, 1)}, {}, ['binary_expr']), ({'__end_of_input__': (13, 1), "'/'": (13, 1), "'+'": (13, 1)}, {}, ['binary_expr']), ({'__end_of_input__': (14, 1), "'/'": (14, 1), "'+'": (14, 1)}, {}, ['binary_expr']), ({'__end_of_input__': (15, 1), "'/'": (15, 1), "'+'": (15, 1)}, {}, ['binary_expr']), ({'__empty__': (0, 2, '<E>')}, {}, ['<E>']), ({'/[0-9]+/': 7, "'-'": 10}, {4: 20, 5: 3, 6: 4, 7: 5, 8: 6, 10: 8, 11: 9, 13: 11, 14: 12, 15: 13, 16: 14, 17: 15}, ['binary_expr']), ({'/[0-9]+/': 7, "'-'": 10}, {4: 21, 5: 3, 6: 4, 7: 5, 8: 6, 10: 8, 11: 9, 13: 11, 14: 12, 15: 13, 16: 14, 17: 15}, ['binary_expr']), ({'__end_of_input__': (10, 2, 'unary_expr'), "'/'": (10, 2, 'unary_expr'), "'+'": (10, 2, 'unary_expr')}, {}, ['binary_expr', 'unary_expr']), ({'__end_of_input__': (16, 3), "'/'": (16, 3), "'+'": (16, 3)}, {}, ['binary_expr']), ({'__end_of_input__': (17, 3), "'/'": 17, "'+'": (17, 3)}, {}, ['binary_expr'])) print E.pre_compile_grammar(E.PRE_COMPILED)
Returns None
if the passed string matches the grammar,
a string containing the parse table otherwise.
Lrparsing offers a few other utilities for debugging grammars.
» pre_compile_grammar(pre_compiled=None)
» parse(input, tree_factory=None, on_error=None, log=None)
» repr_grammar() - the grammar as entered.
» repr_productions() - compiled productions.
» repr_parse_table(state=None) - print an LR(1) ItemSet.
» repr_parse_tree(parse_tree) - pretty print a parse tree
» unused_rules() - frozenset() of unused rules
Only useful if you want to report on more than one error.
parse()
calls on_error
instead of raising an exception.
lrparsing.on_error(iterator, input_tuple, stack)
The stack is a partially constructed branch of the parse tree:
([], (__empty__)) ([START,binary_expr], ((e '2'))) ([binary_expr], ('+')) ([binary_expr], ((e '3'))) ([binary_expr], ('/')) (START (e (binary_expr (e '2') '+' (e (binary_expr (e '3') '/' (e (unary_expr '-' (e '1'))))))))
Home page:
http://lrparsing.sourceforge.net.
Pypi page:
https://pypi.python.org/pypi/lrparsing.
Online manual:
http://lrparsing.sourceforge.net/doc.
And examples:
» Expression evaluator.
» Sqlite3 data manipulation statement grammar.
» Lua 5.2 to Python 2.7 compiler.