Introducing ...
lrparsing.py
Russell Stuart
Russell Stuart
.....Surely?
Has an example grammar for Sqlite.
... and has nice syntax:
from pyparsing import *
integer = Word(nums).setParseAction(lambda t:int(t[0]))
variable = Word(alphas,exact=1)
operand = integer | variable
expop, signop, multop = Literal('^'), oneOf('+ -'), oneOf('* /')
plusop, factop = oneOf('+ -'), Literal('!')
expr = operatorPrecedence( operand,
[("!", 1, opAssoc.LEFT), ("^", 2, opAssoc.RIGHT),
(signop, 1, opAssoc.RIGHT),
(multop, 2, opAssoc.LEFT), (plusop, 2, opAssoc.LEFT),]
)
- Syntax for Sqlite buggy.
- Poor free documentation - buy the book.
- And its absurdly slow:
$ ./pyparsing-sqlite.py
select * from xyzzy where z > 100
parse time=0.389035
select * from xyzzy where z > 100 order by zz
parse time=0.646955
select * from xyzzy
parse time=0.001180
select a, b, c from t1 as A, t2 as B
where x1 > x2 + 2 * 3 - 1 and x1 < 3
order by 1 asc, 2 desc
parse time=1.323233
select a, b, c from t1 as A, t2 as B
where t1.a > t2.b + 2 * 3 - 1 and t1 < ?
order by 1 asc, 2 desc, t1.a
parse time= 0.001747
A sane person probably would have stopped at this point.
An LR(1) parser generator with LR(1) and GLR(1) parsers.
select a, b, c from t1 as A, t2 as B
where t1.a > t2.b + 2 * 3 - 1 and t1 < ?
order by 1 asc, 2 desc, t1.a
parse time= 0.004702
Grammar definition:
class NArithExpr(NonTerm):
@NonTerm.reduce
def reduceA(self, expr1):
"%reduce NUnaryExpr"
self.HIDDEN = True
@NonTerm.reduce
def reduceB(self, expr1, mult, expr2):
"%reduce NArithExpr SymbolMult NArithExpr [PrecMult]"
@NonTerm.reduce
def reduceC(self, expr1, div, expr2):
"%reduce NArithExpr SymbolDiv NArithExpr [PrecMult]"
@NonTerm.reduce
def reduceD(self, expr1, mod, expr2):
"%reduce NArithExpr SymbolMod NArithExpr [PrecMult]"
@NonTerm.reduce
def reduceE(self, expr1, plus, expr2):
"%reduce NArithExpr SymbolPlus NArithExpr [PrecAdd]"
@NonTerm.reduce
def reduceF(self, expr1, minus, expr2):
"%reduce NArithExpr SymbolMinus NArithExpr [PrecAdd]"
@NonTerm.reduce
def reduceG(self, atom):
"%reduce NAtom"
self.HIDDEN = True
Tokens have to be defined individually:
class Token(parsing.Token):
__metaclass__ = TokenMeta
#
# ... etc ...
#
class Keyword(Token): SUPPRESS = True
class KeywordAnd(Keyword): keyword = "and"
class KeywordAs(Keyword): keyword = "as"
class KeywordAsc(Token): keyword = "asc"
class KeywordBetween(Keyword): keyword = "between"
class KeywordBy(Keyword): keyword = "by"
class KeywordDesc(Token): keyword = "desc"
class KeywordFirst(Keyword): keyword = "first"
class KeywordFrom(Keyword): keyword = "from"
class SymbolPlus(Token): token = '+'
class SymbolMinus(Token): token = '-'
class SymbolParam(Token): token = '?'
class SymbolMult(Token): token = '*'
class SymbolDiv(Token): token = '/'
class SymbolMod(Token): token = '%'
# and on and on ...
Each precedence is a class too:
class Prec(parsing.Precedence):
pass
class PrecUnary(Prec):
"%right"
class PrecMult(Prec):
"%left <PrecUnary"
class PrecAdd(Prec):
"%left <PrecMult"
class PrecCompare(Prec):
"%left <PrecAdd"
class PrecEquivalence(Prec):
"%left <PrecCompare"
class PrecBetween(Prec):
"%left <PrecEquivalence"
class PrecAnd(Prec):
"%left <PrecBetween"
class PrecOr(Prec):
"%left <PrecAnd"
class PrecIdent(Prec):
"%left"
class PrecExpr(Prec):
"%left <PrecIdent"
Better than a hand written LL parser?
An LALR(1) parser with inbuilt tokeniser.
select a, b, c from t1 as A, t2 as B
where x1 > x2 + 2 * 3 - 1 and x1 < 3
order by 1 asc, 2 desc
parse time=0.001505
Had I found ply, I would have used it.
class Lexer(object):
reserved = {
"and": "KeywordAnd",
"as": "KeywordAs",
"asc": "KeywordAsc",
"by": "KeywordBy",
# and so on ...
tokens = (
# more lines like below ..
"SymbolEquals",
"SymbolStringCat",
"TokenString",
"TokenInteger",
"TokenDotIdent",
"TokenIdent",) + tuple(reserved.itervalues())
# more lines like below ..
t_SymbolEquals = '=|=='
t_SymbolStringCat = '\|\|'
t_TokenString = "(?:'(?:[^']*)')+"
t_TokenFloat = "[0-9]+[.][0-9]*|[.][0-9]+"
t_TokenInteger = "[0-9]+"
t_TokenDotIdent = '[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z][a-zA-Z_0-9]*)+|"[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z][a-zA-Z_0-9]*)+"'
def t_TokenIdent(self, t):
'[a-zA-Z_][a-zA-Z_0-9]*|"[a-zA-Z_][a-zA-Z_0-9]*"'
t.type = self.reserved.get(t.value.lower(), 'TokenIdent')
return t
class Parser(object):
tokens = Lexer.tokens
def p_NWhere(self, p):
"""NWhere : KeywordWhere NBoolExpr"""
p[0] = ("NWhere",) + tuple(p[i] for i in range(1,len(p)))
def p_NBoolExpr(self, p):
"""NBoolExpr : KeywordNot NBoolExpr %prec KeywordNotU
| NBoolExpr KeywordAnd NBoolExpr
| NBoolExpr KeywordOr NBoolExpr
| NArithExpr SymbolLessThan NArithExpr
| NArithExpr SymbolLessEqual NArithExpr
| NArithExpr SymbolGreaterEqual NArithExpr
| NArithExpr SymbolGreaterThan NArithExpr
| NArithExpr SymbolEquals NArithExpr
| NArithExpr SymbolNotEquals NArithExpr
| SymbolLeftParen NBoolExpr SymbolRightParen"""
p[0] = ("NBoolExpr",) + tuple(p[i] for i in range(1,len(p)))
Lots of choices ..
Some acceptable.
Probably more ...
So why a talk about lrparsing?
An LR(1) parser with inbuilt tokeniser.
select a, b, c from t1 as A, t2 as B
where x1 > x2 + 2 * 3 - 1 and x1 < 3
order by 1 asc, 2 desc
parse time=0.000807
class Parser(Grammar):
class T(TokenRegistry):
float = Token(re="[0-9]+[.][0-9]*|[.][0-9]+")
integer = Token(re="[0-9]+")
ident = Token(re='[a-zA-Z_][a-zA-Z_0-9]*|"[a-zA-Z_][a-zA-Z_0-9]*')
dot_ident = Token(re='[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z_][a-zA-Z_0-9]*)+|"[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z][a-zA-Z_0-9]*)+"')
string = Token(re="(?:'(?:[^']*)')+")
NArithExpr = Ref("NArithExpr"); NBoolExpr = Ref("NBoolExpr")
NFuncCall = T.ident + '(' + List(NBoolExpr, ',') + ')'
NArithExpr = Prio(
T.ident | T.dot_ident | T.string | T.integer | T.float | '?' | NFuncCall,
Tk("+ -") + THIS,
THIS << Tk("* / %") << THIS, THIS << Tk("+ -") << THIS,)
NOrderByTerm = Prio(T.ident | T.integer, NArithExpr) + Opt * Tk("", "asc desc")
NOrderBy = K("order") + K("by") + List(NOrderByTerm, ',', 1)
NBoolExpr = Prio(
'(' + THIS + ')',
K("not") + THIS, NArithExpr + Tk("< <= >= > == = != <>") + NArithExpr,
THIS << K("or") << THIS, THIS << K("and") << THIS,)
NFrom = K("from") + List(T.ident + Opt(K("as") + T.ident), ",", 1)
NSelectColumn = Prio(
(T.ident | T.dot_ident) + Opt(K("as") + T.ident) | '*',
NArithExpr + Opt(K("as") + T.ident))
NSelect = K("select") + List(NSelectColumn, ",", 1)
NQuery = NSelect + NFrom + Opt(K("where") + NBoolExpr)
NUnion = List(NQuery, K("union"), 1) + Opt * NOrderBy
START = NUnion
We start with a string like this:
4 + -3 ** 2 / (1 ++ 2)
expr
And we write that as:
(expr ....)
Next break down the ... like this:
number=4 '+' expr
And we write that as:
(expr number=4 '+' (expr ...))
And we repeat:
expr '/' expr
(expr number=4 '+' (expr (expr ...) '/' (expr ...)))
So a parser is something that starts with a string:
4 + -3 ** 2 / (1 ++ 2)
And turns it into a parse tree:
(expr
number=4
'+'
(expr
(expr '-' (expr number=3 '**' number=2)
'/'
(expr '(' (expr number=1 '+' (expr '+' number=2)) ')'))))
And with that parse tree, we could evaluate the expression according to the rules of arithmetic.
The tokeniser (aka lexer) names the smallest chunks.
» * is a '*'
» ** is a '**'
» ++ is two tokens:
'+' followed by '+'.
A grammar that says what the language looks like.
A parser generator that compiles the grammar.
» The compiled form is called a parse table.
A parser. A small piece of code that:
» Process the tokenised input stream,
» according to a parse table,
» to produce a parse tree.
The parse tree:
expr '/' expr
Suggests this pattern:
expr = expr '/' expr
This is called a production. A grammar is just a collection of all productions making up the language:
expr = expr '+' expr
expr = expr '**' expr
expr = '+' expr
expr = '-' expr
expr = '(' expr ')'
expr = number
lrparing uses Python expressions to represent it's grammar.
from lrparsing import Grammar, THIS, Token
class Expr(Grammar):
expr = (
THIS + '/' + THIS |
THIS + '+' + THIS |
THIS + '**' + THIS |
'+' + THIS |
'-' + THIS |
'(' + THIS + ')' |
Token(re="[0-9]+"))
START = expr
Internally tokens are recognised using a single large regular expression:
token1_re|token2_re|...
Token constructors:
Token(literal="**", [case=True]), Token("**", [case=True]), "**".Token(re="[a-zA-Z_][a-zA-Z0-9_]*")Keyword("and", [case=True])UnrecognisedToken()
from lrparsing import Grammar, THIS, Token
class Expr(Grammar):
expr = (
THIS + '/' + THIS | THIS + '+' + THIS | THIS + '**' + THIS |
'+' + THIS | '-' + THIS | '(' + THIS + ')' |
Token(re="[0-9]+"))
START = expr
print Expr.repr_parse_tree(Expr.parse("4 + -3 ** 2 / (1 ++ 2)"))
While trying to recognise state 11:
expr = '+' expr ^
expr = expr ^ '+' expr
on seeing a '+' I could not decide between:
replace the sequence ['+' expr] with expr
and
accept the '+' in the hope it will match:
expr = expr '+' ^ expr
Please remove this ambiguity from your grammar
While trying to recognise state 11: expr = '+' expr ^ expr = expr ^ '+' expr on seeing a '+' I could not decide
Substitute the Reduce into the Shift and we get:
The choice is between two parse trees:
expr '+' expr
'+' expr
Rewrite the grammar to get rid of the ambiguity:
class Expr(Grammar):
expr = Prio(
'(' + THIS + ')' | Token(re="[0-9]+"), THIS + '**' + THIS,
'+' + THIS | '-' + THIS,
THIS + '/' + THIS, THIS + '+' + THIS)
START = expr
lrparsing.GrammarError: Conflict: Shift/Reduce and no associativity
While trying to recognise state 22:
expr.Prio.Prioritised4 = expr '+' expr ^
expr.Prio.Prioritised4 = expr ^ '+' expr
on seeing a '+' I could not decide between:
replace the sequence [expr '+' expr] with expr.Prio.Prioritised4
and
accept the '+' in the hope it will match:
expr.Prio.Prioritised4 = expr '+' ^ expr
Please remove this ambiguity from your grammar
Lrparsing has compiled your grammar into productions.
#!/usr/bin/python
from lrparsing import Grammar, Prio, THIS, Token
class Expr(Grammar):
expr = Prio(
'(' + THIS + ')' | Token(re="[0-9]+"),
THIS + '**' + THIS,
'+' + THIS | '-' + THIS,
THIS + '/' + THIS,
THIS + '+' + THIS)
START = expr
try:
Expr.compile_grammar()
except:
print Expr.repr_productions()
raise
print Expr.repr_parse_tree(Expr.parse("4 + -3 ** 2 / (1 ++ 2)"))
0 := START __end_of_input__ 1 : START = expr 2 : expr = expr.Prio 3 : expr.Prio = expr.Prio.Prioritised0 expr.Prio = expr.Prio.Prioritised1 expr.Prio = expr.Prio.Prioritised2 expr.Prio = expr.Prio.Prioritised3 expr.Prio = expr.Prio.Prioritised4 4.1 : expr.Prio.Prioritised0 = '(' expr ')' expr.Prio.Prioritised0 = /[0-9]+/ 5.1 : expr.Prio.Prioritised1 = expr '**' expr 6.1 : expr.Prio.Prioritised2 = '+' expr expr.Prio.Prioritised2 = '-' expr 7.1 : expr.Prio.Prioritised3 = expr '/' expr 8.1 : expr.Prio.Prioritised4 = expr '+' expr Traceback (most recent call last): File "example-3.py", line 15, in Expr.compile_grammar() ........
While trying to recognise state 23: expr.Prio.Prioritised3 = expr ^ '/' expr expr.Prio.Prioritised3 = expr '/' expr ^ on seeing a '/' I could not decide
Substitute the Reduce into the Shift and we get:
The choice is between two parse trees:
expr '/' expr
expr '/' expr
Rewrite the grammar to get rid of the ambiguity:
#!/usr/bin/python
from lrparsing import Grammar, Prio, THIS, Token
class Expr(Grammar):
expr = Prio(
'(' + THIS + ')' | Token(re="[0-9]+"),
THIS >> '**' >> THIS,
'+' + THIS | '-' + THIS,
THIS << '/' << THIS,
THIS << '+' << THIS)
START = expr
print Expr.repr_parse_tree(Expr.parse("4 + -3 ** 2 / (1 ++ 2)"))
(START (expr
(expr '4') '+'
(expr
(expr '-' (expr
(expr '3') '**'
(expr '2')) '/'
(expr '('
(expr
(expr '1') '+'
(expr '+' (expr '2'))) ')'))))
List(sym, delim, min=0, max=None, opt=False)name = Ref("name")expr = Ref("expr")
call = ident + '(' + List(expr, ',') + ')'
expr = number | call
Repeat(symbol, min=0, max=None)list = '[' + List(expr, ',') + Repeat(',', 0, 1) + ']'
» Opt(symbol) or Opt * symbol
» Some(symbol) or Some * symbol
» Many(symbol) or Many * symbol
» symbol * N
» THIS * 0 (idiom for name = )
» symbol * 1 (idiom for name2 = name1)
» Choice(s0, s1, ...) or s0 | s1 | ...
» Left(s0 + s1 + ...) or s0 << s2 << ...
» Right(s0 + s1 + ...) or s0 >> s2 >> ...
» Sequence(s0, s1, ...) or s0 + s1 + ...
» Tokens(literals, keywords, case=True)
Tokens("== !=", "like in")
Is the same as:
Token("==") | Token("!=") | Keyword("like") | Keyword("in")
class Parser(Grammar):
class T(TokenRegistry):
float = Token(re="[0-9]+[.][0-9]*|[.][0-9]+")
integer = Token(re="[0-9]+")
ident = Token(re='[a-zA-Z_][a-zA-Z_0-9]*|"[a-zA-Z_][a-zA-Z_0-9]*')
dot_ident = Token(re='[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z_][a-zA-Z_0-9]*)+|"[a-zA-Z_][a-zA-Z_0-9]*(?:[.][a-zA-Z][a-zA-Z_0-9]*)+"')
string = Token(re="(?:'(?:[^']*)')+")
Tk = lambda s,k=None: Tokens(s, k, case=False); K = lambda k: Keyword(k, case=False)
NArithExpr = Ref("NArithExpr"); NBoolExpr = Ref("NBoolExpr")
NFuncCall = T.ident + '(' + List(NBoolExpr, ',') + ')'
NArithExpr = Prio(
T.ident | T.dot_ident | T.string | T.integer | T.float | '?' | NFuncCall,
Tk("+ -") + THIS,
THIS << Tk("* / %") << THIS, THIS << Tk("+ -") << THIS,)
NOrderByTerm = Prio(T.ident | T.integer, NArithExpr) + Opt * Tk("", "asc desc")
NOrderBy = K("order") + K("by") + List(NOrderByTerm, ',', 1)
NBoolExpr = Prio(
'(' + THIS + ')',
K("not") + THIS, NArithExpr + Tk("< <= >= > == = != <>") + NArithExpr,
THIS << K("or") << THIS, THIS << K("and") << THIS,)
NFrom = K("from") + List(T.ident + Opt(K("as") + T.ident), ",", 1)
NSelectColumn = Prio(
(T.ident | T.dot_ident) + Opt(K("as") + T.ident) | '*',
NArithExpr + Opt(K("as") + T.ident))
NSelect = K("select") + List(NSelectColumn, ",", 1)
NQuery = NSelect + NFrom + Opt(K("where") + NBoolExpr)
NUnion = List(NQuery, K("union"), 1) + Opt * NOrderBy
START = NUnion
$ python
Python 2.7.3 (default, Jan 2 2013, 13:56:14)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class X(object):
... def x(self): pass
... y = [x]
>>> X.x == X.x
True
>>> X.x == X.y[0]
False
>>> from lrparsing import Grammar, Token
>>> class G(Grammar):
... START = Token('x')
... y = [START]
>>> G.START == G.START
True
>>> G.START == G.y[0]
False
>>>
>>> from lrparsing import *
>>> class Expr(Grammar):
... class T(TokenRegistry):
... a = Token('a')
... A = T.a
... B = A + Token('b') + Token('b')
... START = A | B
>>> parse_tree = Expr.parse("a\n b b")
>>> print Expr.repr_parse_tree(parse_tree, False)
(START (B (A 'a') 'b' 'b'))
>>> print repr(parse_tree)
(START = A | B, (B = A + 'b' + 'b', (A = T.a='a', (T.a='a', 'a', 0, 1, 1)), \
('b', 'b', 5, 2, 4), ('b', 'b', 8, 2, 7)))
>>> print str(Expr.START)
START
>>> print repr(Expr.START)
START = A | B
>>> print repr(Expr.T.a)
T.a='a'
>>> parse_tree[0] is Expr.START
True
>>> parse_tree[1][1][1][0] is Expr.T.a
True
>>> isinstance(parse_tree[1][2][0], TokenSymbol)
True
>>>
>>> class E(Grammar):
... class T(TokenRegistry):
... u = UserToken()
... x = UnrecognisedToken()
... START = Repeat(T.u) + Repeat(Token('a') | T.x)
... COMMENTS = (
... Token(re='#[^\n]*\n?') |
... Token(re=r'/\*(?:[^*]|\*[^/])*\*/'))
... WHITESPACE = "."
>>> parse_tree = E.parse(
... ((E.T.u, 1), (E.T.u, 2, set()), "a.a.xxxa# some stuff"))
>>> print E.repr_parse_tree(parse_tree)
(START T.u T.u 'a' 'a' '.xxx' 'a')
>>> print repr(parse_tree)
(START = T.u * () + ('a' | T.x=UnrecognisedToken()) * (), (T.u, 1), \
(T.u, 2, set([])), ('a', 'a', 0, 1, 1), ('a', 'a', 2, 1, 3), \
(T.x=UnrecognisedToken(), '.xxx', 3, 1, 4), ('a', 'a', 7, 1, 8))
>>>
#!/usr/bin/python
import operator
from lrparsing import *
class E(Grammar):
e = Ref("e")
number = Token(re="[0-9]+")
binary_expr = Prio(e << Token("/") << e, e << Token("+") << e)
unary_expr = Token("-") + e
e = Prio(number, unary_expr, binary_expr)
START = e
@classmethod
def eval_node(cls, n):
return n[1] if not "eval" in n[0] else n[0]["eval"](n)
binary_ops = {'+': operator.add, '/': operator.floordiv}
unary_ops = {'-': operator.neg}
number["eval"] = lambda n: int(n[1], 10)
binary_expr["eval"] = lambda n: E.binary_ops[n[2]](n[1], n[3])
unary_expr["eval"] = lambda n: E.unary_ops[n[1]](n[2])
print E.parse("2 + 3 / -1", tree_factory=E.eval_node)
#!/usr/bin/python
import operator
from lrparsing import *
class E(Grammar):
e = Ref("e")
number = Token(re="[0-9]+")
binary_expr = Prio(e << Token("/") << e, e << Token("+") << e)
unary_expr = Token("-") + e
e = Prio(number, unary_expr, binary_expr)
START = e
class Node(tuple):
value = None
def __new__(cls, n): return super(Node, cls).__new__(cls, n)
def __repr__(self): return E.repr_parse_tree(self, False)
class Compiler(object):
def __call__(self, node):
node = Node(node)
name = node[0].name
if not isinstance(node[0], TokenSymbol):
node.value = node[1].value
else:
name = name.split(".")[-1]
if name in self.__class__.__dict__:
self.__class__.__dict__[name](self, node)
return node
def number(self, node):
node.value = int(node[1][1], 10)
binary_ops = {'+': operator.add, '/': operator.floordiv}
def binary_expr(self, node):
node.value = self.binary_ops[node[2][1]](node[1].value, node[3].value)
unary_ops = {'-': operator.neg}
def unary_expr(self, node):
node.value = self.unary_ops[node[1][1]](node[2].value)
print E.parse("2 + 3 / -1", tree_factory=Compiler()).value
#!/usr/bin/python
import sys
from lrparsing import *
class E(Grammar):
class T(TokenRegistry):
number = Token(re="[0-9]+")
e = Ref("e")
binary_expr = Prio(e << Token("/") << e, e << Token("+") << e)
unary_expr = Token("-") + e
e = Prio(T.number, unary_expr, binary_expr)
START = e
try:
print E.repr_parse_tree(E.parse("""2 + 3 // -1"""))
except ParseError, e:
sys.stderr.write("%s\n" % e)
line 1 column 8: Got '/' when expecting '-' or T.number while trying
to match binary_expr in state 17
Parser generation can be eliminated by pre-compiling it.
import sys
from lrparsing import *
class E(Grammar):
e = Ref("e")
binary_expr = Prio(e << Token("/") << e, e << Token("+") << e)
unary_expr = Token("-") + e
e = Prio(Token(re="[0-9]+"), unary_expr, binary_expr)
START = e
PRE_COMPILED = ('5709b73c7a329fb147636020987104248ebb2273bfb5d2153e30b452ba9645420e3e7e851f6288a663bb8445987692b5b020c5c1cfd863ef23587416bd0904b3', 0, ({'/[0-9]+/': 7, "'-'": 10}, {2: 1, 4: 2, 5: 3, 6: 4, 7: 5, 8: 6, 10: 8, 11: 9, 13: 11, 14: 12, 15: 13, 16: 14, 17: 15}, ['<E>']), ({'__end_of_input__': 16}, {}, ['<E>']), ({'__end_of_input__': (2, 1, 'START'), "'/'": 17, "'+'": 18}, {}, ['START', 'binary_expr']), ({'__end_of_input__': (4, 1, 'e'), "'/'": (4, 1, 'e'), "'+'": (4, 1, 'e')}, {}, ['e']), ({'__end_of_input__': (5, 1), "'/'": (5, 1), "'+'": (5, 1)}, {}, ['e']), ({'__end_of_input__': (5, 1), "'/'": (5, 1), "'+'": (5, 1)}, {}, ['e']), ({'__end_of_input__': (5, 1), "'/'": (5, 1), "'+'": (5, 1)}, {}, ['e']), ({'__end_of_input__': (6, 1), "'/'": (6, 1), "'+'": (6, 1)}, {}, ['e']), ({'__end_of_input__': (7, 1), "'/'": (7, 1), "'+'": (7, 1)}, {}, ['e']), ({'__end_of_input__': (8, 1), "'/'": (8, 1), "'+'": (8, 1)}, {}, ['e']), ({'/[0-9]+/': 7, "'-'": 10}, {4: 19, 5: 3, 6: 4, 7: 5, 8: 6, 10: 8, 11: 9, 13: 11, 14: 12, 15: 13, 16: 14, 17: 15}, ['unary_expr']), ({'__end_of_input__': (11, 1, 'binary_expr'), "'/'": (11, 1, 'binary_expr'), "'+'": (11, 1, 'binary_expr')}, {}, ['binary_expr']), ({'__end_of_input__': (13, 1), "'/'": (13, 1), "'+'": (13, 1)}, {}, ['binary_expr']), ({'__end_of_input__': (13, 1), "'/'": (13, 1), "'+'": (13, 1)}, {}, ['binary_expr']), ({'__end_of_input__': (14, 1), "'/'": (14, 1), "'+'": (14, 1)}, {}, ['binary_expr']), ({'__end_of_input__': (15, 1), "'/'": (15, 1), "'+'": (15, 1)}, {}, ['binary_expr']), ({'__empty__': (0, 2, '<E>')}, {}, ['<E>']), ({'/[0-9]+/': 7, "'-'": 10}, {4: 20, 5: 3, 6: 4, 7: 5, 8: 6, 10: 8, 11: 9, 13: 11, 14: 12, 15: 13, 16: 14, 17: 15}, ['binary_expr']), ({'/[0-9]+/': 7, "'-'": 10}, {4: 21, 5: 3, 6: 4, 7: 5, 8: 6, 10: 8, 11: 9, 13: 11, 14: 12, 15: 13, 16: 14, 17: 15}, ['binary_expr']), ({'__end_of_input__': (10, 2, 'unary_expr'), "'/'": (10, 2, 'unary_expr'), "'+'": (10, 2, 'unary_expr')}, {}, ['binary_expr', 'unary_expr']), ({'__end_of_input__': (16, 3), "'/'": (16, 3), "'+'": (16, 3)}, {}, ['binary_expr']), ({'__end_of_input__': (17, 3), "'/'": 17, "'+'": (17, 3)}, {}, ['binary_expr']))
print E.pre_compile_grammar(E.PRE_COMPILED)
Returns None if the passed string matches the grammar,
a string containing the parse table otherwise.
Lrparsing offers a few other utilities for debugging grammars.
» pre_compile_grammar(pre_compiled=None)
» parse(input, tree_factory=None, on_error=None, log=None)
» repr_grammar() - the grammar as entered.
» repr_productions() - compiled productions.
» repr_parse_table(state=None) - print an LR(1) ItemSet.
» repr_parse_tree(parse_tree) - pretty print a parse tree
» unused_rules() - frozenset() of unused rules
Only useful if you want to report on more than one error.
parse() calls on_error instead of raising an exception.
lrparsing.on_error(iterator, input_tuple, stack)
The stack is a partially constructed branch of the parse tree:
([], (__empty__)) ([START,binary_expr], ((e '2'))) ([binary_expr], ('+')) ([binary_expr], ((e '3'))) ([binary_expr], ('/')) (START (e (binary_expr (e '2') '+' (e (binary_expr (e '3') '/' (e (unary_expr '-' (e '1'))))))))
Home page:
http://lrparsing.sourceforge.net.
Pypi page:
https://pypi.python.org/pypi/lrparsing.
Online manual:
http://lrparsing.sourceforge.net/doc.
And examples:
» Expression evaluator.
» Sqlite3 data manipulation statement grammar.
» Lua 5.2 to Python 2.7 compiler.
