Parser/Lexer

Posts   
 
    
Trig
User
Posts: 96
Joined: 09-Jun-2004
# Posted on: 22-Jul-2004 20:21:23   

Frans, This isn't LLBLGen related necessarily, but I'm hoping you can help. I've been working on creating a web-based visual boolean search designer for our help-desk system which will basically be an abstraction layer above your predicate system. What I've run into though is that in order to serialize down my abstraction-predicate layer to preserve state I'm going to have to write a custom TypeConverter and a parser/lexer (because of the recursiveness of boolean expressions). I have two questions for you: 1. Can you think of a better way of serializing for state-preservation other than writing a custom TypeConverter? 2. If not, would you be open to me using your TDLParser code as inspiration for my TypeConverter?

I've spent the past day researching lexer's, but am really stumbling over it and I know from having looking at the source in the past that I was pretty impressed with your parser but wanted to make sure you were ok with me looking at your code for this purpose. I most definitely would not be doing any sort of copying of code, just of concepts (as I don't know the first thing about parsers.)

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39826
Joined: 17-Aug-2003
# Posted on: 23-Jul-2004 11:06:46   

The TDL parser is not that complex, it's a typical LL(1) parser with a simple lexical analyzer. The lexical analyzer is the first step and tries to convert the input text into tokens. I use regular expressions for this, as you need state machines for proper lexical analyzing and regular expressions are exactly that simple_smile . The lexical analyzer simply produces tokens. The parser then scans this stream of tokens and tests if these tokens are correct, i.e.: if a given start token has for example the correct end token. These stream of tokens is then converted, by handler routines (typical for LL(n) ), into non-terminals.

non-terminals are elements which appear at the left-hand side of a grammar rule: IfStatement -> IfStart Expression TrueClause IfStatement -> IfStart Expression TrueClause Else FalseClause

Here, 'IfStatement' is a non-terminal, as you can represent a stream of tokens which form an ifstatement with that single non-terminal simple_smile . So the parser transforms the tokens into nonterminals and when the parser is done, the stream of ascii characters is converted into a stream of non-terminals. Then the interpreter steps in and starts scanning the non-terminals. Each non-terminal has a separate handler routine which 'handles' that non-terminal.

I'm not sure if you actually need all this for your situation. Could you elaborate a bit of what your input is and what you want to transform to for your interpretation process?

Frans Bouma | Lead developer LLBLGen Pro
Trig
User
Posts: 96
Joined: 09-Jun-2004
# Posted on: 23-Jul-2004 17:49:43   

It may or may not be overkill, but I'm thinking its not because I spent 2 days racking my brain for another solution and couldn't find one. Let me describe in a little more detail how it works:

Like I said, I've created a sort of abstraction layer above LLBL's predicates, and mimic'd the functionality of the predicates. The base class for everything is SearchPredicate which has the following interface:

Public Interface ISearchPredicate
    Property SearchType() As SearchPredicateType
    Property ValidOperators() As PredicateOperators
    Property Negate() As Boolean
    Property Parameters() As ArrayList
    Function ToLLBLPredicate() As SD.LLBLGen.Pro.ORMSupportClasses.Predicate
    ReadOnly Property PredicateControl() As SearchPredicateControl
    Property Parent() As SearchPredicateExpression
End Interface 

Then there's a SearchPredicateExpression:

Public Interface ISearchPredicateExpression
    Inherits ISearchPredicate
    Sub Add(ByVal SearchPredicate As ISearchPredicate)
    Sub AddWithAnd(ByVal SearchPredicate As ISearchPredicate)
    Sub AddWithOr(ByVal SearchPredicate As ISearchPredicate)
    Sub Delete(ByVal SearchPredicate As ISearchPredicate)
    Default ReadOnly Property Item(ByVal Index As Integer) As ISearchPredicateExpressionElement
    Function Count() As Integer
End Interface 

I then have 9 SearchPredicateType's (for lack of a better term) which allow visual construction of the boolean search: TicketType, Location, EnteredBy, EnteredOn, CompletedOn, Description, Comment, TypeCategory, ResolutionCategory. Each of these has a class deriving from SearchPredicate.

Each SearchPredicate has an associated WebControl which is created dynamically at page construction. When the page first loads, an empty SearchPredicateExpression is created (and its associated WebControl as well). The WebControl has a DropDownList which allows you to select one of the 9 PredicateType's or a BooleanGroup (SearchPredicateExpression). When one is selected, the associated Predicate is created and its control is added to the parent SearchPredicateExpression's Predicates collection, and its associated control is added to the controls collection of the SearchPredicateExpressionControl. This allows for visual design of any possible search query for our help-desk system.

The problem I ran into is that serialization without a custom TypeConverter just wasn't working, and to build a TypeConverter I had to be able to recursively convert the predicates to a string for serialization.

I ended up building a tokenizer for parsing, and instead of converting things to non-terminals I just sent the token stream directly to an interpreter which reconstructed the objects. It works amazingly well.

Here's an example of a serialized SearchPredicateExpression:

 e[p[b[0];i[1];i[0];l[i[1];];];o[0];p[b[0];i[1];i[0];l[i[1];];];o[0];e[p[b[0];i[1];i[0];l[i[1];];]; o[0];p[b[0];i[1];i[0];l[i[1];];];];o[0];e[p[b[0];i[1];i[0];l[i[1];];];o[0];p[b[0];i[1];i[0];l[i[1];];];o[0]; e[e[p[b[0];i[1];i[0];l[i[1];];];o[0];p[b[0];i[1];i[0];l[i[1];];];];o[0];p[b[0];i[1];i[0];l[i[1];];];o[0]; p[b[0];i[1];i[0];l[i[1];];];];];]

Is this overkill? sunglasses

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39826
Joined: 17-Aug-2003
# Posted on: 23-Jul-2004 18:36:49   

Reading this, I see a lot of similarities with what I wrote in january. The code for that is posted here on the forum, please check this thread: http://www.llblgen.com/tinyforum/Messages.aspx?ThreadID=332#1464 (copy the whole line to your browser stuck_out_tongue_winking_eye )

I there build a form with search expressions which are translated to predicates when the user is satisfied. I use some datastructures to store the expressions specified. In my code these are not serialized, but you of course do that.

Perhaps it can help you further, I have the feeling it will give you ideas how to solve this. A parser for this is way overkill simple_smile

Frans Bouma | Lead developer LLBLGen Pro
Trig
User
Posts: 96
Joined: 09-Jun-2004
# Posted on: 23-Jul-2004 19:19:45   

The reason that wouldn't work for me though is because I'm allowing for much more complex queries (sub expressions), as well as the need to persist state to the browser and back across posts. Because the controls have to be created dynamically, they have to be created for every postback, which means the state of the predicate expression has to be serialized to allow for control re-creation.

Trig
User
Posts: 96
Joined: 09-Jun-2004
# Posted on: 23-Jul-2004 19:20:53   

Also, like I said, I've already written the parser for it, and it's not nearly as complex as I originally thought it would be. Not only does it work well, it also allows the viewstate to be MUCH smaller than if I'd been able to have the BinaryFormatter serialize it.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39826
Joined: 17-Aug-2003
# Posted on: 24-Jul-2004 12:20:30   

ok simple_smile

Frans Bouma | Lead developer LLBLGen Pro