follow up to previous email. work in progress on "terse guide to XTreams Parsing Syntax" cheat sheet.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

follow up to previous email. work in progress on "terse guide to XTreams Parsing Syntax" cheat sheet.

Squeak - Dev mailing list
Hi Levente.

Regarding my previous letter, I am almost certain that "consume" means a match has been made and eat the string.
"Yield" means  "invoke the callback" or "return the section of a AST for that rule.

anyhoo, I have been going through your old emails and cross checking with:https://nim-lang.org/docs/pegs.html
and have come up with a preliminary "terse guide" to the syntax.

I would like to expand this into tests/examples for others to use going forward. Eventually this will make its way to a SqueakBook when/if I get some time.

Anyhoo, if you could peruse it and see if anything jumps out at you that is incorrect/incomplete. 

Much appreciated.

t

p.s. I have cc'd squeak-dev in case anybody else finds this interesting.




XTreams


A  < - E 
Rule:
Bind the expression  E  to the nonterminal symbol  A .
Left recursive rules are not possible and crash the matching engine.

\ddd 
Character with decimal code ddd

\" , etc
Literal  " , etc.
PEG: Literal <- QUOTE LiteralEntity{QUOTE}/ DOUBLE_QUOTE LiteralEntity{DOUBLE_QUOTE}




A ... Z 
Sequence:
Apply expressions  A , ...,  Z , in this order, to consume consecutive portions of the text ahead, as long as they succeed.
Indicate success if all succeeded.
Otherwise do not consume any text and indicate failure.
The sequence's precedence is higher than that of Ordered Choice:  A B / C  means  (A B) / Z  and not  A (B / Z) .

A / ... / Z 
Ordered Choice:
Apply expressions  A , ...,  Z , in this order, to the text ahead, until one of them succeeds and possibly consumes some text. Indicate success if one of expressions succeeded.
Otherwise do not consume any text and indicate failure.
The Ordered Choice precedence is lower than that of Sequence:  A B / C  means  (A B) / Z  and not  A (B / Z) .

(E) 
Grouping:
Parenthesis can be used to change operator priority.
(A B) / Z  vs.  A (B / Z) .


{E}    
Cardinality:  Stop Expression
A <- B{C}
to accept A,  means, accept any number of B up until E comes.
Consume E too, but don't yield it.
So, such expression accepts: BE, BBE, BBBE, BBBBE, etc, and yields B, BB, BBB, BBBB, etc.

A <- B{1,"\n"}
means that a A consists of one or more Bs.
The parser will  read B's up until "\n" appears on the stream, which is a carriage return
character.

E?
Cardinality:
Zero or One  E


E* 
Cardinality
Zero or more: E
Apply expression  E  repeatedly to match the text ahead, as long as it succeeds.
Consume the matched text (if any).
Always indicate success.

E*
Cardinality:
Matches zero or more E.


E+
Cardinality:
Matches one or more E
Apply expression  E  repeatedly to match the text ahead, as long as it succeeds.
Consume the matched text (if any) and indicate success if there was at least one match.
Otherwise indicate failure.

E{m}
Cardinality:
Matches m repetitions of E.
B{3}, which is a shorthand for BBB.
B{E} means, accept any number of B up until E comes.
Consume E too, but don't yield it.
So, such expression accepts: BE, BBE, BBBE, BBBBE, etc, and yields B, BB, BBB, BBBB, etc.


E{m,n}
Cardinality:
Matches from m to n repetitions of E.
B{1,3} means B 1 to 3 times, so it accepts B, BB, and BBB.

[A-Za-z]+
Cardinality:
EXAMPLE: Matches one or more alphabetical characters.



Anchor:
Matches at the end of the input.
No character is consumed. Same as  !. .

!. = $
Anchor:
Matches at the end of the input.
No character is consumed. Same as  $


Anchor: Matches at the start of the input.
No character is consumed.

&E 
And predicate:
Indicate success if expression  E  matches the text ahead;
otherwise indicate failure.
Do not consume any text.

!E 
Not predicate:
Indicate failure if expression E matches the text ahead;
otherwise indicate success.
Do not consume any text.


[s] 
Character class:
If the character ahead appears in the string  s , consume it and indicate success.
Otherwise indicate failure.

[a-b] 
Character range:
If the character ahead is one from the range  a  through  b , consume it and indicate success.
Otherwise indicate failure.

's' 
String:
If the text ahead is the string  s , consume it and indicate success.
Otherwise indicate failure.



Any character:
If there is a character ahead, consume it and indicate success.
Otherwise (that is, at the end of input) indicate failure.



"BELOW PROBABLY NOT IN PEG"

Any Unicode character:
If there is an UTF-8 character ahead, consume it and indicate success.
Otherwise indicate failure.

@E 
Search:
Shorthand for  (!E .)* E .
(Search loop for the pattern  E .)


{@} E 
Captured Search:
Shorthand for  {(!E .)*} E .
(Search loop for the pattern  E .) Everything until and exluding  E  is captured.

@@ E 
Same as  {@} E .

\identifier 
Built-in macro for a longer expression.



$i 
Back reference to the  i th capture.  i  counts from 1.


i's' 
String match ignoring case.

y's' 
String match ignoring style.

v's' 
Verbatim string match: Use this to override a global  \i  or  \y  modifier.

i$j 
String match ignoring case for back reference.

y$j 
String match ignoring style for back reference.

v$j 
Verbatim string match for back reference.