REBOL 3.0

Comments on: PARSE: the problem with EITHER

Carl Sassenrath, CTO
REBOL Technologies
22-Sep-2009 19:43 GMT

Article #0249
Main page || Index || Prior Article [0248] || Next Article [0250] || 46 Comments || Send feedback

EITHER is problematic as an inline parse command.

The reason is that it requires three rules as arguments, but does not delimit them. Therefore, the rules must be singular. No modifications are allowed, unless you put them in blocks.

However, the concept of EITHER seems quite fundamental to me, almost like OR (the "|"). I'd prefer that we find the more basic pattern that is lurking here and come up with a general solution for it.

For example, consider this expression:

[a + b | c | d]

where we give '+ the meaning: advance past next alternate rule on failure. (This is similar in implementation to the new NOT command; they introduce a pending state forward.) That is the equivalent of EITHER (or you can think of it as a BRANCH notation of sorts.)

The benefit is: all rules are now delimited. You can write:

[a + b | c | d]
[a + 2 b | c | d]
[a + not b | some c | d]

or whatever. This cannot be done with EITHER unless you use separate blocks.

Of course, the '+ symbol isn't essential; it could be a different word... but I liked the elegance of it.

I think a solution like this is more basic and powerful... more fitting of how PARSE rules work.

Also...

In theory, the above solution could potentially support:

[a 2 + b | c | d | e]

Which means: advance past the next 2 alternate rules on failure.

But, let's wait on that until we see if the general concept flies.

46 Comments

Comments:

Brian Hawley
22-Sep-2009 18:25:13
I had been appreciating the name BRANCH but the + looks a lot better, though I initially assumed it was infix due to the a in your examples. The multiple alternate skip feature you mention is something I haven't seen anywhere else.

This would be a good time to mention that the various papers about PEG and TDPL refer to the proposed STAY operation with as a prefix &. This is why Peta initially suggested the operation be called AND.

Carl Sassenrath
22-Sep-2009 18:43:17
On STAY vs AND... it's kind of interesting:

a | b  ; match a or b
a b  ; match a (and) then b

so, it's possible to say:

a & b  ; match a and b

I'm not suggesting the infix (at this time), but it does put Peta's suggestion in context, where one can see that AND is correct.

So, that can be be changed, if there are no objections.

Brian Hawley
22-Sep-2009 18:52:37
Didn't know whether infix was possible in the PARSE dialect. I would definitely prefer infix & for that operation, if possible. Would it be able to handle complex rules (like COPY x THRU "blah") on the left-hand side of the operation?
-pekr-
22-Sep-2009 21:05:19
Hmm, when I first saw EITHER proposal, I imediatelly knew what it is about. Looking at above proposal, I have really difficulcy to understand, what does it do. It way too much implies math operation, not some parse rule. Why not to use ! instead of NOT then?

Use my feedback to judge your intentions here, but hell - I don't understand this article at all. What is it talking about? "advance past the next 2 alternate rules on failure" - my first reactionand was - WTF? :-) This turns parse into unreadable (for me) guru stuff ...

-pekr-
22-Sep-2009 21:11:57
While I don't understand what such rules mean, they consciously remind me of REBOL's 'also function. I have always difficulcy to understand the 'also expression and have to think twice to be sure to understand the outcome. I even think it interrupts normal REBOL flow of thinking, providing us with some obscure way of solving problems ...
Maxim Olivier-Adlhoch
22-Sep-2009 22:17:17
Carl proposes something and heated debates ensues here and there... hehehe

I must say that if given the choice of the equivalently working syntax of:

[ a either [ b | c ] [ d ] [ e ]]

vs.

[a 2 + b | c | d | e]

I admit the later doesn't feel like it fits with PARSE... I am immediately reminded of the RegExp nightmare in a very strong way.

I admit people have to understand that here, 'EITHER consumes and advances, its not just a conditional.

but the former is readable, and when given extremely long rules, having a bunch of "|" in sequence, makes it very complex to follow, when compared to using explicit start and ends of blocks. if you want to use nested rules, you'll have to use blocks in between the "|" characters anyways, so don't they end up being superfluous?

Maxim Olivier-Adlhoch
22-Sep-2009 22:20:49
I do find that the word 'EITHER is misleading, people will expect a conditional statement... so some other word should be used... but using arithmetic operators is going down the wrong way...

PARSE has always been about using words, please lets take the time to choose a proper one, than defaulting to something that isn't any more precise... we aren't adding rules... we are choosing based on the outcome of other rules.

Ken Singleton
23-Sep-2009 5:32:45
I agree with Maxim and Pekr and think + would be very confusing since it confuses concepts and moves us toward Perl like hieroglyphics. So, since you guru programmers seem to have so much trouble with Either (cheeky grin), how about something like the word FORK - same idea as branch but a bit shorter.
Steeve
23-Sep-2009 5:57:06
We could use infix & to implement the OF proposal.

[rule1 | rule2 | rule3] match only one rule.

[rule1 & rule2 & rule3] match all rules in any order.

Doesn't that make sense ?

-pekr-
23-Sep-2009 8:56:04
There was also a proposal, to eventually use:

[a ? b | c | d]

instead of:

[a + b | c | d]

Carl Sassenrath
23-Sep-2009 14:38:55
With the implementation of INSERT (and related) now complete and using the form:

to "b" insert "xyz"

it is more clear to me that EITHER should work like the rest of these words. This is how parse works, and expressing it this way minimizes internal parser state storage (an important benefit.)

The keyword does not have to be + which is too cryptic, but it can be called EITHER, BRANCH, ALT, ALTERNATE, or FORK (as Ken suggested.) The code:

either a | b

simply "skips" b if a is true. I don't mind using the word EITHER here, because the above line can be read as: "either a or b" -- so the mental model is good.

Also:

In my notes above, I did not mean to imply the use of & but rather that the abstraction of the operation is indeed an AND. I will be renaming STAY to AND, but it is a prefix operation.

Pekr:

I understand your note about ALSO (because I felt the same way for a while) and there are other words such as UNLESS (which I will admit I did not like for several months.)

But, what I've found is that once you "get it" then you soon become fluent. This is true in all languages, human or computer, and soon you cannot resist using it.

Earlier this month I updated the ALSO doc, so type:

?/doc also

to review what is posted, and see if you think it is ok.

Brian Hawley
23-Sep-2009 15:05:47
Steeve, due to the semantics of OF, infix won't work. You need prefix and a block argument. Rules aren't compiled in REBOL parse, they are interpreted. This causes some of our syntax choices to be made for us.

For instance, Pekr's mentioning the ? notation doesn't show the actual ? syntax I proposed: a ? b | c. Which wouldn't work, because the c would be unbounded; it needs to be a ? b | c |. The problem is that Carl's examples have one too many participants, and his description is not descriptive enough. Let's break it down, and show the problems.

For the syntax [a + b | c | d], we get the description: "we give '+ the meaning: advance past next alternate rule on failure".

On failure of which rule? Which of those rules is the "next alternate rule"? What do you do next? Is the + prefix or postfix?

One interpretation: [a b | not a c] d. + is postfix, and acts upon the failure of a, really the state of failure at the point before the +. When the + is reached, if we are in a failure state then the rule skips past the first |, matches c, and then ignores the next |. If we are not in a failure state when we reach the + then we match the b and when we reach the first | then we skip past it and the next | and continue with d. This all assumes that the failure state is something that can be acted on by a postfix operator, which may not be true.

Another interpretation, the one I thought initially because I assumed + had to be prefix: a [b c | not b d]. The a is irrelevant, and the + operates on the failure state of the rules between the + and the first | (the b). The + would change the backtracking mode for a short time, so that failure of b would not backtrack, and instead skip to the second | and then go back to normal. If b doesn't fail, then the first | would be ignored, and then the second | and the d would be skipped over. In retrospect, this doesn't make sense because the d part is unbounded.

I would prefer the first interpretation, and use the character ? instead of +. + may look elegant, but people have been hating it because they don't want to take + out of its arithmetic context and use it in a parsing context.

Brian Hawley
23-Sep-2009 15:20:30
OK, while I was writing that comment Carl wrote another that completely changed everything I was writing about. So, Carl, by "simply skips b if a is true" do you mean that only the one rule b is skipped if all of the rules between the either and the | succeed, or is only one rule a allowed between the either and the | with an error being thrown otherwise?
Brian Hawley
23-Sep-2009 15:31:48
For that matter, how would [either a | b] be different from [a | b] ? We seem to be missing a term here, and I'm having trouble seeing why we need this operation without 3 terms.
-pekr-
23-Sep-2009 16:34:18
Carl - yes, sometimes I probably way too much worry about the wording, usage etc., but I always try to think in the context of how some REBOL coder/user might think about the problem, and how long will it take for him/her to understand the code - and that aspect is VERY important, and this is also the reason why you started new VID, do you remember?

?/doc - now how that goodie could escape my radar? :-)

Carl Sassenrath
23-Sep-2009 16:46:13
Sorry, I abbreviated it (because parse truth is implied in reaching either.)

A full example is:

[a either b | c | d]

For this line, all the normal parse operations apply with one exception: If we hit either (because a is true), we always skip c, even if b fails.

The rest of the expression works just like normal. That is, if a fails, we advance to c. And, if b or c fail, we advance to d.

What I'm really after is a "branch operator" but not a "branch function".

Carl Sassenrath
23-Sep-2009 19:27
Actually, writing [either a | b] is not equivalent to [a | b], because either always means "skip next".

It is now working, and here are the test cases (that all pass):

; The only "either" case that is special:
[not parse "ac"  ["a" either "b" | "ac"]]
[not parse "ac"  ["a" either "b" | "ac" | "d"]]
; These work just like normal parsing:
[parse "ab"      ["a" either "b" | "c" | "d"]]
[parse "c"       ["a" either "b" | "c" | "d"]]
[parse "d"       ["a" either "b" | "c" | "d"]]
[parse "abx"     ["a" either "b" "x" | "c" "y" | "d" "z"]]
[parse "cy"      ["a" either "b" "x" | "c" "y" | "d" "z"]]
[parse "dz"      ["a" either "b" "x" | "c" "y" | "d" "z"]]
[not parse "a"   ["a" either "b" | "c" | "d"]]
[not parse "ac"  ["a" either "b" | "c" | "d"]]
[not parse "ab"  ["a" either "b" "x" | "c" "y" | "d" "z"]]
[not parse "abc" ["a" either "b" "x" | "c" "y" | "d" "z"]]
[not parse "aby" ["a" either "b" "x" | "c" "y" | "d" "z"]]
[not parse "abd" ["a" either "b" "x" | "c" "y" | "d" "z"]]
[not parse "abz" ["a" either "b" "x" | "c" "y" | "d" "z"]]
; Truncations:
[parse "b"       [either "b"]]
[parse "b"       [either "b" | "c"]]
[not parse "c"   [either "b" | "c"]]
[parse "ab"      ["a" either "b"]]
[parse "ab"      ["a" either "b" | "c"]]
[parse "c"       ["a" either "b" | "c"]]

BTW, I am not opposed to renaming either if you think there are better words...

; Possible names:
[a + b | c]
[a & b | c]   ; not advised because not AND
[a || b | c]  ; meaning skip two "|"
[a alt b | c]
[a also b | c]
[a fork b | c]
[a jump b | c]
[a next b | c]
[a pass b | c]
[a past b | c]
[a branch b | c]
[a skip-next b | c]
Brian Hawley
23-Sep-2009 19:34:20
Don't forget ? as a possible name.

I've been updating the proposals page with the changes to the proposals, as I figure them out by reading between the lines. Take a look, particularly at CHANGE 2 and INSERT.

-swede-
23-Sep-2009 21:55:06
How about: [a then b else c]
Brian Hawley
23-Sep-2009 22:25:17
Swede, the else has to be | in order for the trick to work - standard backtracking is used for one path of execution. The word then is interesting though.
-pekr-
24-Sep-2009 0:24:44
The word 'either does not make usage cases clear enough to me, because it is "infix". It would have good meaning to me, if being "prefix", but that is not possible.

My other word alternatives are (top the best):

  • then - absolute favorite
  • also
  • with
  • still
  • even
  • next
  • try - too much implies REBOL's 'try?
  • only
  • ?, => ... if we like a bit of punctuation
Henrik
24-Sep-2009 2:54:47
I find EITHER hard to read, because I always read it as prefix rather than infix, hence, I look always at the right side of the word. Besides other words in PARSE are prefixed. This will confuse new users.

My vote would be for a symbol, like + or ||. I think there should be a rule that symbols in PARSE are automatically infix. Isn't that so in the rest of R3 now as well?

Generally, we should also take advantage of using two chars for symbols:

-- ++ +> <- -> etc.

Ladislav
24-Sep-2009 3:09:25
Yes, [a then b else c] looks very natural
Ken Singleton
24-Sep-2009 3:41:57
When Either was originally proposed it was as a prefix keyword and it made perfet sense and flowed well, but as an infix word it makes less sense to me now. So if we are looking for an infix word then I think 'then' flows well (with an implied else), or ? if you want a single character because that character does not bring with it much concept baggage.
DocKimbel
24-Sep-2009 4:25:49
Trying to catch up with this very interesting topic.

As Ladislav, I second -swede-'s proposition. It looks more intuitive and less ambiguous to me than the other options.

[d | a + b | c]         ; cryptic and if d fails, skip to c?
[d | a then b else c]   ; readable and no issue  
Brian Hawley
24-Sep-2009 5:16:21
Ladislav, Doc, the else won't work: You have to use | because it is overloaded. The else portion is chosen by a standard backtrack to an alternative, and the existing word to indicate the start of that alternate is |. The then operation has nothing to do with it, it isn't even seen yet. If the then operation is reached, it just skips past a standard alternate after its rule is matched. You can use then, but not else.

I don't like + for the operation, but ? or => would be great. The => would look the best if we allow the numeric prefix indicating skipping multiple alternates, something you would have to do if the else rules included alternates within them and you didn't feel like putting them in a block (for performance reasons).

My naming preferences for the first operator, in order:

  • => indicating advancement
  • ? reminiscent of C's choice operator
  • then
  • also

The second operator must be | for this operation to work. I prefer => or ? for the first operator since they are symbols, like the second operator.

Carl, there isn't any sensible meaning to [a not => b | c | d]. The not doesn't make sense there, since the opposite of => would be to not use that rule at all. Because of this, I don't think we have to worry about how not looks in front of whatever word we choose.

-swede-
24-Sep-2009 9:39:53
Another try: [c unless a then b]
-swede-
24-Sep-2009 9:59:45
There is two rules: c unless a ; a then b One of them must succeed.
-swede-
24-Sep-2009 10:19:31

After some more thinking:
a is a predicate in c unless a.
a is text in a then b

Brian Hawley
24-Sep-2009 14:17:37
Swede, we already have your unless, except it's called not.
Ladislav
24-Sep-2009 16:05:23
"Ladislav, Doc, the else won't work: You have to use | because it is overloaded." - I do not think so. Moreover, an expression like:

[a then b else c | d then e else f] looks readable to me, while a transcription to the other notation does not.

Ladislav
24-Sep-2009 16:29:27
Just a note, why [a then b else c | d then e else f] looks more readable, than an alternative using "overloaded |":

1) both (THEN and ELSE in the above parse expression) are (infix) keywords of one operation, so they can look "natural" only if they have the same precedence in the expression

2) it is quite natural, if the precedence of the ordered choice is lower, than the precedence of the above THEN keyword, therefore the | operator cannot meaningfully represent the ELSE keyword, since it should have a lower precedence, than ELSE

Brian Hawley
24-Sep-2009 16:43:30
1) PARSE doesn't have infix expressions, not even | which is more of a alternate starting marker. Even the stuff that looks infix is actually prefix. PARSE rules aren't compiled, they are interpreted on the fly. You can determine the behavior of PARSE rules with a strict left-to-right reading.

2) PARSE doesn't have precedence, it is strict left-to-right (with backtracking of course).

Now, we could add precedence and infix, but that would require greatly increasing the complexity of PARSE's internals, or a complete rewrite based on a different algorithm (not a state machine). You up for that?

Brian Hawley
24-Sep-2009 16:57:22
"(with backtracking of course)"

Whoops, sorry, there is no backtracking of PARSE rules, just recursion and repetition. PARSE only backtracks over data, not over its own dialect.

Ladislav
24-Sep-2009 17:00:35
More precedence notes: the precedence of THEN and ELSE would apply only "outside", i.e. to the left of THEN and to the right of ELSE; "inside", i.e. between THEN and ELSE the keywords would work as brackets
Ladislav
24-Sep-2009 17:08:23
"PARSE doesn't have infix expressions" - do I really have to say no to this? The ordered choice operation as well as the sequence operation are both infix in PARSE; they even use precedence rules, that are typical for infix operators (you may know, that precedence rules are totally unnecessary for prefix). The R2 precedence rules are:

1) repetition operators, like SOME (actually prefix)

2) sequence operation (clearly infix, the operator is the empty space between the operands)

3) ordered choice (clearly infix, the operator is between its operands)

Ladislav
24-Sep-2009 17:32:19
"PARSE doesn't have precedence, it is strict left-to-right" - if that were true, the sequence would have the same precedence as the ordered choice, which would require the parse expression

[[a | b] c]

to be equivalent to the expression

[a | b c]

, which is not the case.

Brian Hawley
24-Sep-2009 18:03:29
The | isn't an operation, it is like the [ ] and ( ) - syntax. Same with the sequence operation. It's like the difference between the evaluator and functions in the DO dialect.

"to the left of THEN"

The then operation doesn't see anything to its left - that would require adding support for infix operations to PARSE, which would require lookahead, something PARSE currently has no support for.

Normally, when confronted with an expression like

[a then b else c]
by the time the then is reached, the a has already been processed. Since the then else is supposed to affect the failure behavior of a (making it skip past the else on backtrack instead of the next |), that would have to be something that would be known before a even starts. Since PARSE doesn't even see the then until after the a has already succeeded or failed, there is no way that the behavior of a could be changed without knowing about the then ahead of time, and that would require lookahead.

On the other hand, if the else keyword is replaced with |, then the behavior of a can be unchanged. The else branch of the choice would be handled using normal backtracking behavior on the failure of a, and the then will never be seen at all.

It's like the ALSO function: Half the work is being done by the evaluator, and the function does very little.

Ladislav
24-Sep-2009 19:35:45
"The ordered choice isn't an operation" is a statement that "cannot hold water". Similarly with other statements above: "Since the then else is supposed to affect the failure behavior of a (making it skip past the else on backtrack instead of the next |), that would have to be something that would be known before a even starts." - that is a provably incorrect conclusion, etc., etc.
Ladislav
25-Sep-2009 3:11:36
Max wrote: "I must say that if given the choice of the equivalently working syntax of: ..." - rightly so, the expressions you used as example are not equivalent at all!
Brian Hawley
25-Sep-2009 3:50:37
Ladislav, if you changed it to
[if a then b else c]
then it would work. The if would trigger the alternate behavior for a without requiring any prescience. The then and else would just be delimiters, like | is.

Of course that would require that check get its name back.

DocKimbel
25-Sep-2009 10:07:45
Brian, we don't know the precise implementation details of PARSE (even if we can guess most of it).

Maybe it's trivial to support [a then b else c] if the result of last expression evaluation is already kept temporary, so that this result could be used when 'then is reached and the branching decision could be made (eval 'b and skip [else c], or lookup 'else and eval 'c).

Ladislav
25-Sep-2009 11:19:14
Brian: "if you changed it to ..." - I recall, that I mentioned above, that this was provably unnecessary.
Carl Sassenrath
25-Sep-2009 13:50:17
Hi. Good discussion.

I've been considering using then if the group rejected the +, ?, and || syntaxes.

BrianH and Ladislav: you're both correct, approximately.

The implementation is this: The or-bar "|" is just a symbol used as a marker. It works two ways: either as a symbol encountered during normal parse sequencing (meaning rule complete) or as the target for skipping forward to the next rule (alternate rule).

BrianH is correct when he says that [a then b | c] is preferred over [a then b else c].

While, strictly speaking ELSE is possible, it would be tricky to implement. An example is shown here, where the rule contains a few more leading expressions:

a b c then d | e | f

If a, b, or c fail, then the parser immediately advances to e. This uses the normal mechanism for | (not a special case like 'ELSE.)

Anyway, the most important thing about this branching rule is that it is a very minor modification to the normal parsing mechanism.

The rule is: the THEN keyword does just one simple thing always: It skips the next alternative rule. That's it! It does not even contain conditional expression logic! Quite nice.

So, you can see why I think of it as a fundamental operator. Just like when you design a CPU's instruction set, various instructions are required in achieving the complete mathematical model from which all expressions can be built. PARSE needs a BRANCH operator... simple and primitive.

So, we can call it THEN, or maybe ? if you prefer. Thanks for your thoughts and ideas. You'll get to try it out really soon.

-pekr-
25-Sep-2009 14:56:44
The only alternative to 'then is proposed => , while ? is not expressing the meaning satisfactory, at least for me.
Brian Hawley
25-Sep-2009 23:23:21
Thank you for the explanation, Carl :)

I really hope that you manage to special-case the -> or => symbol as a word so we can use it for this operation. While ? would be close enough in the simple case for people familiar with C-like languages, the meaning isn't exactly the same and 2 ? would be confusing at first. The semantics are great though.

Post a Comment:

You can post a comment here. Keep it on-topic.

Name:

Blog id:

R3-0249


Comment:


 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote
 
 

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

REBOL 3.0
Updated 16-Apr-2024 - Edit - Copyright REBOL Technologies - REBOL.net