REBOL 3.0

Comments on: PARSE: BREAK vs ACCEPT

Carl Sassenrath, CTO
REBOL Technologies
16-Oct-2009 3:03 GMT

Article #0277
Main page || Index || Prior Article [0276] || Next Article [0278] || 16 Comments || Send feedback

17-Oct-2009: Rewrote this blog to make it more clear. Earlier, I tried to condense it into a few examples, but that didn't make sense.

Within the parse dialect, blocks are used for holding alternative rules (separated with or bars "|".)

Normally, the rules within a block should be written in a simple, clean way where only match words and values are given, along with the paren productions that get evaluated. When you write parse rules, effort should be made to keep the rules simple like that. The normal backtracking of the parser can handle most situations.

However, advanced users may need a bit more control for unique parsing situations. Some of these include:

 failexplicitly fail a single rule, skip to the next alternative (if it has one).
 breakexplicitly exit the entire rule block, skip all alternatives.
 returnexplicitly exit all rules, return from the parse function.

There is one other case: exit back through multiple rule blocks. But, let's discuss that in a separate article.

For break there is an additional issue: does it cause the overall rule block to succeed or fail? For example, given some rule block:

[a b | c break | d]

if the break happens, does that block succeed or fail?

The answer is that you cannot tell just by looking at the rule block. You need to know its repetition limits. For example, it could be any of these:

opt   [a b | c break | d]  ; 0 to 1
any   [a b | c break | d]  ; 0 to infinity
some  [a b | c break | d]  ; 1 to infinity
while [a b | c break | d]  ; 0 to infinity (input independent)
3     [a b | c break | d]  ; 3 only
3 5   [a b | c break | d]  ; 3 to 5
[a b | c break | d]  ; 1 only

If the break occurs before the minimum count, then the rule fails. Otherwise, it succeeds.

Recently we added a new keyword, reject. The purpose of this word is to force the entire rule to fail. This was necessary because there was no other mechanism for explicitly failing the entire rule block.

After defining reject, it theoretically made sense to define accept as the opposite: force the entire rule to succeed.

Unfortunately, both those definitions are inadequate unless we also state their effects on the repetition itself. Does reject fail the entire repetition loop, or does it depend on whether the limits are also satisfied? This same question is also valid for accept.

This was not a problem for break because it is defined to break the loop. However, it does not define the success or failure of the overall rule, because you must consider the rep limits.

You can see here that we are dealing with two variables:

  1. stopping the repetition
  2. success or failure of the overall rule (including any repetition)

So, to define the required actions, the effects on both of those variables must be clearly stated.

For example, we can clearly define:

 breakstop repetition. Succeed only if repetition limits are satisfied.

But, are these the correct definitions for the other two words?

 acceptstop repetition. Always succeed, regardless of the rep limits?
 rejectstop repetition. Always fail, regardless of the rep limits?

Those are questionable, aren't they? So, what's the correct behavior?

Let's get these words defined, or toss them out, so we can finalize parse for R3.

16 Comments

Comments:

Maxim Olivier-Adlhoch
17-Oct-2009 2:17:08
somewhere this whole post seems like someone too close to the tree to see the forest... are we tired Carl? ;-)

(break isn't useful in this example)

>> parse "bbba" [some ["a" | "b" | "c" reject]]
== true
(nor would it help here)
>> parse "bbbab" [some ["a" break | "b" | "c" reject]]
== false
the rule fails cause we aren't at the end. not because THIS rule failed.

in real usage, there would be other rules complementing the use of ACCEPT.

really, REJECT and ACCEPT aren't for such simple rules, they require a little bit more rule context to be of any use.

I'd rather have ACCEPT than BREAK, simply because BREAK has a negative connotation, much closer to REJECT in fact, which doesn't reflect what is really going on.

Or maybe I'm the one who's to close to the tree ... tired cause I've been having parsing nightmares while working on an incredibly complex (to implement) and powerful new version of remark which involves self-modifying parse rules, for the last 40 hours... with a sinus cold on top of it. :-)

Maxim Olivier-Adlhoch
17-Oct-2009 3:10:18
if you want to look at it in a totally different light...

ACCEPT and REJECT allow us to write PREEMPTIVE parse rules.

But for this to actually work, you need something else to preempt to, which isn't the task of REJECT OR ACCEPT to handle.

The only difference is that in one operation the input was considered valid, whereas it was invalid in the other. This just changes where the other parse rules will continue to look for data to process (backtrack or not).

think about it... can you really consider a system to have preemptive multi-tasking with only one task running?

and just to make a link to my previous post ... "does a tree really make noise if you're not there to hear it fall" ;-)

Brian Hawley
17-Oct-2009 7:44:27
Declare that
some [break | "b"]
succeeds and rename break to accept. You might think that the rule is silly, but let's not second-guess the programmer here. We have a lot of situations where R3 makes sense where REBOL didn't before. Let this be another.
-pekr-
17-Oct-2009 7:52:28
With BrianH's explanation, we should accept an ACCEPT :-)
Hostile Fork
17-Oct-2009 9:02:04
+1 on the rule modification to make accept "accept"

We have a lot of situations where R3 makes sense where REBOL didn't before. Let this be another.

+10 on sentiment

Please see my marketing advice on advertising R3 as "Rebol, rebooted."

Those who like Rebol will be happy to see it rebooted and tweak their sources to match new ideas.

Those who haven't cared for Rebol in the past will come back for a second look.

Those who've never heard about it will be intrigued that something unusual was renovated while they slept.

Carl Sassenrath
17-Oct-2009 14:40:57
Maxim, you're quite right... I wrote this blog quickly as I was running out the door, wondering if I was making any errors in it as I wrote it. Sorry about that. I'll update the blog to clean it up.
Carl Sassenrath
17-Oct-2009 15:57:17
This blog has been entirely rewritten. The issue is a bit more complicated, and I've explained it in more detail now.
Hostile Fork
17-Oct-2009 16:42:16
I'm extremely new to parse. But maybe that newness is useful in helping you assess these questions, because you are very deep in this... and maybe a "clueless" person can offer a useful perspective.

I would generally think that break(/accept/reject) is a convenience for code in parentheses. So in your patterns, e.g.:

while [a b | c break | d]

The way to gauge expectation would be to say:

while [a b | c (print "break") | d]

If that would print break then it should break, if not it should not...

That is how I, the parse noob, perceive this. e.g. break isn't a pattern and should not be thought of that way.

Again, I don't think break clearly offers an answer to "well what does parse return if it returns?"

See my parse/position proposal as well, because when trying this out it's very frustrating to not get the point of failure or success... maybe it needs to return a block whose first element is true/false and second is the point where the parser was at when it failed?

Brian Hawley
17-Oct-2009 16:51:53
With the rewritten blog, I say that having all of accept, reject and break would be really useful, if possible. The break operation would be like while: needed by advanced parsers in advanced situations. Give us the flexibility we need and we will use REBOL even more :)
Maxim Olivier-Adlhoch
17-Oct-2009 17:07:49
ahh now that makes sense :-)

with this new information, I feel my PREEMPT statement earlier becomes even more relevant.

although preemption usually takes place from an external controller (ex: kernel controlling processes), in parse the controller is the current parse rules.

with the three explicit extended meanings including repetitions, ACCEPT and REJECT become truly PREEMPTIVE. they exit the block without regard for context, and give control back to the context they were used in an explicit unwavering manner.

ACCEPT: "this part is good", continue with other rules.
REJECT: "this was bad", try again with other rule.

BREAK still depends on the context for this meaning to be definite, so its not fully preemptive, the context is taking the decision in the end.

I know I'm stretching the concept of PREEMPTION a bit, but I find it illustrates well how control is being taken... parse until a situation forces you exit (preempt) the current decision making (process), allowing other rules (other processes) to take over.

Brian Hawley
17-Oct-2009 17:33:07
Fork, BREAK in parentheses is a completely different function from what we are discussing here. The closest PARSE operation to what you are talking about is return, particularly if its parameter is an expression in parentheses.

Assuming that you are new to PARSE but not to parsing in general, the purpose of break, accept and reject is to allow us to recognize language patterns that PARSE would otherwise have a great deal of trouble handling. Every parsing model has a set of language patterns that it is good at recognizing, and other patterns outside of that set that it would have more difficulty recognizing, or that you can prove mathematically that they can't recognize.

For the extended PEG model of PARSE, the set of languages it can handle is pretty extensive. But when you run into the limits of the model (what the CS people call "expressivity"), it's like running into a brick wall. Anything we can practically do to expand those limits can be a good thing. That is what we are proposing here.

The whole purpose of most of the R3 PARSE proposals is to increase expressivity. In particular, that is the reason for and, not, quote, fail, if, then, the into, to and thru changes, and even do and series setting to some extent. All of the deferred proposals (of, limit and reverse) are about increasing expressivity as well.

Don't worry though, we also have added functional operations: change, insert, remove, return, and the aforementioned do and parse series setting :)

Hostile Fork
17-Oct-2009 22:38:26
BrianH--I know break is different from (return). It is for breaking rules, not breaking out of the parse entirely...

But the point I was making is that there has to be consistency. Look for instance at this:

parse "abc" [["ab" (print "hi") | "cd"] "c"]

I just ran that and it printed "hi" and returned true.

Then I ran this:

parse "abc" [["ab" break | "cd"] "c" (print "bye")]

Prints bye, returns true.

My point above was simply that in the parse design, this can be abstracted. Something like:

parse "abc" [["ab" X | "cd"] "c" (print "bye")]

I don't see how or why break is special in terms of evaluating whether it happens or doesn't happen. Whatever X is (code in parens, accept/reject) there should be a consistent answer to whether it has the intended effect.

This is what I mean as a parse noob question. If there's some deep level where you really have to think hard about whether X is code in parens or break/accept/reject, that doesn't seem sensible to a Rebol parse noob like me.

I am wondering why this discussion is about break/accept/reject and not about X.

Brian Hawley
18-Oct-2009 0:46:10
I don't see how or why break is special in terms of evaluating whether it happens or doesn't happen.

The break operation is not any more special than any of the other operations. For almost all PARSE operations (except none) it matters whether they happen or don't happen - that is why the operations exist.

The reason that this discussion is about break, accept and reject is that we need to have this discussion at this point in the development process, so this blog was written about that subject in particular so that we can have this discussion. We've had the PARSE newbie discussions elsewhere - this is an advanced PARSE discussion.

Now ignoring that for a moment, let's talk about your X. The consistent answer to whether any operation X has the intended effect within any rule depends on two things:

  1. What is the effect of the rule X? This is something that will be well documented for all built-in operations, and then can be deduced for composite rules by combining the effects of their operations.
  2. What is the programmer's intention? The answer to that is something that the interpreter can't know. Given that, we assume that the programmer intends to do what they are saying they want to do. This presumes that the programmer has read the documentation. If they have, and they still don't understand, that is likely a failure of the documentation (or perhaps the programmer, but let's not get into that).

There needs to be a little consistency in the syntax of rules and how they are specified, but beyond that there doesn't need to be much consistency in how the rules behave. Their differing behavior is the whole reason why we have different operations in the first place: If they all did the same thing, we could get by with just one operation.

There are going to be PARSE operations that are newbie-friendly, and others that are more advanced. In order to understand the advanced operations, the newbie will have to read the documentation. It is our job, as advanced PARSE users, to make the documentation make sense. We should make tutorials if need be.

However, we will not be restricting the behavior of PARSE to only the newbie-friendly operations. PARSE is a powerful tool, and we will not make it less so just to satisfy people who don't want to read the documentation.

For many of us PARSE is the killer app of REBOL, the thing that we use the most. And to use this app to its full potential, you will need to read the documentation. Once you do that, you will be less of a newbie though :)

Ladislav
18-Oct-2009 7:25:45
My proposal is to axe break, since the definition above, while (maybe) understandable for everyone else, does not make sense to me. I dare to guess, that nobody used break in the above sense yet. Every example of break usage known to me is just to accept (match successfully the input), while stopping the current iteration.
Joe Coder
19-Nov-2009 19:52:17
Consistent design would have BREAK stop REBOL from evaluating remaining input further with PARSE and return TRUE. In short, it's saying "Good enough. We found it. We matched it. No point parsing further. Break off the parse."

What would ACCEPT offer the coder? How is it different from writing rules with SOME or ANY with | ?

Would ACCEPT have REBOL accept the rule at that moment, move the cursor on the input and jump to beginning of the rule again, if the coder has specified repetition.

In short, is it saying "Good enough. We want to start over the same rule on the next bit of input. Keep the parse going."? That functionality seems to exist already.

What purpose exists for REJECT? There does not seem to be any.

Ladislav
3-Dec-2009 6:20:20
At Joe: you asked, how is ACCEPT different from writing rules with SOME or ANY with |?

Here is a nontrivial recursive rule example:

a: [and b | skip a]

is equivalent (if we don't take stack space limits into account) to the following iterative construct:

a: [while [and b accept | skip | reject]]

Post a Comment:

You can post a comment here. Keep it on-topic.

Name:

Blog id:

R3-0277


Comment:


 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote
 
 

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

REBOL 3.0
Updated 26-Apr-2024 - Edit - Copyright REBOL Technologies - REBOL.net