Comments on: PARSE: do you want BREAK?

Carl Sassenrath, CTO
REBOL Technologies
6-Oct-2009 18:39 GMT

Article #0262
Main page || Index || Prior Article [0261] || Next Article [0263] || 19 Comments || Send feedback

The break word is not yet defined in parse, but it can be added if desired. So, say something if you want it.

Note that we have fail now, which fails a specific rule, and moves to the next alternate:

>> parse "abc" ["abc" fail | "bcd"]
== false
>> parse "abc" ["ab" fail | "abc"]
== true
>> parse "bcd" ["ab" fail | "bcd"]
== true

But, fail does not let you fail the entire rule block, so perhaps that's a choice for break:

>> parse "bcd" ["b" break | "bcd"]
== false

Some developers have suggested a return value for break, but I'm not sure why that is useful.

Also, note that return is now implemented for returning from parse via an internal throw action.

>> parse "bcd" ["abc" | "bcd" return ('ok) | "cab"]
== ok

(Note: there's another form for return, but it's not working in A87.)



6-Oct-2009 15:28:37
If you are thinking about implementation, I am not sure you saw another proposal to consider: n BREAK

... and in the same spirit n FAIL would be possible, although not sure if that one would be so usefull ...

6-Oct-2009 15:36:39
I don't really understand because we already have BREAK which allows to exit from the current block rule (without failing). It must stay, we use it.

Actually, we need both of them (FAIL and BREAK). They are different use cases.

But we need too than they act on several layers (nested rules or rules called recursivly).

It's why we should have these enhancements.

n Fail

n Break

n Fail would allow to FAIL several levels (or blocks) at the same time. n Break, would do the same but without failing.

(by default, n = 1)

Brian Hawley
6-Oct-2009 15:52:29
I have used break in the past to work around limits in the types of languages that PARSE recognizes. The new and and not lookahead operations and if will solve some of the same problems I used to solve with rules containing break, but not all. However, I only used it to break out of any and some loops, and it is really poor for that because of not being able to break out of nested blocks.

I would appreciate it if the operation remained, and if the n break operation was added too. Don't really care about n fail.

Maxim Olivier-Adlhoch
6-Oct-2009 22:55:55
I'll just add that the n break allows one to build a rule more simply when input data supports data block terminators of various strengths.

Some of the data segment terminators wrap all levels, some just the current... with n Break, its very easy to map this logic in parse, without needing to do look ahead or backtracking...

The resulting parse will be faster, and makes data conversion for systems like EDI (MUCH?) easier to implement.

Brian Hawley
7-Oct-2009 2:09:37
Pekr, you forgot the other break proposal: To have break break out of the nearest enclosing loop, not its directly containing block. This would solve most of the likely uses of n break (probably making that proposal unnecessary), be consistent with the behavior of the BREAK function, and justify the continuing existence of the break operation.
7-Oct-2009 6:20:13
As opposed to FAIL, I see BREAK as a a bit "anorganic to Parse". Using FAIL, THEN (+eventually IF) we can easily make rules fail. E.g. instead of writing:

rule: [any [abc break | def | ghi]]

we can write

rule: [any [abc then fail | [def | ghi]]

, so we can axe BREAK without much trouble.

7-Oct-2009 6:27:28
Yet another idiom, that can be used instead of

rule: [any [abc break | def | ghi]]


rule: [not abc [def | ghi]]

, which is even shorter

7-Oct-2009 6:29:57
Sorry, lost the ANY, the correct idiom should have been:

rule: [any [not abc [def | ghi]]

Carl Sassenrath
7-Oct-2009 12:50:37
Yes, I saw the n break proposal, and even before the n can be considered, we need to justify break itself.

Over the last couple weeks many enhancements have been added to PARSE. Most of these provide a bit of extra logic to the PARSE dialect.

For example and compliments the "|" (or) in a logical way. Also, not has justification in logic.

Actions like fail and break are non-linear, radical elements, if you know what I mean. They are shortcuts... which makes me wonder if they will cause bad rule writing.

Of course, it is true that some enhancements are for ease of use in very common cases. Actions like insert, remove, and change are common in production parens, so allowing them in rules makes some sense.

So, the purpose of this blog is to justify the addition of break. Only Ladislav has provided real examples, and his point is that we don't need it.

The n break concept seems even less justified because it makes assumptions about the parse tree that entered the rule block. It might be ok for rules where the blocks are clearly nested within each other, but for named rules it could create some chaos, no?

Perhaps more convincing would be a catch and throw mechanism, where the exit point had a label of some kind.

... catch here ... [... throw here | ...]

But, I'm just tossing around ideas here, and have not thought about the details or implications of such an approach.

So, provide some good examples of break, and we can wrap this up soon.

Brian Hawley
7-Oct-2009 20:38:27
I would prefer that break break a loop instead of a rule, and skip n break altogether. Or Carl's catch/throw, except maybe with a block/rule argument to catch instead of a label (which seems too much like a goto).
9-Oct-2009 4:08:30
I see the BREAK keyword as an exception in the Parse dialect, since it is neither a nonterminal (see the Wikipedia definition) like NONE, SKIP, or FAIL, nor an operator like ANY, |, or sequence.

Being serious about axing BREAK I think it is time to propose a new operator, that would make the Parse rules more readable, while not losing advantages we had when we used the BREAK keyword.

Such an operator could be e.g. an UNTIL operator defined recursively as follows:

a: [until b c]

would be equivalent to the recursive rule:

a: [b | c a]

Iteratively, we can express the operator as follows:

a: [any [b then (d: none) fail | [c | (d: [fail]) fail]] d]

9-Oct-2009 5:05:32
Sorry, it looks, that the iterative rule is not equivalent to the recursive one. To be equivalent, the iterative rule would need to look as follows:

a: [any [[b (d: none e: [fail]) | c (e: none) | (d: [fail] e: [fail])] e] d]

9-Oct-2009 5:11:20
Still wrong, am I actually capable of writing it correctly? Seems, that third time is the charm:

a: [(e: none) any [e [b (d: none e: [fail]) | c (e: none) | (d: [fail] e: [fail])]] d]

Carl Sassenrath
9-Oct-2009 16:40:31
BrianH: although break could be defined that way (as a termination to repeated patterns), there is no distinction between repeated and non-repeated rules in the implementation. That is, ['a] actually means [1 'a]. It is simply the default. The loop still happens, but just once. So, it would be difficult for a break to determine what level to break. Some kind of special flag would need to be embedded in the state of the parser.

Ladislav: Well... don't you think the long line is much easier to understand compared with the abstracted until keyword? ;)

But, all joking aside... the a: [b | c a] pattern is very common. Why not write it just like that? Note that the A87 release will parse a lot deeper now... to the full size of the C stack.

Brian Hawley
9-Oct-2009 18:49:06
Ladislav, I'm not sure I get your non-recursive alternatives to until. Does
until b c
mean this:
any [not b c]
or does it mean this:
any [not b c] b
The recursive definition seems to mean the second, but the non-recursive translate to the first (afaict).
10-Oct-2009 12:54:36
Brian, the a: [b | c a] is the definition, my first two iterative variants above are wrong, the third one is equivalent to the recursive definition, AFAIK.
10-Oct-2009 13:09:16
This is another variant using THEN, that may look more readable?

a: [any [b then e: (d: [:e]) fail | [c | (d: [fail]) fail]] d]

(btw, neither of Brian's alternatives is true equivalent, since the first one always succeeds, while the second matches B one more time, than the recursive one)

10-Oct-2009 13:17:38
Being at it, this uses the NOT operator, and should be equivalent to the recursive definition too:

a: [any [not [b e: (d: [:e])] [c | (d: [fail]) fail]] d]

10-Oct-2009 14:49:28
A note: instead of testing the code in the interpreter, I just "mind-simulated" it. After performing real interpreter tests I posted two CureCode tickets, since the behaviour of Parse was not as expected.

Post a Comment:

You can post a comment here. Keep it on-topic.


Blog id:



 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

Updated 25-Jun-2024 - Edit - Copyright REBOL Technologies -