REBOL 3.0

Comments on: PARSE: how to terminate an iteration

Carl Sassenrath, CTO
REBOL Technologies
10-Oct-2009 3:02 GMT

Article #0268
Main page || Index || Prior Article [0267] || Next Article [0269] || 8 Comments || Send feedback

In parse code like this:

parse "a" [some ["a" | none (do something)]]

what causes the SOME loop to terminate? There are three possible choices:

  1. the input has ended
  2. the input has not advanced
  3. just keep looping until failure

In prior versions (up through A87) the first choice was used. But, our friend Ladislav submitted CureCode ticket #1268, a claim that iterated and recursive cases were not equivalent. And as usual, he was correct.

The problem is that the iteration will not even be attempted if the input has ended. So the "do something" never happens.

The third choice, which is technically "the most correct" can be pretty harsh. If you don't provide an explicit termination, and one of the rule alternatives is true, it loops forever. So, the line above would loop forever.

To fix it, you would need to change the line to:

parse "a" [some ["a" | none (do something) fail]]

So, that means that many of your existing parse rules would hang, because they do not include that FAIL keyword.

In A88 I'm testing an alternative: choice #2 above.

Technically, there is "an implied requirement" that at least one of the rules in a block match the input, otherwise the rule has failed. Therefore, a test can be made to determine if the input has advanced by the end of the rule block, and if not, terminate the loop.

This is a fairly easy condition to handle and for most coders to remember. It keeps our code easy-to-write and prevents code that would become an infinite loop.

Of course, I am quite certain that Ladislav the logic guru will be able to produce an example that shows why the A88 solution is not precisely correct... So, we may need to return to this issue at some later date and weigh the issue of usability.

8 Comments

Comments:

-pekr-
11-Oct-2009 2:09:19
Why do we need it? Isn't it like programmer putting forever [] in his code, and complaining, that his script hangs somewhere? :-)
Ladislav
11-Oct-2009 10:28
I would like to compare the alternatives using the recursive equivalents. So, if we use code:

a: [any b]

, the recursive equivalents are:

1. Terminate when the input has ended

a: [not end b a |]

We can see, that having implemented 3. we can obtain 1. if we replace the original b rule by the [not end b] rule, i.e. by a rule, which fails at the end of the input, or when the b rule fails.

However, we miss a transformation to use to obtain 3. from 1.

2. Terminate when the input hasn't advanced

a: [and [b d:] c: :d if (positive? offset c d) a |]

As before, it is possible to obtain 2. from 3. if we replace the original b rule by [and [b d:] c: :d if (positive? offset c d)], i.e. by a rule which fails if the b rule fails, or when the b rule does not advance, while we miss a transformation to use to obtain 3. from 2.

3. Just keep looping until failure.

a: [b a |]

This is the most flexible rule, since it can be used to obtain the previous variants when needed, as shown above.

Ladislav
11-Oct-2009 10:35:34
errata: as you may have noticed, instead of offset I should have used offset?
Carl Sassenrath
11-Oct-2009 14:29:44
Pekr: It's not the same as FOREVER. It is more like FOREACH, because it is matching against the input stream. FOREACH terminates at the end of its input.

The definition of iterative matching is: "repeat the rule block while any one of the match rules are successful."

The question is: does the input stream affect its success? For example, if you've reached the end of the input stream, it is no longer possible to match anything (other than END), so does the iteration loop remain valid or does it stop?

If we adopt choice 3 as the correct method, some of the parse bugs fixed earlier will need to be dismissed, because it will be valid for them to be infinite loops. They are not bugs.

In addition, we probably need to give users a simple method of terminating the iteration block with a successful result. I think BREAK is a good candidate for that, and it gives it a solid definition that is easy to remember.

And finally, we should consider if it makes sense for there to be other iterators. Ladislav suggested UNTIL as such a method: [until a b] (until a is true repeat b.)

Ladislav
11-Oct-2009 15:42:07
Regarding the forever and foreach. My understanding is, that the variant 1. is more like foreach, except for the fact, that the individual steps can be of varying size 1-or-more.

For the variant 3., the infinite loops certainly are of the forever kind, although it is easy to define finite loops of the 3. type, that are neither of the 1. nor 2. type.

Brian Hawley
12-Oct-2009 0:02:39
Carl, we allow FOREACH to take a block containing a set-word as its word argument, which could lead to an endless loop. For FOREACH, the increments can be of varying size as well, but 0-or-more instead. And we do this for the same reason: special tricks. And I wouldn't have it any other way, since we often have to do those special tricks.

In the case of PARSE, those special tricks could include modifying rules, incremental parsing, even manual backtracking using position setting, which might leave the position before where it was at the start. How else are we going to make our workarounds while we wait for REVERSE rule to be implemented?

Don't hamper advanced parsing. Make SECURE evals apply to PARSE instead. And that proposal for BREAK sounds good too.

About UNTIL, does "until a is true repeat b" mean up to a, or up to and including a? I prefer the former, as it is more flexible.

Ladislav
12-Oct-2009 7:35:33
I put the until proposal to the Parse enhancements article. It uses the
a: [b | c a]
definition. The other basic variant can be easily written using
a: [until [and b] c]
Carl Sassenrath
12-Oct-2009 17:01:52
Posted new blog with the decision.

Also, SECURE eval in PARSE is already implemented.

Post a Comment:

You can post a comment here. Keep it on-topic.

Name:

Blog id:

R3-0268


Comment:


 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote
 
 

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

REBOL 3.0
Updated 24-Apr-2024 - Edit - Copyright REBOL Technologies - REBOL.net