PARSE: BREAK vs ACCEPT

17-Oct-2009: Rewrote this blog to make it more clear. Earlier, I tried to condense it into a few examples, but that didn't make sense.

Within the parse dialect, blocks are used for holding alternative rules (separated with or bars "|".)

Normally, the rules within a block should be written in a simple, clean way where only match words and values are given, along with the paren productions that get evaluated. When you write parse rules, effort should be made to keep the rules simple like that. The normal backtracking of the parser can handle most situations.

However, advanced users may need a bit more control for unique parsing situations. Some of these include:

	fail	explicitly fail a single rule, skip to the next alternative (if it has one).
	break	explicitly exit the entire rule block, skip all alternatives.
	return	explicitly exit all rules, return from the `parse` function.

There is one other case: exit back through multiple rule blocks. But, let's discuss that in a separate article.

For break there is an additional issue: does it cause the overall rule block to succeed or fail? For example, given some rule block:

[a b | c break | d]

if the break happens, does that block succeed or fail?

The answer is that you cannot tell just by looking at the rule block. You need to know its repetition limits. For example, it could be any of these:

opt   [a b | c break | d]  ; 0 to 1
any   [a b | c break | d]  ; 0 to infinity
some  [a b | c break | d]  ; 1 to infinity
while [a b | c break | d]  ; 0 to infinity (input independent)
3     [a b | c break | d]  ; 3 only
3 5   [a b | c break | d]  ; 3 to 5
[a b | c break | d]  ; 1 only

If the break occurs before the minimum count, then the rule fails. Otherwise, it succeeds.

Recently we added a new keyword, reject. The purpose of this word is to force the entire rule to fail. This was necessary because there was no other mechanism for explicitly failing the entire rule block.

After defining reject, it theoretically made sense to define accept as the opposite: force the entire rule to succeed.

Unfortunately, both those definitions are inadequate unless we also state their effects on the repetition itself. Does reject fail the entire repetition loop, or does it depend on whether the limits are also satisfied? This same question is also valid for accept.

This was not a problem for break because it is defined to break the loop. However, it does not define the success or failure of the overall rule, because you must consider the rep limits.

You can see here that we are dealing with two variables:

stopping the repetition
success or failure of the overall rule (including any repetition)

So, to define the required actions, the effects on both of those variables must be clearly stated.

For example, we can clearly define:

break

stop repetition. Succeed only if repetition limits are satisfied.

But, are these the correct definitions for the other two words?

	accept	stop repetition. Always succeed, regardless of the rep limits?
	reject	stop repetition. Always fail, regardless of the rep limits?

Those are questionable, aren't they? So, what's the correct behavior?

Let's get these words defined, or toss them out, so we can finalize parse for R3.

16 Comments