Exceptions
From DocBase
| Revision as of 20:49, 9 November 2010 BrianH (Talk | contribs) (→Return and exit - style) ← Previous diff |
Revision as of 20:55, 9 November 2010 BrianH (Talk | contribs) (→Closure based USE implementation - clarification) Next diff → |
||
| Line 52: | Line 52: | ||
| R2 would solve the bug by using the '''[throw]''' function attribute in both the '''use''' function and in the inner closure. | R2 would solve the bug by using the '''[throw]''' function attribute in both the '''use''' function and in the inner closure. | ||
| - | One more note to the implementation: since the inner closure is called just once, there is no need do deep copy the '''body''' block when it is defined. | + | One more note to the implementation: Since the inner closure is called just once, there seems to be no need deep copy the '''body''' block when it is defined; but the block passed to '''use''' might itself be reused, by other calls to '''use''' for instance, and using a block as a closure body binds the block to that closure even if the closure is thrown away. If we don't deep copy the block here we can't reuse it later, such as in other calls to the code that contains the '''use''' statement. |
| ====Foreach based USE implementation==== | ====Foreach based USE implementation==== | ||
Revision as of 20:55, 9 November 2010
Introduction
REBOL exceptions have been discussed at:
- Errors article
- R3 blog (Discussion about error handling)
- AltMe
- Cure Code tickets #539, #771, #851, #884, #1361, #1506, #1509, #1514, #1515, #1518, #1519, #1520, #1521, #1734, #1742, #1743, #1744
The purpose of this article is to find a coherent proposal solving the above problems (and maybe more, with a bit of luck).
Types of exceptions
The R3 Exception/Error Mechanism article mentions that there are two types of exceptions in REBOL: unwinds and throws. Since every exception type can be emulated using the other one, I decided to make a couple of speed tests to find out what the relative speed of the respective exception types is:
; the relative speed of throw is 100% base-throw: time-block [try [none]] 0,05 base-unwind: round to percent! base-throw / time-block [catch [none]] 0,05 ; == 106% error: make error! "" handled-throw: time-block [try [do error]] 0,05 handled-unwind: round to percent! handled-throw / time-block [catch [throw none]] 0,05 ; == 139%
This simple test seems to suggest, that unwinds are faster than throws, which suggests, that it is preferable to replace throws by unwinds whenever possible (and it should be always possible, as noted above; this should be handled as an implementation detail, and should not "leak" into the user space anyway, correcting the corresponding unwind bugs mentioned in CureCode).
RETURN and EXIT
As stated in the Errors article, REBOL currently uses just dynamically scoped return and exit. Problems with dynamic scoping occur when a function processes code block (or blocks) with return or exit coming from different contexts. In such cases, the behaviour looks "unnatural" or "unexpected" for most users.
Note: exit is equivalent to return #[unset!], therefore it is not necessary to discuss it separately. When mentioning "function using dynamic return", we usually mean "function using dynamic return and exit".
USE
The use function and other control functions, like the collect function below are not supposed to catch return or exit from the body block. The place where they should return should be related to the "context of body origin" instead. This is a problem for the current interpreter version, as mentioned in #539.
The use function was proposed by Brian as a good example to illustrate the problem, as well as how it can be solved. The use function is currently implemented as follows:
Closure based USE implementation
use: func [
"Defines words local to a block."
vars [block! word!] "Local word(s) to the block"
body [block!] "Block to evaluate"
] [
apply make closure! reduce [to block! vars copy/deep body] []
]
This use implementation does not work as required when:
- The inner closure catches the dynamic return, and the dynamic return call is present in the body block. The problem is, that the dynamic return is caught by the inner closure, causing the bug #539.
- The inner closure is made to use the definitional return, and a return word is present in the body block. In this case the problem is, that the make function binds the body block to the closure context, which results in binding (and redefining) any return word present in the body block, effectively causing the same bug.
R2 would solve the bug by using the [throw] function attribute in both the use function and in the inner closure.
One more note to the implementation: Since the inner closure is called just once, there seems to be no need deep copy the body block when it is defined; but the block passed to use might itself be reused, by other calls to use for instance, and using a block as a closure body binds the block to that closure even if the closure is thrown away. If we don't deep copy the block here we can't reuse it later, such as in other calls to the code that contains the use statement.
Foreach based USE implementation
Instead of using a closure, we can implement the use function as follows:
use: func [
"Defines words local to a block."
vars [block! word!] "Local word(s) to the block"
body [block!] "Block to evaluate"
] [
foreach (vars) [#[none]] body
]
Advantages:
- This implementation is simpler than the closure based one.
- This implementation is faster.
- If the code that calls the use function had a definitionally scoped return, or even if use just passed through the dynamic return, bug #539 would be corrected.
Disadvantages:
- This solution goes into an infinite loop, if all vars are set-words. I see this property as a foreach design quirk, as noted in #1751, but even if that is fixed we will still have a problem.
- This solution catches break and continue in the body as well.
Those last two disadvantages can be solved by using the following modification:
use: func [
"Defines words local to a block."
vars [block! word!] "Local word(s) to the block"
body [block!] "Block to evaluate"
] [
do foreach (vars) [#[none]] reduce [to paren! reduce [:break 'return] body]
]
Now the foreach loop in the use implementation just binds the body, and does nothing else. This way, when evaluating the body block, the foreach loop is not running, which precludes any interference between the foreach loop and any break/continue call made when the body is evaluated. This can be made even simpler with a helper function that calls break/return.
COLLECT
The use function is not the only one currently caught by this problem, so we shall discuss another example presented by Brian, the collect function. The collect function is currently implemented as follows:
FUNC based COLLECT
collect: make function! [[
{Evaluates a block, storing values via KEEP function, and returns block of collected values.}
body [block!] "Block to evaluate"
/into {Insert into a buffer instead (returns position after insert)}
output [series!] "The buffer series (modified)"
][
unless output [output: make block! 16]
do func [keep] body func [value [any-type!] /only] [
output: apply :insert [output :value none none only]
:value
]
either into [output] [head output]
]]
Notice that the given body is bound when the new anonymous function is created; func is used so that the function bodies are copied. This operation is problematic, if the anonymous function has a definitional return, since in that case all 'return word occurrences in the body are bound, which is not desirable.
FOREACH based COLLECT
collect: make function! [[
{Evaluates a block, storing values via KEEP function, and returns block of collected values.}
body [block!] "Block to evaluate"
/into {Insert into a buffer instead (returns position after insert)}
output [series!] "The buffer series (modified)"
][
unless output [output: make block! 16]
do foreach keep reduce [
func [value [any-type!] /only] [
output: apply :insert [output :value none none only]
:value
]
] reduce [body]
either into [output] [head output]
]]
Notice, that here no 'return in the given body can be bound, as opposed to the word 'keep, binding of which is wanted.
Proposed RETURN alternatives
Dynamic return with optional transparency
This alternative is inspired by the R2's [throw] function attribute, though it would likely be specified using a set-word like throw: instead.
Advantages:
- Solves the most serious traditional problems.
- Most functions would not need to specify an option, including code that calls functions like USE, though USE itself would need to specify it once or twice.
- Can call RETURN and EXIT from code blocks not contained in functions. This is common in PARSE code blocks, and manual common subexpression elimination (though that should be in functions instead).
- At least the dynamic return part is implemented already, and uses the same code that the other dynamic escapes use to propagate.
- Doesn't combine the transparency option with an option to specify a typespec for the return value, allowing them to be specified separately.
- Easy to translate code from R2.
Disadvantages:
- Bad locality for the unhandled RETURN or EXIT error (#1506), or overhead to check for that situation.
- Does not address the need to have both a function scoped RETURN as well as pass through RETURN from different contexts at the same time.
- Some people seem to question or have trouble understanding dynamic return as a concept, let alone its benefits.
- The transparency option would need to be specified more often than the comparable option for definitional return.
- We could do better than the name throw: for the transparency option.
Definitional return only
The MAKE function would bind the 'return and 'exit words in function bodies to specific values at function definition time. Those functions would only return to the function to which they are bound. The top-level versions of RETURN and EXIT would just trigger an error.
Advantages:
- Solves the USE (and COLLECT, etc...) problems (after the rewrite above).
- Functions would not need to specify an option.
- The easiest to explain (to people unfamiliar with dynamic return).
- Satisfies the need to have both a function scoped RETURN as well as pass through RETURN from different contexts at the same time.
- Can be used to implement code patterns like the definitional CATCH/THROW mezzanine pair below.
- Good locality for the unhandled RETURN or EXIT errors (#1506) is easily achievable, with no extra overhead.
Disadvantages:
- If the code block using RETURN is nested in the function body, RETURN works automatically. If not, the user needs to either bind the code block, or get the correct RETURN function otherwise (GET/SET word, etc.). For example, see the mezzanine definitional CATCH/THROW implementation below. Code blocks outside of functions are common for PARSE rules, for instance.
- No (native) way to construct a function without definitional RETURN (i.e. a function with no RETURN at all).
- Slight added overhead to function creation and memory use.
Dynamic-return-only functions vs. option of definitional-return-only functions
This is one interpretation of the "Possible Return Method" in the Errors article. In this interpretation, there will be two kinds of functions: regular and definitional. Definitional functions would redefine RETURN and EXIT, as stated above, but being definitional would be an option. Definitional functions would not catch dynamic returns. Regular functions would catch dynamic returns, but would not redefine RETURN and EXIT in their code blocks, and thus not catch definitional returns; they are the "regular" functions because the top-level RETURN and EXIT would be have to be the dynamic versions. There would be no equivalent to R2's [throw] - in theory it would be unnecessary (in practice, no).
Advantages:
- In theory this would handle more situations than pure definitional return.
- Solves the USE problem using the above FOREACH based approach, when definitional return is specified in the calling code.
- Solves the unhandled return error locality for definitional returns, though not for dynamic returns.
Disadvantages:
- The most difficult to use solution.
- Would still have the unhandled return error locality problem for dynamic return.
- You won't know whether RETURN or EXIT is definitional or dynamic, which will lead to hard-to-debug errors.
- Making definitional return optional means that this option would need to be specified in the code that calls the functions that would benefit from it, rather than the functions themselves. This means that the option would need to be specified a lot.
- There are some times when you don't want a function to catch a return, either definitional or dynamic, that is specified in its code block. The only way to do this is to make a function that does definitional return, and then reference the dynamic or definitional RETURN and EXIT functions you want to call in the code by value, rather than by name. Awkward.
- If we adopt the proposal in the Errors article to use return: as a conflated option to both specify definitional return and optionally a function return type, it will be impossible to specify a return type on a dynamic return function. These options should be separate.
- There will be whole classes of functions that will be impossible to write, and noone will be able to tell you what those classes are or why they are impossible.
- Slight added overhead to function creation and memory use for definitional return functions.
Dynamic return with a definitional return option
By another interpretation of the "Possible Return Method" in the Errors article, the return: option would only cause the redefinition of RETURN and EXIT to be function-local (definitional return), but not cause the function to ignore dynamic returns. All functions would catch dynamic returns, and RETURN and EXIT would be defined at the top level to generate dynamic returns. Definitional return would be an option.
Advantages (compared to the other interpretation):
- Fewer functions would need to specify the definitional return option (though still most that call RETURN or EXIT in code passed to mezzanine functions).
- You would have a better chance of knowing whether RETURN or EXIT would work for you.
- Slightly easier to understand.
Disadvantages (compared to the other interpretation):
- There would be even more functions that would be impossible to write, for even more confusing reasons.
- Though the unhandled return error locality problem would still only affect dynamic returns, it would happen more often.
- The awkward workaround mentioned above to not catch any RETURN or EXIT would just not work with dynamic return.
- Possibly more difficult to use than the other interpretation, though that would be debatable (if we could figure out how to do so).
Definitional return with an option to not define RETURN and EXIT, dynamic return as a fallback
This would make definitional return the default, but we could optionally specify in a function spec that RETURN and EXIT would not be bound (localized). Basically, this option would be the definitional return version of what R2's [throw] function attribute is for dynamic return. We could keep dynamic return as the fallback if RETURN and EXIT aren't rebound by function definition. The option to not redefine RETURN or EXIT (whatever we call it) should also cause the function to be transparent to dynamic returns, like R2's [throw] - to do otherwise would be too confusing.
Advantages:
- Solves the USE problem and all comparable problems (including COLLECT) without the rewrite, no buts.
- Most functions would not need to specify an option, including code that calls functions like USE.
- Addresses the need to have both a function scoped RETURN as well as pass through RETURN from different contexts at the same time.
- We can call RETURN or EXIT from code blocks that aren't in functions without performing additional (BIND, or GET/SET) operations. This can be common in PARSE rules, or manual common subexpression elimination (which should probably be in functions anyways).
- Option not conflated with the option to specify the return type of the function.
- Easy to explain, even when compared to R2's [throw].
- Easy to use.
Disadvantages:
- What do we call this option? We shouldn't call it return: - that wouldn't make sense. Perhaps throw: or something better?
- Bad locality for the error of unhandled dynamic RETURN or EXIT, or we get overhead to check for that condition (see #1506). Doesn't affect definitional return though.
- We would still have the minimal overhead of trying to catch a dynamic return even for definitional return functions.
- Two return types means added semantic overhead for the programmer to consider, even if it's minimal.
- The same slight definition-time and runtime overhead of the other definitional return models.
Definitional return with an option to not define RETURN and EXIT, no dynamic return
Once we have the definitional model with the option to shut it off, do we need dynamic return at all? Dynamic return could be dropped altogether, in theory, and all we would lose is the ability to call RETURN or EXIT from code blocks that aren't contained in any function (without performing additional operations like BIND).
Advantages (compared to the previous model):
- The top-level RETURN and EXIT functions can just trigger an error immediately, with good locality. #1506 is fixed.
- The overhead of dynamic return can be dropped too. (Not sure if that can spare anything, though. -LM-)
- Simpler semantics.
Disadvantages (compared to the previous model):
- We lose the ability to call RETURN or EXIT from code blocks that aren't in functions... (see above for the rest).
- We already have dynamic return implemented, and any overhead we almost get for free along with the other dynamic escape functions.
QUIT
- The QUIT function is useful, allowing the user to finish the work of the interpreter.
- The CATCH/quit function is useful for applications, which "need" to catch the QUIT not wanting (e.g the tested) code to escape from their control
- Having the QUIT/now function is an error (see #1743), imitating just the state that existed before CATCH/quit was introduced. If we want to have that state, it suffices to undefine CATCH/quit.
HALT
Except for bugs (counting QUIT/NOW as one too), HALT is the only exception in REBOL able to cause the test environment crash. It is desirable to achieve a situation when all exceptions are catchable (this can be easily transformed to its opposite by udefining the respective catch function/functions), therefore, a corresponding catch function was proposed in #1742.
THROW
THROW is currently a dynamic exception. The /name refinement can be used to "individualize" throws. Differences between regular THROW and named THROW:
- Named THROW is longer to write, needing the CATCH/name block name and THROW/name value name pair of expressions.
- Named CATCH does not check the context of the thrown word, which still admits naming conflicts.
- Regular CATCH catches named THROW, which is arguably an error under normal circumstances (see #1518). Most test/debug code relies on this error though.
Exactly like the dynamic RETURN, the dynamic THROW is subject to the locality problems mentioned in #1506.
None of the above mentioned problems is too serious (except #1518), e.g. #1506 can be circumvented by using an additional CATCH, but they contribute to the reasons why this construct is used just rarely.
Example mezzanine implementation of definitional CATCH/THROW
If we had a definitional RETURN in REBOL, the definitional CATCH/THROW pair could be defined as a mezzanine, using code like:
catch: func [
{Catches a throw from a block and returns its value.}
body [block!]
] [
; create a new function to have a new definitional RETURN available
; use the new definitional RETURN as THROW in the BODY
do func [] [do foreach throw reduce [:return] reduce [body]]
]
Notice, how the definitional RETURN of the new function defined in CATCH is used as definitional THROW in the BODY.
Why THROW should be kept dynamic
Definitional THROW would disable the greatest strength of the dynamic CATCH/THROW pair: They can be used to implement all other dynamic escape functions that we haven't thought of yet, or are so user-specific that they won't be added to the main language. On a practical level, this usually requires the CATCH statement and the THROW statement to be in different functions, wrapper functions that implement the exact semantics that are needed. If the code is required to be nested or explicitly bound (side effect of definitional escapes) then you lose the advantages that dynamic escapes give you, which means that you need to use some other method of doing dynamic escapes, leaving us back where we started.
Advantages of having THROW serve as the general-purpose method for implementing custom dynamic escape functions:
- It's an easy method, particularly when you use THROW/name. And you don't have to be as smart as Ladislav to do it.
- There are tricks that you can do with dynamic escapes that you can't do with definitional, and this would be an easy way to do those tricks. This includes replacing QUIT/now, which would allow us to remove that option (#1743). For that matter, all dynamic escape functions could be implemented using THROW/name, which could be useful when making safe function wrappers used by sandboxed code.
- When it's built in, you can make a standard method to catch THROW and THROW/name for debug/test code, and this method would extend to all dynamic escapes that use them. See #1520 for how.
- Having one built-in method for future user-created dynamic escapes, and then controlling that method with a built-on option, stops the escape control arms race dead in its tracks.
Disadvantages (mostly to all uses of THROW/name at the moment):
- THROW/name doesn't currently work: It doesn't get past CATCH without /name. See #1518 for details.
- No dynamic escapes work when used in an expression that accepts error! values. See #1509, #1515, #1519 and more for details.
- Locality problems of the unhandled THROW error, as discussed in #1506. See there for solutions that don't involve the definitional approach.
- To be really useful for implementing custom escape functions, they have to be unspoofable, and THROW/name is spoofable if you know the name - this is a problem for sandboxing functions. One solution for this is to make CATCH/name only catch words that are bound to the same context (EQUIV? instead of EQUAL?), as proposed in #1744. The alternate solution of including a token with the thrown value is spoofable, because someone can intercept the token and reuse it.
- Name conflicts when the flow of execution goes through code that is written by different people who aren't coordinating their efforts well enough. Also solved by #1744.
Here's an unspoofable QUIT/now replacement implemented using dynamic THROW/name with #1518 and #1744 fixed:
use [name] [ ; Local 'name makes it unspoofable
quit-now: func [retcode] [throw/name retcode 'name]
catch-quit-now: func [body [block! file! url!]] [quit/return catch/name [return do body] 'name]
]
chat: funct [
"Open REBOL DevBase forum/BBS."
][
print "Fetching chat..."
if error? err: try [catch-quit-now http://www.rebol.com/r3/chat.r none] [
either err/id = 'protocol [print "Cannot load chat from web."] [do err]
]
exit
]
Then the chat script at the url would call QUIT-NOW instead of QUIT/now/return.
