REBOL 3.0

Comments on: PARSE: TO and THRU multiple

Carl Sassenrath, CTO
REBOL Technologies
29-Sep-2009 7:18 GMT

Article #0256
Main page || Index || Prior Article [0255] || Next Article [0257] || 6 Comments || Send feedback

A84 provides a first draft implementation of the TO and THRU commands that will match multiple targets.

For example, to match CR, LF, or END in a string, you can write:

to [cr | lf | end]

Main rules:

  • For blocks, you can match a single input value. Use QUOTE for special literals.
  • For strings, the match can be made with a string, char, or integer char value. The match is case-insensitive unless the /case refinement is used.
  • For binary, the match can be made with binary!, integer byte value (lowest 8 bits of the integer), or char value (less than 256.)
  • Each target can be followed with a paren for taking action on the match. (Allows you to set a variable if you need to know which target you hit.)

A few special notes:

  1. Do not forget the or-bar to separate the targets.
  2. Only singular match rules are supported at this time. Do not use complex rules.
  3. Can be very CPU intensive. Don't use string targets where char targets are wanted (e.g. use #"a" not "a" when possible.) Also be aware that using variables for targets will slow it down. (No target caching as of yet.)
  4. You cannot mix string and binary types. Remember that strings are Unicode-oriented and binary is encoded data (such as UTF-8 or anything else.)

Example:

This code removes all CRs and LFs from the string:

parse str [any [to [cr | lf] remove skip]]

Count the number of CR and LF chars in a string and display them:

cr's: lf's: 0
parse str [any [thru [cr (++ cr's) | lf (++ lf's)]]]
?? [cr's lf's]

6 Comments

Comments:

-pekr-
29-Sep-2009 4:16:42
Are we able to limit the look-up? E.g. I know that some phone number format will follow, but that it will not take more than 20 chars, so I would like to state LIMIT 20 before the to/thru multiple, to prevent it eventually to scan deeply ... just an idea ...
Richard
29-Sep-2009 4:40:29
That would be like the curly brace syntax in regular expressions to supply a lower and upper bound on the number of matches, and could be handy.

Has an analysis been done to compare the facilities in regular expressions to those in Rebol's 'parse'? Since we know that REs have been used successfully for years it would be reassuring to know that there is an equivalent for each facility that REs support.

Ladislav
29-Sep-2009 7:54:09
Generally, Parse expressions are strictly more powerful than REs. Some notes can be found e.g. in my "Parse versus RE" article. Otoh, if you compare e.g. PEGs (compatible with Parse) and REs, you will find out, that even though they use "the same operators", the meaning is different, so, the transcription isn't as simple as it looks at the first sight.
Ladislav
29-Sep-2009 8:14:20
the

a: [to b]

rule should generally be equivalent to

a: [and b | skip a]

, but it actually isn't, since the recursive rule is limited by the actual parsing stack size. It may be useful to know, what is the planned size of the Parse stack.

Edoc
29-Sep-2009 10:11:25
These parse enhancements should smooth out the learning curve for new users. You've just made parse-- already one of the best features of REBOL-- a lot more friendly and convenient for everyday use.
Giuseppe Chillemi
29-Sep-2009 11:44:33
Could be possible to use patterns for the targets ?

Post a Comment:

You can post a comment here. Keep it on-topic.

Name:

Blog id:

R3-0256


Comment:


 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote
 
 

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

REBOL 3.0
Updated 26-Apr-2024 - Edit - Copyright REBOL Technologies - REBOL.net