REBOL 3.0

Comments on: PARSE: split the function?

Carl Sassenrath, CTO
REBOL Technologies
9-Oct-2009 18:56 GMT

Article #0266
Main page || Index || Prior Article [0265] || Next Article [0267] || 9 Comments || Send feedback

As you know, parse has two modes:

  1. A major-mode: parse according to BNFish rules specified in layered blocks. Returns TRUE or FALSE.
  2. A mini-mode: split a string according to simple delimiting rules. Returns a block of results.

The second is a shortcut for very common types of string delimiting, such as CSV. In fact the first can implement the second.

It has been suggested that these two behaviors do not belong in the same function. An important point is that they return different types of results.

This change is not difficult, other than picking a new name. In fact, the new function could be a mezzanine, not a native.

I want to poll the userbase to determine the level of interest in this change.

9 Comments

Comments:

Henrik
9-Oct-2009 15:38:43
I agree. It's a cryptic way to split a string. Besides we already have a SPLIT function in R3.
Hostile Fork
9-Oct-2009 19:15:17
Because I didn't know about split, I suggested lexify which seemed terminologically correct.

But split is less esoteric and looks like it does the same thing! Hm, why didn't I think of that word.

Will call out one thing I learned while making USCII, which was the control codes in ASCII for building hierarchical records. You can find them starting at 28:

http://github.com/hostilefork/uscii/blob/master/specifications/uscii-5x7-english-c0.r#L759

There's "File Separator, Group Separator, Record Separator, and Unit Seperator". Pictorally, I went with comma for unit separator, and line feed for record separator (because that matched history). I'm open to suggestions on File/Group.

It seems that if you really wanted to do something special and interesting for strings that were delimited, you might want to offer structure. That could impress the pants off of some people.

But if such an exercise is beyond the scope of what you're interested in, I say drop the functionality on parse and stick with split. That's a nice abstraction.

Maxim Olivier-Adlhoch
9-Oct-2009 21:55:54
+1 keep split, dump parse minimal.

all R2 scripts have to be rebuilt anyways... no point in trying to keep compatibility.

I've had to build my own CSV converters anyways cause the simple parse ... well ... mileage may vary.

Hostile Fork
10-Oct-2009 1:35:09
By the way, one thing is that having a binary result means you have to jump through hoops to determine where the parse stopped if it failed. This makes it hard to give a meaningful error.

An easy fix to this would be to offer a /position refinement, which returned the position the parser was at when it was done. Then you could say:

pos: parse/position input rules
if pos == tail pos [
    print "Success!"
] [
    print rejoin ["Parse failed at: " mold pos]
]

(You'd have to compare to tail pos and not tail input if you allow series switching.)

Oldes
10-Oct-2009 6:44:59
I would drop the minimal parse mode in favor of better split function.

Also the Hostile's /position refinement looks useful, if it's possible.

BTW: in REBOL we have tail? action so we can write:

print either tail? pos [
    "Success!"
][
    ["Parse failed at:" mold pos]
]
Hostile Fork
10-Oct-2009 22:48:18
Thanks for the tip, Oldes. Always seems to be a shorter way to write things.

I do realize that returning a position contradicts the "always returns true or false" argument I made earlier for one of the reasons minimal parse seemed iffy. But somehow, having the refinement being the name of the return type helps it be more obvious.

Always returning a position could work too, but looks a little awkward in the average case:

print either tail? parse input rules [
   "Success"
] [
   "Failure"
]

So much of these decisions remind me of the "Quality" concept Robert Pirsig described in Zen and The Art of Motorcycle Maintenance.

Hostile Fork
13-Mar-2010 11:05:54
I notice that the "split" functionality is still in PARSE, which is confusing now that /all is the default behavior for block-form rules. As an example of the misunderstandings, see RebolTutorial's StackOverflow question on the topic:

http://stackoverflow.com/questions/2438177/parse-and-charset-why-my-script-doesnt-work

It's been a while since this was brought up and not too many comments. Still, does anyone see a reason why split functionality and the /all refinement shouldn't be dropped from parse entirely in Rebol 3?

Brian Hawley
15-Mar-2010 4:22:43
"Still, does anyone see a reason why split functionality and the /all refinement shouldn't be dropped from parse entirely in Rebol 3?"

Because both are used a lot, still. Even in mezzanine code in R3. And if simple parse will do the job for you, it will do it much more quickly than the rule-based variant.

Hostile Fork
16-Mar-2010 12:36:03
"Because both are used a lot, still. Even in mezzanine code in R3."

Above, the 4 Rebol programmers besides you and Carl have said the legacy isn't important enough to not redefine split. The question from Reboltutorial shows the liability of parse/all.

I can't imagine the mezzanine changes would take more than a moment of your time.

Post a Comment:

You can post a comment here. Keep it on-topic.

Name:

Blog id:

R3-0266


Comment:


 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote
 
 

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

REBOL 3.0
Updated 24-Apr-2024 - Edit - Copyright REBOL Technologies - REBOL.net