REBOL 3.0

Comments on: PARSE: INSERT and REMOVE can be handy

Carl Sassenrath, CTO
REBOL Technologies
25-Sep-2009 23:17 GMT

Article #0254
Main page || Index || Prior Article [0253] || Next Article [0255] || 19 Comments || Send feedback

Over the years, perhaps one of the most requested enhancements was to allow insert and remove (and also change) directly in the parse dialect.

You will find the first two in A83. Check them out. For example:

parse d: "ac" [to "c" insert "b"]
probe d
"abc"
parse d [to "b" remove 1]
"ac"

Note that the input is properly adjusted for the action. It advances past the insert:

parse d: "ac" [to "c" insert "b" "c"]
true

and stays at the beginning of the remove:

parse d [to "b" remove 1 "c"]
true

Note that with REMOVE you can:

  • use a positive integer to remove forward
  • use a negative integer to remove backward
  • use an index variable to remove to a specific location

Here's an index variable example:

parse "abxyzcd" [thru "b" m: to "c" remove m]
"abcd"

Also, with insert you can use the optional keyword ONLY to insert blocks-as-a-whole.

Note that the speed of insert and remove will depend on the size of the string (binary, block) being parsed. If you expect a large number of inserts or removes, it may be better to use an output string, and not make the inserts or removes at all.

But, for shorter strings...

dt [loop 1000000 [parse copy http://www.rebol/ [thru "rebol" insert ".com"]]]
0:00:01.031

So, about a million parse calls per second -- including entry, recovery overhead, and Unicode internals. Also, that number includes copying the original URL string, not part of the parse itself.

19 Comments

Comments:

Edoc
25-Sep-2009 22:29:51
Beautiful. I've wanted this for a while, mainly because I use it so often, and this makes the parse rules much cleaner. Thanks
Brian Hawley
25-Sep-2009 23:08:57
I'm glad to see this finally happen. These modifiers will make life easier for everyone :)

A couple questions though:

  • change? I hope we get this one too, if only for optimization.
  • The original proposal for insert treated its value argument the same way that the quote operation does, including the special treatment of parens. Will that feature be included in a83?
Carl Sassenrath
26-Sep-2009 2:57
BrianH, not yet, but we can define it.
RobertS
27-Sep-2009 14:19:51
If we have AND ANY SOME NOT would it not be useful to have ONE or ONCE or SINGLE ?
Steeve
27-Sep-2009 15:32:12
Sorry, i don't like the way it goes...

REMOVE should accept any rule as argument and remove the whole area matched by the rule. I don't like to have to calculate an offset.

>>parse d [to "b" remove 1] should be: parse d [to "b" remove skip]

It's more obvious with something like this:

parse "abcded" [ any [ "b" remove -1 | "cd" remove -2 | skip ] ]

Should be remplaced by:

parse "abcdef" [ any [ remove "b" | remove "cd" | skip ] ]

You see what i mean ? there is no need to calculate an offset or to set an index. Remove should "remove" the following matched rule. Nothing more, nothing less.

Carl Sassenrath
27-Sep-2009 22:23:14
RobertS: ONCE is the default for all rules. Or, are you thinking of some other feature?

Steeve: I've been trying to avoid that format (prefix action with single argument, plus requires internal state storage.) But, it does look cleaner in the lines you've written above.

It would mean that to write a complex match for REMOVE, you would always need to use a block.

What do other users think of this proposal? Let me know soon, and this change can be in the next release.

Also, think about the CHANGE command... which also needs a similar set of index positions. I assume it would work the same way as REMOVE.

Brian Hawley
28-Sep-2009 1:33:52
Losing the current proposed format for change and remove loses the integer offset form, which would be a loss of half the functionality of those proposals. And, for that matter, the only form that currently works for remove (see bug#1239). I understand that the original proposals are more friendly, but the new proposals are more powerful and efficient. I'll enjoy them either way.
-pekr-
28-Sep-2009 6:11:18
Maybe index based scenario does not look so well, but it also allows to use named markers, no? E.g. in following example we want to remove copied input:

mark: copy stuff to "something" remove mark

I can understand both aproaches and I have no problem with either ...

Edoc
28-Sep-2009 11:30:58
Although Steeve's proposal appears to be more readable, I'm pretty used to leading with the string/pattern to be matched. So this code fits with the way I've learned to think about parse:

parse "abcded" [any ["b" remove -1 | "cd" remove -2 | skip ]]

Maarten
28-Sep-2009 14:46:07
Carl, continuing about break returning current, and deep blocks. I've always felt we need an at/deep, as well as index?/deep

Those two would allow returning en re-entering deep blocks, essentially turning them into stacks. Need head/outer as well then.

So index?/deep some-val 1x5x12x7

at/deep head/outer some-val 1x5x12x7

Note that this assumes that head/outer always gives you the main entry point. I am pretty sure you know more elegant designs than this, but you get the point.

Brian Hawley
28-Sep-2009 15:56:47
HEAD/outer wouldn't work without a reference to the outer structure or some structure that refers to it, since you can't know which of many potential structures from which your block is referenced that you are looking for. Blocks aren't really nested, just referenced, and those references aren't exclusive.
RobertS
28-Sep-2009 21:35:52
My bad. The docs for AND say: and rule match to the rule, but do not advance the input (allows matching multiple rules to the same input)

I misunderstood/misread this as and ruleS and so my thought of single match against multiple rules, i.e., one ruleS Now I understand this as [rule1 and rule2] rather than [and [rule1] [rule2]] Regardless, a-83 XP gives me >> parse "ababead" [any [thru "c"] [thru "b"] [thru "d"]] == true >> parse "ababead" [and [thru "c"] [thru "b"] [thru "d"]] == true which seems odd

Brian Hawley
28-Sep-2009 21:45:53
RobertS, and currently doesn't work. See bug#1238 for details.
RobertS
28-Sep-2009 21:47:42
>> parse "ababead" [some [thru "c"] [thru "b"] [thru "d"]]

== false

>> parse "ababead" [any [thru "c"] [thru "b"] [thru "d"]]

== true

>> parse "ababead" [and [thru "d"]]

== false

>> parse "ababead" [and [thru "c"] [thru "b"] [thru "d"]]

== true

>> parse "ababead" [and [thru "b"] thru "d"]

== true

>> parse "ababead" [and [thru "d"]]

== false

>> parse "ababead" [and thru "d"]

== false

>> parse "ababead" [and [thru "b"] thru "d"]

== true

>> parse "ababead" [and [thru "b"] [thru "d"]]

== true

>> parse "ababead" [and [thru "b"]]

== false

The AND cases don't make sense to me; the SOME and ANY are fine. Granted, I now realize AND is infix and this is not intended.

Brian Hawley
28-Sep-2009 23:01:58
AND is not infix, it's prefix. And it's broken: see bug#1238.
Brian Hawley
28-Sep-2009 23:34:38
About your examples (minus the dups):
>> parse "ababead" [and [thru "d"]]
== false
Goes thru the last char, then backtracks. No rule matches the string, so it fails.
>> parse "ababead" [and [thru "c"] [thru "b"] [thru "d"]]
== true
The thru "c" fails, but the and doesn't fail because of bug#1238.
>> parse "ababead" [and [thru "b"] thru "d"]
== true
Goes thru the second letter, backtracks to the start, then goes through the last letter.
>> parse "ababead" [and thru "d"]
== false
Same as the first.
>> parse "ababead" [and [thru "b"] [thru "d"]]
== true
Same as the third.
>> parse "ababead" [and [thru "b"]]
== false
Same as the first, except it goes through the second character instead of the last before backtracking to the beginning.

Aside from the error in the second example, it works as proposed.

Anton Rolls
30-Sep-2009 11:44:08
Steeve's proposal seems very useful to me too. Couldn't we have both? Perhaps Steeve's example's strings might have to enclosed in blocks to ensure they are interpreted as rules, eg:

remove ["b"]

Maxim Olivier-Adlhoch
30-Sep-2009 13:58:43
I much prefer Steve's functionality, simply because we are within a parse context.

Parse is usually used to match stuff and act accordingly. its very easy to build rules which match patterns, but its almost impossible to create indexes when rules are nested.

the way I see it, building index-based remove is as simple as defining a rule like so:

remove [11 skip] 
which has the added advantage that it can fail and roll-back if there aren't 11 items in the series. making removes safe for alternative rules, where you match larger patterns at the top of the rules.
Steeve
30-Sep-2009 14:39:13
And you even don't need to enclose it in a block.

remove 11 skip

Post a Comment:

You can post a comment here. Keep it on-topic.

Name:

Blog id:

R3-0254


Comment:


 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote
 
 

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

REBOL 3.0
Updated 24-Apr-2024 - Edit - Copyright REBOL Technologies - REBOL.net