Comments on: Replace REPLACE

I need a lot more from REPLACE than what it offers... and for quite some time now.

I know some of you do too, and that some of you have already made your own functions. I'm not suggesting a fancy dialect here (we can use PARSE enhancements for that) nor a nice template substitution like REWORD.

What I want are those few common replacement features you would find elsewhere, but at high speed. I've said it before.

For example, I often want to replace one set of chars with another. E.g. change all "-" to "_"... along with a few other chars at the same time. Such replacement can be very high speed.

This is fairly easy to write and can be done as a native. We just have a few basic decisions to make.

14 Comments

Comments:

onetom
14-Aug-2009 16:23:09 like the unix tr(ansliterate) command? i miss it sometimes too...
but i also miss a rejoin/with to assemble comma or whatever separated lists for other tools, languages or for human processing. i do a lot of meta-programming and for that it's a must.
Edoc
14-Aug-2009 17:51:38 I'd love allow the function to allow a block or map of chars to replace, so that you don't need to chain 'replace.
[OT] On a related wish, I'd love to see a series iteration function that allows refinements for /forFirst /forMiddle /forLast (as well as /forEven /forOdd and /forMatch).
Sunanda
15-Aug-2009 5:17:41 Thanks for asking.
***
I'd like speed. It's notable with strings, that the FIND part of REPLACE is more-or-less the same speed as FIND (tested by doing a REPLACE on a long string that did not contain the target) but actually making string replacements is orders of magnitude slower.
***
I use REPLACE/ALL a lot, so optimisations for that would be helpful.
It's not just me: a quick trawl of REBOL.org's script library suggests that 80% of all REPLACEs are REPLACE/ALL.
***
I'd like this to work in R3 (it does in R2):
replace/all "a1b2c3b4e5f6" complement charset ["abcd"] "" == "abcb"
It's a way of dropping unacceptable characters from a string. (A bit like TRIM/WITHOUT if such a thing existed).
Anton Rolls
15-Aug-2009 5:43:03 I have some R2 string search functions: http://anton.wildit.net.au/rebol/library/string-search-functions.r
The efficient versions of two functions in the above library, FIND-EACH-STRING and FIND-EVERY-STRING, parse the input string just once, even while searching for multiple substrings.
These functions could be modified to do replace as well.
Anton Rolls
15-Aug-2009 5:47:04 Modified to do replacing, then:
FIND-EACH-STRING would be like REPLACE
FIND-EVERY-STRING would be like REPLACE/ALL
Steeve
15-Aug-2009 14:50:23 Need something like:
replace/each [1 2 3][4 5 6]
Where 1 is replaced by 4, 2 by 5, etc... Ehanced with PARSE capabilities, we could replace litteral values by matching rules, like:
replace/each [word! integer! [2 string!]] ["toto" 5 "test"]
Maxim Olivier-Adlhoch
16-Aug-2009 22:40:22 I second Steeve's exact post... "replace chaining" occurs VERY often, is very unsexy and hard to debug.
/all should be greatly sped up, it is abysmally slow right now (exponential as sunanda points out).
Ratio
24-Aug-2009 16:46:21 At least in R2 'replace works fine!
I'am using it with no problems in my often used function:
CW: make function! [series old new] [replace/all series old new]

Steeves example lacks the first parameter, the parameter to be changed!
Of course, it must be written:
>> cw [1 2 3] [1 2 3] [4 5 6] == [4 5 6]

Result perfectly ok! - As in all cases I could see.
Where are the problems? If it doesn't work in R3 - why not??
Ratio
Ratio
24-Aug-2009 17:18:49 I never understood why there should be a totally new REBOL 3
R2 was and is excellent!
Serious programmers avoid R3. It lacks even a well designed console to explore its features.
Now I just tried to implement my just mentioned CW function into R3:
See what happened:
>> CW: make function! [series old new] [replace/all series old new] ** Script error: cannot MAKE/TO function! from: [series old new] ** Where: make ** Near: make function! [series old new] [replace/all series old new]

What a mess! :-(
Carl, I beg your pardon. What's going on in REBOL ?????
Ratio
DideC
25-Aug-2009 6:49:37 to Ratio:
'make has been changed to always pick the same number of argument : 2.
So just write
CW: make function! [ [series old new] [replace/all series old new] ]

Ratio
25-Aug-2009 19:28:56 Yes, that works. Your tip was a real time saver. Thank you, DideC!
Ratio
Carl Sassenrath
9-Sep-2009 0:22:58 I've been thinking about these REPLACE changes because they are on the short list.
There are two main goals for REPLACE:
1. speed improvements
2. allowing multiple changes (without chaining)
The first can be done in the REPLACE code as it is right now (a mezzanine.) It does not need to become a native. The method is:
1. MAKE an output series of the same size as the target.
2. Use APPEND (not CHANGE) for doing the replacement. (Avoids the CPU intensive insert bubble.)
3. When done, write the output back to the original. (We can also add a /copy refinement to not do that, and return the new series.)
So, if someone wants to make that change, we can add it right away.
The second goal is more complicated because it requires a special FIND method to search for multiple values at the same time. This method must build a tree, otherwise it ends up being N*M in speed. To avoid a lot of overhead, it probably would need to be a native (although, it may be possible put it in a vector.)
Anyway, it's a lot more work.
Carl Sassenrath
9-Sep-2009 1:31:47 Actually, in testing the theory, it appears that using APPEND does not provide a huge difference.
Here's the test code:
repl-fast: funct [ target [series!] "Series that is being modified" search "Value to be replaced" replace "Value to replace with (will be called each time if a function)" /case ][ out: make type? target length? target len: length? search while pick [ [pos: find target :search] [pos: find/case target :search] ] not case [ append/part out target pos append out replace target: skip pos len ] append/part out target tail target out ] data: to-string read http://www.rebol.com/docs.html print length? x: replace/all copy data "http" "h123456" print length? y: repl-fast data "http" "h123456" print x = y print dt [loop 10000 [replace/all copy data "http" "h123456"]] print dt [loop 10000 [repl-fast data "http" "h123456"]]

The speed difference was only about 18% faster... and for the case where the replacement string is the same size, the APPEND method will be about 8% slower.
Of course, results will depend a lot on how many matches there are... but, the method seems less worthwhile than I thought. It's always useful to measure it.
Carl Sassenrath
9-Sep-2009 1:49:03 The analysis also has me thinking that the "multi change" case may not be worth a fancy solution either (not building a tree.) The reason is the overhead in building the tree and operating the state machine... it may not be a substantial benefit over a simple comparison method.

Comments on: Replace REPLACE

Comments:

Post a Comment: