I need a lot more from REPLACE than what it offers... and for quite some time now.
I know some of you do too, and that some of you have already made your own functions. I'm not suggesting a fancy dialect here (we can use PARSE enhancements for that) nor a nice template substitution like REWORD.
What I want are those few common replacement features you would find elsewhere, but at high speed. I've said it before.
For example, I often want to replace one set of chars with another. E.g. change all "-" to "_"... along with a few other chars at the same time. Such replacement can be very high speed.
This is fairly easy to write and can be done as a native. We just have a few basic decisions to make.
like the unix tr(ansliterate) command? i miss it sometimes too...
but i also miss a rejoin/with to assemble comma or whatever separated lists for other tools, languages or for human processing. i do a lot of meta-programming and for that it's a must.
I'd love allow the function to allow a block or map of chars to replace, so that you don't need to chain 'replace.
[OT] On a related wish, I'd love to see a series iteration function that allows refinements for /forFirst /forMiddle /forLast (as well as /forEven /forOdd and /forMatch).
Thanks for asking.
I'd like speed. It's notable with strings, that the FIND part of REPLACE is more-or-less the same speed as FIND (tested by doing a REPLACE on a long string that did not contain the target) but actually making string replacements is orders of magnitude slower.
I use REPLACE/ALL a lot, so optimisations for that would be helpful.
It's not just me: a quick trawl of REBOL.org's script library suggests that 80% of all REPLACEs are REPLACE/ALL.
I'd like this to work in R3 (it does in R2):
replace/all "a1b2c3b4e5f6" complement charset ["abcd"] ""
It's a way of dropping unacceptable characters from a string. (A bit like TRIM/WITHOUT if such a thing existed).
I have some R2 string search functions:
The efficient versions of two functions in the above
library, FIND-EACH-STRING and FIND-EVERY-STRING,
parse the input string just once, even while searching for multiple substrings.
These functions could be modified to do replace as well.
Modified to do replacing, then:
FIND-EACH-STRING would be like REPLACE
FIND-EVERY-STRING would be like REPLACE/ALL
Need something like:
replace/each [1 2 3][4 5 6]
Where 1 is replaced by 4, 2 by 5, etc...
Ehanced with PARSE capabilities, we could replace litteral values by matching rules, like:
replace/each [word! integer! [2 string!]] ["toto" 5 "test"]
I second Steeve's exact post... "replace chaining" occurs VERY often, is very unsexy and hard to debug.
/all should be greatly sped up, it is abysmally slow right now (exponential as sunanda points out).
At least in R2 'replace works fine!
I'am using it with no problems in my often used function:
CW: make function! [series old new] [replace/all series old new]
Steeves example lacks the first parameter, the parameter to be changed!
Of course, it must be written:
>> cw [1 2 3] [1 2 3] [4 5 6]
== [4 5 6]
Result perfectly ok! - As in all cases I could see.
Where are the problems? If it doesn't work in R3 - why not??
I never understood why there should be a totally new REBOL 3
R2 was and is excellent!
Serious programmers avoid R3. It lacks even a well designed console to explore its features.
Now I just tried to implement my just mentioned CW function into R3:
See what happened:
>> CW: make function! [series old new] [replace/all series old new]
** Script error: cannot MAKE/TO function! from: [series old new]
** Where: make
** Near: make function! [series old new] [replace/all series old new]
What a mess! :-(
Carl, I beg your pardon. What's going on in REBOL ?????
'make has been changed to always pick the same number of argument : 2.
So just write
CW: make function! [ [series old new] [replace/all series old new] ]
Yes, that works. Your tip was a real time saver. Thank you, DideC!
I've been thinking about these REPLACE changes because they are on the short list.
There are two main goals for REPLACE:
1. speed improvements
2. allowing multiple changes (without chaining)
The first can be done in the REPLACE code as it is right now (a mezzanine.) It does not need to become a native. The method is:
1. MAKE an output series of the same size as the target.
2. Use APPEND (not CHANGE) for doing the replacement. (Avoids the CPU intensive insert bubble.)
3. When done, write the output back to the original. (We can also add a /copy refinement to not do that, and return the new series.)
So, if someone wants to make that change, we can add it right away.
The second goal is more complicated because it requires a special FIND method to search for multiple values at the same time. This method must build a tree, otherwise it ends up being N*M in speed. To avoid a lot of overhead, it probably would need to be a native (although, it may be possible put it in a vector.)
Anyway, it's a lot more work.
Actually, in testing the theory, it appears that using APPEND does not provide a huge difference.
Here's the test code:
repl-fast: funct [
target [series!] "Series that is being modified"
search "Value to be replaced"
replace "Value to replace with (will be called each time if a function)"
out: make type? target length? target
len: length? search
while pick [
[pos: find target :search]
[pos: find/case target :search]
] not case [
append/part out target pos
append out replace
target: skip pos len
append/part out target tail target
data: to-string read http://www.rebol.com/docs.html
print length? x: replace/all copy data "http" "h123456"
print length? y: repl-fast data "http" "h123456"
print x = y
print dt [loop 10000 [replace/all copy data "http" "h123456"]]
print dt [loop 10000 [repl-fast data "http" "h123456"]]
The speed difference was only about 18% faster... and for the case where the replacement string is the same size, the APPEND method will be about 8% slower.
Of course, results will depend a lot on how many matches there are... but, the method seems less worthwhile than I thought. It's always useful to measure it.
The analysis also has me thinking that the "multi change" case may not be worth a fancy solution either (not building a tree.) The reason is the overhead in building the tree and operating the state machine... it may not be a substantial benefit over a simple comparison method.|
You can post a comment here. Keep it on-topic.