Comments on: Post your favorite new PARSE examples

With so many new parse features in R3, various types of operations should be documented as simple examples.

For example, while working on website revisions I wanted to use R3 to strip out some of the obsolete HTML tags, such as the old FONT markups like:

<FONT SIZE="4" FACE="Arial, Helvetica">About REBOL's Technology</FONT>

And, in fact, all such FONT tags and end tags, regardless of attributes.

Here's the parse code I used. It also shows how to make two passes over the same file because some of the end tags may not be properly paired.

parse page [
    top: while [to "<font" remove thru ">"]
    :top while [to "</font>" remove thru ">"]
]

Notice the use of while rather than some. This is important because remove causes the position index to not advance. See the new note in Parse Summary about this.

Now, I invite you to post your own useful little R3 parse examples. Thanks. We'll collect some of them for use as examples in the parse documentation.

23 Comments

Comments:

Steeve
23-Dec-2009 16:42:45
flatten: funco [data [block!] /deep][ deep: if deep [[:data]] parse data [ while [to block! data: change skip data/1 deep] ] head data ]
>>flatten [[[1]] [[2]][[3]]] == [[1] [2] [3]] >>flatten/deep [[[1]] [[2]][[3]]] ==[1 2 3]
Steeve
23-Dec-2009 18:04:17
mixin: funco [a [series!] b [series!] /local v][ parse a: copy a [ some [skip if (v: first+ b) insert v] a:] ] head clear a ]
>>mixin "12345" "abcdefgh" =="1a2b3c4d5e"
>>mixin [1 2 3 4 5 ] "abcdefgh" ==[1 #"a" 2 #"b" 3 #"c" 4 #"d" 5 #"e"]
Alan Macleod
24-Dec-2009 1:08:53 Steeve, I just had the need for mixin the other day...converting csv files to name/value pairs
Edoc
24-Dec-2009 10:08:05 yeah, steve -- i love that mixin func!
Steeve
27-Dec-2009 7:55:36
fnum: funco [ n [number! money!] mask [string!] /local num c drop merge ][ num: parse/all trim/with form abs n "$%" "." mask: parse/all mask "." drop: [if (c: first+ num) change skip c] merge: [ and #"0" [drop | skip] | and #"9" [drop | change skip #" "] | change [#"+" (c: pick "+-" positive? n)] c | change [#"-" (c: pick " -" positive? n)] c | skip ] parse mask: reduce [reverse num/1 reverse mask/1 num/2 mask/2][ set num string! into [any merge] set num string! into [insert #"." any merge] ] append reverse mask/2 any [mask/4 ""] ]

-12345.12 " 999999" -> " 12345" -12345.12 "-000000" -> "-012345" -12345.12 "-000 000.000 $" -> "-012 345.120 $" -12345.12 "-999 999.999 $" -> "- 12 345.12 $" 123456.12 "£ 999,990.000 +" -> "£ 123,456.120 +" 123456.12 "£ 999,990.000 -" -> "£ 123,456.120 "
shadwolf
27-Dec-2009 20:49:46 my favorite New parse function, Is the one i will do for the next area-tc... When the new VID will allow me to use properly fonts at my taste.
(well in fact that will be probably steeve who will write it... i just don't understand parse... sorry..)
=^_^=
RobertS
28-Dec-2009 11:04:49 In the Parse summary at Input position must change
The parse function is about matching the input stream with given rules. In some cases, a rule may succeed, but the input position did not change. For example:
it seems to me that for a newbie the examples given make little sense without knowing the value of the word str used in the examples (likely "abc" was assumed )
xRatio
31-Jan-2010 1:01:45 (at) Carl
PARSE with its "rules" is the worst function in REBOL. Almost nobody understands it.
PARSE also has a peculiar "internal limit". We had severe problems with that!
We use parse only in its basic version. But even here the option parse/all is extremely misleading. It's ONLY!
Bugs should not be called a "dialect", Carl ;-)
Is EACH udf or higher REBOL function a "dialect" ?? ;-).
Cheers, xRatio
xRatio
31-Jan-2010 1:05:17 (at) Carl
you certainly know the PARSE-header-date function.
It produces LOTS of errors, parse out of limits, and wrong results returning NOW.
We replaced it using just basic REBOL:
digits: charset "0123456789" x: find str digits ; we just search first digit (day), make date! does the rest make date! trim/lines x ; trim/lines dels any double spaces - appearing often!
Checked with thousands of very different emails.
Thanks to basic REBOL it works (instead of PARSE-header-date function). Cheers, xRatio
xRatio
31-Jan-2010 1:08:01 Nobody explained, why we need R3.
What's fundamentally new in R3 ??? NOTHING.
R2 works fine.
R2 certainly can be improved, like any other computer program.
R2 is excellent, R3 -sorry- nonsens.
For nmarketing purposes a good REBOL DOCU is urgently needed, not a so-called R3!
xRatio
Henrik
1-Feb-2010 10:28:54
PARSE also has a peculiar "internal limit".

Could you describe this limit? Thanks.
xRatio
1-Feb-2010 21:27:46 We got these failures e.g. with a few hundred TO: fields in emails.
Check it yourself. - With real applications.
Basic REBOL is excellent. We do not want and need this "PARSE" with its "rules".
I am really angry: we wasted much time to find out that higher functions in REBOL like import-email are so buggy.
We completely abolished the parse oriented "import-Email".
Using basic REBOL is much better.
Carl,
REBOL/Core should never provide higher functions. Expierenced developers write them themselfs.
But there are hundreds of totally superflous functions, confusing not only beginners.
Carl,
Users need well documented, reliably working BASIC functionalities.
On the basic level REBOL is fast and excellent, but still has problems with GC.

xRatio
Henrik
2-Feb-2010 3:27:16
Check it yourself. - With real applications.

Thanks for describing the limit.
Oldes
7-Feb-2010 14:46:11 xRatio: you mix so many topics in one. Simply if you don't like PARSE, don't use it. Go to use PERL. I like PARSE and I'm not alone. Parse one of the main reasons why I use REBOL.
And I don't know, what you mean by "higher functions" but if you need just really basic REBOL, we have it - it's named REBOL/Base and you can find it in the SDK. And if you mean that "import-email" is buggy, so why you don't fix it? It's source is available to everybody. Or give us your much more better version.
xRatio
9-Feb-2010 0:44:03 (at) Oldes
Basic parse is quite ok. I do not "mix" anything.
Certainly we do not need an email "object" constructed via a parse ruled and very buggy "import-email" function.
The email itself is its own "object". ;-)
If we really want an email "object" we create it without any parse rules with a few lines of basic REBOL like this, accepting even files :-)
Ximport-email: make function! [ file [string! file!] ][
header: copy [] data: case [ file? :file [read/lines file] string? :file [PARSE/ALL :file "^/"] ] n: 0 foreach line data [ n: n + 1
if all[empty? :line not empty? header] [ ;content insert insert tail header make set-word! "Content" form replace/all skip data n "" "^/" BREAK ; done ] either xfield: find/tail :line ": " [
set-word: make set-word! copy/part line -3 + index? xField ; leftstr xfield: trim/head xfield ; del any leading blank its a delimiter
; case double field entries like Received: either spot: find header set-word [insert tail spot/2 rejoin ["^/" form set-word ": " xField] ] ; str-append, set-word also copied [insert insert tail header set-word xField] ; normal case ][
; any lines belonging to a previous fíeld, often beginning with tab, we copy with tab unless empty? header [insert tail last header line] ; str-append ] ] ; just for debug ;-) header: make system/standard/email header print dump-obj header header ]

That's all. Compare these few lines with the -sorry- mess in the SDK using parse rules.
PARSE with its "rules" aka "regular expressions" produces almost never what you expect. Much too complicated (and buggy).

Cheers, xRatio
Oldes
9-Feb-2010 8:45:29 xRatio: Parse produces always what "I" expect. Maybe the problem is that "you" have a bad expectation.
I would not want to use your version imho. The way how you break the string into block is a really bad way according me. It consumes quite a lot of memory, especially if you get a big email with some attachments.
I'm sure we will write some better import-email function for R3. The ideal way is to parse the email in one pass and not multiple passes as you do.
And if you say that the function is "very buggy" - I can see only one unfixed bug related to this function. The function is a compromise between speed and functionality. For me it's over functioned anyway as I don't know, why I should have all the header fields parsed into object. Usually you need just date, from, subject and data. The rest you can parse once you need it.
Anyway... we are out of topic here again. Why you don't join us on Altme? Or add a wish ticket on CureBase.
xRatio
10-Feb-2010 23:53:01 (at) Oldes
The talk is about PARSE, used e.g. in REBOLs "import-email". So there seems nothing "out of topic".
Thanks for your invitation. As often said: Cannot write in Altme. Tried it several times. Have no time to deal with such ridic problems again and again.
What I posted as code above - after the explicit remark: "If(!!) we really want.."
is in reliability, speed and memory usage an extremely simplified and improved example of standard "import-email".
There are no "multiple passes", and no content/attachments needed. Better you look more carefully at this example before you write such -sorry- nonsens.
As said also: WE do NOT use any "version" of import-email.
As also said: The email itself is an "object".
A small function of just one line is enough to get with REBOL all wanted and available fields if needed:
if x: find email field [copy/part x find x newline ]

And this - thanks basic REBOL - excellent working function is not called with complete emails, of course. But just with the headers. No contents, no attachements must be read before.
Works extremely fast and fine! ;-) Several thousand emails on any fields checked within seconds.
BUT there is another problem:
Before(!) downloading the often very many and large emails. Till now we found no way to avoid a complete download from the servers before we could check the headers as we normally do before. All the REBOL series functions including FIND, SKIP, PICK etc. working on the ports could not be "reduced" to read the headers only. Seem they need and therefore auto download the complete mails including all their often ridic contents, images and attachements before the users can decide if they really want all that spam and stuff. We have a long list of spammers, but must check the headers to decide if it IS spam. Carl posted SEND email versions. Would be nice to see an example how to avoid unwanted downloads by first evaluating the headers only. Open/direct - perfectly working on local basis - does not work on network ports - though it normally should.
If you or anybody else has a solution for that it would be fine to see a suggestion. - Without parse rules, of course. ;-)
xRatio
xRatio
11-Feb-2010 0:04:12 Not only with the just mentioned poor Altme. REBOL "sells" itself almost everywhere extremely bad to the public.
Not even using center-face as default, instead using a grey(!) background as default (brrr..), no Cancel button or ESC key as default, destroying/ignoring CRLF in all request-functions (using inform) - these and many more are so extreme failures that I really cannot believe it!!!
For all newcomers REBOLs great power is hidden behind a curtain of peculiarities and often evident nonsens, in the docus, the often unuseable, the often even absolutely wrong and misleading examples.
RebGui is MUCH better in all these aspects but certainly cannot repair all deficencies in the official representation of REBOL and its docus.
R3, primarily boasting with "effects" but not even providing a useable console is no solution. All these "effects" already are available in R2. Who is interested in "effects"? Most serious developers certainly are not.
Often, yes very often all these defects (not "effects" ;-)) are so extreme that I can't help it and really think that the terrific representation of REBOL is purpose. But I cannot really believe that. Production and marketing are different things.
xRatio
Graham
11-Feb-2010 1:50:54 X, if you want to know something, you should ask on the mailing list or BBS.
See http://www.rebol.org/ml-display-thread.r?m=rmlHFGS on how to use a modified pop protocol to just download the mail headers.
Or, you could try out http://compkarori.com/cerebrus/ which does all of this stuff.
DideC
12-Feb-2010 4:32:40 To xRatio.
I have a script to list only message headers in some POP mailboxes that I'm running everyday to flush spam without downloading them. It even check the Received: IPs with RBL by itself.
I agree that the import-email func can hang on malformed header. It has always failed on spam. So my script contain a modified version of it that just deal with the problems I had encoutered in the thousand email headers I have received since 7 years that this script exists.
Feel free to use it, pick some code in it, or improve it if you like.
http://membres.multimania.fr/didec/rebsite/delete-emails/delete-emails.r
Help file is outdated but the most part is documented.
xRatio
13-Feb-2010 16:10:07 Thank you DideC,
will look at your
http://membres.multimania.fr/didec/rebsite/delete-emails/delete-emails.r
as soon as my time allows.
Included in our compiler and linker system we have a well working Email-program, written with great effort years ago with all needed details.
We are trying to improve and shorten our code by using REBOL instead.
Cheers, xRatio
Oldes
16-Feb-2010 5:12:28 xRatio: In one of your previous posts you say ...There are no "multiple passes"...
Of course there are. The first one is when you do read/lines or parse file "^/". And the foreach cycle is second pass. And the part with replace/all skip data n "" "^/" - I expect that "" should hold CR char which has been removed by the blog.r script here - anyway - I can count it as a third pass.
Your code is ugly. And id should be posted in some newbie forum, not here. In such a forum someone could explain you, why you should use forall instead of foreach ... n: n + 1 ... skip data n.
You say, that your version is ...is in reliability, speed and memory usage.... So I ask you: do you know the difference between block with thousand of strings and just one string which you hold in the memory?
And finally, in your other post under other topic you complain, that you have Giga bytes RAM used by REBOL. I can imagine it's yours fault, not REBOL's. Are you sure you clear unused series correctly and how to use recycle command?
I'm using REBOL everyday and must say, that I don't have problems with memory used by REBOL. And if occasionally I have it, it's always my fault = bug in my code which I must fix.
Simply you should first learn the language well before shouting that something is terrible and bad.
xRatio
17-Feb-2010 22:36:53 Oldes,
evidently cannot post a better code as I did.
Simply you should first learn behavior well before shouting that something is terrible and bad.
xRatio

Comments on: Post your favorite new PARSE examples

Comments:

Post a Comment: