REBOL3 - Parse (Discussion of PARSE dialect [web-public])

Return to Index Page
Most recent messages (300 max) are listed first.

#UserMessageDate
4818BenBranyes it works perfect in R3. Thanks again.6-Jan-10 20:18
4817BenBranlol :-)6-Jan-10 20:17
4816BrianHYou were right, it was something simple :)6-Jan-10 20:16
4815BenBrandoh!6-Jan-10 20:15
4814BrianHThat is R2, not R3.6-Jan-10 20:15
4813BenBran>> help system SYSTEM is an object of value: version tuple! 2.7.7.3.1 build date! 1-Jan-2010/12:15:27-8:00 product word! View core tuple! 2.7.7 components block! length: 606-Jan-10 20:14
4812BrianHWhat version of REBOL are you using? system/version ...6-Jan-10 20:13
4811BenBranfor completeness in R3 - I tried the lines above:

>> parse "GET /a.html HTTP/1.1" ["get " return to " "] ** Script Error: Invalid argument: ?native? ** Where: halt-view ** Near: parse "GET /a.html HTTP/1.1" ["get " return to " "]

I must be missing something simple

6-Jan-10 20:11
4810BrianH>> parse "GET /a.html HTTP/1.1" ["get " return to " "] == "/a.html" Note that /all is the default in R3 so you need to specify space after GET.6-Jan-10 19:43
4809BrianHThat would return the file instead of setting a variable and not return false because of leftover input.6-Jan-10 19:40
4808BrianHPARSE returns true if the rule matches and covers the entire input, or false otherwise. Your rule matched but there was input left over. PARSE's return value doesn't matter in this case, just whether file is set or not. If you are using R3 you can do this too: parse buffer [ "get" [ "http" | "/" | return to " "]]6-Jan-10 19:39
4807Grahamparse buffer [ "get" [ "http" | "/" | copy file to #" " ( print file) ] to end ] will return true6-Jan-10 19:37
4806BrianHWas going to reply but Graham types faster :)6-Jan-10 19:36
4805BenBranok I see. Thanks.6-Jan-10 19:36
4804Grahamtrue if the rule completes to the end, false otherwise6-Jan-10 19:35
4803Grahamumm.. parse returns either true or false ...6-Jan-10 19:35
4802Grahamif you want the value you have to change the parse rule6-Jan-10 19:34
4801Grahamfalse is the value returned by the parse function6-Jan-10 19:34
4800BenBranI get whats happening now. If i compare buffer and file I see the clipped text:

>> probe file == "index.html"

>> probe buffer {GET /a.html HTTP/1.1 Host: localhost User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safar i/531.21.10 Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-US Accept-Encoding: gzip, deflate Connection: keep-alive Address: 127.0.0.1}

>>probe parse buffer ["get" ["http" | "/ " | copy file to " "]] == false

>> probe file == "/a.html" Should I have been able to see the results instead of == false?

6-Jan-10 19:31
4799BrianHThe break being a parse match fail, and file being set to none for a zero-length match.6-Jan-10 19:06
4798BrianHSort of. The actual code is a little more complex, more like this: either tmp: find data " " [file: if 0 < offset? data tmp [copy/part data tmp]] [break]6-Jan-10 19:04
4797BrianHSo, copy file to " " is the equivalent of this regular REBOL code: file: if find data " " [copy/part data find data " "]6-Jan-10 18:59
4796BrianHThe copy and to are parse operations. COPY copies the data covered by the next operation, the TO. TO covers the data from the current parse position until the first instance it can find of its argument.6-Jan-10 18:56
4795BrianHBenBran: Not sure where to put this so asking here:

I downloaded a web script and it has a snippet I don't understand: buffer: make string! 1024 ;; contains the browser request file: "index.html" parse buffer ["get" ["http" | "/ " | copy file to " " ]]

what does:

copy file to " "

mean or do? tia

6-Jan-10 18:53
4794PekrCarl - first "error" in parse rewrite with some/any is the auto protection for non advancing input. It is like writting in BASIC

10 Print "Hello" 20 goto 10

... and not expecting it to run forever, because some magical internal mechanism kicks-in. If I write the code which could cause infinite loop, then be it. For me it causes the opposite reaction - some/any are not safe to use, let us use while instead ....

something like: parse str [some [to "abc"]] is so obvious and self explanatory, that actually not looping forever almost feels like parse error. But - even if I don't like it, maybe most such infinite loop hits are more difficult to notice, so that actually the prevention might be ok, I don't know. As for me though, I would probably prefer some internal capability to detect such case, and some debug option to show last rule/position, where it happens ...

I am not fluent enough with parse theory, but maybe it also relates to your loop vs matching note above ...

1-Jan-10 8:45
4793GreggFor example

- Parsing an input that has nested structures, and how to collect the values you want. - Showing the user where the parse failed. - How to avoid infinite parse loops. - How to safely modify the input stream.

More advanced examples would be great too of course.

31-Dec-09 21:33
4792GreggWe have some cool new parse enhancements; really, really nice some of them. What I think will add the most value to PARSE--and maybe this is just me--are practical examples, idioms, and best practices.31-Dec-09 21:30
4791SteeveI see your point, but what if the ANY block contains production rules ? parse "" [any [and skip copy tmp to end break | insert "1" and insert "2"]]

(i know, stupid example)

31-Dec-09 18:47
4790CarlThere are a few ways to do it, but that is not my point.31-Dec-09 18:40
4789Steeveany [and skip copy tmp to end] any [copy tmp [skip to end]] etc...31-Dec-09 18:39
4788SteeveWe have so much alternatives that i don't see this as a burden31-Dec-09 18:36
4787CarlIt's a small thing, and maybe too late to change. I wanted to point it out.31-Dec-09 18:34
4786CarlIn other words, is ANY smart about the input? If there is no input, why should it even try?

Of course, in the past we've used ANY a bit like WHILE -- as a LOOPing method, not really as a MATCHing method.

31-Dec-09 18:33
4785CarlIn the rewrite of DECODE-CGI, that behavior of ANY forces me to write:

parse "" [any [end break | copy tmp to end]]

This seems wrong to me if we define ANY as a MATCHing function, not as a LOOP function. This topic has been debated a bit between a few of us, but I think it deserves more attention.

31-Dec-09 18:29
4784Steevewhat do you expect in this case ?31-Dec-09 18:29
4783CarlI'm still running into some problems with PARSE... mainly from the expectation of what ANY and SOME should do.

For example: >> parse "" [any [copy tmp to end]] >> tmp == ""

31-Dec-09 18:26
4782CarlRight: synonyms.31-Dec-09 18:23
4781LadislavCarl made a distinction in R3 blog, but they currently work the same, as far as I can tell, so, the only difference I see is, that ACCEPT is more self-explanatory.30-Dec-09 11:52
4780PekrWhat is the difference between BREAK and ACCEPT? Both "break" out of the rule, both with success (IMO).30-Dec-09 7:52
4779Ladislave.g.

parse [a b c] [?? copy value thru 1 skip to end]

should have preferably been

parse [a b c] [?? copy value 1 skip to end]

29-Dec-09 18:09
4778LadislavCOPY should accept any rule, not just the ones you mentioned29-Dec-09 17:57
4777Forkkcollins: I'm using OS/X, I still haven't found a way to reproduce it. Comes and goes.29-Dec-09 17:49
4776ForkLadislav: I didn't realize you could use "while" as the second argument to copy, I thought it only worked with to and thru...29-Dec-09 17:49
4775LadislavI overlooked, that you used the STRING! datatype:

parse [1 2 3] [?? while [integer! string! accept | skip | reject] ?? integer!]

29-Dec-09 13:08
4774LadislavRe the THRU problem: you can use

parse [1 2 3] [?? while [integer! block! accept | skip | reject] ?? integer!]

29-Dec-09 13:05
4773LadislavRegarding the QUOTE keyword: the original proposal was to treat blocks as in quote [1 2] as sequences of elements, not as embedded blocks, wouldn't you prefer that behaviour?29-Dec-09 13:01
4772kcollinsFork, are you seeing these outputs "coo", "thte", etc. on a Linux build of R3? I have seen similar corrupted output with Linux R3 when testing TCP client code, as documented in Curecode #1322.29-Dec-09 6:32
4771ForkWell, I should find a way to reproduce it before doing that. Left a note about how getting a CureCode account didn't work the other day.28-Dec-09 20:02
4770BrianHDefinitely another bug. CureCode it.28-Dec-09 19:57
4769Fork>> parse [a b c] [?? copy value thru 1 skip to end] coo:: [a b c] == true28-Dec-09 19:56
4768BrianHBut no such characters should be output by ??28-Dec-09 19:56
4767ForkIndeterminate, e.g. just ran it again and:28-Dec-09 19:56
4766BrianHSeems like a Unicode to ANSI translation error.28-Dec-09 19:56
4765Fork(That question mark not visible in the terminal, showed up when I pasted here)28-Dec-09 19:54
4764Fork>> parse [a b c] [?? copy value thru 1 skip to end] co? : [a b c] == true28-Dec-09 19:54
4763ForkFYI still seeing some erratic behavior with ?? at head of the parse rule28-Dec-09 19:54
4762BrianHFork, the fact that both of those examples work incorrectly instead of throwing an error is a bug in PARSE. It should be CureCoded.28-Dec-09 19:46
4761Pekrto/thru were reimplemented to allow multiple options. There are cases, where they are not supposed to work, but in above case I would regard it being a bug .... unless some guru finds a theory showing us why it should be regarded being a correct result :-)28-Dec-09 19:45
4760Pekr>> parse [a b c][?? 3 skip ??] 3: [a b c] end!: [] == true28-Dec-09 19:44
4759PekrI would expect that ...28-Dec-09 19:42
4758ForkShould the latter be [a b c] ?28-Dec-09 19:42
4757Pekrbrian - so we can use things like any-string! or other typesets to match?28-Dec-09 19:41
4756Fork>> parse [a b c] [(value: none) copy value to 3 skip to end (probe value)] [a b] == true

>> parse [a b c] [(value: none) copy value thru 3 skip to end (probe value)] [a b] == true

28-Dec-09 19:41
4755BrianHFortunately typesets work for block parsing like bitsets do for string parsing, so first sets are easy.28-Dec-09 19:40
4754BrianHYes. You can express a sequence of characters in a string as a string literal, but not a sequence of types in a block. You are going to need first sets and the other LL tricks for that.28-Dec-09 19:38
4753ForkIs a sequence of things one of the complex rules that you can't use in a thru?28-Dec-09 19:35
4752ForkAnd it stopped doing that. I'll see if I can get it to do it again.28-Dec-09 19:33
4751ForkHm. Version: 2.100.96.2.5 I quit and restarted.28-Dec-09 19:32
4750Pekrwhat do you mean by "match thru a series of things"?28-Dec-09 19:31
4749Pekr>> parse [1 2 3][?? thru [integer! string!] ?? integer!] thru: [1 2 3] integer!: [2 3] == false28-Dec-09 19:30
4748Fork?? not initialized after first match? And secondly, how do I match thru a series of things (e.g. integer! integer!, but just wondering about the thte. ?? problem before the first match?)28-Dec-09 19:28
4747ForkWhat's that "thte" thing?28-Dec-09 19:27
4746Fork>> parse [1 2 3] [?? thru [integer! string!] ?? integer!] thte: [1 2 3] integer!: [2 3] == false28-Dec-09 19:26
4745LadislavMore complicated rules can be easily simulated using the While keyword, the opposite isn't true. Carl's example just proves, why While is useful.25-Dec-09 14:17
4744Ladislavsorry, I meant a: [b a |]25-Dec-09 13:51
4743LadislavThe WHILE keyword is the simplest possible cycle. The rule:

a: [while b]

is equivalent to recursive:

a: [b a]

25-Dec-09 13:50
4742PekrI probably need more examples ..24-Dec-09 10:49
4741PekrRunning above examples, my opinion is, that in fact adding 'while was probably not a good decision. I can understand, that now we have more power - our code will not easily cause an infinite loops, but otoh you now have to think, if it can happen or not, and 'some becomes your enemy ...24-Dec-09 10:47
4740PekrI don't probably understand usefullness of 'while at all. Because now I have to think, if my code would cause infinite loop, or not, and use 'some or 'while accordingly ...24-Dec-09 10:42
4739PekrHenrik - according to docs explanation, 'parse contains some internal protection for the case, when input stream does not advance its position. In R2, following code causes infinite loop, in R3, it returns false:

parse str [some [to "abc"]]

(I am not sure I like that it returns false - normally I expect it to cause infinite loop. This is imo overprotecting programmer, and you have to think, why your code returns false anyway, which for me is the same, as if it would cause an infinite loop)

Further from docs:

To avoid infinite looping, a special internal rule is triggered based on the fact that the rule did not change the input position.

However, this shows a problem with this rule:

parse str [some [to "a" remove thru "b"]]

Here the input did not appear to advance, but something useful happened. In such cases, the some word should not be used, and the while word is better:

parse str [while [to "a" remove thru "b"]]

24-Dec-09 10:40
4738HenrikLooking at the new WHILE keyword and I was quite baffled by Carl's use of it in his latest blog example. Then I read the docs and it didn't get much better:

- WHILE is a variant of ANY - ANY stops, if input does not change - WHILE doesn't stop, even if input does not change

What does "input does not change" mean?

Is it about changing the parse series length during parse? Is it actively moving the parse index back or forth using special commands? Is it normal progression of parse index with each cycle of WHILE or ANY? Is it alteration of the parse series content while maintaining length during parse?

24-Dec-09 9:32
4737Maximhehe16-Dec-09 19:02
4736GabrieleMaxim, maybe you thought I was kidding the other day... ;)16-Dec-09 10:22
4735Maximthe funny thing is that the C language reference on the MSDN is actually pretty well done... there are a lot of evil C examples for some of the more obscure parts of the language like pointers, structs and unions.

funny thing is that some of the most complex things to express where the litteral constants! integers, with octal, hex notation... not as simple as some [digits] ;-)

16-Dec-09 5:31
4734BrianHWell, good luck! :)16-Dec-09 5:30
4733Maximmy goal is to get the host code and OpenGL headers past the parsing phase. once that is done, I'll start work on adding the production phase.

I still have to write the pre-processor, but that in fact is pretty straight forward. there are little rules and they are much more static and well defined on the MS web site.

16-Dec-09 5:28
4732BrianH"data" in this case being C source.16-Dec-09 5:28
4731BrianHNo, really. The syntax of C is so complex that you would need a lot of data to test all of the common variations.16-Dec-09 5:27
4730Maximyou are being sarcastic right? :-)16-Dec-09 5:26
4729Maximthere is all in all only two or three rules that I'm unsure of the transformation, as some aspects of the C syntax are a bit obscure to represent.16-Dec-09 5:26
4728BrianHAre you sure you have enough test code/data?16-Dec-09 5:26
4727BrianHSounds about right.16-Dec-09 5:25
4726Maximwell, considering that I just finished the basic rule re-organisation... eheheh I think I'll apply the unit testing phase right now to test if all the rules perform as they shoudl using input text. there is probably going to be about 100kb of unit test code for what is now about 12kb of parse rules.16-Dec-09 5:25
4725BrianHYou might be better off translating a C grammar for a PEG or TDPL parser generator into PARSE - less topological shifts needed.16-Dec-09 5:23
4724BrianHUnfortunately, the C grammar was designed with LR parsers in mind.16-Dec-09 5:21
4723BrianHBNF is just a syntax form, with a *lot* of variation. The real difference that matters between Yacc and PARSE is the parsing model. Yacc implements an LR parser (or some variant thereof), and PARSE implements a variant of TDPL parsing (related to PEG), though more powerful and with a completely different syntax. How you structure the parse rules depends on the parsing model, not the syntax.

For instance, LR parsers tend to do recursion rather than iteration, and when they recurse the recrsive call tends to be on the left, with the distinguishing clause on the right. For PEG parsers, recursion goes the other way. This is not an error, this is a difference in parsing model.

If you are translating from Yacc to PARSE, it's not just a syntax change. You have to reorganize the rules to match the new model. And watch out: Certain patterns are easier to express in some parsing models than in others. Some patterns aren't supported at all in some models, and so no amount of translation will help you. We chose the TDPL model for PARSE because it is more expressive than the LR model, so in theory you should be able to translate LR rules to PARSE with some topological twists (redoing the sturcture of the rules). However, there are patterns that you can express in PARSE that can't be translated to LR, even with topological changes.

16-Dec-09 5:21
4722MaximI've been rewriting bnf generated parse rules (and often a bit cryptically) into proper parse ordered rules for 3 days now... <sigh> C is sooo complex for what it really does. I''ve discovered a few quite mind-boggling language capabilities... stuff like:

char *( *(*var)() )[10];

it takes 7 steps to define what that really is and there are other "fun" examples which end up being interpretation nightmares, but look really simple.

one thing is certain at this point... although I will be able to build a C to rebol converter with relative precision under specific goals, some of the crazy stuff just will have to be finished manually by humans.

at least I rarely see such twisted C code in most of what I've been reading so far.

16-Dec-09 3:55
4721Maximsure.14-Dec-09 22:34
4720GreggGenerating PARSE rules wasn't too hard. It is a nice fit. Same issue with existing grammars though, in that you have to fix some things up manually, or we have to make the generator smarter.

I'll zap you what I have. Can't remember where I've posted it elsewhere.

14-Dec-09 22:33
4719Maximthat is nice, is your ABNF parser still accessiblel somewhere? it could improve the quatily and ease of integrating the protocols to R3 IMO.

ABNF also seems much more aligned to parse

14-Dec-09 22:30
4718GreggThere are a lot of differences, unfortunately. It's not terrible, just different. It's not EBNF.

http://en.wikipedia.org/wiki/Augmented_Backus%E2%80%93Naur_Form

14-Dec-09 22:27
4717Maximis ABNF == EBNF ?14-Dec-09 22:27
4716Maximwhat is the difference?14-Dec-09 22:25
4715GreggYup. Different mindset.

I just looked at your BNF compiler earlier. Good stuff. I did an ABNF-to-parse generator some time back. ABNF is used in a lot of IETF RFCs and such.

14-Dec-09 22:25
4714Maximone strange thing I realised is that most people who write bnf, will write them in exactly the opposite of what parse needs to be..

they'll but the smallest pattern first. so that if applied in parse directly, it always short-circuits the other rules following it.

14-Dec-09 19:56
4713Maximfinished the rewrite of the BNF parser... funny... there is more documentation & comments than code.13-Dec-09 22:55
4712MaximI've used word= for other things before and I liked it.13-Dec-09 20:00
4711MaximI'll try that, its a good variant, even better since then we clearly identify the 3 different parse constructs separately.13-Dec-09 19:59
4710GreggFor a long time I've added = to the end of my parse rules, and = to the beginning of parse variables. I think it matches the production rule grammar well, and also emulates set-word/get-word syntax.13-Dec-09 19:56
4709Maximthe new parse rejection system is VERY cool. ( can simplify the structure of some rules a lot :-)13-Dec-09 5:44
4708Maxim(all in R3, but not using newer parse stuff, cause its not required)13-Dec-09 4:17
4707Maximyay, I've got the BNF grammar done... its ripping through a C language BNF grammar definition... :-)

now I've just got to make a parse rule emitter ... easy enough.

13-Dec-09 4:17
4706PeterWood"any others care to comment?"

I'm afraid t looks very messy to me and reminded me of Perl for some reasion.

13-Dec-09 1:31
4705Maximtrue :-)13-Dec-09 0:58
4704Grahamit's not a syntax but a convention ...13-Dec-09 0:56
4703MaximI'm just trying to get a feel for what others think about the idea. and sharing a bit of a discovery at the same time, if it may help others. the goal isn't to be popular or convince others... and sorry, if my last line may have looked harsh, it wasn't. :-)

I was just resuming your reaction plainly and relaunching the question to be sure others realize I want a few opinions.

13-Dec-09 0:39
4702GrahamMax, just do what ever suits you.13-Dec-09 0:35
4701Maximunfortunately what you say isn't feasible, even if you can technically do it. who is going to program a parser to colorise code which is usefull for only one application? its actually going to take more time to write your color parser for each piece of code than write the code itself :-P

so bottom line, Graham doesn't like this syntax. any others care to comment?

13-Dec-09 0:18
4700Grahamexactly ... for coding.13-Dec-09 0:05
4699Maximbut not while I'm coding... this is not for presentation, its for coding... I'm writing rules twice as fast now... just cause I'm not waisting time "searching" for the keywords within all of that text.13-Dec-09 0:02
4698Grahamwithout the need for all those = signs everywhere13-Dec-09 0:01
4697Grahamso you could write a parser that reads your rules and colorises them ...13-Dec-09 0:01
4696Maximstuff is colorized... (*in my editor*)13-Dec-09 0:00
4695Maximsyntax highlighting colorizes words ... stuff is colorized... but user words aren't colorised and they all get mixed up between functions, variables and rules... and having colors which are two strong next to each other and in relative distribution ... cancels out.12-Dec-09 23:59
4694GrahamChuck Moore uses color extensively in his color forth .. to replace other types of syntactic markup.12-Dec-09 23:58
4693GrahamGab uses the == in his literate editor ..12-Dec-09 23:57
4692GrahamUse an editor that colorises the words12-Dec-09 23:57
4691Maximwhat do you mean color?12-Dec-09 23:56
4690Grahamuse color instead :)12-Dec-09 23:55
4689Maximwith syntax highlighting it's quite amazing how bits stands out. ... in my editor at least.12-Dec-09 23:25
4688Maximwhen using rules in other contexts, they also stick out...

=alphabet=: rejoin [=digit= =letter= bits "_"]

here I immediately see that bits isn't a rule, but a function or a word.

12-Dec-09 23:24
4687Maximanother example.... in this dense block of text, I can spot the =eol= (end of line) token instantly in both x and y dimensions of the rule paragraph:

=line-comment=: [ =comment-symbol= [ [thru =eol= (print "comment to end of line")] |[to end] ] (print "success") ]

12-Dec-09 23:22
4686MaximI just adopted a new notation standard for parse rules... the goal is to make rules a bit more verbose as to the type of each rule token... I find this reads well in any direction, since we encouter the "=" character when reading from left to right or right to left... and parse rules often have to be read from right to left.

example:

=terminal=: [ =quote= copy terminal to =quote= skip (print ["found terminal: " terminal]) ]

on very large rules, and with the syntax highlighting in my editor making the "=" signs very distinct, I can instantly detect what parts of my rules are other rules or character patterns... it also helps out in the declarations... I see when blocks are intended to be used as rules quite instantly where ever they are in my code.

in my current little parser, I find I can edit my rules almost twice as fast and loose MUCH less time scanning my blocks to find the rule tokens, and switching them around.

wonder what you guys think about it...

12-Dec-09 23:20
4685WuJiannewbie's solution,without PARSE: >> s2: {1 ''2 '3 4 ' '5 ''6 '7 8 9 '0'} >> replace/all s2 {''} {'} replace/all s2 {'} {''} print str 1 ''2 ''3 4 '' ''5 ''6 ''7 8 9 ''0'' >> str == s2 == true12-Dec-09 3:01
4684ReichartJack, Parse is my fav REBOL command. If I ever have time, this is the one funciton I would like to create hundreds of examples for in a Wiki.12-Dec-09 1:27
4683MaximI'd gladly give back a few $ for their efforts11-Dec-09 18:15
4682MaximI sure would use it... some people have helped save days of work with free code and insight.11-Dec-09 18:14
4681Maximactually, having a paypal account linked with your login and a "donate" button would be really nice :-) right in the chat tool.11-Dec-09 18:14
4680Steevewe should add a DONATE account somewhere, linked with Altme. I'm sure people would be glad to add 1 dollar for such fast assistance. Then, we could finance some interesting projects11-Dec-09 18:13
4679RebolekJust curious, I tested both versions and Steeve's version is about 2times faster than Maxim's :)11-Dec-09 18:12
4678Maxim( I can see that being misleading when read hehehe :-)11-Dec-09 18:08
4677jack-ortAh! when you said "...you match double quotes first then fallback to single quotes, ..." I was thinking double-quote character, not double single-quotes. Need more coffee...

Thanks very much!

11-Dec-09 18:07
4676Steevecorrected version with thru: >> parse/all str [ any [thru {'} [{'} | p: (insert p {'} ) skip ]]]11-Dec-09 18:06
4675Maximprint it out in the rebol console... you will see that my exampe doesn't nave any double quote characters.. they just look like so in altme's font ;-)11-Dec-09 18:05
4674jack-ortThanks! I'm going to have to look @ this for awhile to understand why you even need to worry about the double-quote character. Much to learn....

Thanks Maxim and Steeve for the prompt replies!

11-Dec-09 18:04
4673Steevesame as mine, except i use THRU to speed up the process11-Dec-09 18:04
4672Maximnote all ticks... ( ' ) are single quote chars in the above.11-Dec-09 18:02
4671Maxim>> str: {1 ''2 '3 4 ' '5 ''6 '7 8 9 '0'} >> parse/all str [some [{''} | [{'} here: (insert here {'}) skip] | skip]] >> print str == {1 ''2 ''3 4 '' ''5 ''6 ''7 8 9 ''0''}11-Dec-09 18:01
4670Steevei think i misunderstood something, replace {"} by {'} maybe11-Dec-09 17:59
4669Steeve>> parse/all str [ any [thru {"} [{"} | p: (insert p {"} skip) ]]] something like this (not tested)11-Dec-09 17:57
4668jack-ortyes, View 2.7.6 under Windows XP11-Dec-09 17:54
4667MaximR2?11-Dec-09 17:52
4666Maximeasy, actually. you match double quotes first then fallback to single quotes, adding a new one and skiping one char...

give me a minute I should get something working...

11-Dec-09 17:52
4665jack-ortHelp! Still struggling to understand parse. How could I replace any and all SINGLE occurrences of the single-quote character anywhere in a string (beginning, middle or end) with TWO single-quotes? But if there are already TWO single-quotes together, I want to leave them alone.

TIA for any and all help for a newbie!

11-Dec-09 17:50
4664BrianH|4-Dec-09 6:09
4663GrahamLadislav, what 'choice operator?3-Dec-09 22:52
4662GrahamJanko, charset is short for make bitset! so you can call them bitsets or charsets :)3-Dec-09 22:50
4661LadislavIt looks, that I could have used:

C: [while [and [A | B] accept | skip | reject]]

3-Dec-09 11:39
4660Ladislav"I didn't know you could set the position back with :here" - you can set the position back even without :here, the choice operator is sufficient for you to be able to do that, see the above idioms as an example3-Dec-09 11:08
4659LadislavJust to complete the list of possible equivalents to the

C: [to [A | B]]

rule, here is a way how to do it in Rebol3 parse:

C: [while [and [A | B] break | skip | reject]]

you can find other equivalent idioms at http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse#Parse_idioms

3-Dec-09 11:01
4658Jankobut it is a level less simple and nice to use than simple parse modes that's why the simple ones should be powerfull *if possible* too - you can't get a newbie impressed with charset parsing because he won't understand it probably.3-Dec-09 10:57
4657Jankoyes, you are right .. if you can write partser for php then you can make anything with it. I always supposed parse with charsets is like low level step by one char in a looop and call "events" and change states , with which you can parse anything from xml to languages .. well but parse with charsets is still much more elegant3-Dec-09 10:56
4656JankoOldes if that is in R3 >> copy x to [" ." | " !"] << this is exactly as I was proposing above :) , very nice!

I know I have to .. I haven't really needed them yet I guess, I solved some things less elegantly in other ways without them. I intend to take the plunge next time I need them.

3-Dec-09 10:54
4655JankoLadislav, thanks.. I didn't know you could set the position back with :here , that is interesting and probably expands what you can do with parse a lot.3-Dec-09 10:52
4654OldesAnd Janko... if you don't use charsets at all, I think you should give it a try. It's not so difficult. I think that if I can write parser to colorize PHP code, than you can parse everything.2-Dec-09 22:04
4653OldesJust would like to remember that there is something like R3 where: >> parse "I like Apple . I like Windows ! I like Linux . I like Amiga ." [any ["I like " copy x to [" ." | " !"] (probe x) to "I like "]] "Apple" "Windows" "Linux" "Amiga"2-Dec-09 22:02
4652LadislavJanko: the only problem is, that you cannot use:

C: [to [A | B]]

, where A and B are "general rules", but you can always write:

C: [here: [A | B] :here | skip C]

, which would do what you want

2-Dec-09 20:49
4651Grahamhttp://www.mail-archive.com/rebol-bounce@rebol.com/msg01983.html2-Dec-09 19:59
4650GrahamBTW, Bolek wrote a regex engine in Rebol ...2-Dec-09 19:59
4649Janko(aha bitsets.. I was calling them charsets upthere)2-Dec-09 19:57
4648Grahamyou have to turn off parse's default delimiters and use bitsets2-Dec-09 19:56
4647Jankoand I know everything has limitations ... this functionality OR with taking the first that appears would just in practice solve me many cases2-Dec-09 19:56
4646JankoI know parsing csv can be messy ... at least at this high level I don't know how to do it with escapes and commas in etc2-Dec-09 19:55
4645GreggThat said, if you know the format (e.g. WRT quotes and escapes), it can be done with PARSE. It just may not be a one-liner.2-Dec-09 19:54
4644GreggCSV parsing is an issue, because REBOL handles some inputs well, but fails for what may be a common way things are formatted. "CSV" isn't always as simple as it sounds.2-Dec-09 19:54
4643GreggIt's not necessarily a PARSE limitation, but there are things we'd like PARSE to do that aren't always reasonable. :-)

TO and THRU can work very well, but that doesn't mean they'll work for every situation. You may have to use rules where you check for your target value or just SKIP, marking locations in the input as you go.

2-Dec-09 19:52
4642Janko"janko","some\"thing92!","graham" I am not sure but I think here you have the same problem2-Dec-09 19:51
4641JankoI just started talking about this as a general limitation of parse that I meed a lot of times and I suppose Paul could of meet it when trying to parse CSV2-Dec-09 19:49
4640JankoI don't have real example right now :) I had them few times before and I also asked here about them and I solved with your help somehow2-Dec-09 19:49
4639Janko>> parse "I like Apple . I like Windows ! I like Linux . I like Amiga ." [ [ some [ thru "I like" copy IT [to "." ( prin "so so: ") | to "!" (prin "v ery much: ") ] (print IT) ]] so so: Apple so so: Windows ! I like Linux so so: Amiga2-Dec-09 19:48
4638GrahamJanko, best thing to do is show us a string you can't parse ... and someone will show you how to do it.2-Dec-09 19:45
4637JankoBUT .. what if I want to have controll there .. or if for the sake of example it's a more complex multicharacter difference like "<DOT>" "<EXCLAMATION>"2-Dec-09 19:44
4636Jankook , you again found a solution to my specific problem :))2-Dec-09 19:42
4635Grahamcharset [ #"!" #"." ]2-Dec-09 19:42
4634Jankothis is the common to all problems where that I am describing .. if I had > to [ "." | "!" ] and parse would find both and go to the one that is closer it would be solved.2-Dec-09 19:41
4633Janko>> parse "This is Apple . This is Windows ! This is Linux . This is Amiga ." [ some [ thru "This is" copy IT [to "." | to "!" ] (print IT) ]] Apple Windows ! This is Linux Amiga2-Dec-09 19:39
4632JankoThe pattern is known ... the scentence starts with this is and can end with . or ! but they can come in any order .. if you try to parse with "." first you will get ---- ops some errors upthere .. just a sec2-Dec-09 19:38
4631Jankoparse "This is Apple . This is Windows ! This is Linux . This is Amiga ." [ some [ "This is" copy IT (print IT) to [ "." | "!" ] ]2-Dec-09 19:36
4630GrahamIf you don't know what pattern the data is .. you can't parse it with anything.2-Dec-09 19:34
4629GrahamI know what you mean .. so you have to order your rules knowing what the data looks like2-Dec-09 19:34
4628Jankowhigh = which2-Dec-09 19:33
4627Jankono wgih is the closest .. look at this example (I hope this will be better)2-Dec-09 19:33
4626Grahamand see which has the best fit ?2-Dec-09 19:32
4625Grahamto go to the closest one .. means it has to try all the rules??2-Dec-09 19:32
4624Graham[ some [ "start" digits [ "end" | "finish" ] ] should work2-Dec-09 19:31
4623Jankoyou can use to but it still won't work2-Dec-09 19:31
4622Jankoyes , then you have to do charset parsing (but I don't know that yet :) ) .. I was just trying to say if there would be the way to say something like "to any [ "A" | "B" ] and it would go to the closest one A LOT of problems with parse would be easily solvable2-Dec-09 19:30
4621Grahamyour problem is because you are using 'thru which breaks the other rule2-Dec-09 19:29
4620Grahamparse string [ some [ "start" digits "end" | "start" digitis "finish ]]2-Dec-09 19:28
4619GrahamIn this case I would use block parsing ... then I'm no expert in parsing2-Dec-09 19:27
4618JankoI was trying to show an example where you have two possible endings and you want to process both (and you can differently with parens) ) but you don't know in what order they will come or anything2-Dec-09 19:25
4617Grahamchange the rule again2-Dec-09 19:24
4616Jankook .. but I meant that you have "start 111 end start 222 finish start 333 end " then it won't work :)2-Dec-09 19:23
4615Graham[ to "end" | to "finish" ]2-Dec-09 19:23
4614Grahamchange it2-Dec-09 19:22
4613Jankoparse "start 111 end start 222 finish" [ some [ thru "start" copy NUMS [ to "finish | to "end" ] ] ] this wont work2-Dec-09 19:21
4612Grahamthis is a current parse limitation.2-Dec-09 19:20
4611Jankofrom Advocacy --> Graham [ to "A" | to "B" ] won't work as I want .. I will try to find a concrete example2-Dec-09 19:16
4610JankoI know I was stopped by parse in some occasions where. I think always every time the problem would be solvable if I had for example >> to [ "A" | "B" ] where parser would check where is A and where is B and go to the closest one.2-Dec-09 19:15
4609PekrDialect is a dialect. The only difference in string vs block parsing, imo is, that with block parsing, you are using REBOL datatypes to identify/match your types, whereas with string you are more "free-form" :-)17-Nov-09 16:53
4608JoshFOK. Thanks again for the timely help! I have to run off to work (which is firewalled up the yang), so you'll be able to avoid more silly questions from me for at least the next ten hours! ; - )17-Nov-09 14:27
4607Ladislavright, what you are doing is a dialect17-Nov-09 14:24
4606JoshFI understood that character stuff wouldn't work in a dialect -- but my understanding is imperfect.17-Nov-09 14:23
4605JoshFThe difference between what I'm doing and what you linked to is that it's working against a string, while I'm doing a dialect, no?17-Nov-09 14:22
4604Pekrit is a bit difficult to understand recursive rules, but :-)17-Nov-09 14:20
4603HenrikDepending on the situation, it can be hard to tell whether you are dealing with a word or a specific value. that's the price for freely interchangable code/data. :-)

a: [none]

b: copy a

b: reduce b ; me doing this behind your back

a == [none] ; word!

b == [none] ; none!

17-Nov-09 14:20
4602Pekrhttp://www.rebol.com/docs/core23/rebolcore-15.html#section-617-Nov-09 14:20
4601Ladislav...except for the fact, that lit-words are used in the Do dialect (= when Rebol is concerned, as you say), when you want to write an expression, which evaluates to a specific word, so, e.g. the expression:

'a

evaluates to the same value as the expression:

first [a]

, which happens to be the word A

17-Nov-09 14:19
4600Henrika trap that you might fall into:

type? first [none] == word!

type? first reduce [none] == none!

type? first reduce ['none] == word!

17-Nov-09 14:18
4599JoshFOK... Thanks very much. That helps a lot. I was right down the road to writing an expression parser, then that whole slash thing stopped me dead in my tracks. Now I should be able to get into some _real_ trouble!17-Nov-09 14:18
4598Ladislavright17-Nov-09 14:17
4597JoshFOK... So, let me paraphrase... As far as REBOL is concerned, lit-words are used only by the parse dialect to represent a thing to match to, whereas words are evaluated to find the thing to match to. However, because of parsing constraints in REBOL as a whole (the significance of "/" when dealing with indexable variables), there's no way to "escape" the slash into an unevaluated (literal) word without the dodge you showed me.17-Nov-09 14:16
4596HenrikI think you can say, that a word can be an evaluated lit-word. When you are typing a word directly into the console, you evaluate the word into a value that it's bound to. When entering a lit-word, it's evaluated into a word.17-Nov-09 14:15
4595LadislavCompare: >> parse [a] [a] ** Script Error: a has no value ** Near: parse [a] [a] >> parse [a] ['a] == true17-Nov-09 14:14
4594Ladislavin Parse, lit-words are used for matching, while words are looked up for values, which then are used for matching, so totally different behaviour17-Nov-09 14:13
4593JoshFOr are they just used for the special case of dealing with a / in load? ; - )17-Nov-09 14:13
4592JoshFI thought there was only word!'s and then everything else were more concrete types. I guess what I am asking is what is the purpose of lit-words?17-Nov-09 14:12
4591Ladislavjust a different datatype17-Nov-09 14:12
4590JoshFOK... Mechanically, I see what you're saying, but what's the difference between a lit-word and a word? The spirit eludes me...17-Nov-09 14:11
4589HenrikAnd also hence the expression "a block is or isn't loadable"17-Nov-09 14:11
4588HenrikIf LOAD won't eat a block, PARSE won't either, so you can test your block with LOAD. Some words can't be typed directly in, hence ladislav's solution.17-Nov-09 14:11
4587Ladislavcheck as follows:

type? :lit-div type? :tdiv

17-Nov-09 14:10
4586LadislavMy example works, since the LIT-DIV variable refers to a lit-word, while your tdiv refers to a word17-Nov-09 14:09
4585JoshFBoth tdiv and lit-div type? to a word!...17-Nov-09 14:09
4584JoshFHa! Black magic! That works a champ Ladislav, thanks very much! I had tried >> tdiv: to-word "/" == / >> parse [3 / 2] [some [integer! (print "number") | ['+ | '- | '* | tdiv ] (print "op ")]] But had gotten the same error. What makes yours work?17-Nov-09 14:07
4583LadislavJoshF: Rebol load does not parse the '/, but you can do:

as-lit-word: func ['word [any-word!]] [to lit-word! word] lit-div: as-lit-word / parse [3 - 2] [some [integer! (print "number") | ['+ | '- | '* | lit-div] (print "op")]]

17-Nov-09 14:04
4582JoshFThe second one failed when I tried to extend the dialect with multiply (*) and divide (/). After further experimentation, it seems that you can't escape the "/". Google has not been helpful here... Does anybody have any ideas? I could parse for just a word! instead of the +, -, etc., but I wanted parse to do the work of deciding what was a valid operation or not. Sorry for the multiple messages, I'm still trying to figure this client out... Thanks for any advice!17-Nov-09 14:02
4581JoshF>> parse [3 - 2] [some [integer! (print "number") | ['+ | '- | '* | '/ ] (print "op")]] ** Syntax Error: Invalid word-lit -- ' ** Near: (line 1) parse [3 - 2] [some [integer! (print "number") | ['+ | '- | '* | '/ ] (print "op")]]17-Nov-09 14:00
4580JoshF>> parse [3 + 2] [some [integer! (print "number") | ['+ | '- ] (print "op")]] number op number == true17-Nov-09 13:59
4579JoshFHi! I'm trying to use REBOL's parse to make a simple calculator dialect. However, I'm having trouble with escaping entities (I think)... Here's my first try (that worked):17-Nov-09 13:58
4578RobertIMO that would be really nice.8-Nov-09 12:04
4577RobertI have used www.antlr.org stuff several years ago with C/C++ target. It's a very cool parser generator toolkit. Just took a look again. It has emitters for different languages. Maybe one of the parse gurus here can take a look if we can do a REBOL emitter.8-Nov-09 12:04
4576BrianHAgreed :)26-Oct-09 18:36
4575SteeveBut it should return a proper error message as Pekr noticed it.26-Oct-09 18:35
4574BrianHOtherwise adding them would be difficult.26-Oct-09 18:05
4573BrianHKeywords that are *planned* to be added should definitely be reserved.26-Oct-09 18:03
4572Pekrposted to Chat/R3/Parse group ...26-Oct-09 13:31
4571PekrHmm, you are right .... But we might need better error message, no?

>> test: ["123"] parse "123" [test] == true

>> limit: ["123"] parse "123" [limit] ** Script error: PARSE - invalid rule or usage of rule: end! ** Where: parse ** Near: parse "123" [limit]

26-Oct-09 13:28
4570Steeveif you just try to use it, your parsing may crash. So, it's doing nothing but it's here.26-Oct-09 13:13
4569PekrI thought it is not implemented yet, hence no reservation?26-Oct-09 13:11
4568Pekr:-)26-Oct-09 13:11
4567Steeve(in R3)26-Oct-09 13:07
4566SteeveSomething funny.

I spent an hour debugging a parsing rule. To finally understand this. Never name a rule, LIMIT. LIMIT keyword is reserved for a further use in parse apparently.

26-Oct-09 13:07
4565BrianHWill, R2/Forward is already available for download in DevBase (R3 chat). It is a little outdated though, since I had to take a break to rewrite R3's module system. I'll catch up when I get the chance. The percentage of R3 that I can emulate has gone down drastically since the last update, since R3 has made a lot of changes to basic datatype behavior since then. We'll see what we can do.26-Oct-09 5:45
4564BrianHChris, there can be an advantage in R3 to breaking up a bitset into more that one bitset on occasion, mostly memory savings. However, it might not work as well as you might like since offset and/or sparse bitsets aren't supported. Bitsets that involve high codepoints will take a lot of RAM no matter what you do.26-Oct-09 5:40
4563GrahamRebol doesn't have lines :)26-Oct-09 4:49
4562SteeveR3 one liner ;-)

>> map-each [a b] parse "this-is-a-string" "-" [ajoin [a #"-" b]]

26-Oct-09 0:16
4561GeomolAnother:

>> out: parse "this-is-a-string" "-" >> forall out [change/part out rejoin [out/1 "-" out/2] 2] >> out == ["this-is" "a-string"]

25-Oct-09 22:35
4560GeomolSunanda, one way:

>> out: clear [] >> parse "this-is-a-string" [mark1: any [thru "-" [to "-" | to end] mark2: (append out copy/part mark1 mark2) skip mark1:]] >> out == ["this-is" "a-string"]

25-Oct-09 22:32
4559Willis R2/Forward available for download? thx25-Oct-09 22:29
4558SunandaI guess parse can do this too? http://stackoverflow.com/questions/1621906/is-there-a-way-to-split-a-string-by-every-nth-seperator-in-python25-Oct-09 21:49
4557ChrisAn example: a nested d: [k v] structure where 'k is a word and 'v is 'd or any other type:

data: [k [k "s"]]

R2, you can validate with d: [word! [into d | skip]]

Now you have to specify: d: [word! [and any-block! into d | skip]] otherwise you get an error if 'v is a string!

22-Oct-09 21:58
4556ChrisAllowing 'into to look inside strings can break current usage of 'into, requiring [and any-block! into ...]22-Oct-09 21:40
4555ChrisNot size, efficiency.22-Oct-09 20:03
4554Steeveif the size is a problem you can build a function to test each range. But It will be slow22-Oct-09 19:35
4553SteeveIt seems22-Oct-09 19:31
4552ChrisThat's what I'm asking. Complemented bitsets wouldn't make a difference here though as the excluded range is of similar scope, right?22-Oct-09 19:30
4551SteeveSo W1 + W+ = 128Kb

Is this a problem ?

22-Oct-09 19:26
4550Steeve64 Kb , sorry22-Oct-09 19:24
4549SteeveAnyway, a bitset with a length of 2 ** 16 is not so huge in memory (only 16kb)22-Oct-09 19:23
4548SteeveUses R3 (and his optimized complemented bitsets)22-Oct-09 19:21
4547ChrisBoth w1 and w+ appear to be very large values. Would it be smart to perhaps do:

[[aw1 | w1] any [aw+ | w+]]

Where 'aw1 and 'aw+ are limited to ascii values?

22-Oct-09 19:08
4546Chris(sorry if that looks messy)22-Oct-09 19:04
4545ChrisIs there any advantage in breaking up charsets that represent a large varied range of the 16-bit character space? For example, XML names are defined as below (excluding > 2 ** 16), but are most commonly limited to the ascii-friendly subset:

w1: charset [ #"A" - #"Z" #"_" #"a" - #"z" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(02FF)" #"^(0370)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] w+: charset [ #"-" #"." #"0" - #"9" #"A" - #"Z" #"_" #"a" - #"z" #"^(B7)" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(203F)" - #"^(2040)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] word: [w1 any w+]

22-Oct-09 19:04
4544Pekrah, got reply on Chat from Carl towards complementing:

"Re #5718: Pekr, that's a good question, and I think the answer must be YES. We need to be able to complement bitmaps in a "nice way". Otherwise, Unicode bitmaps, even if simply used on ASCII chars, would take a lot of memory.

This change should be listed on the project sheet, and if not, I'll add it there."

18-Oct-09 7:11
4543Maxim(it only accepts a string... dummy :-)18-Oct-09 1:04
4542Maximdoh... when you're too close to the tree... you can't see the forest...

I was using TO parse command on a rule ... this obviously won't work....

18-Oct-09 1:03
4541Maximmy deadline is to have a site working by this week... unless this darned bug I am trying to kill doesn't kill me first.18-Oct-09 0:42
4540BrianHIt's on my list...18-Oct-09 0:41
4539MaximI promise.18-Oct-09 0:41
4538Maximwell, build it and I will try it ;-)18-Oct-09 0:40
4537BrianHWhich is what a rule compiler does :) Actually, it sounds like you could adapt the tricks of the ruule compiler to *your* rule compiler, which would let you use the new operations in your rule source and have the workarounds generated in the output.18-Oct-09 0:39
4536Maximand its not simple parsing since I use parsing index manipulation, which is also dictated by the source data in encounters. its like swatting flies using a fly swatter at the end of a rope, while riding a roller coster which changes layout every time you ride it ;-)18-Oct-09 0:39
4535Maximreally, the problem is not the parsing itself... its getting the darn rules to generate the proper rules hehehe.18-Oct-09 0:37
4534BrianHOf course the *result* of the compilation would be self-modifying rules :)18-Oct-09 0:36
4533BrianHIf the self-modifying rules are strung-together basic blocks, you can use the rule compiler to generate the blocks. And the R3 changes make self-modifying rules less necessary, so you can have even larger basic blocks.18-Oct-09 0:35
4532Maximsince I use binding to map inner rules which are also constructed on the fly but have to be pushed and poped from the stack as I traverse data... its a lot of fun :-D18-Oct-09 0:34
4531Maximthe rule I am writing now actually does JIT rule compilation... hairy to debug :-)18-Oct-09 0:32
4530Maximladen with many paren expressions and a stack on top of it.18-Oct-09 0:30
4529Maxima rule compiler doesn't adapt very well to self-modifying rules18-Oct-09 0:29
4528BrianHMaxim, that is what Pekr was talking about. That is planned to be fixed.18-Oct-09 0:29
4527BrianHMaxim, Remark could be adjusted to use the rule compiler. For that matter, Remark could use R2/Forward (which needs some work, but is already better than R2 on its own).18-Oct-09 0:28
4526Maximyou end up with a full codepoint bitset minus one byte if it complemented or not18-Oct-09 0:28
4525Maximone situation which complemet can't handle very well (ram wise):

union charset "a" complement charset "b"

18-Oct-09 0:27
4524Maximbut wouldn't work with remark ;-)18-Oct-09 0:26
4523BrianHPekr, we still need complementing to be enhanced. Even Carl has said so.18-Oct-09 0:26
4522BrianHGabriele, these changes can be backported to R2 in the form of a rule compiler that generates (unreadable) R2 parse rules.18-Oct-09 0:25
4521PekrSo - we don't need complementing to be enhanced? Because we talked about it, but it is not defined in proposal, it is not part of Carl's feature table, and I also got no reaction on R3 Chat ....17-Oct-09 14:41
4520PekrAn=And17-Oct-09 11:50
4519PekrGabriele - wrong perception :-) The correct claim should be - "An now nothing prevents me from fully switching to R3 ..." :-)17-Oct-09 11:50

Return to Index Page