REBOL Discuss and Decide

REBOL Discuss and Decide http://www.rebol.net/cgi-bin/r3decide.r A place for publishing discussions about REBOL 3.0. en-us Carl Sassenrath 2008 REBOL Messaging Language What does () mean? http://www.rebol.net/r3decide/0011.html Carl Sassenrath <no-spam@rebol.com> Wed, 5 Mar 2008 23:41:43 -0500 In R2, () was an error. In R3, () returns the value: UNSET! So: >> mold () == "unset!" It's debatable. We must admit that it does represent a valid expression. This can be seen in functions that allow missing arguments, such as: >> cd /C/rebol/3.0 >> cd %.. /C/rebol/ which are valid anywhere if you use parens: >> (cd) /C/rebol/3.0 >> (cd %..) /C/rebol/ So, in the (cd) case, the missing argument for path is given the value UNSET! Mathematically, () is the trivial case. QED. Comments? Decided - MOLD char! http://www.rebol.net/r3decide/0010.html Carl Sassenrath <no-spam@rebol.com> Sun, 17 Feb 2008 01:02:48 -0500 In R3 a char can be Unicode. For example, this is valid: c: #"^(1234)" The 1234 is a hex value for the character. This is also valid: c: #"***" Where the *** is a multi-byte UTF-8 encoded character. However, when we mold such a character, which of the above formats is best? Do we want the hex escaped value or the actual character? \note Decision By default, the character will be in its native encoded format. In other words, a Greek alpha will appear as #"α". There will be an option added to force MOLD to output the ^(1234) escaped format when needed. /note Decided - Unicode Console http://www.rebol.net/r3decide/0009.html Carl Sassenrath <no-spam@rebol.com> Thu, 14 Feb 2008 14:20:28 -0500 On Windows, R3 currently uses the default console for text I/O. Eventually, we will add our own console, such as in R2. However, for now, we do not want to spend time on that, but on core functions. The issue is: what mode do we want to use for the console and how is it best initialized. IMO, there are two choices: #Set it up for UTF-8 input/output. That is, keyboard input is sent to the R3 console device as bytes in UTF-8 encoded format. #If that is not supported by MS, then UTF-16. #If that is not supported by MS, then raw Unicode codepoints as wide chars is ok. \note Decision We will use the wide-char console mode. This will make the console work properly for Unicode, but will require that we make special changes to support stdio redirection. /note Defining whitespace http://www.rebol.net/r3decide/0008.html Carl Sassenrath <no-spam@rebol.com> Wed, 13 Feb 2008 14:46:20 -0500 It's funny the little things that need to be properly defined. For example... There are a few internal functions that know about whitespace. For example, the PARSE function by default will deal with whitespace. \note The Question: What do we mean by whitespace? /note Internally in R3, there are 2 definitions: *SPACE: #" " + #"^-" *WHITESPACE: SPACE + #"^/" + #"^m" However, for such functions, do we want to consider other control chars as whitespace? For example, is backspace whitespace? In R2, many control chars were treated as whitespace, but I am not so sure we want to do that in R3. Your opinion? Decided: No direct support for Win98 http://www.rebol.net/r3decide/0007.html Carl Sassenrath <no-spam@rebol.com> Wed, 13 Feb 2008 12:53:23 -0500 This is a simpler question to ask, although the answer may not be that simple. \note Question: Should 3.0 support Win98? /note Up to this point, 3.0 in Win98 works from what people tell me, but as we move forward with Unicode (and other features), Win98 may be less likely to work correctly. Note that 3.0 does not require that its base OS be Unicode capable. It should run fine on non-Unicode systems. \note Decision Reached Decision is to go full speed ahead and not worry about 98. If someone who really needs 98 wants to make the necessary changes to the host code to support 98, it should be possible to do that. (E.g. remove the Unicode API, etc.) /note Collation table and sorting http://www.rebol.net/r3decide/0006.html Carl Sassenrath <no-spam@rebol.com> Wed, 13 Feb 2008 12:02:37 -0500 One of the final parts of the R3 Unicode implementation is to define the sort order for strings. The sort order will be determined by a simple collation table that for 3.0 assumes strings are normalized into single codepoint representations. Later, for 3.* we can add other methods. The standard R3 distribution will contain the collation table for the "western" charsets, including latin-1, etc. We can do the same for some of the other smaller eastern charsets, and I am open to others, as proposed and defined by the REBOL community (RC). I will admit I know nothing about the sort order of Chinese or other Asian codepoints. (But, I know that some of you do!) Once booted, R3 will allow programs to set their own collation table. \note Question: Do you think this method is sufficient for 3.0 or am I missing something important? /note Loading Strings http://www.rebol.net/r3decide/0005.html Carl Sassenrath <no-spam@rebol.com> Wed, 13 Feb 2008 11:34:47 -0500 R3 needs a way to load strings that are encoded in various formats: *UTF-8 (default) *Latin-1 *Any codepage map *UTF-16 LE *UTF-16 BE And, we may also want to allow other UTF formats, but that is a subject of a different decision topic. \note To decide: What we need to decide is the specific function spec to use for loading strings. /note Earlier, I proposed the use of load because that is it's primary purpose: the conversion of external data to an internal representation. That includes not only code, but also images, sounds, and text. For example, to load a string that is in UTF-16LE format: str: load/as data 'utf-16le However, your feedback indicated that the use of load in this way was perhaps confusing and overloaded the function with too much capability. In addition, because load is now implemented as a mezzanine, it does not really solve the basic problem. A media loading method is required below its implementation. Another idea was to use to with a refinement to indicate the format. For example: str: to/as string! data 'utf-16le However, I've reconsidered that method, because the additional refinement arguments span all instances of to over all datatypes. Not something we really want to do unless it is critical. This same considerations can be applied to make because it also applies to all datatypes. So, now I'm opening it up to the group for suggestions. Note also that there is an opposite topic, the output of strings, but let's just focus right now on this one area of loading strings. Complementing virtual bitsets http://www.rebol.net/r3decide/0004.html Carl Sassenrath <no-spam@rebol.com> Tue, 12 Feb 2008 17:27:55 -0500 In R3, bitsets are just as long as they need to be and no longer. For example: >> b: make bitset! " " == make bitset! #{0000000080} If you test a value outside the bitset range, it returns as not found. For example: >> find b #"a" == none This is quite useful because we might use Unicode values that are far beyond the range of the set. This line works fine: >> find b #"^(03d6)" ; Greek symbol for PI == none But, there is a problem: We use complement to invert a bitset, and this is quite useful to do. However, if you do that now, you get: >> c: complement make bitset! " " == make bitset! #{FFFFFFFF7F} and: >> find c #"a" == none should return true but does not. \note Issue to resolve: How do we want to invert bitsets? /note It seems to me, if we want the bitset itself to represent its inverted state, then it is necessary to internally flag the bitset to indicate that it is complemented. The case above would then be: >> find c #"a" == true The problem with this solution is that a molded bitset can no longer be represented in source code as a set of bits. We need to extend it to include a complement indication. For example: >> mold/all c == #[bitset! not #{0000000080}] Of course, this is not a problem for evaluated creation of bitsets, which can simply remain: >> c: complement make bitset! " " \note Secondary issue: But, there is a secondary issue. What if you perform a data set function such as intersect on the inverted dataset? /note For example, what if you write: >> a: intersect a complement b Ignoring the fact that this is really a difference operation, what we will need to do internally is perform the logical intersect operation, and return the most appropriate result, which could itself be a complemented dataset. Although this makes the internal implementation slightly more complicated, it should not affect performance because it can all be done with a single pass using the proper sequence of bitwise operators. ===Just a note... Note that the bitset complement problem can also be addressed by providing additional refinements to find and a new word in parse. For example, this would be possible: >> find/not b #"a" == true and also: >> parse data [to not b here: "about"] Note, however, that these two changes are not a requirement for the above virtual complement to work. They would be additional, so we'll make those a subject for another note here in the future. Decided -- WORD datatypes within set functions http://www.rebol.net/r3decide/0003.html Carl Sassenrath <no-spam@rebol.com> Tue, 12 Feb 2008 17:12:35 -0500 Some background information: The data set (set as in a collection of values, not as in set-word) native functions are: *difference *exclude *intersect *union *unique These functions have been entirely rewritten due to Unicode and due to the removal of the HASH! datatype, by which they were optimized. They now use the MAP! datatype hashing method. Each of these functions will operate on three argument types: *block! *string! (includes Unicode) *bitset! (includes Unicode) \note Here is the issue: Should these functions differentiate between word types? /note Currently, this is the case: >> union [a b] ['a 'b] == [a b] >> union ['a 'b] [a b] == ['a 'b] However, does that make sense? I think we should probably differentiate on word datatypes. The above example would become: >> union [a b] ['a 'b] == [a b 'a 'b] What do you think? Suggested topics? http://www.rebol.net/r3decide/0002.html Carl Sassenrath <no-spam@rebol.com> Tue, 12 Feb 2008 16:57:06 -0500 This area is for you to make suggestions about areas of discussion or issues. If in the past, you've made a suggestion or pointed something out, and it was missed or not concluded, you can raise it again here. But, I reserve the right to pick the topics get discussed. The focus right now is on R3, so if your topic is for something unrelated or more distant, it may not be covered right away. We may come back to it later, and at least your suggestion will be archived, so we won't lose track of it again. So, post your suggestions in the comment area here. Thanks. The purpose of this blog. http://www.rebol.net/r3decide/0001.html Carl Sassenrath <no-spam@rebol.com> Tue, 12 Feb 2008 16:33:11 -0500 There are many topics related to REBOL that require greater discussion within the community, with the goal of making a decision and moving on to the next thing. That's the purpose of this blog. To talk about those topics and issues and reach a conclusion for each one. This approach of using my blogger.r script may not be perfect in every way, but it should work, and it is easy to get it running, and easy to modify it as we need. So, each topic or issue will be posted as an article, and then we will discuss it in the comment area. Once we conclude on a topic, I will restate the conclusion and modify the topic line to indicate that it has been concluded. ---There are a few important rules to keep in mind: #Be concise in your comments. Make your point clear, and you will be doing all of us, and yourself a favor. #Proofread your comments. You cannot modify them after they are submitted. #Be fair and be kind. We're trying to make progress, we don't want heated debates. We are open to all ideas here. #It is ok to reference these blogs via URL links, but don't broadcast them in a major way. We want to be as open as possible without it haunting us in the future. #If you have a topic you want discussed, you can suggest it by making a comment to article 2 below. #For additional notices, I will alert you via the comment section of this first article. ---In addition... #I would like to modify (or if someone else wants to send it to me) this blog script to set up an R data feed as well. That way if one of you feels like writing a small client, then those of us who do not like using the web can use that client instead. It can include a "direct post" method too, so people can reply via the client. #Also, if someone wants to add user-id cookies to blogger.r, that would be great, so repliers don't need to type their name each time, etc. Here is the current source code to this blog program. The program (and configuration) is about 28 KB. Ok, so that's it. Let's get started.