Comments on: Rethinking the ISSUE datatype

REBOL introduced a datatype called an issue. From the manual:

An issue! is a series of characters used to sequence symbols
or identifiers for things like telephone numbers, model numbers,
serial numbers, and credit card numbers.

It was a good attempt, but rarely, if ever, has it been used. Normally, we use strings for those types of values.

Instead, the issue datatype has gained usage as a meta keyword. This originally came from Prerebol, the REBOL preprocessor, but this usage pattern has gained popularity in other utilities too (such as Ladislav's Mecir's include package).

For example, you will see it as:

#include %lib-file.r

#if [a = b] [do something]

These expressions are using issues for meta keywords which are being preprocessed before (or as) the source code is being loaded.

This is indeed useful, but not the intention of issue as a datatype. In addition, this method is not efficient, because each issue instance is a unique string. So, each issue takes memory and equality checks require full string comparison.

In 3.0, due to the Unicode rework, we have to re-examine all character-based datatypes and consider their encoding. The more I thought about the issue datatype and its usage, the more I begin to think:

An ISSUE! should be a WORD! subtype.

Right now, we have:

word
word:
:word
'word

as word-based semantic variants (natural, set, get, lit).

What if we added:

#word

as meta?

If we want, we can still call it an ISSUE! datatype, but the internal representation would be a word, and functions like any-word? would return true for it.

The advantage is that an issue would no longer require string memory (other than symbol table allocation) and equality comparison is extremely fast (including search with FIND.)

Above, each #include or #if found in code would not require extra storage and can be compared with greater speed.

I should also point out that this # usage is somewhat consistent with the mold/all meta-construct syntax:

#[datatype! spec....]

What is the downside? If somewhere out there in userland the issue datatype is actually being used for its intended purpose: serial numbers, phone numbers, etc., then there could be some incompatibilities. But, then R3 makes no general guarantee of compatibility.

Internally, this is an easy change. So, that's not a factor.

Tell me what you think.

11 Comments

Comments:

Kaj
10-Jan-2008 22:12:06 Would that mean that issues can't start with a digit any more? That would be a pity. I'm working on a web dialect and was planning to use issues for referring to bug numbers very succinctly. It's natural for people to write
#123
within their text and have it expand to a link into a bug database.
one vote to retain issue
10-Jan-2008 22:54:16 Can't we have preprocessor-type symbols without losing one of Rebol's more interesting data-types? I always use issue! in my comparisons of Rebol and Curl ... and Smalltalk's Symbol ... and it will make the IDG 'Rebol For Dummies' book less useful to newbies ... but if it must go, surely we won't lose bitset! I hope!
Edoc
11-Jan-2008 9:31:06 I admit to being slightly confused about the meaning of a "meta keyword". (If metadata is data which describes a set of data, then a meta keyword would be a higher-level keyword which describes one or more keywords?)
The # symbol is used to denote a char! and also as a character range notation for charsets. If preprocessing is the main goal here, maybe a new word form could be made consistent with build-markup.
Pier Johnson
12-Jan-2008 2:16:38 Can we add phone! and id! datatypes? After all, the wizards who progammed REBOL internals were smart enough to write code to parse and recognise email addys -- smart(at)wizards.com
Having many datatypes makes REBOL rocking! Add these!
Why would there be a need for the pound/hash sign to make these?
examples
909-222-2222 ;USA tel. phone!
1-800-GOOG-411 ;USA tel. phone!
555-60-5555 ;US welfare id id!

Christian Ensel
12-Jan-2008 12:10:16 I always liked the issue! datatype in conjunction with product numbers and alike, as in #978-0-7645-0745-8, it's just too handy especially in the console.
For meta-words, maybe other characters can be used. With § immediately becoming my favourite one:
§include %lib-file.r
§if [a = b] [do something]
Gregg Irwin
12-Jan-2008 13:30:22 § is a great symbol (at least I like it), but I think of it more in terms of markers for doc organization/tagging. It takes more effort to type, at least on US keyboards, so it would be good for something you don't need to type much.
Brian Hawley
15-Jan-2008 12:49:35 Christian, there is no part of the standard REBOL syntax that requires an alt keycode or character map when typing on a US keyboard. Most of the syntax can be typed without the shift keys, for that matter. It was designed that way to make REBOL faster to type (at least for US typers with PC keyboards). Let's not use the Unicode support as an excuse to become Perl 6.
Is ~ being used for anything?
Christian Ensel
15-Jan-2008 19:29:33 My bad, § isn't a great symbol at all, I hadn't checked against the US keyboard in the first place.
Norm
23-Jan-2008 10:10:45 \ is already an identifier of some issues, as an escape sequence. Could it become a dialect for issues and identifiers, over and above its 'non-interpreted' established syntax. I suppose there would be reasons why the following suggestion would not be fit, but the '\' is used in some languages to mean a user word to be processed by the (also done with \) user's defined functions: \ could be both an identifier of recognised words and of functions on text containing those, by recursive substitutions. So one could make a dialect with it. Tex and Latex are good examples of it, with a large user base, a base also targetted by Lua, with open sources (luatex.org). And '\' is used in C for escape sequences of litteral use of caracters. So, having \ both for issue and format functions would enable us to leverage Rebol with established LaTex, for printed documentation purpose. It would be like the ultimate printer driver, suitable for litterate programming and print on demand, if shell access is allowed to Latex and friends. With it, userland could provide for identifiers of all sorts, targeting a domain of data names and symbols aiming at documentation. To enable this, we have to accept that (within such dialect) some reserved words allocate a series escaped identifiers: the list of Rebol and C escape sequences, like \\ \" \', etc. This would make Rebol amenable to descriptive (documentation) processing, which is a large use of issues as identifier of user's names. Then a user may name his issues: \phone{sometext}, \section{atitle} \documenttype{adocspec}. A huge nomenclature already exist, on which a large userbase would like to parse on with a dialect oriented language. And exquisite printing is one of the last weaknesses of Rebol. Is it possible to mary Rebol to Latex by leveraging \ with {}, to produce a dialect targetting proven established drivers like latex and pdfTex ? There is an great opportunity to seize without reinventing the wheel. Pardon my dithirambic length.
maxim olivier-adlhoch
3-Feb-2008 13:45:46 I use the issue datatype more and more.
It is extremely usefull for parsing and dialecting as a character holding type (concurent to string tags and words).
semantically, an issue is pretty close to a label (word) to identify a unique item of a set so its logical.
if representing it internally as a word doesn't change its use I'm all for it.
I'm just a bit weary of binding subtleties which might complicate its use (expecting values when evaluated, etc).
the note about not allowing a digit as first char is also a MAJOR concern, cause as a SID, the issue! would become useless.
IMHO, One of the reasons it (still) isn't used a lot is that is was not publicized and only rarely used (if ever) in all documentation and examples.
I didn't even become truely aware of it until a few years of using Rebol. I always tought it was in internal type.
Adrian O.
5-Dec-2008 15:24:06 Heavy REBOL user - and consistent issue datatype user.
Current example: ad-hoc polymorphic "database" of blocks, with issue datatype IDs, allowing for non-sequential retrieval within dates. (Each date may have different field-set. Each date includes data id'fied as #1, #2, &c.) Structure something like
database: [
2008-11-01 [
[ idx f2 f3 ] ; fields
[
#01 "a" 123
#02 "b" 234
] ; data
...
]
2008-11-02 [
[ idx f2 f3 f4 ]
[
#01 "y" 876 [1 2 3 4]
#02 "b" 900 [5 6 7 ""]
]
...
]
Distinct issue-type data helps (in the above case and throughout my work) in self-documentation and, with datatype enforcement within functions, error prevention/reduction.
It'd be a shame to see #issues go.
(Of course, thanks to parse it would be possible to programmatically alter scripts to convert issues to, say, strings...)

Comments on: Rethinking the ISSUE datatype

Comments:

Post a Comment: