Comments on: Redefining BINARY!

Carl Sassenrath, CTO
REBOL Technologies
5-Mar-2008 19:24 GMT

Article #0119
Main page || Index || Prior Article [0118] || Next Article [0120] || 3 Comments || Send feedback

I've ready your feedback on my prior note about the use of INSERT for the BINARY datatype. Thanks for the quick feedback.

In order to make progress and resolve these issues as quickly as possible, we have to break down the problem into smaller chunks.

For example, these are fundamentally different binary actions (bin is a binary series):

insert bin "test"
insert bin 123

The first is a question of auto-encoding (converting) a string to be a binary value; should it be UTF-8 encoded and should LF's be converted to CRLFs.

The second is a question of semantics: are we asking for the byte value 123 or the integer string representation "123" to be inserted? (And then you could then add the above encoding question on top of that.)

In R2, we converted 123 to "123" then inserted that. Why? It came from this the desire to have:

port: open/string %file
insert port 123  ; here port is of string type

be very close to:

port: open/binary %file
insert port 123  ; here port is of binary type

We wanted them to be consistent. You can see why by the example.

However, in real life, there is no such thing as perfect consistency. There are always exceptions, even in the mathematics of computer languages and denotational semantics.

After all, what does it mean to do this?

insert bin image

Such a statement implies that a flat binary serialization of an image has meaning. In reality, it is of little use to us as written, because an IMAGE is a composite value, it includes more than just image data; we need to know at least the width of a line in pixels and size of a pixel, or we lose information in the above insert.

This is true of many other datatypes as well. What does it mean to have any specific datatype converted to or from binary? Examples? There are many:

to-binary "text"
to-binary 123
to-binary 12.3
to-binary #123
to-binary 1:23
to-binary 1-2-2003
to-binary image
to-binary object
to-binary charset "abc" ; a bitset

And, of course, you have the reverse conversions. I will not list them; they are pretty obvious.

In the past, we've had "the luxury" of ignoring a precise definition of BINARY.

We can no longer afford to do that. It must be precisely defined.

Yet, time is short. R3 must move along.

Possible conclusion?

My current inclination (working conclusion) is that perhaps we need to remove all to-binary conversions entirely, then bring them back one-at-a-time as we can properly define them. And, if we cannot define them, they will not exist.

In this way we will avoid creating a set of new incompatibilities with our future R3 implementations.



6-Mar-2008 8:00:01
This is going the right direction. I think binaries should echo the rebol datatype's internal format, except where endianness is involved. Choose a standard endianness and use that for every platform. So maybe TO-BINARY makes a cross-platform (one endianness) binary, but AS-BINARY just returns what the rebol datatype has stored internally (could be useful for moving directly into a vector, for speed).
maxim oliver-adlhoch
8-Mar-2008 3:05:15
Sorry if this is a long post, but this issue cuts at the heart of one year of commercial work, I did using REBOL.

last year I was working on entreprise-level tcp client/servers which connected to EDI data servers (highly compact data protocol, binary based, with its own encoding) from european ticketing agencies. and had a few issues using binaries. forming of values made everything more complicated. converting to-from scalars was something unobvious at first. CRLF massaging also was a hindrance at some point.

we must realize that using a binary directly is inherently a low-level/expert operation. only a handfull of scripts or situations need to go there, most of them really being advanced uses anyways.

I'd rather there be absolutely no modification of data on insert/append, and allow only scalars or non-encodable data to be used. The endianess becomes an issue anyways. in such a case, I'd go with what ever is standard in internet comms by default, in keeping with REBOL's messaging roots.

if we had specificaly separate scalar and series convertions it would already ease the integration.

and when using binaries, we want the least going on behind the scenes, cause chances are we need extremely precise control over the 8-bit bytes. just adding CRLF handling breaks all tcp servers, cause length is changed, and practically all protocols use a byte length somewhere.

in my case the server used a persistent connection, with end-to-end messages.... a one byte discrepancy means the loss of a whole stream of data, not just one msg.

I can Even see places where the same binary actually stores SEVERAL encodings... RSS feed aggregated data for example.

lets not make complex things even more complicated, by trying to make them simpler. Like Carl says, no decision on what massaging to do will ever be perfect, and if there are 10 cases, chances are, it will only help in a few, but be a hindrance in all others.

Rudolf W. Meijer
18-Mar-2008 5:33:35
We need to worry also about the type properties of binary and its elements. To me, binary! is series! but not any-string!. Then what is the type of pick #{313233} 1? It should not be char! as in R2.99, but either integer! (as it was in R2) or a new datatype, byte!, unsigned integers in the range 0..255. Of course, byte! would be number!. This would enable type- and range-checking when doing insert bin 123 (OK) or insert bin 1234 -- here we have a choice of forbidding (because > 255) or converting to two bytes. I have posted some comments in the mailing list on the application of extract to binary series. The current anomaly of forming an integer before inserting it into a binary series (which I understand will go away) made extracting non-intuitive. I would plead for the expected behaviour
extract #{312032203320} 2 = #{313233}

Post a Comment:

You can post a comment here. Keep it on-topic.


Blog id:



 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

Updated 24-Mar-2017 - Edit - Copyright REBOL Technologies -