Comments on: In search of better lower-level byte manipulation

Carl Sassenrath, CTO
REBOL Technologies
16-Dec-2006 22:19 GMT

Article #0056
Main page || Index || Prior Article [0055] || Next Article [0057] || 27 Comments || Send feedback

I would like to ask the group here what we can do to better support lower level byte manipulation.

For example, if you are writing or parsing memory structures (directly), what can we provide to make that easier? Is the struct! datatype (with perhaps some small improvements) sufficient? Or, do we need a few special functions as well?

The path to the best solution is to construct a number of example cases, then test our mapping methods.



Brian Tiffin
23-Dec-2006 15:39:31
I may be stuck in the 60's...but growing up with Forth there is nothing quite as straight forward as fetch and store with a smart move. Smart meaning no need to worry about source dest overlap.

Glossing over the byte, word, long word, quad word issues.

peek and poke could work on an address! type?

Initial unthoughtout thoughts.

Merry holly days.

Jaime Vargas
24-Dec-2006 0:25:13
How about borrowing some Erlang Bit Syntax notations? This paper is a good review and it shows the advantages when compared to the approaches used by other languages such as Haskell, Ocaml and C.

Jaime Vargas
24-Dec-2006 0:28
I like Erlang's approach because it goes beyond maniputalation of C datastructures, it also helps with bitstreams which are present when implementing network protocols. Also is very straight foward when extracting bit parts directly into variables.
25-Dec-2006 3:24:21
To me this is just an other series. So why do we need a new concept? I just want to get a block of memory, bytes or even bits. And than use the currnet functions on those.

bit-series: make bit! [] bytes: make byte! []

bytes: copy 0x1234 0x4321 bit-series: to-bit bytes

select bit-series 23345 will return bit #23345

And so on.

25-Dec-2006 8:35:49
I would like to have some better support for conversions. At this moment I'm using these functions, which doesn't look too nice:
ui32-struct: make struct! [value [integer!]] none
ui16-struct: make struct! [value [short]] none
int-to-ui32: func[i][
  ui32-struct/value: to integer! i copy third ui32-struct
int-to-ui16: func[i][
  ui16-struct/value: to integer! i copy third ui16-struct
int-to-ui8: func[i][
  ui16-struct/value: to integer! i copy/part third ui16-struct 1
int-to-bits: func[i [number!] bits][
  skip enbase/base head reverse int-to-ui32 i 2 32 - bits
I often need to do byte aligning for which I'm using this function:
byte-align: func[bits [string!] /local p][
  p: (length? bits) // 8
  if p > 0 [insert/dup tail bits #"0" 8 - p]
And I need to count the less number of bits needed to hold the integer:
bits-needed: func[i [integer!] /local b][
  b: find enbase/base head reverse int-to-ui32 abs i 2 "1"
  either none? b [0][length? b]

If some of these functions could be done by some native replacement, I would appreciate it. Basically I don't like, that I have to work with bits like with strings (converting binaries to string using enbase and back debase). But maybe there is already other way how to do such a binary manipulations. Maybe I just missed something.

25-Dec-2006 8:41:10
And maybe it would be good to have some native functions for these Ladislav's functions, which looks pretty ugly as well:
probe-mem: func[
	address [binary!]
	length	[integer!]
	/local m
	m: head insert/dup copy [] [. [char!]] 16
	m: make struct! compose/deep [bin [struct! (reduce [m])]] none
	change third m address
	probe third m/bin
	free m
address?: function [
    {get the address of a string}
    s [any-string!]
] [address] [
    s: make struct! [s [string!]] reduce [s]
    address: make struct! [i [integer!]] none
    change third address third s
get-mem?: function [
    {get the byte from a memory address}
    address [integer!]
    /nts {a null-terminated string}
    /part {a binary with a specified length}
    length [integer!]
] [m] [
    address: make struct! [i [integer!]] reduce [address]
    if nts [
        m: make struct! [s [string!]] none
        change third m third address
        return m/s
    if part [
        m: head insert/dup copy [] [. [char!]] length
        m: make struct! compose/deep [bin [struct! (reduce [m])]] none
        change third m third address
        return to string! third m/bin
    m: make struct! [c [struct! [chr [char!]]]] none
    change third m third address
set-mem: function [
    {set a byte at a specific memory address}
    address [integer!]
    value [char!]
] [m] [
    address: make struct! [i [integer!]] reduce [address]
    m: make struct! [c [struct! [chr [char!]]]] none
    change third m third address
    m/c/chr: value
26-Dec-2006 3:50:35
Maybe I should be more thinking, before I post something...
I wrote better version of the bits counter:
bits-needed: func[i][1 + to-integer log-2 abs i]
John Niclasen
26-Dec-2006 18:29:33
I made a small library of bit manipulation functions at one time:
You're free to use them.
Developers had a discussion in the AltME world "REBOL3" in the group "Binary tools" about this.
26-Dec-2006 18:37:54
Maxim Olivier-Adlhoch
27-Dec-2006 0:32:24
It is tedious to go to-from binary in many cases. it often takes several type conversions, or ugly struct intermediates, and even then, one has to know about endianess... which has to be verified per platform, etc.

also, some current binary conversions are rather unsymmetric which often makes them very annoying


to-binary 13
== #{3133} ; int is not a series, SHOULD return #{0D}
to-integer #{3133}
==12595    ; although this is prefered, its not symmetric!

I'd like a uniform means to convert simple types to-from binary VALUES (as opposed to binary SERIES).

These should obviously be native to improve speed ( most binary uses are in more advanced code, which are more often speed-sensitive.)

maybe adding a global switch could allow us to perform all binary VALUE conversion in platform or explicit endianess.

example :

endianess: 'LSB
== #{0A000000}
endianess: 'MSB
== #{0000000A}

27-Dec-2006 4:04:05
One problem I see is the problem with nested structs, where REBOL behaviour differs from C (insert pointer versus insert data)
27-Dec-2006 4:57:14
one question is its purpose. is it for binary files, then i am with jaime and erlang. if it is for osaccess, i need a good c-interface. because the includes and structs are in c, and its easier to use them there.

Other thoughts: struct! should have a way to specify offsets, not to declare the fields there. [at 0 i: integer! at 4 d: decimal! at 12 array: label at 12 + 10 * 4 end]

Java uses a file-like interface with readInt(), seek() and such, but internally optimized. Seems to compare to arrays speedwise, but its more free-form (ints, float etc can be mixed). Good way to share buffers with c, or stream to opengl and such. Or so i heard.

Gregg Irwin
27-Dec-2006 13:55
Struct!s have a few issues that make them painful to use in certain cases; char arrays and nested structs being two prime examples. Unions may not be common, but I had to use them once, and it wasn't fun to figure out (or know if what I was doing was safe). An example of how to do it would be fine in that case though.

Structs may not be too far off (at least add char array support, and improve how nested structs are specified), but maybe a more dialected approach would be a good supplement, where you can say AT or SKIP, specify a number of bits or bytes, and denote the target datatype.

Gregg Irwin
27-Dec-2006 13:58:33
I'll second Volker's thought as well; that the purpose will drive the design. If we have two purposes, maybe there are two models to work against. As an example, it's not a small task to write a dBase file loader, which is easy in most other langauges.
Gregg Irwin
28-Dec-2006 15:19:43
For library interfaces, a BSTR type would be very handy for dealing with certain APIs and DLLs written in other languages that use them.
Jeff M
31-Dec-2006 12:59:25
At the end of the day, there are only a handful of operations that are really needed:

set byte, get byte, bitwise AND, OR, XOR, LSL, LSR, ASR

Once we have these, anything else is easily doable. I know, for myself, that a couple REBOL apps I put together were nearly impossible without these.

As for me, I don't really care about how. They can be functions, or they could be operators. If REBOL3 were to actually support LSR vs. ASR, and they were operators, I'd prefer ASR to be >>> and LSR to be >>, but that's just a personal preference.

For getting and writing bytes, this needs to be possible in many ways: from large numbers all the way through byte arrays. It would be nice if there was setting, clearing, and toggling of bits in large bit arrays (like Lisp) as well.

As for a sample application, my first attempt at something "large" with REBOL was an assembler. REBOL's parse functionality being built in made this very easy. But the complete lack of being able to compose opcodes and then write them (elegantly) to a stream of bytes pretty much brought this project to a halt (in REBOL). Try duplicating that effort in REBOL3 to know whether or not you've got something useable and elegant (IMO).

Just my 2 cents.

Jeff M
31-Dec-2006 13:00:50
Hmm, something in my last post that I didn't mean to imply, but could actually be quite cool, would be to actually COMPOSE values from bytes using a function like COMPOSE. Not sure how it would look, or if it would even be useful, but an interesting idea to explore.
8-Jan-2007 5:16:06
I would like to compose not just from bytes, but from bits as well. For example imagine MP3 frame header which is actually integer with such a structure:
11   sync              0xFFF
2    version           1=mpeg1.0, 0=mpeg2.0
2    lay               4-lay = layerI, II or III
1    error protection  0=yes, 1=no
4    bitrate_index     
2    sampling_freq     
1    padding
1    extension         
2    mode              
2    mode_ext          used with "joint stereo" mode
1    copyright         0=no 1=yes
1    original          0=no 1=yes
2    emphasis
I know, that I can use logical operators to set/get the bits, but than the code would looks like full of magic. The more I'm thinking about the binaries in Rebol, i'm just missing the native shifting operators. I still cannot understand, why there are not here.
Gregg Irwin
8-Jan-2007 14:52:44
If they could include basic RebCode ops, even without the advanced stuff, the mezz wrappers for shift ops are already done. There's a lot of value in those few simple ops. I just want to make sure that if shift/rotate funcs are added, that they can work on ints, bitsets, or series. The series part is also easy to do as a mezz.
Brian Hawley
15-Jan-2007 18:03:27
It may be heading in the opposite direction to the above suggestions, but I would like binary operations that operated on larger binary values, such as bitset! and binary!, and then transparently decomposed these operations to vector operations like SSE or AltiVec. Or to parallel operations on hardware where that is appropriate.

Much as these low-level operations have their place, REBOL is a high-level language, and heavier-duty operations can help minimize interpreter overhead by doing more work between instruction calls. Leave the low-level stuff to rebcode.

A destructuring operation like the one Oldes is suggesting could be very cool though. Perhaps a dialect that generates optimized rebcode would help with that.

19-Jan-2007 6:51:39
And it would be good to improve parse to be able better work with binary format.
For example now I have to write:
bytes: complement charset ""
WORD:  [2 bytes]
DWORD: [4 bytes]
to-int: func[str][to integer! head reverse to binary! str]
parse/all bin [
   copy id DWORD (id: to-int id)
   copy nu WORD  (nu: to-int id)
   copy name to #{00} 1 skip
Wouldn't it be better to have a native conversions? So I would not need the to-int function? for example just:
parse/binary bin [
   copy id UI32
   copy nu UI16
   copy name to #{00} 1 skip
where the name could be binary if the bin is binary?
19-Jan-2007 7:13:26
And.. just found, I should use:
  copy name to "^(at)" 1 skip
As parse is not playing with the binary data type in the current Rebol version, which is not good as well. (please note, that the (at) should be the char for #{00} - it's converted by the blogger script)
26-Jan-2007 4:19
binary <-> decimal and binary <-> integer conversions (both little- as well as big-endian) will be useful. But the example above can be written more elegantly even now:

	UI32: [
		copy lastUI32 4 skip
		(lastUI32: to integer! reverse as-binary lastUI32)
	UI16: [
		copy lastUI16 2 skip
		(lastUI16: to integer! reverse as-binary lastUI16)
	parse/all bin [
		UI32 (id: lastUI32)
		UI16 (nu: lastUI16)
		copy name to "^(00)"

Carl Sassenrath
7-Feb-2007 16:33:43
Thanks for all the opinions, ideas, comments, code, etc. I will read through it carefully.
27-Feb-2007 16:39:29
And here is one more issue... I think, that there should be some way how to convert issue! datatype into binary! as at this moment as-binary with issue as argument gives:
>> as-binary #ff0000
== #{666630303030}
but should be:
>> as-binary #ff0000
== #{FF0000}
if I would need current behavior (really don't know why) I can do:
>> as-binary next mold #ff0000
== #{666630303030}
27-Feb-2007 16:53:06
Ach... I take my last comment back. I forgot, that issue! datatype can hold not just hexa-like-values, but may be much more complex:) and I can write for example:
>> x: to-issue "ahoj lidi"
== #ahoj
>> x
== #ahoj
>> as-string x
== "ahoj lidi"
1-Apr-2008 12:51:49
what about inherting some compact codig from script languagues like

unpack bin_data [a: b: c:]  ["iuA"]

unpack as a string bin_data using patern "iuA" i= integer, u = unsigned A= string

or replacing replacing by [integer! integer! string!]

Post a Comment:

You can post a comment here. Keep it on-topic.


Blog id:



 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

Updated 29-Mar-2017 - Edit - Copyright REBOL Technologies -