REBOL 3.0

Comments on: Modules in binary compressed format

Carl Sassenrath, CTO
REBOL Technologies
15-Oct-2009 7:44 GMT

Article #0274
Main page || Index || Prior Article [0273] || Next Article [0275] || 17 Comments || Send feedback

For many years REBOL apps like REBOL/IOS, AltME, and others have used compressed binary data for network transfers (which were often encrypted as well.)

As you know from the CureCode todo list, we need to make R3 Chat an R3 module, isolating its namespace from the global context. However, because it downloads fresh each time, it is compressed to speed up transfer.

To make it a compressed module, I could "hand-craft" a little wrapper. Not a problem, but I think it would be better if we simply supported such modules directly, using the header to indicate the compression, with some kind of agreement about how to attach the binary data (both raw and base encoded modes.)

Of course this concept is nothing new, nor does it need to be fancy in any way, but formalizing it into a standard for R3 Modules does require a few lines of code and a couple examples posted to the Docs would be nice too.

The format would be something like HTTP:

Binary Module Format

REBOL header in UTF-8

CR-LF CR-LF

compressed binary data

The header would indicate the encoding of the data.

And, like all REBOL code and data, we would want to easily view a module's contents in text format with a short line like:

probe load/all %bin-module.r

Anyone have something already cooking on this?

17 Comments

Comments:

-pekr-
15-Oct-2009 4:40:43
Aren't we partly visiting Rebin and RIF land? IIRC, Rebin was supposed to be a binary data representation, no? Let's make it flexible, not just for modules, or we will create another .rip, which will R.I.P., as .rip itself :-)
Maxim Olivier-Adlhoch
15-Oct-2009 4:44:48
it would also be nice to have save and write equivalents... which create these directly.

it would also be nice for all supporting funcs to accept an encryption key and mode.

so if the binary is encrypted in whatever way, as per its header, this would open it in one line.

probe load/all/key %my-module.r "j30dncvegh293m"

import/key %my-module.r "j30dncvegh293m"

Hostile Fork
15-Oct-2009 14:51:01
The idea of a script saying its contents are compressed--with a textual header and binary contents, does not sound aesthetically appealing at all.

If I opened a directory and found a bunch of those files, and discovered this was a Rebol "feature", I would blame Rebol. It's the wrong place to be slicing this.

The right place to handle it is the network protocol layer. It's too bad that Rebol doesn't support zip because then you could just put in the HTTP header "I will accept a zipped response" and the server would take care of it on the fly, doing the proper caching and all.

http://en.wikipedia.org/wiki/HTTP_compression

Would be nice if Rebol supported zip and could let all HTTP requests leverage this work. If you don't want to do that, you could always make an Apache module (mod_rebzip) which did this on the server side for your own compression format.

But I strongly advise against making another custom compression-header code obfuscation scheme...

Oldes
15-Oct-2009 15:08:39
I agree with Fork that REBOL needs better support for compression instead. I don't think we need ZIP, we need GZIP.

And it would not require much, the ZLIB code is in REBOL for ages. It's used internally in compress function and also in PNG code. REBOL's compress just needs to be able produce correct gzip header.

The HTTP scheme should be enhanced by us afterwords but please enhance the compress function first.

With official gzip support we could also do:

probe load/all %bin-module.r.gz
do %very-large-script.r.gz
Hostile Fork
15-Oct-2009 15:10:36
Quick clarification: HTTP compression typically uses gzip, which acts on a single file (like compress), not ZIP.

A mezzanine level gunzip/deflate would probably not be difficult to write.

Oldes
15-Oct-2009 15:14:06
And with reusing of code which is used in GIF loader, we could have at least LZW decompress refinement :)
Hostile Fork
15-Oct-2009 16:02:20
Making do sniff files for the gzip header is a neat hack, but I still don't like the idea of manually maintaining directories of zipped files. I much prefer letting the server do the work. That way I can open it in an editor, make a change, and it's all ready...no mess.

The one issue with having this handled at the http layer automatically is that Rebol has to get involved in its defaults. For instance, the average-case request header should say:

Accept-Encoding: gzip

And then when you type:

read http://example.com/myscript.r

If the server returns a response with a gzip encoding, it should be decoded by read (and not do). I think this is the best default for http read across the board and is what people expect.

These sorts of "magic" defaults are a good thing as long as there's documentation and a way to disable it. I posted about my experiences with http headers in the system object recently and it seems there already is magic like this—but it wasn't explained anywhere. :(

Vincent Ecuyer
16-Oct-2009 4:52:50
For http with deflate/gzip, it does work in R2, if you change the header and do the png decoding trick on the received data.

With a slightly enhanced 'decompress, it could be easily integrated in the scheme at the mezzanine level.

(both zip/unzip and gzip/gunzip can be implemented as mezzanines as long as zlib deflate/inflate is exposed)

Carl Sassenrath
16-Oct-2009 15:42:04
I see that we have multiple topics above!

I will open a new blog for the compression, zip/rip topics. So save your related comments for that blog.

Returning to the compressed module topic it is important to know that the REBOL header is clear text, not part of the compressed data body. Otherwise, I would not even suggest this concept!

The file is REBOL code/data as compressed binary. That is all.

It remains a single file, very simple in its encoding. Rather than being UTF-8 (the required binary format for R3) it is a compressed UTF-8. (This change would only require a few lines to be modified in the mezz-load.r code.)

That is why I stress the line:

probe load/all %file

This code shows that the file is nothing more than a REBOL text file simply with a different binary encoding.

This proposal is not related to REBIN, ZIP, RIP, GZIP, or TAR.GZ, nor is it intended to be. That is a different discussion, and I will get that blog open so you can comment on that separately.

Hostile Fork
16-Oct-2009 17:26:55
Maybe I didn't understand, but... this seems like you're saying you have a file that you want to transfer quickly over the network, employing compression.

(I'm assuming a module is a .r file)

You didn't say you were worried about how much disk space the file took up on either the server or the client. (I'd suggest letting those who really care use compression-enabled filesystems. Then they can apply it to more than just .r files.)

How is the idea of hooking into the standard HTTP compression not suitable? It seems much simpler than introducing a hybrid. Also it would give Rebol something it needs--which is to default to using compression for http transfers in general. Right?

Correct me if I'm wrong, but the only advantage of what you are proposing is that it takes up less space on disk once you've gotten it. That seems like a poor tradeoff. Sure, you can still read the header and know what the module is... but if you want to read the code you have to decompress it.

Maxim Olivier-Adlhoch
17-Oct-2009 1:52:09
fork,

Its an option, don't use it if you don't like it.

I find it useful.

for large text files it can be impressive how much human-readable text can be compressed. 80+ %

its also useful for obfuscating code delivered to clients where you don't want them to snoop around in the code and break things.

its also useful so that text searches by a client do not return your code.

obviously it won't be used for situations where the source is intended to be readable... ex: rebol.org.

Hostile Fork
17-Oct-2009 8:30:11
Maxim--If you're saying this is a code obfuscation tool, then that's different from Carl's stated purpose. The criteria he laid out are served adequately by http compression.

I don't think obfuscation is a priority feature. Rebol's main problem isn't that people are going in and editing the source. It's that no one knows (or cares) what the source means.

What I'm worried about is that files like this will increase indifference, make Rebol look crazier than they think it already is, etc.

If someone has serious encryption needs, that should be addressed in a more intelligent way. Or just run the code on a server that people do not have physical access to.

Brian Hawley
17-Oct-2009 16:12:29
Fork, accept that there will be circumstances that do not involve HTTP where this will come in handy, and that in order to implement this properly we need to build it in. You can comment in the other blog about the needs of HTTP compression - I've already started the comments along those lines.

Now, as to the topic at hand...

To do this properly in the R3 script/module system, you have to realize that there is very little difference between modules and regular scripts - just some binding differences later on. However, in order to do this decompression we need to do it in the LOAD function itself, before any binding is done. If we do this outside of LOAD or in the module itself, it will break the binding sequence and be really tricky to do. In LOAD, it will be really easy to do, just a matter of putting a few lines of code in the right point of the sequence.

What we really need right now is some consensus about how the header flag will be specified. If we are only compressing with REBOL compression (zlib deflate + a length integer), then we can just say Compress: true in the header, and let SAVE and later LOAD do the work. That would be the simplest method. If we want to support other compression algorithms and/or encryption, that would be a little trickier, but still no difficulty.

I will be happy to do the actual coding myself, once there is some agreement about what, exactly, we want to do. Decide! :)

Brian Hawley
17-Oct-2009 16:39:15
Here are some caveats that you can expect:
  • Script-in-a-block won't be directly compatible with compression. If you need to embed compressed/encrypted scripts in other test/binary files, this will be tricky and likely involve encoding the compressed/encrypted data in a REBOL source binary! syntax, like 64#{}. We should see if this ability is needed before we try to do this trick.
  • If whatever compression/encryption algorithm we use can't handle extra data on the end, then our data format won't be able to handle it either.
  • We likely won't be able to handle incomplete files - they probably won't be able to decompress/decrypt. LOAD/next won't happen until after the decryption or decompression is finished. This could be seen as an advantage in some cases.
  • Correctly compressed/encrypted data which had bad REBOL syntax before will still be bad syntax after it is decompressed/decrypted. GIGO.
  • If we use R2-compatible compression, we can generate these scripts in R2 if need be using similar REBOL code. However, that compression format will need to be properly documented, in case we need to do this compression somewhere where REBOL doesn't run at all (yet).
  • This won't provide a speed advantage when doing network compression as well, but it has other advantages outside of that. Such as author verification using public key encryption, for instance.
  • Compressed or encrypted files are more sensitive to data corruption than raw text is. Beware!

Does anyone else have concerns to add?

Oldes
18-Oct-2009 7:31:57
I can see a problem if the "compressed data" in Carls example above are pure binary data.. such a scripts can be corrupted easily during transfer as for example browser or ftp client may not know that the content is binary format.

When I was using compressed files I used something like:

REBOL []
do decompress 64#{eJwrKMrMK1FQKkktLlECAB2xBFIMAAAA}

so do I understand that we are just talking about getting rid of the do decompress part? So we could use:

REBOL [compressed: true base: 64]
eJwrKMrMK1FQKkktLlECAB2xBFIMAAAA
or
REBOL [compressed: true]
64{eJwrKMrMK1FQKkktLlECAB2xBFIMAAAA}
For transfer data like in R3 chat over HTTP I still prefere correct way using HTTP compression. Instead of something like:
REBOL []
xœ+(ÊÌ+QP*I-.Q^B^(at)^]±^DR^L^(at)^(at)^(at)
Brian Hawley
18-Oct-2009 16:33:01
HTTP compression should be discussed in blog 275 - we're talking about script compression here.

Binary compression will be fine, as long as it is over a binary-friendly connection or storage medium. However, Oldes, you have made it clear that a textual encoding will be needed in some circumstances.

It will be easier and better to do the textual encoding in REBOL syntax, like this:

REBOL [compressed: true]
64#{eJwrKMrMK1FQKkktLlECAB2xBFIMAAAA}
This will make the file easier to generate and decode using REBOL code, and still make it compact - only 5 characters of additional overhead in the encoded data, plus the header.

It will be easy enough to make SAVE generate base64 encoded data, and LOAD will be able to decode it automatically with no extra code. And since the extra overhead characters are the same every time, it will be no difficulty to generate them without REBOL if need be.

onetom
21-Oct-2009 6:05:13
or are we talking about such a compression which makes the Rebol EXEcutable so small? :) like dumping out the internal binary implementation of the blocks and the other datatypes.

Post a Comment:

You can post a comment here. Keep it on-topic.

Name:

Blog id:

R3-0274


Comment:


 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote
 
 

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

REBOL 3.0
Updated 28-Mar-2024 - Edit - Copyright REBOL Technologies - REBOL.net