Comments on: The 64 bit question

This question has been in the back of my mind for a few months now. So far, REBOL 3.0 still uses 32 bit integers in order to get it up and running as quickly as possible.

However, the time is coming over the next couple weeks to make the big jump to 64 bits. The advantage of course is that integers can hold much larger values, such as the sizes of very large files, etc. The disadvantage is that 64 bit values require more CPU time for memory access and math operations. They also take twice the memory space, although REBOL already allocates that space, so most programs won't see that increase.

The 64 bit change could also affect more than integer values. For example, do we want to allow series (e.g. strings, binaries) to be larger than 2/4GB? I'm not talking here about files (see below); I'm talking about actual in-memory series. If we want that, not only does the management structure (e.g. tail position) increase, but so does the series index pointer (to be able to address extra-large series). Although this is not such a big deal for the structure definitions, it is a big deal to the run-time code; REBOL is packed full with references to things like tail and index (for bounds checking, etc.).

And finally, there is the idea that perhaps we could use a compile-time switch to build both 32 bit and 64 bit versions. A 32 bit version makes more sense for things like cell phones and PDAs. But, it raises the issue that these versions of REBOL would be closely compatible but not perfectly compatible. Perhaps that could be improved by using 64 bit integers, but 32 bit series.

As with any design process, there is a range of tradeoffs to consider. Certainly, the needs of REBOL users should help us to determine the best mix between the choices. Personally, I like the idea of 64 bit integer math, but the 32 bit series restriction is fine with me. I would be using seek to address such large files anyway.

And, with regard to files, I should note that I am planning that the new port! datatype will allow 64 bit index access. The new ports will no longer store their index positions within their references, but within the port object itself. The port system will work more like traditional file access than it has in the past.

36 Comments

Comments:

Volker Nitsch
10-Apr-2006 16:42 :) From a compatibility-POV, where is the problem with series? Series need sufficient memory. If you seek to the tail of a >4gb-series, you need that memory first.. (exception would be if we can seek to any value and get the index back, and someone finds a way to use that for a hack. But we can not seek behind the tail of a series, which needs enough memory first. (which can not happen with 32bit-cpus anyway)).
From performance, how big is the impact? conversion 32->64bit could actually be slower than 64bit all the time. And memory is fetched by bursts and wide busses, so there may be a small difference. (Thats my theory. Curious for benchmarks :)
About 64bit, that means time and money could be counted in integers? I would like that.
64bit-series II: Could be usefull if you enable memory-mapping. Then a laarge db is "read" into memory by a little os-call, and the physical reads are only for really touched "memory". Could be nice for scripts which start fast, do a little bit and terminate, but need to look into that laarge db, or get a frame from that unpacked video one edits.
Ryan Cole
10-Apr-2006 16:42 :) 64 bit is cool, but its need would be infrequent and not worth any performance hit for everyday use IMO.
I have never yet had the need working with such huge series in memory. Typically around 100,000 and I am looking around at databases. I have tested a few series up to 5 million, far off from 2^31.
Accessing huge files is bound to come up for me soon, so I am glad to see that.
Marty Heyman
10-Apr-2006 16:42 :) Tough one, Carl. Thre's a nasty cost to this on 32-bit machines. For the present, since 64-bit is primarily on Server systems, it seems wise to keep 32 and 64 bit versions so user experience is of a fast system. But it's a pain building and testing both. Obviously, coding that compiles for both would be a win and doing the builds to ensure that would be appropriate (if you keep 32-bit around for real people to use).
Brian Hawley
10-Apr-2006 17:05 I vote for 64bit on 64bit platforms, 32bit on 32bit platforms, ReBin platform neutral (probably 64bit, maybe variable), Services marshalling if necessary, and some set of constants somewhere in the system object that tell us what our limits are, endianism, etc. I would like to use the capabilities of 64bit platforms when available but still be able to use 32bit when I need to. Ports and files as 64bit is fine, but remember that eventually these may need to be upgraded to 128bit (on Solaris for instance). I personally use both 64bit and 32bit platforms.
Brian Hawley
10-Apr-2006 17:10 On a (possibly) related note, should list! be sparse? Why store none values in a list when you can skip them? Another memory-saving optimization...
Paul Tretter
10-Apr-2006 17:11 :) I like the options of compile-time switches. The way I see it - it has been a long time since a major rework has been done to REBOL therefore, we may find technology to dictate this need sooner than later and now is when we have opportunity.
Paul
Edoc
10-Apr-2006 17:11 :) Here's a marketing thought that may not be popular with the developers here:
64-bit is a feature/benefit. It has value, but probably not to the average user. Make 64-bit an encap option for the SDK. In other words, if you want the extra-strength 64-bit version, you get it (and other features) when you purchase the SDK.
Sorry. Someone had to suggest it.
Brian Hawley
10-Apr-2006 17:23 For that matter, 64bit integer math on (non-embedded) 32bit platforms could be a selling point for REBOL over other languages. As long as the integer math made use of more advanced instructions when available (like SSE2, AMD64 and such) this could be a good thing.
Brian Hawley
10-Apr-2006 17:32 Perhaps 64bit series could be implemented as ports to memory-mapped files, and keep regular series as 32bit. The port index would be 64bit, integers 64bit and series 32bit. That would solve the performance problem.
Still, I like the idea of taking the opportunity now to make REBOL adjustable at compile time to different memory sizes. This would improve portability in the long run, both to systems that are more capable and less so.
Gregg Irwin
10-Apr-2006 17:49 R3 has a short timeframe, right? So, while we know that we'll have to address 64-bit at some point, is R4 so far off that we need to do it now and make the more far-reaching changes it sounds like it will require internally?
Does the 64-bit question affect how/when/if money! will change?
Volker Nitsch
10-Apr-2006 17:58 :) I think compiler-switches are obvious. After all, isnt there size_t in c? In an interpeter series-accesses should outnumber real int-math by far. These would be in the cpus prefered size. Their value-range would be limited by memory, so users would not even know what the internal size is.
About 64bit-ints: Playing Chrystal-ball: We are close to the 2gb-signed (we have no unsigned ints to index). Files are already larger, and we want to seek them. Memory is close. When Vista comes out, 2gb-signed will fall. Vista will come out when hardware-makers can meet the basic spec (close too). It was always windows job to be slow enough to force hardware-upgrades.. Now looking back at the intro of win95: IIRC there where stagnating memory-prices for a long time, typical machines had 4mb. Some month after intro supermarket-machines had 8-16mb and the memory-prices where down enough. A sudden strong demand, such masses could be sold safely enough, producers had low risk when increasing production.. Vista-spec look a lot like repeating that. Non-professional scripters will expect to deal with it easily. 32bit-ints will look weird.. (disclaimer: it was a cheap chrystall-ball, dont count on this when investing :)
Gabriele
10-Apr-2006 18:38 :) My vote is: 64 bit integer, 32 bit series. The reason for 64 bit integers is that we need SIZE? to work for any file (I can easily create >4Gb files on my system, just a couple days ago I was playing with a 7Gb file, both on NTFS and EXT3).
It may be desirable to have both 32 bit and 64 bit integers for performance, however I'm not completely sure it's worth the added complexity.
tom
10-Apr-2006 18:53 :) Oh Please! Oh Please! I would use it most every day. My context is I write small programs that process huge (genomic) data files on 64bit machaines with dozens of gigs of ram. programming in the small with data in the large works very.
Typically I do not open a huge file and seek to some spot snatch a bit and close the file (those jobs are done in a database) I process to whole thing beginning to end. having scripts break because the data grew over a 2gig limit which, in our case exists soley in Rebol, interferes with Rebol being seen as the serious tool I know it is.
I know my situation is not currently typical case, but soon it will be. 2 gig limits will go the way of single sided floppies. Now you have the chance to establish a good foundation for the next set of basic expectations . take it.
Jussi Hagman
10-Apr-2006 19:03 :) I've been not following Rebol development that much lately, and am not aware of what people are using Rebol for. In any case, given today's processors, I am quite sceptical that the integer performance would be a big factor to any perceivable slowness. Were the integers 32 or 64 bits.
Some years back when I did more Rebol I already had some issues with 32-bit integers not being enough (calculating some folder sizes byte wise) so at least 64-bits is really needed.
I think that from a beginner's point of view the best thing would be to use arbitrary precision numbers by default. The programmer could then optimise the performance using fixed length integers *if and only if* needed.
Obviously I don't know how feasible the implementation would be or how big the speed penalty would be.
Allen Kamp
10-Apr-2006 21:10 :) Perhaps implement 64 bit series, as special case per lists! and hash!
Tamás Herman
10-Apr-2006 21:11 :) After learning REBOL for a month my first real-life application would have been a raw disk editor (for helping myself to recover a broken 80G disk). I felt it a shame that i finally had to write it in gforth :/
Andreas Bolka
10-Apr-2006 22:42 My vote for pure 64b (series, ints, everything). And yes, I actually require 64b support for some work I do (unfortunately not w/ REBOL, at the moment :)
yeksoon
10-Apr-2006 22:48 :) I vote 64.
I see it from the potential marketspace that RT is going to address...
Andreas Bolka
10-Apr-2006 22:52 And ah, i forgot: memory-mapped file io would be a lovely thing to have as well :)
Jaime
11-Apr-2006 0:52 I am on the 64 integers and 32 series camp. Maybe we could have an extra type for 64 series.
Brian Hawley
11-Apr-2006 1:56 Can you use mmap to allocate memory not associated with a file, or have it allocate a temporary file? I might be unclear on the concept.
If we do 32bit series and 64bit ports, perhaps we could have 64bit big-block!, big-string! and big-binary! types, perhaps implemented as ports, perhaps as another kind of series. These 64bit types could even be implemented using a component that is only available on 64bit builds. Add big-integer! and big-decimal! infinite precision numeric types and we'll really have a party! :)
Cyphre
11-Apr-2006 3:28 I'm for 64 bit integer, 32 bit series.
Oldes
11-Apr-2006 3:28 And what's with the 'bignum' component, which is in current Rebol versions? I'm sure that we need bigger integers (or at least unsigned 32b integers), but I'm not sure if we need it everywhere. Most of the integers I use are the small one. I vote for more integer datatypes, not just big-integer! but also small-integer! (16b). If it's not possible, then 64b, as we will need it more and more.
rebolek
11-Apr-2006 4:00 I'm 64bit positive.
Artem Hlushko
11-Apr-2006 8:19 I do not need 2 different 32b and 64b versions for the same platform. I need platform specific versions. If it is not possible allocate >2G at some platform why to use 64b pointers, counters and bounds? For portability of scripts it will be enogh to clearly state platform restrictions in run-time.
Karol Gozlinski
11-Apr-2006 8:31 if I am not mistaken currently rebol stores values in 16 bytes slots together with datatype information, so there is not enough place for 64bit series, we need 8 bytes for pointer, 8 byte for index and where is place for datatype flag? Does that mean that REBOL 3 will have 32 bytes solts or preserve 16 bytes slots with the cost of 32bit series only?
Christian Langreiter
11-Apr-2006 8:31 :) 64 bit everywhere, please! and I second andreas' plea for mmapped files :)
Gabriele
11-Apr-2006 12:43 :) Karol, I think that 64 bit series would mean increasing the slot size.
romano
11-Apr-2006 18:56 4'294'967'296 (unsigned 32 bit) slots in a block need 68 GB of memory with 16 byte slot and 85 GB of memory with a 20 byte slot.
Anonymous
11-Apr-2006 23:30 64bit Carl. It will be one more reason for my programmers to make the switch.
Karol Gozlinski
12-Apr-2006 3:15 Rebol should be able to consume more then 4GB of system memory (even current PCs are very close to that size of memory), but holding binary or string with size over 4GB is real but special situation. My vote is : - 64bit integer, - series addresed by 64bit pointer but indexed by 32bit - not indexed datatype for holding strings and binary data over 4GB No reason for increasing slot size which rise memory needs and performace can also suffer.
Karol Gozlinski
12-Apr-2006 5:12 More about holding data over 4GB. I don't think that it should be treated just like normal series. I mean autoextending if appended data reach allocated space. Gigabytes of memory should not fly around memory, be garbage collected etc. Just allocate memory (make big-binary! 10GB) and deal with this. Only subset of current series managment functions should work with this monstrum, those which make sense.
Anonymous
12-Apr-2006 18:46 No reason I can see why the largest integer and the largest series index should be the same.
Even with 32-bit integers it would have been possible to have series represented by a pair of integers (or similar scheme) so the largest series could have been 2**64 elements.
Best to keep the two separate, even if they do coincidentally have the same upper limit.
Gabriele
13-Apr-2006 13:20 :) Why series index cannot be bigger than integer - because INDEX? returns an integer.
Scot
13-Apr-2006 13:20 :) I want to keep series fast and small for devices, but I would really like to have 64 bit integers. I like either the two-integer scheme for series or 64 bit integers and 32 bit series.
maximo
26-Apr-2006 10:03 well, since I am in a situation where I will NEED larger than 4GB access...
I vote for having two versions. 32bits is enough for 99% of work.
Why sacrifice speed, RAM (which rebol already consumes too much of), or try finding a sweet spot which does not work for everyone?
Even if you do create a hybride mode, please add new (64bit) datatypes, to preserve as much efficicency and please DO compile a fully 64 bit version ALSO.
This version will not be downloaded often, but will allow REBOL projects to scale beyond small scale coding, if they grow.

Comments on: The 64 bit question

Comments:

Post a Comment: