Comments on: Vector Datatype

While I have your attention (from the prior note about hash! datatype), I want to insert the vector! datatype into the discussion pipeline.

As you know, vector! has been planned for some time. I became even more motivated to add vector! after the Milan DevCon, when several Swiss developers told me it was essential to their projects.

What is a vector?

In short, it is an array of a specific scalar datatype, such as integer, char, unicode, decimal, logic, and perhaps others.
In long, a vector is a first class REBOL datatype that could produce useful results from the range of native methods defined for all datatypes.

For me, #1 has a lot of appeal. It provides a way to efficiently store and access large arrays.

However, I must admit that the operations provided by #2 could be quite interesting. (The computer I used next after building Amiga was the Cray XMP48, a vector-based processor - handy for scientific applications.)

Suggested Plan

Since under-the-hood, REBOL is already based on a vector engine design, it makes sense to provide a vector datatype. But, I don't want to spend a lot of time mulling about the wide range of vector function possibilities. There are a lot of other things that need to be done in 3.0.

The plan is to provide a basic no-frills vector! datatype. In its basic form, it does not add much to the size of REBOL. It may even be possible to back-port it to REBOL 2.7 with little effort.

Then, we need to hear from you, the developers, as to what additional features you may require... then set some priorities. For example, do you want vector comparison, find, addition, multiplication, etc.

So there's something for you to think about... as you sip your Cabernet over the holidays.

19 Comments

Comments:

John Niclasen
29-Nov-2006 14:31:51 Brain-storming: (so I haven't thought it out completely)
A vector is a special case of a matrix. Matrices can be nxm in size, and a vector is then a nx1 matrix. I think, the next level is a tensor, so all vectors can be specified as nx1 matrices or nx1x1 tensors, and all matrices can be specified as nxmx1 tensors. In general, you can continue to whatever dimension, you need. Will it be meaningful to implement tensors (or multi-dimensional vectors), now we're at it? Maybe all three datatypes should be implemented? It should be triviel to implement matrix and vector datatypes, if tensors are implemented.
And yes, we need to be able to calculate with those, adding, subtracting, multiply with both scalars and other (multidimensional) vectors, get the determinant, maybe the reverse (A^-1), etc.
Carl Sassenrath
29-Nov-2006 14:39:55 Thanks. It's a good point John. We should consider that now, rather than later.
I agree that it should be trivial, as long as we can keep it based on 1-to-1 real memory mapping (no sparse matrix optimizations for large empty sections). That is, just keep it simple.
-pekr-
29-Nov-2006 15:00:31 as my understanding of this topic is really weak, I would like to ask (as noone reacted to in previous blog) - isn't John's proposition similar to what Norm proposed in hash blog comments - n-tupple datatype?
John Niclasen
29-Nov-2006 16:45:52 Carl, there are some libraries with linear algebra (or vector processing), that you maybe know already:
BLAS: http://www.netlib.org/blas/index.html
LAPACK: http://netlib.org/lapack/index.html
I found the links here: http://developer.apple.com/hardwaredrivers/ve/vector_libraries.html
John Niclasen
29-Nov-2006 16:52:23 pekr, as I understand Norm's post, he talks about a one-dimensional vector-kind-of-thing. Matrices are 2-dimensional, and you can go to whatever dimension in the math.
Ralph Paul
29-Nov-2006 18:59:14 Carl, I think you should not try to implement anything other than an efficent vector type. I would not worry about BLAS/LAPACK, ... because these libraries should be used from Rebol with a foreign function interface, not redone in Rebol (with all of the mistakes that amateurs do).
Otherwise sooner or later developers will ask to you redo Matlab or F95 in Rebol. If you really want to implement multi dimensional arrays, look at Fortran95 or later. Syntax-wise they are quite advanced. Overall, I would stick with a native vector type and an additional 'iterator/indexing' type which provides efficient access to elements. With such an indexing type one can easily emulate any kind of array: C matrices (row major), Fortran matrices (column major), create efficent memory layout (see http://www.usenix.org/publications/login/2006-10/openpdfs/johnson.pdf ) tensors, ... An interesting place to look into could be equi4.com and the work done on 'Vlerq' and Metakit. Cheers
Karol Gozlinski
30-Nov-2006 3:44:20 Does AGG have matrix algebra? Could it be exposed to rebol?
Marco (Coccinelle)
30-Nov-2006 4:02:03 Carl, please, concentrates on the goals of Rebol 3. Vectors are probably a good idea, but the remainder is more important in my opinion.

Continued Lightweightness

Programming-in-the-large

Hybrid Open Model, Open Interfaces

REBOL-in-the-Browser

REBOL/Services Built-In

Optimized Graphics

Greater Locality Support

Resident Storage Method

Mario Cassani
30-Nov-2006 8:09:11 Carl says it should be trivial but how much time does it takes?
Can't, maybe later, something to "map" and handle a block! to n-dimensions be done? A datatype! built over an existing datatype!
If this requests too much effort it should be postoponed as Marco and others suggest, especially for multidimensional and more generic vectors.
I must admit that as I am using REBOL with the students to integrate mathematics topics with programming I faced the matrices and vectors "problem" in the same terms described by John.
Anyway I created a quick and dirty function to remove a given column and row and I'll probably write a determinant function with block!s.
These have their limits as to make them easily understood I had to use a block! of block!s and I don't want to think about multidimensional vectors implementations at this level in the future...
They will also be useful when porting mathematical programs written in other languages to REBOL.
I gave up a lot of such programs porting due to matrices lack.

In a few words: I think that they are needed but do them now only if it's really quick and easy.
Brian Hawley
30-Nov-2006 9:14:56 While vector/matrix math would be useful, the part we really need is the arrays of a consistent type part. We could really use these when calling foreign functions that require them, and it could be used to speed up REBOL code (definitely rebcode) and reduce memory usage.
If it turns out that they are compatible with BLAS and LAPACK, all the better.
Andreas Bolka
30-Nov-2006 9:15:52 i think it is austrian developers to blame, as i remember chris, viktor and me bugging you about the beauty and usefulness of a native vector back in milan ;) i'm still very interested in an efficient, native vector datatype.
in regards to what john said, vectors alone should do just fine. a matrix then simply is a vector of vectors, etc. continue this to any dimension you want/need. the overhead of higher-dimensional constructs in this way instead of having associated native higher-dimensional types is typically neglectable.
regarding the implementation, simply providing the datatype along with LENGTH?, iterator loops (e.g. FOREACH, FORSKIP) and direct access functions (e.g. PICK, AT) adapted to work properly on vectors should be fine for a start. developers interested in vector stuff could then simply write their own libraries building on those basic facilities. and i, for one, would be happy to do so :)
but i'm not really aware of what will be open-sourced or easily extensible in R3. if we will be able to add custom types (as C extensions) and modify the aforementioned functions to work with such a custom type, developers should be able to create a "vector extension" all on their own.
Scot
30-Nov-2006 12:15:11 YES to vectors! Beginning to smell like the roots of an associative database!
Andreas Bolka
30-Nov-2006 13:11:21 sure thing, scot. take some vectors, put them together in a nice way and you got a column-oriented database ;) for proper enjoyment, add a few relational operations and top it off with some syntactic sugar. best served cold.
Karol Gozlinski
1-Dec-2006 4:07:37 If future rebol will support matrix datatype as a vector of vectors, vector datatype should be array of any type not only those metioned by Carl.

Rebol stores its values in memory consuming way and vector is compacted binary format for storing values of one datatype.

Rebol3 will also support new binary storage format for rebol values (Resident Storage Method ?). Maybe it could be the same format. And vector datatype will be subtype of binary storage datatype, constrained to store only one datatype.

This one gives option to use vector for storing any datatype (even vector).
Norm
1-Dec-2006 11:53:57 Just to state the question so as even newcommers can grasp where this question taps in a language, the structure aimed at here really is like the argument list of a function, the list alone [firts_arg second_arg ...]. Sometimes it is interesting to know from the start that the list is intended to have a given length - a fixed or variable number of arguments expected, as this is the meaning of the word variable. To do so, in a function, we simply state the expected arguments and optionnaly their datatypes. And logically there is an expected size limitation or a way to state that the size is variable or optional - over and above a stated expected list.
A tuple is a term list. : a directed list of n elements. A matrix is a tuple of tuple, etc. A function is a mapping on the terms of this list. So a tuple is the use of a naked function. Something like that :
A: doesnothing [name1 [string!] name2 [string!] name3 [string!]]
namesList: A ["John" "Peter" "Paul"]
AxA: [A A]
thisArray: AxA ["John" "Peter" "Paul" "Mary" "Eve" "Lina"]

An arguments list could be gathered simply for the commodity of it, stated by a model of the expected data. It could be keeped in a notation that is lighter than a function, as simple as a block is - if possible. Could it be a block of intended order? Would it be visually more interesting to know it from the notation mode of it? That is an answer that the gurus should answer. Actually, the language does not implement the datastructure of a tuple. So the basic of it: A vector or tuple would be a way to state and store a list of terms in order and of respective datatype.
Norm
1-Dec-2006 11:55:15 In Rebol, the distinction of name and value is made by the functions applied to the block. And up to now, the distinction of a block (in essence of discretionary length) and of a tuple is made thru the /skip function refinement applied on blocks []. The distinction with a block is that the operations that applies to the vector/tuple knows the length of the expected data and knows how to interpret missing terms or optional _additionnal_ length. So basically, to me, a tuple is an empty function on whom we apply given functions according to the logic of the expected datatype of the terms. That expectation is very strong in matrix calculations on real numbers.
As for the context of those words (tuple/vector), practically a vector (and so a matrix) is more understood as a list of numbers (usually reals) to do calculations for graphics. The idea of a tuple is more general as it is expected that its elements be of any possible datatype. And it can also be interpreted as a name. A number is not, usually, so interpreted. So the language semantics can make the difference between the expected discrete versus numerical contexts: a vector is a specialised (numerical) tuple.
This way, it both allows work in math (MathLab lookalikes), but also in other contexts like graph theory, predicates (in logic and linguistics), games (in sociology and economics), over and above the usual cartesian products and powers (in graphics, math, meteorology, physics). So the possible contexts are multiples and it is usefull to recognise it, so that Rebol keep its scope (as opposed to scale).
The way Rebol is designed, it is still near of linguistics and discreete math. The roots of the ver 3.0 should keep that in eye, too. Even a casual user of Rebol would acknowledge that tuples and vertors would be a great language enhancement : the notion of expected order of data items could be distinguished from the actual sequence of a block, adding to the language expressiveness at a quite elementary level.
Gregg Irwin
1-Dec-2006 16:31:03 I'll add my vote to Marco's. Vectors may be cool, but we have a number of other unfinished pieces that are more practical, and needed by a much wider audience than I think vectors will benefit at this point.
Cool, yes; someday, yes; a pressing need, no.
Carl Sassenrath
1-Dec-2006 18:03:27 Hi Gregg... trust me, this is what I call a pipelined question.
For many design issues such as this, I multiplex by requesting comments several weeks or months before expected implementation.
I like to work this way. Otherwise, I discover that I don't have sufficient user comments to make important design decisions. (The hash! datatype was just one such example - in the time it took to get enough feedback, the prototype implementation was code-complete.)
See this article for more about my design pipeline: Understanding the design flow of REBOL 3.0
Edoc
5-Dec-2006 14:15:14 [OT] Say... I'd love to see this piplining in REBOL.
E.g., the MapReduce programming model, managed by a REBOL 'mux function.
http://labs.google.com/papers/mapreduce.html

Comments on: Vector Datatype

What is a vector?

Suggested Plan

Comments:

Post a Comment: