REBOL 3.0

Comments on: Complemented bitsets

Carl Sassenrath, CTO
REBOL Technologies
19-Oct-2009 20:34 GMT

Article #0278
Main page || Index || Prior Article [0277] || Next Article [0279] || 6 Comments || Send feedback

The A91 release will include complemented bitsets.

Note that I've written new Bitset Documentation that explains them in much greater detail.

This change makes character classes viable again in parse and find. For example, you can write:

no-space: complement charset " "
parse string [.... some no-space ...]

and also:

find string no-space

Such a line will work for all characters because the bitset is now logically complemented (not physically complemented.) That is, a flag indicates that it is a complement. Its actual bits are not modified.

This method solves the problems of virtual charsets (you can check bits that are even outside of the range of the physical bitset) but it creates a few new challenges...

For example, if I type the line above:

>> complement charset " "
== make bitset! [not bits #{0000000080}]

As you can see, the bitset! constructor is now a block. The not indicates the inverted state, and the bits indicates that the binary! is the actual bitmap, not an array of bytes. Also, note that the bitmap itself is not inverted.

Of course, you can also write this directly:

make bitset! [not " "]
make bitset! [not "abc" "123"]

The not applies to the entire bitset.

There are a few other issues to be resolved. For example, if you append to a complemented bitset, you are adding the inverted bit(s). For example:

>> b: complement charset " "
== make bitset! [not bits #{0000000080}]

>> append b "."
== make bitset! [not bits #{000000008002}]

I think we also need some method of determining if a bitset is complemented. The obvious method would be something like:

negative? bitset

although, it is stretching the definition a bit.

And finally, we must consider what happens when we and, or, and xor such bitsets.

6 Comments

Comments:

-pekr-
20-Oct-2009 0:01:48
On R3 Chat, you also mentioned potential problem with MOLDing of bitsets? Does it still count? If so, maybe blog article could mention it too for others to take it into account, while thinking about replies ...
endo
20-Oct-2009 11:22:04
complemented? is better than negative? imo.
Brian Hawley
20-Oct-2009 17:38:04
Pekr, the problem with molding has been resolved with this notation:
>> complement charset " "
== make bitset! [not bits #{0000000080}]

As for AND, OR and XOR, is there some simple algorithm for determining which storage method (regular or complemented) would be smaller? Something like comparing lengths. If so, we could generate the smallest possible result between the two choices.

If not, the operations are commutative so we could just say that the complemented state of the left bitset would be adopted in the result, and let the programmer figure out which expression they want to be on the left.

Maxim Olivier-Adlhoch
22-Oct-2009 23:19:33
would an extended definition of bitsets be too much of a performance hit?

make bitset! [bits #{00020} not bits #{0000000080} ]

it could make for tremendous space saving...

all the AND XOR OR functions would need to know is if they need to add the bits to the first or second bit pattern and ANY pattern can be thus expressed AFAIK.

the down side is speed loss within the natives which compare the bitsets however they would be merged (two checks or dynamic not xor within a persistent bitset inside the interpreter).

there could also be reduce added to this so that both are baked into one bitset returning the smallest possible bitset. people can thus care about speed or ram, however they need it.

meijeru
10-Nov-2009 11:25:37
I posted a ticket on Curecode (#1328) concerning to-binary, which appears to ignore the fact of complementing. I can see the advantages, in terms of efficiency, of keeping the complemented status separately stored, but as the APPEND and TO-BINARY examples show, the user may get greatly confused.
meijeru
16-Nov-2009 14:27:17
I have a follow-up question: how can I programmatically find out that a bitset has been specified with NOT? If I can't, I will have difficulty working with the data. Try the following code:

>> equal? charset [" "] charset [not " "] == true ;; counter-intuitive!

Post a Comment:

You can post a comment here. Keep it on-topic.

Name:

Blog id:

R3-0278


Comment:


 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote
 
 

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

REBOL 3.0
Updated 19-Aug-2017 - Edit - Copyright REBOL Technologies - REBOL.net