Comments on: Hash! or no Hash!

Ok, so I've heard complaints from people about the hash datatype pretty much since it was added.

I think the problem is that hash is widely misunderstood. It's more than just a associative mapping method because it also provides a sequential block access method consistent with the rest of REBOL. I've always thought of it as more powerful than dictionaries or associative arrays found in other languages like Python and Perl. But, maybe I'm wrong.

Now is the time for your input. If you don't want hash, let's get rid of it. Instead we can provide some kind of simple one to one associations and find some easy way to serialize them for input and output.

Your comments are appreciated. Speak up soon.

23 Comments

Comments:

Henrik
2-May-2006 13:13 I think some useful examples (cookbook maybe?) on how to use hash properly would make more people use it. Inspiration is often key.
Anton Rolls
2-May-2006 13:30 I've never used hash! (or list!), but I'm not sure I want to throw them away. The day may come when I need them for what they're good at.
John Niclasen
2-May-2006 13:39 I've not used hash much, only some experimentation. It would be good to have an example perfectly suited for hash, and where the expected performance boost (over block) is shown.
Given such an example will maybe make it easier to judge, whether hash should stay or not.
Ryan Cole
2-May-2006 13:39 :) IMO there is nothing wrong with REBOL's hash! type. Its a great performance tool when you need it. I think the problem really is REBOL's less than optimal assocative array abilities.
In PERL you can update/append and retrieve associative arrays values very easily:
$assarray("apples")=$assarray("apples") + 1; print $assarray("apples");
This used to be ugly in REBOL, basically meaning you better have written your own associative array function to do it. Definately not REBOL's elegant norm. Good news is a recent change lets you come closer there without a special function:
assarray/("apples"): 1 + either fnd: find/skip assarray "apples" 2 [ fnd/2 ] [ repend assarray ["apples" 0] 0 ] print assarray("apples")
PERL whoops REBOL's assarray still.
One problem is REBOL will throw an error when it tries to access a non existing path element, where PERL simply returns ""/0. If REBOL assumed the programmers always right, and returned none, we could then:
assarray/("apples"): 1 + any [assarray/("apples") 0] print assarray("apples")
Adding a defualt return value would make it even more ergo. Essentially we would be par with PERL:
set-default assarray 0 ; set default value assarray/("apples"): 1 + assarray/("apples") print assarray("apples")
Though the above adds an extra line, I think it more than makes up for it in versatility.
--Ryan

Not bad using keyval, but I think what those PERLy's are looking for is this:
assarray: make associative! [initialize 0] assarray/""
Ryan Cole
2-May-2006 13:39 :) Ignore those last few lines, meandering thoughts that were supposed to be deleted.
Brian Hawley
2-May-2006 15:46 If the changes to the ordinals (first, ...) to have them return none on out-of-range were extended to path access, Ryan's problem would be solved.
Brian Hawley
2-May-2006 15:54 I like hashes, particularly since the mold/all syntax for them to be specified as literals was added. I use them for performance and occasionally for type differentiation in data structures (like for XML parsing).
However, it would be nice if they were better documented. At some point they went from hashing only string values to hashing more values, but I have no idea when this happened, for what types hashes are more efficient today, what the insertion overhead is, how they compare in speed to other searching techniques (even on a big O level), how much more memory they take than blocks, etc.
It would be nice to know more about hashes.
James Hague
2-May-2006 16:17 :) In REBOL you have what already *looks* like a dictionary/hash (a block), but you have to specifically turn it into a hash. In Python and PErl, what looks like a dictionary/hash really is one.
How about the first time you look up something in a block by string, a hash table is computed behind the scenes (obviously you have to deal with modification of the block, too).
Ingo
2-May-2006 16:47 I think the major shortcomings of hash! are: 1) missing/hidden documentation 2) people coming from other languages expect to have direct mapping of key / value pairs.
About 2) Would it help to add a possibility to define a skip-value on the hash! ? like: h: make hash! [ a b b c c d] set-skip h 2 h/a ;== b h/b ;== c set-skip h 3 h/c ; -> [c d]
I don't know what to do about duplicate keys, though.
Something else, that might help: auto append, on set access of a missing value. set-skip h 2 h/x: 'y h ; == make hash! [ a b b c c d x y]
Gregg Irwin
2-May-2006 16:50 I use them occasionally, when their performance helps (and boy does it help). I like that they're basically drop-in replacements for blocks. There were/are some inconsistencies with how blocks work, IIRC.
I'm with Ryan, and would guess that's the main complaint, there's no real shortcut syntax for how AAs/Dicts work in other langs.
I'm not sure it's best to emulate how other langs do it. Somewhere here I think I have a dialected DICT function; that's the way I think I would go anyway.
Brian Hawley
2-May-2006 17:10 When I am emulating the feature in other languages I just use hash! as-is and limit myself to the subset of its behavior that other languages support. The hash! type serves as a type difirentiator in those circumstances as well, good for translating from their data structures.
The real trick is that many of these languages use their hashes to implement their object types, making them expandable. REBOL 2 has a simpler, non-expandable object type for that, and REBOL 3 is apparently going to get a more complex object, more powerful object type, but neither is a direct match. I often use hash! for an expandable (and contractable) replacement for object when needed.
Oldes
2-May-2006 17:10 Please, do not remove hash! and list! datatypes. I'm using them as they are much more faster in special cases than pure blocks! Who cares that someone is not using them.
Oldes
2-May-2006 17:10 If someone needs an example, when it's better to use hash!, I can give one: just imagine that you have a library with many books and want to count word frequency, which is stored as ["someword" 3234 ...] The hash! is much more faster then block! in this case. Just try it.
Ladislav Mecir
2-May-2006 23:10 Hashes do more work than is necessary for an associative array: they maintain a relation between index and key, while associative arrays just need to maintain a relation between a key and a value. Therefore I suggested, that associative arrays can be faster than hashes. Re Oldes: your example is exactly the case when an associative array would be both faster and more comfortable. Re Brian: associative array can play a role of expandable and contractable object replacement more comfortably and faster than hashes do. Summary: hashes are better than associative arrays only in situations when you need multiple keys associated with a value, but I doubt anybody uses hashes that way.
Ladislav Mecir
2-May-2006 23:18 Re the hash properties. I think, that INDEX? is O(1) on hashes, while FIND is "almost O(1)". CHANGE/ONLY is O(1) too as well as SKIP. As opposed to these, both INSERT and REMOVE are "slow" (slower than for blocks). Corrections welcome.
Ladislav Mecir
2-May-2006 23:35 sorry, CHANGE/ONLY cannot be O(1), my error.
Brian Hawley
2-May-2006 23:51 Hey, I'm willing to go with an associative array instead of hash! as it is now if it more efficient, particularly since contexts seem to be upgrading to more than a fixed-length associative array. Sure, you'd lose a little functionality, but if you weren't using that functionality anyways and you get more efficiency, alright.
The only current way that I use hashes that would require some changes would when it is used as an index. Maybe there's a better way using an assoc - we'll see.
Thanks for the specs.
Volker
3-May-2006 0:27 From a semantic POV, why hashes, we have blocks. They do the same, and the order herlps while debugging.
From a performanc POV, i develop with blocks and when find gets slow i declare them as hashes easily later.
So keep hashes. Add auto-append. If there is a limited but quick hash too, good. Butg dont drop the old please.
GreG
3-May-2006 11:59 I like Hash, it's simple.
Sunanda
3-May-2006 19:56 Like others say, I use them for performance.
Though that's mainly for high speed associative mapping. I've never thought there was any other reason to use hash.
If hash has, as you hint, other capabilities, then I'd love to hear about them.
But please don't take away the performanec boost they give for large block lookups.
Carl Sassenrath
8-May-2006 20:56 A lot of great comments. Thanks.
PS: I thought I did post a really good detailed document about using HASH! efficiently. Hmm... Let me see if I can find it myself.
Scot Sutherland
25-Sep-2006 18:45:57 I also vote to keep hash!, I love to simply replace a block with a hash and watch the performance jump.
Deepali
11-Jan-2008 4:00:19 I want all the information of hashes mostly about how to remove duplicate keys in hash. I found few pages on internet but i want detailed explanation with examples.

Comments on: Hash! or no Hash!

Comments:

Post a Comment: