Comments on: Numeric almost-equal, equal, strict-equal, and... ?

We need to have a little discussion about numeric equality. This could get really long, so... let me boil it down.

We have these comparison functions for numbers:

almost equal (semantically equal)
exactly equal (bits are the same)
strictly equal (same datatype too)

And, we have these operators (ignoring the inverted forms):

same?
equal?
strictly-equal?

Now, we have to match these up, if possible. (This is a hint that perhaps we don't have all necessary operators, but let's see...)

What we need is a table that we can look at, analyze, discuss, and then agree on as the R3 standard.

On the surface, this may seem simple... but I think not. There are three dimensions. (Remember this applies across several numeric datatypes.)

To help get you thinking about it, we could begin with:

>> 0.0 = -0.0
== true
>> 0.0 == -0.0
== true
>> same? 0.0 -0.0
== false

And, then we have:

>> 0 = 0.0
== true
>> 0 == 0.0
== false
>> same? 0 0.0
== true

So... it begins.

Can you imagine the variations? We need a volunteer!

27 Comments

Comments:

Ladislav
22-Jun-2009 18:33:53 >> same? 0.0 0 == false
so, no symmetry. Similarly for EQUAL?
Brian Hawley
22-Jun-2009 18:50:54 I would have both same? 0.0 0 and same? 0 0.0 be false. SAME? should be strictly more precise than STRICT-EQUAL? across the board.
Ladislav
22-Jun-2009 18:54:34 Comparison hierarchy: currently the "bottom of the hierarchy", the coarsest comparison function seems to be the EQUAL? function, which we can keep as-is. That function compares spelling of words.
A problem is, what should be the next function above it. It should not be the STRICT-EQUAL? function, since a function comparing spelling + binding should be weaker than that, allowing different types to be equivalent in this respect.
The STRICT-EQUAL? function should be the next one. The relation between STRICT-EQUAL? and SAME? is not hierarchical either, since SAME? does not compare datatypes, while STRICT-EQUAL? does not compare binding of words.
Ladislav
22-Jun-2009 18:58:53 Another question: what is the public opinion on "comparing anything to anything" - i.e. allowing any-type! for both arguments?
Brian Hawley
22-Jun-2009 19:26:22 Perhaps SAME? should also compare datatypes - the type flags are part of the bits...
As for any-type! comparisons, it is interesting that the op! versions of these functions can compare unset! and error! values on the left-hand side of the operator, despite what the specs say. There is code in R3 that depends on being able to compare unset! in this way.
Ladislav
22-Jun-2009 19:58:29 In the case of numbers I see "more space" between EQUAL? and STRICT-EQUAL? than above STRICT-EQUAL?. I think, that we don't have a function, that would be finer than EQUAL?, transitive, yet it would ignore datatype differences, classifying 0 and 0.0 as equivalent, while classifying 0.1 + 0.1 + 0.1 and 0.3 as not equivalent.
Brian Hawley
22-Jun-2009 20:32:19 Keep in mind that a non-symmetric, non-transitive EQUAL? should probably be considered an error. EQUAL? 0 0.0 and EQUAL? 0.0 0 should both be true.
Ladislav, at what point should 0.1 + 0.1 + 0.1 and 0.3 be EQUAL?, but equivalent using your proposed second-level numeric equivalency? Please don't say at system/options/decimal-digits - the existence of that option is still an error that we need to fix, and replace with an at-call MOLD option.
Ladislav
23-Jun-2009 5:17:15 Brian: yes to symmetry, no to transitivity. No approximate equality can be transitive, so if 0.1 + 0.1 + 0.1 is approximately equal to 0.3 (IEEE754), then this approximate equality is necessarily non-transitive.
Ladislav
23-Jun-2009 5:27:54 Answering the question comparing EQUAL? and STRICT-EQUAL?:
EQUAL? tests for approximate equality, meaning, that it isn't and cannot be transitive, and it does not compare datatypes.
STRICT-EQUAL? is non-transitive too:
>> strict-equal? 23/jun/2009/10:00 23/jun/2009 == true
>> strict-equal? 23/jun/2009 23/jun/2009/11:00 == true
>> strict-equal? 23/jun/2009/10:00 23/jun/2009/11:00 == false
, but it does not look right for me (is tolerable for EQUAL?, though, since EQUAL? is supposed to be non-transitive, as I mentioned)
Otherwise, STRICT-EQUAL? is currently transitive, but, as opposed to EQUAL? it tests for datatype equality. Therefore, I see a space for a transitive comparison ignoring datatype differences.
Ladislav
23-Jun-2009 5:40:50 Summary. If we want a linear hierarchy of comparison functions, then we can have a four-level linear hierarchy as follows:
*the bottom level: symmetric, non-transitive (approximate equality), ignores datatype, binding (words), alias distinctions and character case differences
*the second level: symmetric, transitive, ignores datatype differences, alias distinctions, character case differences, but takes into account binding
*the third level: as the second one plus take into account the datatype and case of characters
*the top level: bit by bit equivalence, sameness (needed especially for mutable datatypes)
Brian Hawley
23-Jun-2009 15:09:25 That summary sounds good to me, particularly if the last one includes datatype sameness too.
Note that the transitive comparisons (the last 3) should consider date/times without time zones to not be equivalent to ones with time zones.
Ladislav
23-Jun-2009 17:11:42 Right, the top level should be the finest, comparing everything, including datatypes. If we decide to go this way, one of the biggest problems is: what shall be the name the fourth function?
Brian Hawley
23-Jun-2009 18:17:59 Possible names for your 4 levels, in order specified: SIMILAR?, EQUAL?, STRICT-EQUAL?, SAME?. Ops for same: ~=, =, ==, =?.
Maxim Olivier-Adlhoch
23-Jun-2009 20:50:05 is it just me or is there another level... missing... knowing if two words point to the same reference.
This is fast and useful.
or is there already a word for that?
Maxim Olivier-Adlhoch
23-Jun-2009 20:57:49 other equality words of different degrees of meaning (in no particular order):
coequal? duplicate? equivalent? identical? alike? symmetrical? correlates? reciprocal? synonymous?
Maxim Olivier-Adlhoch
23-Jun-2009 21:04:38 btw, wrt above comment about "two words point to the same reference. "
The unobvious implied meaning of my comment is:
that even if they have different offsets, they still are referring to the same mutable data, thus would return true even for currently different evaluated values.
hope what I write makes sense ':- /
Brian Hawley
24-Jun-2009 2:01:36 Maxim:
is it just me or is there another level... missing... knowing if two words point to the same reference.

That is the second level of Ladislav's proposed 4.
Your other suggestion sounds like
same? head a head b

Do you think we need a function for that?
EricB
25-Jun-2009 9:51:56 (at)Brian Hawley/23-Jun-2009 18:17:59:
Could there perhaps, also be some value in simplifying to completely non-hyphenated terms? eg:
SIMILAR, EQUIV, EQUAL, SAME
Of course, this would require re-designation of all STRICT-EQUAL references; but if this is worthwhile, now is the time to implement it.
Brian Hawley
25-Jun-2009 14:23:25 EricB, I see your point and agree, but since we would have to keep the old name for backwards compatibility there's not much justification for the renaming. Most people use the operators anyways, and we still need to use the hyphenated NOT- for the opposites due to English's differences in naming antonyms.
I'm a little worried about changing these functions, especially subtly. Subtle changes can be tricky to track the effects of - blatant changes are much easier. The potential problem is comes from changing the name of the first function, or the behavior of the first into the second - however you prefer to look at it.
The problem is that R2's EQUAL? wasn't very "equal", it was only similar. But if we want to add a function that is more equal, perhaps even what EQUAL? should have been in the first place (Ladislav's second level), then what do we call it, and what operator do we give it? STRICT-EQUAL? is taken, MORE-EQUAL? is silly.
Going through Maxim's suggestions, let's first eliminate "symmetrical", "correlates" and "synonymous" - they are all likely to be misspelled. The word "coequal" seems redundant (and is so in English); "reciprocal" is a bit obscure; "duplicate" has no obvious place in the hierarchy so it would be hard to remember where it falls; "identical" is too strong - to the level of SAME? or greater - and we don't need a new word there. Now, "alike" is good for something that would go before EQUAL?, as is "similar"; "equivalent" would fit in just after "equal" - the words mean exactly the same thing, but people tend to think that "equivalent" is stronger because it is a longer word - but we shouldn't abbreviate it to "equiv" if we want the psychological effect.
That gives us two options so far to name Ladislav's four levels:

ALIKE? or SIMILAR?, EQUAL?, STRICT-EQUAL?, SAME?. Opposites: NOT-ALIKE? or NOT-SIMILAR?, NOT-EQUAL?, STRICT-NOT-EQUAL?, NOT SAME?

EQUAL?, EQUIVALENT?, STRICT-EQUAL?, SAME?. Opposites: NOT-EQUAL?, NOT-EQUIVALENT?, STRICT-NOT-EQUAL?, NOT SAME?

Both have their advantages and disadvantages. For the first, the main advantage is that we can keep the operators for the current function names and still have an obvious operator for the first level: ~=. It also makes more sense semantically. The disadvantages of the first option is that it changes the meaning of EQUAL? - for the better, but still a change to a commonly used function in the way that it is commonly used - and that the first-level operator would include a shifted character that isn't in the same place on different keyboards, which could slow down programming in REBOL.
For the second option, the advantage is that it would be minimally disruptive semantically. The disadvantage is that there are no obvious operators for EQUIVALENT? or NOT-EQUIVALENT?. ~= is less strong, mathematically, than =, so we shouldn't use it. We can't use == or =?, since assigning those operators to new actions would have greater impact than renaming the actions, since the operators are more commonly used.
What operator, using word! characters, fits between = and == ? If we can't figure this out, we may have to go without an operator for EQUIVALENT?.
-pekr-
25-Jun-2009 16:50:23 'similar is much better word than 'alike imo, at least as far as my understanding of English goes :-)
I like group of names 1) better, but would not mind even going with 2). Is 'equal? really used so much, that we should worry about the change?
Brian Hawley
25-Jun-2009 17:36:31 Pekr, EQUAL? is used all of the time, though usually in the = operator form.
Current behavior:
>> 0.3 = (0.1 + 0.1 + 0.1) == true >> 'a = use [a] ['a] == true

Effects of the first naming option:
>> 0.3 ~= (0.1 + 0.1 + 0.1) == true >> 0.3 = (0.1 + 0.1 + 0.1) == false >> 'a ~= use [a] ['a] == true >> 'a = use [a] ['a] == false

I prefer the first naming option as well, if we were doing this from scratch. But I am not a newbie, so these distinctions matter to me. To newbies the behavior of = might be confusing. Sometimes I have to remember that I am not necessarily the target market.
The main problem I have with my second naming option is coming up with operators for EQUIVALENT? and NOT-EQUIVALENT?. Anybody have suggestions? We can go without operators if need be...
-pekr-
26-Jun-2009 0:45:32 hmm, I think that:
0.3 = (0.1 + 0.1 + 0.1)
being false, will cause lot's of huh? among the ppl. I don't like it being false - I would switch the meaning of operators then ...
Brian Hawley
26-Jun-2009 3:08:01 Pekr, the problem is that in memory 0.3 is actually not equal to 0.1 + 0.1 + 0.1, because IEEE754 floating point numbers can't represent 0.1 or 0.3 exactly, so there are rounding errors in the encoding after the 15th digit. So EQUAL? in R2 fakes it by not considering the whole number in memory (17 digits), and instead compares 15 digits of the number. EQUAL? in R2 is really SIMILAR?, or approximately equal (~=).
Yes, that causes a lot of huh's, and it does so throughout the industry. It's not a REBOL problem though, it's a floating-point problem. REBOL's solution of saying that something is EQUAL? when it is really SIMILAR? is a common solution - though not good enough for precise math.
This is why I favor my second naming option: EQUAL? is for newbies, and for those who don't know about the lack of precision, or more importantly don't need to know or care.
Still waiting for suggestions for an operator between = and ==.
Anton Rolls
26-Jun-2009 8:50:27 Nice discussion. I like BrianH's #1 option for the four levels: ALIKE? or SIMILAR?, EQUAL?, STRICT-EQUAL?, SAME?.
Ladislav
30-Jun-2009 7:42:46 I think, that Brian's #2 is:
1) more understandable for newcomers 2) backwards compatible with R2 at the bottom level
Carl Sassenrath
30-Jun-2009 13:52 Thanks to everyone here for their inputs on this topic. I'm going to review all the comments carefully and make a decision.
I've started work on a "defining test" -- similar to what was done with finalizing the percent! datatype. I've sent it to Ladislav for his additions, and once we have a near final draft, I'll post it to a new blog for your review.
Carl Sassenrath
8-Jul-2009 14:52:03 Posting a new blog on this topic.

Comments on: Numeric almost-equal, equal, strict-equal, and... ?

Comments:

Post a Comment: