[Scheme-reports] eq? considered harmful. Ray Dillinger 13 Aug 2012 19:01 UTC

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 08/12/2012 09:03 AM, Mark H Weaver wrote:
> On 08/11/2012 11:43 PM, Ray Dillinger wrote:
>> Conversely, I expect, if two variables are eqv? but not eq?,
>> that mutation of one will not have a side effect of mutating the
>> other, thus the two variables will not remain eqv?
>
> I believe this is incorrect.  If two mutable values are 'eqv?',
> then they must denote the same locations.  Can you provide an
> example of a Scheme standard or major implementation that allows an
> exception this rule?

On review of the standards, you are correct.  I incorrectly
wrote eqv? when I was thinking of the definition which in
reality attaches to equal?  Apologies, this was my mistake.

This mistake was born of an expectation that the language
in the standard should make sense, when in fact it does not.

At the very least, the "more discriminating" and "finer
distinctions" misstatements in the standard should be
corrected, or moved to eqv? where they belong. eq?s
behavior is strictly less specified than eqv?s behavior.
That means it is LESS discriminating, not more
discriminating.

I suggest that the description of eq? should be modified
to say the following:

    eq? is an equivalence predicate guaranteed to have
    the same behavior as eqv? on symbols, booleans, the
    empty list, pairs, procedures, and non-empty strings
    and vectors.  It is guaranteed never to return #t
    unless eqv? would also return #t. On numbers,
    characters, empty vectors, and empty strings, it
    may return #t or #f when eqv? would return #f, but
    no meaningful distinction is indicated by any #f
    response where eqv? would return #t.  If there
    were any meaningful distinction, then the values
    would not be "operationally equivalent" and
    therefore eqv? would not return #t.

The fact that eq?s underspecification allows it to return
false negatives (as compared to eqv?) rather than false
positives should not be misrepresented as allowing it to
make "finer" distinctions or being "more discriminating"
than eqv? -- no such supposed "distinctions" are specified,
nor, given the notion of "operational equivalence" defined
by eqv?, are they useful for any algorithmic purpose.

In the Shannon sense, such "distinctions" are noise, not
information.  They should not be presented with wording such
as "more discriminating" and "finer distinctions" which
mislead people (which at least did mislead me) into
believing that they are information.  In believing that
eq? was the most discriminating equivalence predicate
I therefore had the wrong idea of what eqv? meant.

eq?, in fact, detects *LESS* useful information than
eqv?, and eqv?, not eq?, is the most discriminating
equivalence predicate and capable of making the finest
distinctions in the language.

In fact, every instance of reliance on such a false
"distinction" (one made by eq? but not eqv?) is necessarily
either a bug, or implementation-dependent code. This reduces
eq? to the status of an implementation-dependent performance
hack rather than a properly specified language feature.

Eq? is nothing but a performance hack.  It is a test for
operational equivalence, exactly the same as eqv? except
that its behavior is specified in fewer cases.

This is justified by the fact that it can be implemented
to run faster than eqv? -- but in all the cases where the
behavior of eq? is specified, eqv? needs to do no more work
than eq? as the return value is fully determined by the
comparison of two immediate bitpatterns.  This reduces the
speed distinction between them, in the cases where eq? might
have been correctly used, to the time required by a typetag
comparison and nonbranching execution of a conditional branch
instruction. This advantage is negligible.

Further, in implementations where even the most rudimentary
type analysis is done, even that speed distinction is lost
in most cases because the typetag comparison and conditional
branch instruction can be optimized away from most uses of
eqv? where eq? would have sufficed.

It is my opinion that in the long run eq? should be
deprecated from the standard.  It isn't worth the space
in the standard required to explain it, nor is it usefully
distinct from eqv? in writing correct and portable code.
As a performance hack, the benefit it provides is negligible
ranging toward none, and its availability promotes confusion
about which equivalence predicate to use, causing a class of
bugs in which eqv? or eq? are used incorrectly.

I invite people to make the following test.  The following raises
an exception whenever eq? is used in a context where its return
value is unspecified.  Type in

(define (eq? arg1 arg2)

   (cond ((not (eqv? arg1 arg2)) #f)
         ((or (symbol? arg1) (pair? arg1)
              (boolean? arg1) (procedure? arg1)) #t)
         ((and (vector? arg1) (> (vector-length arg1) 0)) #t)
         ((and (string? arg1) (> (string-length arg1) 0)) #t)
         (else (raise "eq? used incorrectly")))

Then run your code and see how many bugs you find.

				Bear

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJQKU8YAAoJEAOzWkqOibfNJroH/3/D9yVS64tDSgZRTEph7HT4
SxIcKIbtrMdgmkxs+t3DMtW08X4f0VqgIcMjQiEgeYGbkB9daTh4h+dY/l3fRpLU
wB+rES54efLUo+MC0mb5wj8n6JzXqQB/+hWQPxqlIf6cf8Zz8A+QJ+RJxNXnepV/
Fda0TPxBOF6nt37WtCS+nCs//23zj6P3Nk/QJdKKL6gevSMQ0dLAlxlo7dxb8LRs
DDt96lR2jpSu7feeCyL6M+5XYtE9rgZAf7Xq5polSSSixDtHDvzHjnkph7XV4mLZ
riFuHXlekg2LKHfWZ+RGVxu2vCwOtiWArwKFj5mKH8oODXvXOSDuvIkAQuhOi6U=
=REfA
-----END PGP SIGNATURE-----

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports