Re: [Scheme-reports] Procedural equivalence: the last debate will@ccs.neu.edu 05 Jun 2013 18:00 UTC

The behavior of eq? and eqv? on procedure arguments affects
quite a few optimizations, but three of those optimizations
have been especially important in the history of Lisp, Scheme,
and related languages.  Understanding these three optimizations
would go a long way toward improving our discussion of eq? and
eqv?.  The three optimizations I'm going to discuss are:

    A. implementing eq? by a simple pointer comparison
    B. implementing eqv? by examining the representations
        of procedures
    C. implementing delayed boxing for procedures

Optimization A is explicitly sanctioned by the IEEE standard,
the R4RS, and the R5RS.  Many implementations of Scheme,
including almost all compiled implementations, perform this
optimization.

Optimization B is very nearly explicit in the rationale given
by the IEEE standard, the R4RS, and the R5RS for the behavior
of eqv? on procedures and literals.

Optimization C is based on the eta rule of lambda calculus, and
is often implemented by compilers for languages based on the
lambda calculus.  It's similar to delayed boxing for floating
point numbers, which is probably the most important technique
available for improving the performance of floating point
calculations in Scheme.  Delayed boxing for numbers is the
main reason eq? and eqv? are allowed to behave differently
on numerical arguments.  (I should mention that aggressively
naive delayed boxing is not safe for space efficiency.  This
is more likely to be a problem with delayed boxing for numbers
than for procedures, and those who write optimizing compilers
will probably know of the problem and a possible solution, so
I won't mention this further.)

Although some Scheme compilers have performed optimization C,
most current compilers do not.  One reason this optimization
is implemented so rarely by Scheme compilers is that, beginning
with the R3RS, the behavior of eq? and eqv? on procedures has
been constrained in ways that make optimization C hard to
reconcile with optimization A.  One key constraint is that,
according to the R3RS:

    ...it is guaranteed that eqv? can recognize a procedure
    created at a given time by a given lambda expression as
    "being itself." This is useful for applications in which
    procedures are being used to implement objects with local
    state.

That constraint can be satisfied by combining optimization B
with optimization C.  If eqv? compares two procedures, each
represented as closures, by looking to see whether their code
pointers coincide *and* their environment pointers coincide,
and the delayed boxing of optimization C is done carefully
(see below), then eqv? will always recognize the equality of
every procedure with itself, even if delayed boxing causes
that procedure to be represented by two or more different
pointers.

For that combination of optimizations B and C to satisfy the
constraints of the R3RS and subsequent reports, the delayed
boxing must be done carefully.  If a closure consists of a
code pointer and an environment pointer, then delayed boxing
can arrange for the same environment pointer to be used for
each delayed boxing.  A simpler alternative is to use flat
closures, although that requires eqv? to look at every slot
of the environment part of the closure.

Combining optimizations B and C has been possible under every
Scheme report ever written.  That combination is seldom seen,
however, because the R3RS and subsequent reports required eq?
as well as eqv? to recognize every procedure as "itself".
That was the purpose of the

    (let ((p (lambda (x) x)))
      (eq? p p))                       ==>  #t

example that was introduced by the R3RS and retained by the
R4RS, IEEE, and R5RS standards.

                                * * *

That example was *not* intended to say eq? and eqv? must behave
the same on procedures.  How do I know?  Because Jonathan Rees
and I worked together on this.  Although Jonathan was primary
editor of the R3RS, I was the one who suggested restating the
semantics of eqv? in terms of operational equivalence, and I
wrote most of the prose for that specification.  I was careful
*not* to list procedures as a type on which eq? and eqv? would
be guaranteed to behave the same.  Furthermore, Jonathan and I
added the notes that were relabelled as rationales in the R4RS,
IEEE, and R5RS.  That's how I know those rationales should *not*
be misinterpreted to say eq? and eqv? are required to behave the
same on procedures.

If I recall correctly, the example above was a late addition
to mollify those who regarded eq? (rather than eqv?) as the
primary arbiter of object identity in Scheme.  Those people
didn't much care whether eqv? treated (lambda (x) x) and
(lambda (y) y) as the same procedure, because they didn't
regard eqv? as the primary arbiter of identity; they regarded
eqv? as a half-measure along the way to equal?.

For my part, I regard eq? as a fast but slightly broken version
of eqv? that's guaranteed to behave the same as eqv? on some
kinds of arguments (which were listed explicitly in the
R3RS/R4RS/IEEE/R5RS/R6RS) but not others (only some of which
were listed explicitly).  I don't remember whether the failure
to list procedures among the types on which eq? and eqv? might
not behave the same was deliberate or an oversight, but I
suspect it was deliberate.  The rule of consensus was already
beginning to be interpreted as a requirement for unanimity
among the 17 listed authors, so editors of the R*RS series were
afraid to add clarifications that might spur someone to raise
an objection.

In the R4RS/IEEE/R5RS, that semantics of procedure identity was
formalized by the (conceptual) location tags.  Although Richard
Kelsey was primary editor of the R4RS, I came up with the idea
of explaining the semantics of procedure identity in terms of
these (conceptual) location tags, and I drafted the prose for
the new specifications of eq? and eqv? as well as the paragraph
that added location tags to the specification of lambda
expressions.

That history is why I can say with some authority that the R3RS,
R4RS, IEEE, and R5RS standards do not require eq? and eqv? to
behave the same on procedure arguments.  They are constrained
more tightly than I would like, but they are not constrained to
behave the same.

                                * * *

This has been a long digression, but I believe the digression
was needed to counter a common misconception.  Having disposed
of that misconception, I can now summarize the combinations of
optimizations A, B, and C that are practical under constraints
imposed by the R5RS, the R6RS, and by my proposed semantics for
eq? and eqv?.

        R5RS    R6RS    my proposal

ABC      no      no          yes
AB       yes     no          yes
AC       no      yes         no
BC       yes     yes         yes
A        yes     yes         yes
B        yes     yes         yes
C        no      yes         no
none     yes     yes         yes

The table above represents a judgement call.  A sufficiently
weird implementation of Scheme might implement a combination
I regard as impractical.  Indeed, some combinations I count
as practical are pretty weird.  The R6RS, for example, allows
both eq? and eqv? to return false whenever either argument is
a procedure, and that weird implementation justifies several
"yes" items in the R6RS column.  Implementing eq? by making it
the same as eqv? is a lot less weird, and I assumed eq? is the
same as eqv? for all rows that don't include optimization A.
For rows "C" and "none", I assumed eq? is the same as eqv?,
and that eqv? uses a simple pointer comparison when comparing
procedures.

                                * * *

This message is long enough, and I have meetings to attend.  In
a subsequent message, I will explain why code that uses eq? to
compare procedures usually works, even though it's probably
incorrect in theory.  I will also explain what implementors
have been doing and can do in the future to allow that kind
of incorrect legacy code to work in spite of itself.

Will

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports