[Scheme-reports] Procedural equivalence: the last debate John Cowan (02 Jun 2013 04:04 UTC)
|
Re: [Scheme-reports] Procedural equivalence: the last debate
Peter Bex
(02 Jun 2013 10:49 UTC)
|
Re: [Scheme-reports] Procedural equivalence: the last debate
Alex Shinn
(02 Jun 2013 12:32 UTC)
|
Re: [Scheme-reports] Procedural equivalence: the last debate
John Cowan
(02 Jun 2013 13:56 UTC)
|
Re: [Scheme-reports] Procedural equivalence: the last debate
Alex Shinn
(03 Jun 2013 03:22 UTC)
|
Re: [Scheme-reports] Procedural equivalence: the last debate
Ray Dillinger
(03 Jun 2013 17:04 UTC)
|
== Introduction == This is the kickoff message for a discussion of a proposal put forward during the voting period on the ninth draft of R7RS-small. The proposal is to reverse one of the changes to R5RS, namely the rules for `eqv?` and `eq?` when applied to procedures. In the ninth draft, as in R6RS, `eqv?` is required to return `#f` if its arguments "are procedures that would behave differently (return different values or have different side effects) for some arguments" (p. 30). However, in all other cases where both arguments are procedures, `eqv?` may return either `#t` or `#f`. In R5RS, to which the proposal would return, the rule for returning `#f` is the same as the one above. However, there is an additional rule: if both arguments to `eqv?` are procedures with the same location tag, then `eqv?` must return `#t`. A location tag is defined in section 4.1.4 of R5RS thus: "Each procedure created as the result of evaluating a lambda expression is (conceptually) tagged with a storage location." In particular, `(let ((p (lambda (x) x))) (eqv? p p))` must evaluate to `#t` by R5RS rules, but can evaluate to either `#t` or `#f` by R6RS rules. In the ninth draft, `eq?` is allowed to differ from `eqv?` when both arguments are procedures. The proposal is to withdraw this permission. == Previous reports == Here is an account of the definition of `eqv?` and `eq?` in all previous Scheme reports and standards: R0RS and R1RS are silent, as all first-order procedures come from MacLisp. In R2RS (p. 24), `eqv?` is defined on booleans, symbols, pairs, and vectors (presumably strings are meant too, but they are not mentioned) as well as exact numbers. Under `eq?` it says that characters can be compared with `eqv?`. Nothing is said about procedures. In R3RS (p. 13), `eqv?` is defined in terms of an approximation to a notion called "operational equivalence", which is in turn defined on booleans, symbols, numbers, characters, pairs, vectors, strings, the empty list, and procedures. For all non-procedure types, `eqv?` implements o.e. However, `eqv?` is allowed to return #f even on o.e. procedures that were created at different times (sc. by different invocations of `lambda`). In IEEE (p. 23, or p. 31 of the PDF), in R4RS (p. 13), and in R5RS (p. 17), `eqv?` is defined on the same types, both positively and negatively for the first time, discarding the explicit notion of o.e. These versions also talk about locations rather than mutation for pairs, etc., and this language is extended to procedures: if they have the same (virtual) locations, they are `eqv?`. On the other hand, if they behave differently, they are not `eqv?`. In R6RS (p. 37), `eqv?` defers to `boolean=?` and to `symbol=?`, and the rules for pairs etc. are extended to bytevectors, hashtables, records, and ports. (Record-type descriptors are also specified, deferring to library section 6.3.) Procedures that behave differently must return #f, but in no case is procedure equivalent required to return #t. In all reports, `eq?` is the same as `eqv?` except on numbers and characters. Although an example showing that `(eq? p p)` returns #t is present from R4RS onwards, nothing is said about procedures in the prose. == Advantages of the R6RS semantics == In essence, in R6RS procedures do not have identity: certain pairs of procedures can be distinguished, but it is never the case that two references to a procedure can be reliably be said to be the same procedure. This means that an implementation is free to coalesce two evaluations of lambda if they are in effect the same, as well as to break a single evaluation into two actions. For example, it is possible for an implementation to rewrite calls on `car`, which is stateless, into calls on `%internal-car`, which the compiler knows how to inline. All references to `car` in operand position are then rewritten as `(lambda (pair) (%internal-car pair))`. Consequently, `(eqv? car car)` will naturally return `#f`, since two lambdas are now involved where in the source there was only one. On the assumption that calls are far more frequent than operand references, this is a win. How much of a win it is, is a question. == Advantages of the R5RS semantics == In Scheme, procedures have always had identity, and a large body of code stretching from the original Lambda: the Ultimate papers up to the present day depends on it. Unfortunately, breaking the first-class status of procedures as objects causes a number of techniques previously possible to become unreliable. (The following list was prepared by Alex Shinn.) - Using procedures as lightweight objects or actors - SRFI-17 (generalized set! needs to lookup procedure setter) - TinyCLOS generics (needs to understand method identity) - Elisp `add-hook' equivalents (needs to avoid duplicates) - Fast paths such as SRFI-1 lset-* optimized for eq?, or SRFI-95 sort optimized for <. See https://wiki.call-cc.org/eggref/4/special-case for the general case of special cases. - Caching with procedures as (part of) the key. Given a large enough cost and high enough cache hit ratio, caching is an arbitrarily large speedup. - Memoization. The (chibi parse) parser combinator library memoizes combinators to achieve packrat parser-like performance. The test case at https://code.google.com/p/chibi-scheme/source/browse/tests/parse-tests.scm#121 runs in exponential time with memoization disabled, and linear time otherwise. In short, Scheme without procedure identity is a different language altogether. == The uncertainty of the R6RS semantics == While optimizations of the kind that R6RS permits are important to encourage programmers to use higher-order abstractions and avoid manual inlining. However, it is not clear how often these optimizations are applicable, and how much speedup they actually provide, in typical Scheme code. Furthermore, the programmer will have little intuition about when the optimization is actually a pessimization due to code duplication and overflowing the instruction cache or for other reasons. In addition, some of the optimization can be provided using a cache of lambdas; escape analysis is also useful in improving, if not fully optimizing, such invocations. == The detailed proposal == The proposed language is *exactly* that of IEEE, R4RS, and R5RS, with the following exceptions already in place in the ninth draft: 1) Symbol equivalence defers to the new (R6RS) `symbol=?` predicate; no operational difference. 2) To equivalence of pairs, strings, and vectors is added bytevectors and records on exactly the same terms. 3) The R6RS-based rumble-bumble about inexact numbers that ensures that `eqv?` is undefined when both arguments are NaN, and that `(eqv? 0.0 -0.0)` evaluates to `#f` in systems that distinguish them otherwise. Comments are solicited. -- Long-short-short, long-short-short / Dactyls in dimeter, John Cowan Verse form with choriambs / (Masculine rhyme): cowan@ccil.org One sentence (two stanzas) / Hexasyllabically Challenges poets who / Don't have the time. --robison who's at texas dot net _______________________________________________ Scheme-reports mailing list Scheme-reports@scheme-reports.org http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports