Re: [Scheme-reports] Mutable Pairs Eli Barzilay (14 Jun 2010 05:57 UTC)
Re: [Scheme-reports] Mutable Pairs Brian Harvey (14 Jun 2010 06:32 UTC)
Re: [Scheme-reports] Mutable Pairs Eli Barzilay (14 Jun 2010 08:16 UTC)

Re: [Scheme-reports] Mutable Pairs Eli Barzilay 14 Jun 2010 05:56 UTC

[Replying on `scheme-reports' since that's what I was told is the
preferred way.  I am not affiliated with R7RS in any way.]

On Jun 13, Brian Harvey wrote:
> > From: DekuDekuplex@Yahoo.com (Benjamin L. Russell)
> > Wow, you're really in favor of giving PLT the proverbial kick in
> > the rear-end.
>
> That is not how I would characterize the situation.  PLT have
> chosen, as they say explicitly, to design a language that is based
> on Scheme but is not Scheme.

Yes, Racket is based on Scheme, but it is not (R5RS) Scheme, because
it provides (much) more bindings, and because it accepts square
brackets, and because it's case-sensitive, and because it has the
usual list/pair bindings use immutable pairs, and mutable ones have
different names.  Yet, Racket is close enough -- and rich enough --
that it provides an R5RS language, strictly R5RS to the point that
very little (if any) other implementations provide.

> > My preference is to opt for the middle ground, and not require
> > immutable pairs, but not require mutable pairs, either. ... The
> > problem with doing this in Scheme is that it breaks
> > backwards-compatibility with existing implementations.  However,
> > immutable pairs does not seem to be a bad goal for the future.
> >
> > It seems that immutable pairs should be a gradually introduced
> > goal for the future...  Scheme programmers should be encouraged to
> > program in a functional style, as in such languages as Haskell and
> > Clojure.  Over time, as such languages that encourage programmers
> > to program in a functional style as Haskell and Clojure catch on,
> > more programmers should adopt the functional style, and more
> > Scheme textbooks should encourage the use of immutable pairs.
> > Eventually, immutable pairs could become the default, with mutable
> > pairs relegated to a separate library.
>
> NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO!
> NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO!

[Avoiding an obvious childish reply.]

> I can't believe I'm reading this.
>
> Yes, functional programming is great.  I teach functional
> programming.  I urge my students to program functionally whenever
> they possibly can.

*sigh*

Just *how* many more decades should students be "urged"?  How many
more buffer-overrun-like diseases should the world of programming
suffer for this kind of thinking to finally go the way of the GOTO?
Mutable pairs as a default fundamental data structure is actually
worse than GOTOs: those make your code a mess, which (in theory)
affects only people who maintain that code.  But having mutable values
is something that is damaging also to people who use your code at the
interface level.

The problem is demonstrated by Matthew's post, the one that Benjamin
mentioned, but in a later post he (Benjamin) made the mistake of
saying:

| [...] allowing proofs of correctness for Scheme programs to be
| written much more easily [...]

and following this line, you reply:

> Over fifty years ago, John McCarthy brilliantly invented a
> programming language loosely based on lambda calculus, but designed
> to be used, not just to be reasoned about [...]

and all of this is as if immutable pairs are something that fills a
kind of a theoretical/masochistic need for self punishment in the form
of correctness proofs.  Something that *real* programmers never have
to deal with, only "theoreticians of bondage and discipline
languages".  That couldn't be more *wrong*.  On the side of proofs of
correctness, there is nothing interesting, really.  The bottom line is
that this:

  (define (map f l)
    (define (loop l)
      (if (null? l) '() (cons (f (car l)) (loop (cdr l)))))
    (if (list? l)
      (loop l)
      (error "poof!")))

can be proved as a correct implementation of map *assuming* no changes
to the list ever occur in the call to `f', or more generally during
the dynamic extent of the call to `map'.  If you're one of these
theoretical people, there's not much else to see here.

The problem is on the side of the *working* programmer -- the one who
lives in the *real* world.  These people will often make the same
assumption (and as Matthew points out, they almost always do), so they
write the above definition and then they're done with it, or rather
they *think* that they're done.  But the problem starts with the fact
that Scheme is a language with first-class values -- and `map' happens
to be a function that uses that: it is a higher order function.  This
means that you're passing a mutable value to random code -- and this
random code can now break invariants that your own code (`map', in
this case) depends on unless you're extra careful.  As Matthew writes,
given only mutable pairs, the only way to guarantee that your own
pair-related invariant (such as `list?'ness) is not broken is to copy
it:

  (define (map f l)
    (define (loop f l)
      (if (null? l) '() (cons (f (car l)) (loop f (cdr l)))))
    (if (list? l)
      (loop f (loop values l))
      (error "poof!")))

In almost two decades of reading and writing Scheme code, I have not
seen this done even once.  I can think of several reasons for this:
first, as Matthew noted in that blog post, copying means that you're
changing the semantics of code, so you can't just always do it and
expect no problems; but that's probably too obscure for most people
who are not implementors to care about.  There's also the reason of
speed: this code is roughly twice slower than the above version, since
it generates a junk list -- and real programmers just know that this
is insane.  (At this point there are some people that will almost
instinctively chant "given a sufficiently smart compiler" -- but
that's obviously wrong for practical compilers, since optimizing the
copy away requires whole-program analysis, and most of us real-world
programmers write (and compile) code in little bits that are linked
together later.)

No, I don't think that these arguments prevent people from writing
such code.  What does that job is the assumption I mentioned above:
that the structure of `l' won't change during the `map' call.  After
all, only nice people use Scheme, right?  It's the dreaded
"unspecified result" factor: it's fine if this means some
#<unspecified> value, but it's very wrong if it means a segfault, an
OS crash, or an erroneous missile launch.  I hope that it's obvious
that this is the same kind of thinking that made buffer overruns the
nightmare they are.  To put this differently: Schemes (and Lispers)
have been dreaming for a really long time about Scheme/Lisp operating
systems and other large scale systems.  Say that I write software for
*your* bank, and I do that in Scheme -- it's a wonderful system that
is immediately successful, and I open it up for 3rd-party plugins.
Would you be happy to know that my API passes on your bank account as
a structure made of mutable pairs to these plugins?  I certainly
wouldn't want that, especially not as the implementor of the bank's
code who will later be liable for the damage it will lead to.

The bottom line is that mutation is harder than it first looks.  Bugs
involving state will almost always be found in some code that is
syntactically unrelated to the buggy code.  I've spent too many nights
tracking such bugs to just let myself go with the "assuming no bad
mutation" flow and hope for the best.  I've seen enough such problems
to not let my students get away with "you're urged to use functional
code *when* it makes sense".  For most programmers, and certainly for
students, it "makes sense" far too often than it should.

> [...] /not/ to dictate to the programmer about how to program. [...]

But all of that is not to say that mutation should be abolished -- I
do consider myself a practical hacker -- I completely agree with the
need to let the user (which is me) use mutation when it makes sense to
do so.  But *why* lump mutable data together with something as
fundamental as pairs?  Where's the "jewel-like"ness of that?  If
anything, this lumping *is* -- IMO -- an "attack on the core nature of
Scheme" even if it happened right at the start; a bad legacy inherited
from Lisp.  After all, Scheme did separate pairs from functions,
unlike its parent language -- and the result is a better language.
IMO, features that can be separated into N sub-features should always
be separated -- so if location-based mutation doesn't have to be
bundled with pairs, then it shouldn't.  My original encounter with
boxes in PLT Scheme was "why?".  After all, you can do exactly the
same with pairs...  It took me a while to realize that the important
feature of boxes is that they make the mutable state explicit, and
therefore more manageable.  Sure, you can still have those kinds of
nightmarish bugs where corrupt data flows around -- but tracking it
becomes so much easier when you know of a particular kind of data that
is intended to hold such state.  Furthermore, when the default is
immutable data and mutable data requires an explicit container like
boxes (or an explicit flag in your type definitions), then --
obviously -- programmers will use these explicit devices only when
they *need* to have mutable state.

This is going to be an anecdote, but I think it's an important one: I
was originally skeptical about going with immutable-by-default pairs
in PLT; and from my perspective, the important aspect of the
experiment was to see how I manage with this on my usual daily hacking
-- the "efforts" of porting old code are much less important IMO (and
turns out that they were all very small).  In retrospect, my personal
conclusions after using this for two years are obvious: I don't miss
mutable pairs at all.  When I need mutation, I use boxes, vectors,
mutable hash-table, a mutable struct, ... or `set!'.  Even more than
that, I need less than a hand to count the number of times I used
mutable pairs in these two years -- it just *feels* wrong now: why
allow more mutation when I don't need it (and requiring pairs with
both their `car's and `cdr's mutable is something that just almost
never happens).  So why do I think that it's an important anecdote?
It's because of this lack of surprise I have in my conclusion: it's
obvious that if you need to be explicit with mutation you won't just
use it everywhere, and it's obvious that overall you need very little
of it.

So the whole dilemma (if I had one, that is) boils down to the
question of preserving legacy code.  At this point things become
subjective.  Given the minimal effort that was required, and given the
benefits, I certainly think that the answer is obvious.  After all,
someone wise recently said:

  Programming languages are like sharks. When they stop moving
  forward, they die.

Keeping legacy code as a holy grail means that (among other issues)
mutable pairs are going to stick around with Scheme until it dies.

> Since then, /inspired by Lisp/, many people have invented purely
> functional languages.  How many are there by now?  Haskell, ML,
> ... let's even throw in the declarative languages like Prolog.
> These are elegant languages.  Add them all together and they don't
> have a tenth the usership of Lisp.

(ML and Prolog are not pure.  The combination of the ML, the Prolog,
and the Haskell communities is likely bigger than the Lisp community.
I can't guess what the ratio is, but there are most definitely much
more than "a tenth of the usership of Lisp".)

> Why?  I believe it's because programmers, good programmers, will
> always choose good toolkits over straitjackets.  I want Scheme to be
> a Lisp, not an ultra-leftist sectarian dual of Pascal.

One straitjacket that I don't want to be forced to wear is the huge
amount of work I have to do to get robust code written when pairs are
mutable.  Another straitjacket I don't want to wear is the illusion
that "assuming the list doesn't change" is good enough for real code.
I want Scheme to allow me to be free of these things, not some chaotic
"here's a cpu and some wires, you can do whatever you want with it
now" thing.

--
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports