Re: [Scheme-reports] Mutable Pairs Eli Barzilay (14 Jun 2010 05:57 UTC)
|
Re: [Scheme-reports] Mutable Pairs
Brian Harvey
(14 Jun 2010 06:32 UTC)
|
Re: [Scheme-reports] Mutable Pairs
Eli Barzilay
(14 Jun 2010 08:16 UTC)
|
[Replying on `scheme-reports' since that's what I was told is the preferred way. I am not affiliated with R7RS in any way.] On Jun 13, Brian Harvey wrote: > > From: DekuDekuplex@Yahoo.com (Benjamin L. Russell) > > Wow, you're really in favor of giving PLT the proverbial kick in > > the rear-end. > > That is not how I would characterize the situation. PLT have > chosen, as they say explicitly, to design a language that is based > on Scheme but is not Scheme. Yes, Racket is based on Scheme, but it is not (R5RS) Scheme, because it provides (much) more bindings, and because it accepts square brackets, and because it's case-sensitive, and because it has the usual list/pair bindings use immutable pairs, and mutable ones have different names. Yet, Racket is close enough -- and rich enough -- that it provides an R5RS language, strictly R5RS to the point that very little (if any) other implementations provide. > > My preference is to opt for the middle ground, and not require > > immutable pairs, but not require mutable pairs, either. ... The > > problem with doing this in Scheme is that it breaks > > backwards-compatibility with existing implementations. However, > > immutable pairs does not seem to be a bad goal for the future. > > > > It seems that immutable pairs should be a gradually introduced > > goal for the future... Scheme programmers should be encouraged to > > program in a functional style, as in such languages as Haskell and > > Clojure. Over time, as such languages that encourage programmers > > to program in a functional style as Haskell and Clojure catch on, > > more programmers should adopt the functional style, and more > > Scheme textbooks should encourage the use of immutable pairs. > > Eventually, immutable pairs could become the default, with mutable > > pairs relegated to a separate library. > > NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! > NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! NO! [Avoiding an obvious childish reply.] > I can't believe I'm reading this. > > Yes, functional programming is great. I teach functional > programming. I urge my students to program functionally whenever > they possibly can. *sigh* Just *how* many more decades should students be "urged"? How many more buffer-overrun-like diseases should the world of programming suffer for this kind of thinking to finally go the way of the GOTO? Mutable pairs as a default fundamental data structure is actually worse than GOTOs: those make your code a mess, which (in theory) affects only people who maintain that code. But having mutable values is something that is damaging also to people who use your code at the interface level. The problem is demonstrated by Matthew's post, the one that Benjamin mentioned, but in a later post he (Benjamin) made the mistake of saying: | [...] allowing proofs of correctness for Scheme programs to be | written much more easily [...] and following this line, you reply: > Over fifty years ago, John McCarthy brilliantly invented a > programming language loosely based on lambda calculus, but designed > to be used, not just to be reasoned about [...] and all of this is as if immutable pairs are something that fills a kind of a theoretical/masochistic need for self punishment in the form of correctness proofs. Something that *real* programmers never have to deal with, only "theoreticians of bondage and discipline languages". That couldn't be more *wrong*. On the side of proofs of correctness, there is nothing interesting, really. The bottom line is that this: (define (map f l) (define (loop l) (if (null? l) '() (cons (f (car l)) (loop (cdr l))))) (if (list? l) (loop l) (error "poof!"))) can be proved as a correct implementation of map *assuming* no changes to the list ever occur in the call to `f', or more generally during the dynamic extent of the call to `map'. If you're one of these theoretical people, there's not much else to see here. The problem is on the side of the *working* programmer -- the one who lives in the *real* world. These people will often make the same assumption (and as Matthew points out, they almost always do), so they write the above definition and then they're done with it, or rather they *think* that they're done. But the problem starts with the fact that Scheme is a language with first-class values -- and `map' happens to be a function that uses that: it is a higher order function. This means that you're passing a mutable value to random code -- and this random code can now break invariants that your own code (`map', in this case) depends on unless you're extra careful. As Matthew writes, given only mutable pairs, the only way to guarantee that your own pair-related invariant (such as `list?'ness) is not broken is to copy it: (define (map f l) (define (loop f l) (if (null? l) '() (cons (f (car l)) (loop f (cdr l))))) (if (list? l) (loop f (loop values l)) (error "poof!"))) In almost two decades of reading and writing Scheme code, I have not seen this done even once. I can think of several reasons for this: first, as Matthew noted in that blog post, copying means that you're changing the semantics of code, so you can't just always do it and expect no problems; but that's probably too obscure for most people who are not implementors to care about. There's also the reason of speed: this code is roughly twice slower than the above version, since it generates a junk list -- and real programmers just know that this is insane. (At this point there are some people that will almost instinctively chant "given a sufficiently smart compiler" -- but that's obviously wrong for practical compilers, since optimizing the copy away requires whole-program analysis, and most of us real-world programmers write (and compile) code in little bits that are linked together later.) No, I don't think that these arguments prevent people from writing such code. What does that job is the assumption I mentioned above: that the structure of `l' won't change during the `map' call. After all, only nice people use Scheme, right? It's the dreaded "unspecified result" factor: it's fine if this means some #<unspecified> value, but it's very wrong if it means a segfault, an OS crash, or an erroneous missile launch. I hope that it's obvious that this is the same kind of thinking that made buffer overruns the nightmare they are. To put this differently: Schemes (and Lispers) have been dreaming for a really long time about Scheme/Lisp operating systems and other large scale systems. Say that I write software for *your* bank, and I do that in Scheme -- it's a wonderful system that is immediately successful, and I open it up for 3rd-party plugins. Would you be happy to know that my API passes on your bank account as a structure made of mutable pairs to these plugins? I certainly wouldn't want that, especially not as the implementor of the bank's code who will later be liable for the damage it will lead to. The bottom line is that mutation is harder than it first looks. Bugs involving state will almost always be found in some code that is syntactically unrelated to the buggy code. I've spent too many nights tracking such bugs to just let myself go with the "assuming no bad mutation" flow and hope for the best. I've seen enough such problems to not let my students get away with "you're urged to use functional code *when* it makes sense". For most programmers, and certainly for students, it "makes sense" far too often than it should. > [...] /not/ to dictate to the programmer about how to program. [...] But all of that is not to say that mutation should be abolished -- I do consider myself a practical hacker -- I completely agree with the need to let the user (which is me) use mutation when it makes sense to do so. But *why* lump mutable data together with something as fundamental as pairs? Where's the "jewel-like"ness of that? If anything, this lumping *is* -- IMO -- an "attack on the core nature of Scheme" even if it happened right at the start; a bad legacy inherited from Lisp. After all, Scheme did separate pairs from functions, unlike its parent language -- and the result is a better language. IMO, features that can be separated into N sub-features should always be separated -- so if location-based mutation doesn't have to be bundled with pairs, then it shouldn't. My original encounter with boxes in PLT Scheme was "why?". After all, you can do exactly the same with pairs... It took me a while to realize that the important feature of boxes is that they make the mutable state explicit, and therefore more manageable. Sure, you can still have those kinds of nightmarish bugs where corrupt data flows around -- but tracking it becomes so much easier when you know of a particular kind of data that is intended to hold such state. Furthermore, when the default is immutable data and mutable data requires an explicit container like boxes (or an explicit flag in your type definitions), then -- obviously -- programmers will use these explicit devices only when they *need* to have mutable state. This is going to be an anecdote, but I think it's an important one: I was originally skeptical about going with immutable-by-default pairs in PLT; and from my perspective, the important aspect of the experiment was to see how I manage with this on my usual daily hacking -- the "efforts" of porting old code are much less important IMO (and turns out that they were all very small). In retrospect, my personal conclusions after using this for two years are obvious: I don't miss mutable pairs at all. When I need mutation, I use boxes, vectors, mutable hash-table, a mutable struct, ... or `set!'. Even more than that, I need less than a hand to count the number of times I used mutable pairs in these two years -- it just *feels* wrong now: why allow more mutation when I don't need it (and requiring pairs with both their `car's and `cdr's mutable is something that just almost never happens). So why do I think that it's an important anecdote? It's because of this lack of surprise I have in my conclusion: it's obvious that if you need to be explicit with mutation you won't just use it everywhere, and it's obvious that overall you need very little of it. So the whole dilemma (if I had one, that is) boils down to the question of preserving legacy code. At this point things become subjective. Given the minimal effort that was required, and given the benefits, I certainly think that the answer is obvious. After all, someone wise recently said: Programming languages are like sharks. When they stop moving forward, they die. Keeping legacy code as a holy grail means that (among other issues) mutable pairs are going to stick around with Scheme until it dies. > Since then, /inspired by Lisp/, many people have invented purely > functional languages. How many are there by now? Haskell, ML, > ... let's even throw in the declarative languages like Prolog. > These are elegant languages. Add them all together and they don't > have a tenth the usership of Lisp. (ML and Prolog are not pure. The combination of the ML, the Prolog, and the Haskell communities is likely bigger than the Lisp community. I can't guess what the ratio is, but there are most definitely much more than "a tenth of the usership of Lisp".) > Why? I believe it's because programmers, good programmers, will > always choose good toolkits over straitjackets. I want Scheme to be > a Lisp, not an ultra-leftist sectarian dual of Pascal. One straitjacket that I don't want to be forced to wear is the huge amount of work I have to do to get robust code written when pairs are mutable. Another straitjacket I don't want to wear is the illusion that "assuming the list doesn't change" is good enough for real code. I want Scheme to allow me to be free of these things, not some chaotic "here's a cpu and some wires, you can do whatever you want with it now" thing. -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _______________________________________________ Scheme-reports mailing list Scheme-reports@scheme-reports.org http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports