Re: [Scheme-reports] Cycle detection problems: #442, #338, "equal?" David A. Wheeler (30 Aug 2012 03:01 UTC)

Re: [Scheme-reports] Cycle detection problems: #442, #338, "equal?" David A. Wheeler 30 Aug 2012 03:00 UTC

First, thanks very much for replying, I appreciate it.  I really *do* like a lot about R7RS, which is why I comment on it (like everyone, I want it to be *good*).  Anyway...

Alex Shinn:
> Your comment is overly long so I'll summarize.  You
> believe that allowing write to first output many megabytes
> or gigabytes of garbage before detecting a cycle and
> raising an exception could be done more efficiently than
> detecting the cycle when it first occurs.  I'm not convinced
> of this - the output itself has an arbitrarily large overhead
> either on the runtime (for string/in-memory ports) or the
> OS (for files).  If you set the recursion threshold too
> low it's no different from checking for cycles initially,
> if too high it's an easy way to explode a lot of memory
> or disk.

Is there a reason the spec has to *forbid* this alternative?  Much depends on how often cycles occur.  Historically write didn't support this anyway, so presumably this is a really really rare case.  Lots of systems optimize for the "normal" case.   I'm not asking that such approaches be *required*, just not *forbidden*.

> Furthermore detecting cycles (in advance or on the fly)
> can be done with a eq-hash-table with O(n) time and
> O(d) space, where d <= n is the length of the longest
> cycle.  Thus asymptotically it could be argued this is
> faster than your proposal.

Big-O hides the constants, which concern me.  And I see no eq-hash-table in the spec to do that anyway; did I miss it?  I didn't think R7RS had hash tables.  Without a portable hash table, many smaller implementations (especially if they want portability) are going to create sloooow list implementations.

> I'm also uncomfortable with `write' raising exceptions
> where it never did before.

Okay, that makes perfect sense.  In that case, just have it terminate.

> > In addition, there seems to be a major gap in the definition of "equal?" involving cycles.  It currently says, "Even if its arguments are circular data structures, equal? must always terminate."  That additional sentence adds a MAJOR new performance and implementation cost on "equal?" compared to R5RS.
>
> No it does not, this can be done essentially for free,
> see http://www.r6rs.org/r6rs-editors/2006-February/000969.html

Ah, thanks so much for the reference, I appreciate it.

I read the email, and I'm even more confused.  That email doesn't seem to show "free" at all, indeed, it seems to soundly disprove it.  The equiv? operator (a terminating equal?, as proposed) appears to take 300% (3 times) as much time or much more in most cases. A 5-10% overhead is probably fine, but a 300% to 1350% increase in execution time is pretty damning. Perhaps that's just a difference between an optimized built-in vs. a portable implementation?  Or am I misunderstanding the email (easy, since I'm reading it out of context)? If I understand this email correctly, that is concerning.  Interestingly, the email *also* proposes, as I do, that a *different* procedure name be used for a procedure with this different behavior ("equiv?" in this case).  I don't know the discussions that happened since then, though - was that cleared up later?

In any case, the spec should say *what happens* when a cycle is detected if "equal?" or whatever it's called does cycle detection. I suggest that it return #f if it detects a cycle.

Thank you very much for your time.

--- David A. Wheeler

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports