Re: [Scheme-reports] Padding/placeholders (hashes) in numerical syntax Peter Bex 04 Sep 2011 15:28 UTC
On Wed, Aug 10, 2011 at 03:39:22PM -0400, John Cowan wrote:
> Peter Bex scripsit:
>
> > While we're on the topic of numerical stuff in the standard, I'd like
> > to ask why the "padding"/placeholder digits for numbers (# characters
> > instead of digits inside a number) is kept around.
>
> I agree that it's bogus.  The ballot question asked about the #s from R5RS and
> the mantissa-width specifier (|nnn) from R6RS: the first, the second, neither,
> or both.  The vote was inconclusive, so the R5RS status quo was kept.

Later I was told that this might be reconsidered if it was shown to be
really difficult to implement.

I've now written up a numerical syntax torture test, initially for
testing the string->number correctness of Chicken's "numbers" egg.
I ran the test on several other major Schemes and found that
implementations often accept invalid syntax related to the padding
and the decimal syntax (numbers containing exponents or decimal dots).

The latest version of this test can be found here:
http://bugs.call-cc.org/browser/release/4/numbers/trunk/tests/string-conversion.scm
(you can download it through one of the links at the bottom of the page)

To get it to run you'll need to make a few modifications in the prelude,
and for Scheme48 and MIT Scheme you will need to comment out a few
tests to prevent them from raising an error.  Please ignore the ugly
macro and focus on the tests themselves :)

The only Scheme that passes the tests with flying colors is Gambit (and
of course now Chicken with the numbers egg, since the test was intended
to debug it), and Guile if you ignore that it accepts syntax like "+nan.1234".
Hopefully soon Chicken core will include this test as well, and get fixed
as well.

Attached you find a few files, which are outputs of the following
Scheme systems (all tests on 64-bit NetBSD, unless noted otherwise):
- chicken-with-numbers-egg.txt.gz: Chicken 4.7.0 with numbers egg 2.7
- * chicken-with-old-numbers-egg.txt.gz: Chicken 4.7.0 with numbers egg 2.6.1
- chicken-without-numbers-egg.txt.gz: Chicken 4.7.4 core (without loading "numbers")
- gambit.txt.gz: Gambit 4.6.0
- gauche.txt.gz: Gauche 0.9.1
- guile.txt.gz: Guile 1.8.8
- * mit-scheme.txt.gz: MIT Scheme 7.7.90.+  (tested on 32-bit Linux)
- racket.txt.gz: Racket 5.0.1
- * scheme48.txt.gz: Scheme48 1.8

For the entries marked with an asterisk above I had to comment out some
tests because of errors that caused it to stop and the fact that there's
no common way to catch errors supported by all these Schemes.  MIT Scheme
for example incorrectly accepts input like "1/#" and then gives a
division by zero error, so the "Everything OK" output is misleading there!

Scheme48 doesn't allow complex numbers with NaN or Infinity so I had to
comment out all tests related to that, and a few others IIRC.
Chicken's old numbers egg had different errors triggered by the way
it "parsed" numbers with a regex and then tried to use #f as a number.

Guile can run the tests, but you'll need to invoke it as
"guile string-conversion.scm".  When I used "guile -l string-conversion.scm"
or ran (load "string-conversion.scm") from the REPL, I got a stack overflow
error on my x86_64 box running NetBSD.  Your mileage may vary, depending
on architecture & OS.

To run the tests with Gambit, you will need to supply -:s on the
commandline to enable syntax-rules support.

Outputs of other Schemes would be interesting to see as well, and
suggestions for new testcases are welcome too!

As you can see from the outputs, the "errors" in these Schemes are mostly
related to padding syntax, and especially such gems like "#x1#+1#i" or
"#e1#/2".  Surprisingly, there's also lots of errors related to allowing
the decimal syntax for bases other than 10 (especially in Racket).

I hope that these results are convincing enough to start a new vote to
get rid of this needless complication of Scheme syntax.  Personally, I'd
also vote against the R6RS "123|45" notation as there's really no point to
specifying the "precision" of a number anyway if there's no way to print
out a number with reduced precision.

One final question: Why was the +nan.0 syntax copied from R6RS but not
the -nan.0 syntax?  I think it makes sense to allow -nan.0 as well.
(I haven't added this to the tests yet for that reason; it's a gratuitous
 incompatibility with R6 IMO).

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
							-- Donald Knuth
_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports