[Scheme-reports] Useless ambiguities in number specification, section 7.1.1 David A. Wheeler 18 Jan 2013 17:25 UTC

The syntactic specification for numbers in the Scheme R7RS draft 8
section 7.1.1 (lexical structure) has unnecessary useless ambiguities that I think should be fixed.

For one thing, there is a useless ambiguity in the prefix_10 production
(for base 10 numbers) when both "exactness" and "radix_10" are empty.
Currently the exactness and radix_10 productions accept "empty",
but the "prefix_10" production is defined as:
  prefix_10 ->  radix_10 exactness | exactness radix_10
That means a number with empty radix_10 and empty exactness has a
useless ambiguity. For example, given "7", is its prefix
no-radix no-exactness, or no-exactness no-radix?

The "ureal_10" definition also has two useless ambiguities.  It's defined as:
  ureal_10 -> uinteger_10 | uinteger_10 '/' uinteger_10 | decimal_10
Problems:
1.  The digit "1" can be a uinteger_10 or a decimal_10, again, a
    useless ambiguity.
2.  The suffix is optional, but a suffix can be empty anyway,
    creating another useless ambiguity (should I skip the suffix,
    or take the "empty" suffix?).

These ambiguities are unnecessary, and there are many ways to clean them up.
To me, the obvious ways involve changing "exactness" and "radix_10"
to NOT include the empty cases, and then modify the other productions
to use them as needed (making them optional where they are optional).

A simple tweak for the exactness/radix case is to modify "number" to be:
number -> exactness (num_2 | num_8 | num_10 | num_16)
          | num_2 | num_8 | num_10 | num_16
          | unprefix_num_10
and change the rest of the references to "exactness" and "radix_10"
so that their invocation is optional, like this:
  prefix_2 -> radix_2 exactness?
This would, however, allow two exactness statments, e.g., #i#b#e1.

The exactness/radix case can be completely resolved, while still
forbidding #i#b#e1, by changing "number" more substantially so that
it unambiguously states where exactness and radix_ are allowed.
Here's an example, again assuming that "exactness" and "radix_..."
no longer permit an empty clause:
number -> exactness
            (radix_2 complex_2
             | radix_8 complex_8
             | radix_10 complex_10
             | radix_16 complex_16
             | complex_10)
          | radix_2 exactness? complex_2
          | radix_8 exactness? complex_8
          | radix_10 exactness? complex_10
          | radix_16 exactness? complex_16
          | complex_10
If you do this, you no longer need the num_R and prefix_R productions.

The *productions* would be simpler if exactness (if specified) were
required first, instead of allowing it to be either before and after
the radix. However, I presume that at this point in the spec development
you don't want to remove user capabilities.

For ureal_10, it might be better to rewrite ureal_10 specially
so that it embeds decimal_10 and just directly uses the
non-terminal suffix. One try:
ureal_10 ->
  uinteger_10 ('/' uinteger_10 | suffix | period digit_10* suffix)
  | period digit_10+ suffix

This group can probably come up with even better ways to resolve these.
I'm just hoping to justify that (1) there's a problem and
(2) there are ways to easily solve it.

In case it helps, below is an early attempt at using ANTLR version 3
to faithfully describe Scheme R7RS draft 8.

--- David A. Wheeler

=== An attempt to define Scheme numbers in ANTLR version 3 as LL(1) ===

// The following is structured so that exactness (if stated)
// can precede *OR* follow the radix (base) declaration, while
// (1) forbidding two exactness statements (only 1/number),
// (2) allowing exactness and radix declarations to both be optional without
//     creating a useless ambiguity.
// Instead of calling "NUM_2", we explicitly make statements about
// exactness.

NUMBER : (EXACTNESS                // Exactness-first case
            (RADIX_2 COMPLEX_2
             | RADIX_8 COMPLEX_8
             | RADIX_10 COMPLEX_10
             | RADIX_16 COMPLEX_16
             | COMPLEX_10))
          | RADIX_2 EXACTNESS? COMPLEX_2
          | RADIX_8 EXACTNESS? COMPLEX_8
          | RADIX_10 EXACTNESS? COMPLEX_10
          | RADIX_16 EXACTNESS? COMPLEX_16
          | COMPLEX_10 ;

fragment I : 'i' | 'I' ;

// fragment NUM_2 : PREFIX_2 COMPLEX_2 ;
fragment COMPLEX_2 : REAL_2
  ('@' REAL_2
   | ('+' | '-') UREAL_2? I
   | INFNAN I )?
  | ('+' | '-') UREAL_2 I
  | INFNAN I
  | ('+' | '-') I ;
fragment REAL_2 : SIGN UREAL_2 | INFNAN ;
fragment UREAL_2 : UINTEGER_2 ('/' UINTEGER_2)? ;
fragment UINTEGER_2 : DIGIT_2+ ;
// fragment PREFIX_2 : RADIX_2 EXACTNESS? ;

// fragment NUM_8 : PREFIX_8 COMPLEX_8 ;
fragment COMPLEX_8 : REAL_8
  ('@' REAL_8
   | ('+' | '-') UREAL_8? I
   | INFNAN I )?
  | ('+' | '-') UREAL_8 I
  | INFNAN I
  | ('+' | '-') I ;
fragment REAL_8 : SIGN UREAL_8 | INFNAN ;
fragment UREAL_8 : UINTEGER_8 ('/' UINTEGER_8)? ;
fragment UINTEGER_8 : DIGIT_8+ ;
// fragment PREFIX_8 : RADIX_8 EXACTNESS? ;

// fragment NUM_10 : PREFIX_10 COMPLEX_10 ;
fragment COMPLEX_10 : REAL_10
  ('@' REAL_10
   | ('+' | '-') UREAL_10? I
   | INFNAN I )?
  | ('+' | '-') UREAL_10 I
  | INFNAN I
  | ('+' | '-') I ;
fragment REAL_10 : SIGN UREAL_10 | INFNAN ;
// UREAL_10 has to be handled carefully.  The spec has this:
//    fragment UREAL_10 : UINTEGER_10 ('/' UINTEGER_10)? | DECIMAL_10;
//    fragment DECIMAL_10 : UINTEGER_10 SUFFIX
//      | PERIOD DIGIT_10+ SUFFIX
//      | DIGIT_10+ PERIOD DIGIT_10* SUFFIX ;
// which is ambiguous; for UREAL_10, "1" can match branch 1 or 2.

fragment UREAL_10 :
  /* SUFFIX can be empty, so making it optional creates unnecessary ambiguity */
  UINTEGER_10 ('/' UINTEGER_10 | SUFFIX | PERIOD DIGIT_10* SUFFIX)
  | PERIOD DIGIT_10+ SUFFIX ;

fragment UINTEGER_10 : DIGIT_10+ ;
// fragment PREFIX_10 : RADIX_10 EXACTNESS? ;

// fragment NUM_16 : PREFIX_16 COMPLEX_16 ;
fragment COMPLEX_16 : REAL_16
  ('@' REAL_16
   | ('+' | '-') UREAL_16? I
   | INFNAN I )?
  | ('+' | '-') UREAL_16 I
  | INFNAN I
  | ('+' | '-') I ;
fragment REAL_16 : SIGN UREAL_16 | INFNAN ;
fragment UREAL_16 : UINTEGER_16 ('/' UINTEGER_16)? ;
fragment UINTEGER_16 : DIGIT_16+ ;
// fragment PREFIX_16 : RADIX_16 EXACTNESS? ;

fragment INFNAN : ('+' | '-') ('inf.0' | 'nan.0') ;

fragment SUFFIX : /*empty*/ | EXPONENT_MARKER SIGN DIGIT_10+ ;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l' |
                           'E' | 'S' | 'F' | 'D' | 'L' ;
fragment SIGN : /*empty*/ | '+' | '-' ;
// NOTE: This *requires* non-empty.
fragment EXACTNESS : '#i' | '#I' | '#e' | '#E' ;

fragment RADIX_2 : '#b' | '#B' ;
fragment RADIX_8 : '#o' | '#O' ;
// NOTE: This *requires* non-empty.
fragment RADIX_10 : '#d' | '#D';
fragment RADIX_16 : '#x' | '#X';
fragment DIGIT_2 : '0' | '1' ;
fragment DIGIT_8 : '0'..'7' ;
fragment DIGIT_10 : DIGIT ;
fragment DIGIT_16 : DIGIT_10 | 'a'..'f' | 'A'..'F';

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports