[Scheme-reports] Issues with R7RS draft 8 section 7.1.1 <symbol element> David A. Wheeler 12 Jan 2013 19:29 UTC

I just found some issues in R7RS draft 8 section 7.1.1
("Lexical structure") involving <symbol element> and <string element>.
Currently these productions are defined as follows:

<symbol element> -->
  <any character other than <vertical line> or \>
  | <string element> | " | \|

...
<string element> --> <any character other than " or \>
  | \a | \b | \t | \n | \r | \" | \\
  | \<intraline whitespace>* <line ending>
    <intraline whitespace>*
  | <inline hex escape>*

But these productions have two issues:
1. Having <symbol element> list double-quote is pointless, since that
   is already covered by "<any character other than <vertical line> or \>".
2. More importantly, the call to <string element> creates a useless
   ambiguity, because string element's <any character other than " or \>
   ALSO matches almost all the same characters.

This causes problems if you try to directly implement these productions
in a typical tokenizer.  You can work around it, but it'd better
if that wasn't necessary.

I think these would be better written as follows, which removes the
extraneous double-quote and adds a disambiguating <special string element>:

<symbol element> -->
  <any character other than <vertical line> or \>
  | <special string element> | \|

...
<string element> --> <any character other than " or \>
  | <special string element>

<special string element> -->
  \a | \b | \t | \n | \r | \" | \\
  | \<intraline whitespace>* <line ending>
    <intraline whitespace>*
  | <inline hex escape>*

Thanks!

 --- David A. Wheeler

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports