[scheme-reports-wg2] Re: [Scheme-reports] DISCUSSION/VOTE: The character tower John Cowan (06 May 2014 06:22 UTC)
Re: [Scheme-reports] DISCUSSION/VOTE: The character tower Sascha Ziemann (07 May 2014 08:16 UTC)
Re: [Scheme-reports] DISCUSSION/VOTE: The character tower Per Bothner (08 May 2014 01:35 UTC)
Re: [Scheme-reports] DISCUSSION/VOTE: The character tower Alaric Snell-Pym (08 May 2014 12:22 UTC)
Re: [Scheme-reports] DISCUSSION/VOTE: The character tower Jussi Piitulainen (08 May 2014 05:36 UTC)
Re: [Scheme-reports] DISCUSSION/VOTE: The character tower Shiro Kawai (06 May 2014 21:04 UTC)

[scheme-reports-wg2] Re: [Scheme-reports] DISCUSSION/VOTE: The character tower John Cowan 06 May 2014 06:22 UTC

Bear scripsit:

> Yes, with the exception of code points which are not actually mapped to
> any character by the Unicode standard.

For clarification, which of these do you mean?

(a) Code points which will never correspond to any character, namely
the surrogates?  (These are already excluded by -small.)

(b) Code points for reserved noncharacters (there are 65 of these;
they are not to be used in interchange, but may be useful internally to
a program)?

(c) Codepoints that will (or at least may) be assigned to characters in
future versions of Unicode?

> > 7) Should R7RS-large implementations be required to
> > provide the characters from #\x10000 to #\x10FFFF?
>
> No.

I'm curious why you reject these, seemingly out of hand.  They are
required by a lot of scripts, though mostly archaic and minority-use ones.
You similarly reject #11 without explanation.

> > 8) Should R7RS-large implementations be required to allow #\x0 in strings?
>
> Abstention.  If an implementation is serious enough about Unicode
> support to keep its strings in a Unicode normalized form, which ought
> not be forbidden, then NUL can never appear in any string.

I don't understand this remark at all.  The normalized form of the U+0000
character under any normalization form is quite simply itself.  The
internal encoding of the characters with or without 0 bytes is not
relevant here.

> Yes, with the exception of code points which are not actually mapped to
> any character by the unicode standard and code points which have a
> canonical decomposition (ie, the standard ought to allow an
> implementation to implement strings as unicode normalized strings).

That is, in normalization form D, I assume you mean.  (Normalization form
C is more commonly used, and actually encourages the use of characters
with a canonical decomposition.)

> Identifiers which are distinct when in NFKC/NFKD normalized form
> must be considered distinct by all implementations.  Identifiers which
> are not distinct when normalized as NFD/NFK must _not_ be considered
> distinct by any implementation.  The standard should give a definite
> rule about identifiers which are distinct in NFD/NFK normalizations, but
> identical in NFKC/NFKD normalizations; are they to be considered
> distinct, considered identical, or is that implementation-defined?

This is an interesting point which I will probably ballot later.

--
John Cowan          http://www.ccil.org/~cowan        cowan@ccil.org
Female celebrity stalker, on a hot morning in Cairo:
"Imagine, Colonel Lawrence, ninety-two already!"
El Auruns's reply:  "Many happy returns of the day!"

--
You received this message because you are subscribed to the Google Groups "scheme-reports-wg2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-wg2+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.