[Scheme-reports] Inconsistency of sequence copying procedures Marc Feeley 01 Jul 2012 00:24 UTC

Formal Comment

Submitter's name: Marc Feeley
Submitter's email: feeley at iro.umontreal.ca
Relevant draft: r7rs draft 6

Type: defect
Priority: major
Relevant section of draft: 6.7. Strings, 6.8. Vectors, 6.9. Bytevectors

Summary: Inconsistency of sequence copying procedures

R7RS has three vector-like data types: strings, vectors and
bytevectors.  The inconsistencies in their properties and sequence
copying procedures (names and API) make it harder than it needs to be
for the programmer to remember.

1) self-evaluation inconsistencies

Vectors and bytevectors have a similar external representation, yet
bytevectors are self evaluating (page 46) and vectors are not self
evaluating.  I do not care very much if they are, or if they are not
self-evaluating, but it should be the same for vectors and
bytevectors.

2) sequence copying procedures inconsistencies

Subsequences of strings can be extracted using the procedure substring
which takes 3 required parameters, i.e.

  (substring string start end)

There is also a string-copy procedure which takes a single required
parameter and returns a copy of the string.  These procedures are
related like so:

  (string-copy string) = (substring string 0 (string-length string))

Subsequences of vectors can be extracted using the procedure
vector-copy only, which takes one required parameter and 3 optional
parameters, i.e.

  (vector-copy vector [start [end [fill]]])

With a single parameter a copy of the whole vector is returned,
otherwise a subsequence is returned.

Subsequences of bytevectors can be extracted using the procedure
bytevector-copy-partial, which takes 3 required parameters and behaves
exactly like substring except for the fact that bytevectors are being
processed and returned, i.e.

  (bytevector-copy-partial bv start end)

There is also a bytevector-copy procedure which takes a single
required parameter and returns a copy of the bytevector.  These
procedures are related like so

  (bytevector-copy bv) = (bytevector-copy-partial bv 0 (bytevector-length bv))

There are also 2 procedures to copy the content of a bytevector to
another bytevector imperatively: bytevector-copy and
bytevector-copy-partial!.

I do not see a good reason for having different APIs (mix of required
and optional parameters) and naming conventions for similar
operations.

The naming convention could be based on the one which has been in
place for strings for a long time, i.e. substring, subvector, and
subbytevector for extracting subsequences.  The same API should
be used consistently for all the procedures, in other words:

  (substring     string     [start [end [fill]]])
  (subvector     vector     [start [end [fill]]])
  (subbytevector bytevector [start [end [fill]]])

Note that it reads even better if bytevector operations are named using
the SRFI-4 naming convention:

  (substring   string   [start [end [fill]]])
  (subvector   vector   [start [end [fill]]])
  (subu8vector u8vector [start [end [fill]]])

The functional copy procedures would remain for consistency:

  (string-copy   string)   = (substring   string)
  (vector-copy   vector)   = (subvector   vector)
  (u8vector-copy u8vector) = (subu8vector u8vector)

The imperative partial copy procedure defined for bytevectors

  (bytevector-copy-partial! from start end to at)

should exist for other sequences too.  Better consistency would be
achieved by exchanging the order of the destination and source, in
order to benefit from the same pattern of optional parameters as the
other procedures:

  (substring-move!   to at from [start [end [fill]]])
  (subvector-move!   to at from [start [end [fill]]])
  (subu8vector-move! to at from [start [end [fill]]])

I don't think the imperative copy operation performed by
bytevector-copy! is sufficiently common to be included in R7RS (and
applied to the other sequence types).  In any case the same operation
could be obtained by using a ...-move! procedure with an additional
constant 0 used for the "at" parameter.

Finally, I think the handling of the fill parameter is questionable.
It is a bad idea for the fill parameter to have a default.  When fill
is absent, it should be an error when start and end are not within the
bounds of the sequence.  Otherwise, some index calculation errors
(off-by-one on "end") may go unnoticed.  Moreover, when it is
supplied, the fill should also be used when start is less than 0, for
consistency with the case where end is greater to the length of the
sequence.

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports