Re: [Scheme-reports] Bytevectors should be called u8vectors Jussi Piitulainen (03 Jul 2012 15:18 UTC)
Re: [Scheme-reports] Bytevectors should be called u8vectors Marc Feeley (04 Jul 2012 14:06 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors Marc Feeley 04 Jul 2012 14:05 UTC

On 2012-07-04, at 3:19 AM, John Cowan wrote:

> Marc Feeley scripsit:
>
>> I would expect such an important question [as what implementations
>> support bytevector-u8-ref vs. u8vector-ref] to merit a thorough analysis
>> of current practice,
>
> In fact I do not believe that it does.  In a library-based world,
> it is not so essential to be careful to either use or avoid the names
> provided by implementations.  It is perfectly reasonable for one name
> to be available in one library and another name for the same thing in
> another, all within the same implementation.

I don't think we agree here.  It is confusing to have two names for the same thing.  If it is a coincidence that two names exist for the same function (say an identity function in an assertion module, and an identity function in a combinator module, or a particular function in a module implemented with and without tracing) then that is reasonable.  But here we are talking about two names for functions with the same intent on the same data type.  It is just bloat, not only in terms of implementation (i.e. more code), but also in the complexity of the language in the programmer's mind (i.e. which API should the programmer choose?).

> For example, WG1 decided to adopt the R6RS names `exact` and `inexact`
> on the grounds that the R5RS versions are actively misleading:
> `inexact->exact' implies that the argument must be inexact, but this
> has never been true.  Nonetheless, the R5RS names are still available
> through the R5RS compatibility library, which unlike its R6RS analogue
> exports all the R5RS names (except `transcript-{on,off}`).
>
>> When this issue was discussed and voted on by WG1, was there an analysis
>> (possibly informal) of current practice?  Perhaps just an analysis
>> of the major implementations?  If so, which implementations does WG1
>> know of which support u8vectors and which implementations support
>> bytevector-u8-ref ?
>
> Both the WG and I have avoided trying to specify which implementations are
> "major" and which are not:  I have instead presented facts about the
> implementations that I know about, and leave it up to the readers to decide
> which ones matter to them and which do not.  Anyway, here's what I know about what
> implementations *claim* to provide:
>
> SRFI 4: Racket, Gauche, Gambit, Chicken, Bigloo, Guile, Kawa, Scheme48,
> STklos, RScheme.  This information is probably out of date.
>
> R6RS: Guile, Chez, Vicare, Larceny, Ypsilon, Mosh.

As I suspected, common practice would favor the SRFI-4 API.

As a double-check I went to googlebattle.com and battled u8vector-ref and bytevector-u8-ref and u8vector-ref is 4 times more popular.  A more thorough investigation which examines existing code bases would be interesting.

>> In a low-performance Scheme system it may be straightforward, but
>> bytevectors exist for performance reasons (otherwise plain vectors would
>> be adequate).  It is less straightforward in a high-performance Scheme
>> system where primitives are inlined, constant-folded, optimized, etc.
>> There is also the problem of precise error reporting.  It would be
>> suboptimal when evaluating the user code (u8vector-ref x -5) to give the
>> error message "*** ERROR in bytevector-u8-ref -- index out of bounds".
>> Precise error reporting implies that the difference between the user
>> code (u8vector-ref x y) and (bytevector-u8-ref x y) must be preserved
>> all the way until run time.  This causes bloat in the system if both
>> APIs are supported.
>
> These considerations go far beyond the remit of any Scheme standard.

Common practice, performance, bloat, and debugging are all issues which must be taken into account when designing a standard.

>> I have a feeling that the use of "bytevector" in the names of
>> procedures in R7RS small is due to WG2 concerns of extending the set of
>> operations on bytevectors to 16, 32, etc bit width access operations.
>> Alaric Snell-Pym and others have pointed out that "blob" is a better
>> name for such a data type.
>
> I agree that it's better, but the WG voted otherwise.

But it still remains to be seen what will happen when community input is factored in.  My first impression was to dislike the term "blob", but I'm warming up to the idea due to their abstraction breaking nature.  The  data is not so much a vector of separate byte elements as it is an untyped structure layed out as a sequence of bytes.

Marc

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports