Re: [Scheme-reports] Bytevectors should be called u8vectors

Show/hide message thread

Re: [Scheme-reports] Bytevectors should be called u8vectors Marc Feeley (03 Jul 2012 13:30 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors Jussi Piitulainen (03 Jul 2012 15:18 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors Marc Feeley (03 Jul 2012 16:04 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors John Cowan (04 Jul 2012 07:20 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors Marc Feeley (04 Jul 2012 14:06 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors John Cowan (05 Jul 2012 01:19 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors Alex Shinn (05 Jul 2012 01:27 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors Alan Watson (05 Jul 2012 02:07 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors Andy Wingo (15 Jul 2012 19:48 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors John Cowan (15 Jul 2012 20:06 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors Andy Wingo (15 Jul 2012 20:15 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors Andy Wingo (16 Jul 2012 08:29 UTC)

Re: [Scheme-reports] Bytevectors should be called u8vectors Marc Feeley 03 Jul 2012 16:04 UTC

On 2012-07-03, at 11:17 AM, Jussi Piitulainen wrote:

> Marc Feeley writes:
>
>> I have a feeling that the use of "bytevector" in the names of
>> procedures in R7RS small is due to WG2 concerns of extending the set
>> of operations on bytevectors to 16, 32, etc bit width access
>> operations.  Alaric Snell-Pym and others have pointed out that
>> "blob" is a better name for such a data type.  I am not saying I
>> prefer it, but perhaps that's the name WG2 and the community will
>> prefer for that data type.  So committing to the name "bytevector"
>> in R7RS small is premature.  On the other hand, you say that in your
>> WG2 bytevector proposal, you were proposing to support u8vectors and
>> the other SRFI-4 names.  So I don't see your position of prefering
>> to standardize in R7RS small the "bytevector" names instead of the
>> "u8vector" names.
>
> Apologies if I mistake, but I think John distinguishes meaningfully
> between the bytevector-* interface and the [fus]{8,16,32,64}vector-*
> interface (somewhere in this thread) and this distinction should be
> appreciated more. These are not alternative names for the same thing:
> bytevector-* offsets are in bytes, the other kind offsets in the units
> indicated by the name. The bytevector interface can interpret binary
> formats byte by byte in varying units. The other interface fixes an
> interpretation as a homogeneous vector, and the interfaces overlap in
> the 8-bit cases.
>
> Let v be #u8(a, b, c, d, e, f, g, h) for suitable integers a, b, ...
> Let w be the same memory as an u16vector, disjoint type or not.
>
> (bytevector-u8-ref v 3)  => d as unsigned-int8
> (bytevector-s8-ref v 3)  => d as signed-int8
>
> (u8vector-ref v 3)       => d as unsigned-int8
>
> (bytevector-u16-ref w 3) => d, e as unsigned-int16
> (bytevector-u16-ref w 4) => e, f as unsigned-int16
>
> ;; no access to d, e with u16vector-ref, er, (u16vector-ref w 3/2)?
> (u16vector-ref w 2)      => e, f as unsigned-int16
> (u16vector-ref w 3)      => g, h as unsigned-int16
>
> Hm. I'm not sure if it makes much practical sense for u8vector and
> bytevector to be disjoint types. For the other homogeneous types a
> distinct written representation would be nice, at least in a REPL.
>
> Both interfaces seem important to me.

Sorry for not being precise, but yes that's what I undestood.  I agree that an "integer layout" API offers more operations than SRFI-4, but it is a complex API that involves additional concepts such as numerical encoding and endianness and possibly alignment.  The complex API adds run time overhead which goes against the purpose of these homogeneous vectors (in other words, if mixed-type access to the binary data is not required by a program, which I expect to be the more common case, it is preferable to use the SRFI-4 interface for performance reasons).

A compromise which would eliminate the bloat of the two interfaces, is to have R7RS small adopt the u8vector names, and for R7RS large to add the rest of the SRFI-4 procedures, and the "integer layout" API using a u8vector prefix, i.e.

   (u8vector-u16-ref u8vect byte-offset endianness)
   etc.

I would find this more consistent, given that the external representation for the vectors operated on by these procedures is #u8(...).

Marc

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports