Re: [Scheme-reports] Sequence to sequence conversion

Show/hide message thread

Re: [Scheme-reports] Sequence to sequence conversion Alex Shinn (01 Jul 2012 20:40 UTC)

Re: [Scheme-reports] Sequence to sequence conversion Marc Feeley (02 Jul 2012 12:34 UTC)

Re: [Scheme-reports] Sequence to sequence conversion Ray Dillinger (02 Jul 2012 15:57 UTC)

Re: [Scheme-reports] Sequence to sequence conversion John Cowan (03 Jul 2012 07:57 UTC)

Re: [Scheme-reports] Sequence to sequence conversion Marc Feeley 02 Jul 2012 12:33 UTC

On 2012-07-01, at 4:39 PM, Alex Shinn wrote:

> On Sun, Jul 1, 2012 at 10:19 PM, Marc Feeley <feeley@iro.umontreal.ca> wrote:
>> The R5RS has the following sequence to sequence conversion procedures:
>>
>>    list->string, and string->list
>>    list->vector, and vector->list
>>
>> The R7RS is adding bytevector sequences, but it does not add the conversion procedures:
>>
>>    list->bytevector, and bytevector->list
>>
>> What is the rationale for this inconsistency?
>>
>> Moreover, the R7RS is adding only the first set of these conversion procedures:
>>
>>    vector->string, and string->vector
>>    bytevector->string, and string->bytevector  (not in R7RS)
>>    vector->bytevector, and bytevector->vector  (not in R7RS)
>
> Actually, we have the second, it's just named
> utf8->string and string->utf8 to emphasize the
> encoding used to convert to and from a bytevector.

Not really.  I expected bytevector->string to be equal to

       (lambda (bv) (list->string (map integer->char (bytevector->list bv))))

which would correspond I guess to a latin1->string functionality with your naming Scheme.

Concerning utf8->string and string->utf8, I dislike these procedures for many reasons:

1) Very minor point: the official name for this encoding is UTF-8, so it should be UTF-8->string and string->UTF-8.

2) The procedures specify in their names the character encoding to use.  But there are oodles of character encodings, so for easy extensibility to other encodings, it would be better to use a parameter as in (decode-string bytevector 'UTF-8) and (encode-string string 'UTF-8) instead of oodles of different procedures.

3) The main reason for character encodings is to perform I/O on byte-oriented streams.  Yet the only procedures having to do with character encodings in R7RS are utf8->string and string->utf8.  This seems wrong.  If textual output could be performed on binary ports and the character encoding could be specified when the port is opened (as was proposed in SRFI-91, http://srfi.schemers.org/srfi-91/srfi-91.html, and implemented in Gambit), then the procedures utf8->string and string->utf8 would be superfluous since they could be defined easily like this:

    (define (string->utf8 s)
      (let ((port (open-output-bytevector 'UTF-8)))
        (display s port)
        (get-output-bytevector port)))

Marc

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports