Re: [Scheme-reports] mutable unicode strings
Per Bothner 02 Jul 2014 15:04 UTC
On 07/02/2014 12:34 AM, Sam Tobin-Hochstadt wrote:
> Racket stores strings as arrays of 4-byte characters.
On 07/02/2014 07:06 AM, Michael Montague wrote:
> Foment uses 4 bytes to store each character in a string.
The follow-up question is: When you have a Scheme reference
to a string, presumably that is a pointer to a string object header
containing at least a size and maybe some kind of typecode. The
actual data characters can be stored right next to the header (inlined),
or the header can have a pointer to a separate array of data characters
(indirected). The latter makes it easy to resize the array by re-allocating
it; doing it inlined makes resizing difficult.
I would argue that the ability to resize a mutable string is so important
that it justifies the slight overhead of indirection. (Indirection may
also be easier to implement, depending on how your memory allocation works.)
In other words: Supporting string-replace! has no extra overheads beyond
requiring an "indirect" representation. The latter is forced anyway if
you support mutability and full Unicode, unless you use 3- or 4-byte characters.
Even if you do use 3- or 4-byte characters, indirection is worth it, because
mutable fixed-size strings is an essentially-useless feature.
--
--Per Bothner
per@bothner.com http://per.bothner.com/
_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports