[scheme-reports-wg2] Re: [Scheme-reports] Seeking review of sets and hash tables proposals John Cowan (28 May 2013 01:51 UTC)

[scheme-reports-wg2] Re: [Scheme-reports] Seeking review of sets and hash tables proposals John Cowan 28 May 2013 01:50 UTC

Alexey Radul scripsit:

> Why not just make it a data structure?  Call them ontologies, or
> signatures (after the concept from universal algebra), and say that a
> constructor for a hash table (or a set) accepts an ontology over the
> objects that it is to contain.

I'm coming around to this way of thinking, though both these terms are
so overloaded I don't care for either of them.  I'll call it a _whatsit_
tentatively, though I hope we come up with something better (consider the
Bell-LaPadula "*-rule", a name which started out as a mere placeholder
but was never replaced by anything and is now standard).

Have I mentioned how the functional use of "persistent" drives me crazy?
To me a persistent data structure is one that outlives its process by
being backed up somewhere.  (Or is there some way in which these meanings
can be unified?)

> What's an ontology?  Well, for a hash table or a set, an ontology must
> have a member called 'equivalence-predicate', which must implement an
> equivalence predicate over all objects that might be used as keys in
> the hash table (or elements of the set, respectively).  For a hash
> table, the ontology must also have a member called 'hash-function',
> which must implement a hash function compatible with the equivalence
> predicate of that ontology.  The ontology could also contain other
> information, such as: a member called 'total-order', which implements
> comparison with some particular interface, that must be compatible
> with the equivalence predicate;

Probably the same interface as <.

> a member called 'monoid-product',
> which implements an associative binary operation (also compatible with
> the equivalence); a member called 'monoid-unit' which is an identity
> element for the monoid product; etc.

What's the benefit of these?  This smells of polymorphism that Scheme
doesn't really need, AFAICT.  Of course, there is no problem making the
whatsit extensible so you can add new keys and values to it.

What will certainly be needed is a type predicate, since Scheme is
dynamically typed.

> A clever implementation of sets could then examine the ontology and
> use a backing hash table if the ontology contains a hash function,
> or a backing binary tree if the ontology contains a total order,
> or fall back to lists if the ontology contains neither of those things.

That works for sets, but I can't see falling back to alists for
hash tables.  Or can I?  I think it's better to get an error than to
have performance go all to hell when you port your application from an
implementation that supports some particular whatsit to one that doesn't.
A drop from O(1) to O(log n) is tolerable, but from O(1) to O(n)?
(This is not an argument against whatsits as such, just on how they
are used.)

> An implementation of hash tables or sets could provide several
> predefined ontologies, appropriate for the equivalence predicates eq?,
> eqv?, equal?, string=?, and string-ci=?.  The predefined ontologies
> need not be transparent, so the hash function corresponding to eq?
> can remain hidden.

This is kind of ugly.  If it's opaque, how can you use it for novel
data structures?  And yet it isn't safe to do so.  This needs more thought.

> The proposal need not require an implementation to provide mechanisms
> for the construction of user-defined ontologies, though it could
> perhaps specify what those mechanisms should look like if present (I'm
> thinking alist->ontology).  If disjointess of types and uniformity
> with user-defined ontologies (if any) are not required, the standard
> ontologies could even be represented by their equivalence predicates,
> reducing to the minimal implementation of the current proposal but
> with clearer nomenclature.
> I assume you mean _weak_ but non-ephemeral hash tables.  Strong hash
> tables are certainly useful.

Yes, certainly.

> > So I've removed `hash-table-map` in favor of `hash-table-update-each!`
> > which is like `hash-table-for-each` except that the value returned is
> > used to update the association.  For the same reason I have also removed
> > `hash-table-remove!` from the draft; what it wants to do should be done
> > with `hash-table-delete-keys!`, which is allowed to be efficient.
>
> OK, that makes sense.

Arrgh.  I meant to say that I had removed `map!` rather than `map`.
`Map` is still with us.

> Neither do I, but I can just see myself defining the helper function
>
> (define (hash-table-map-value eq hash proc merger hash-table)
>   (hash-table-map
>    eq hash
>    (lambda (key value)
>      (values key (proc key value)))
>    (lambda (key1 val1 key2 val2)
>      (values key1 (merger key1 val1 key2 val2)))
>    hash-table))
>
> every time I want to use the hash tables from this proposal.

Hmm.  I see your point.  I'll think on it.

> This response clarifies things a little, but it doesn't actually
> address my main question.  What purpose is served by allowing the
> merger in hash-table-map to return a new, potentially inequivalent
> key?  What if that new key has already been returned by a previous
> call to proc (or a previous call to merger)?  It seems like this will
> require chaining calls to merger to resolve additional collisions.

Quite so.  But without a key merger, it becomes impossible to write
mapping functions that reduce the (effective) domain of the key.
Suppose, for example, that the keys are words, and that what you want
is to create a new hash table that maps the first three letters of each
word into an a-list of keys and values corresponding to that.

You can't just transform the key with substring, or all the words begining
"wat" will overwrite each other in some random order.  With a merger
function, you get notified that "wat" is already in use in the new
hash table, and can recover by acons-ing the new key/value pair onto
the previous value.

For that use case, you only need to return a new value, it's true.  But
in the general case you may want to return a new key altogether; if that
key is in use, the merger function can tail-call itself to cope.

> However, a uniform interface for similar functions is very valuable in
> a higher-order language.  I may not always know at a call site whether
> I have ref, replace, or one of the others, and it would be very
> convenient for them to accept exactly the same parameters with the
> same semantics.

If you don't know if you have an accessor or a mutator, your case is
desperate indeed.

--
So they play that [tune] on                     John Cowan
their fascist banjos, eh?                       cowan@ccil.org
        --Great-Souled Sam                      http://www.ccil.org/~cowan

--
You received this message because you are subscribed to the Google Groups "scheme-reports-wg2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheme-reports-wg2+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.