[scheme-reports-wg2] Re: [Scheme-reports] Seeking review of sets and hash tables proposals
John Cowan
(26 May 2013 16:04 UTC)
|
Re: [Scheme-reports] Seeking review of sets and hash tables proposals Alexey Radul (27 May 2013 01:08 UTC)
|
[scheme-reports-wg2] Re: [Scheme-reports] Seeking review of sets and hash tables proposals
John Cowan
(28 May 2013 01:51 UTC)
|
On Sun, May 26, 2013 at 12:04 PM, John Cowan <cowan@mercury.ccil.org> wrote: > Alexey Radul scripsit: >> I am fond of the following universal lookup procedure [1]: >> (hash-table-search table key >> (lambda (value update remove) >> ...) >> (lambda (insert) >> ...)) > > Nice one. Added. :) >> If a set is to be implemented as a hash table, how is an appropriate >> hash function to be derived from the equivalence predicate? > > This is precisely why we need a way to bind them together. Currently, > however, going beyond the Five Equivalences for sets/bags is an option > implementations need not support. Why not just make it a data structure? Call them ontologies, or signatures (after the concept from universal algebra), and say that a constructor for a hash table (or a set) accepts an ontology over the objects that it is to contain. What's an ontology? Well, for a hash table or a set, an ontology must have a member called 'equivalence-predicate', which must implement an equivalence predicate over all objects that might be used as keys in the hash table (or elements of the set, respectively). For a hash table, the ontology must also have a member called 'hash-function', which must implement a hash function compatible with the equivalence predicate of that ontology. The ontology could also contain other information, such as: a member called 'total-order', which implements comparison with some particular interface, that must be compatible with the equivalence predicate; a member called 'monoid-product', which implements an associative binary operation (also compatible with the equivalence); a member called 'monoid-unit' which is an identity element for the monoid product; etc. A clever implementation of sets could then examine the ontology and use a backing hash table if the ontology contains a hash function, or a backing binary tree if the ontology contains a total order, or fall back to lists if the ontology contains neither of those things. An implementation of hash tables or sets could provide several predefined ontologies, appropriate for the equivalence predicates eq?, eqv?, equal?, string=?, and string-ci=?. The predefined ontologies need not be transparent, so the hash function corresponding to eq? can remain hidden. The proposal need not require an implementation to provide mechanisms for the construction of user-defined ontologies, though it could perhaps specify what those mechanisms should look like if present (I'm thinking alist->ontology). If disjointess of types and uniformity with user-defined ontologies (if any) are not required, the standard ontologies could even be represented by their equivalence predicates, reducing to the minimal implementation of the current proposal but with clearer nomenclature. Credit where credit is due: I acquired this idea from Taylor Campbell. >> If the programmer's intent is that the hash table's values can be >> accessed by traversing the hash table, then the entry should be kept. >> If the programmer's intent is that the only way to use the hash table >> to get one of the values is by looking up the key, then the entry >> should be removed and the key and value garbage collected. > > I don't understand the first kind of intention: can you give a concrete > example? The way it looks to me, non-ephemeral hash tables are just > (possibly tolerably) buggy versions of ephemeral ones. I assume you mean _weak_ but non-ephemeral hash tables. Strong hash tables are certainly useful. I don't actually have a concrete example in mind of a situation where the right data structure was a hash table whose keys would be held by weak pointers but whose entries would not be removed if the key were accessible only through the value. Bruno Haible published a nice article about this: http://www.haible.de/bruno/papers/cs/weak/WeakDatastructures-writeup.html I have two takeaways from that: 1) the word "weak" by itself is not specific enough, because people can interpret it as either "ephemeral" or "weak non-emphemeal"/ 2) "weak non-ephemeral" may not ever be the right semantics in practice, but if the programmer knows that the values will never contain the keys, it may be acceptable; and it is easier and (at least somewhat) more efficient to implement. Whether that constitutes an argument for standardizing it is a separate question. Consequently I suggest clarifying what collection of weakness semantics the proposal specifies mechanisms to ask for. >> It seems as though the in-place mutation of hash-table-map! would be >> considerably more useful (for efficiency) if the proc were not allowed >> to change the keys to inequivalent ones. > > The original idea, which is not well explained, is that the association > being processed by map! is conceptually removed and the new association > inserted in its place. But you're right that the user really shouldn't > be allowed to mutate the keys a hash table contains while scanning it. > > So I've removed `hash-table-map` in favor of `hash-table-update-each!` > which is like `hash-table-for-each` except that the value returned is > used to update the association. For the same reason I have also removed > `hash-table-remove!` from the draft; what it wants to do should be done > with `hash-table-delete-keys!`, which is allowed to be efficient. OK, that makes sense. >> It also seems that leaving the key alone would be a common use case >> for hash-table-map; is it worth specifying a variant that accepts >> this? Or specifying that the procedure given to hash-table-map may >> return either one value or two, with the second being assumed to be >> the key if absent? > > I don't see that knowing this allows you to optimize anything. Neither do I, but I can just see myself defining the helper function (define (hash-table-map-value eq hash proc merger hash-table) (hash-table-map eq hash (lambda (key value) (values key (proc key value))) (lambda (key1 val1 key2 val2) (values key1 (merger key1 val1 key2 val2))) hash-table)) every time I want to use the hash tables from this proposal. >> I am surprised by the proposal to allow the merger argument to the >> various hash table manipulation procedures to return an arbitrary >> new key as a result of a collision. What purpose does this serve? >> Its cost strikes me as high: what if the merger returns a key that is >> not equivalent to either of its input keys? What if that key conflicts >> with a different key already present in the result being built? This is >> also inconsistent with the mergers specified for hash-table-union! >> and hash-table-intersection!. > > Those are pure value mergers, but they now take the keys as arguments > as well, since they may provide useful context. This response clarifies things a little, but it doesn't actually address my main question. What purpose is served by allowing the merger in hash-table-map to return a new, potentially inequivalent key? What if that new key has already been returned by a previous call to proc (or a previous call to merger)? It seems like this will require chaining calls to merger to resolve additional collisions. What new liberty does that hair give? >> Why is the failure argument to hash-table-replace! mandatory? If not >> a typo, that seems like a useless irregularity. > > Because the default failure continuation is to signal an error, so > `(hash-table-replace hash-table key)` would be the same as > `(hash-table-ref hash-table key)`, which doesn't mutate anything. However, a uniform interface for similar functions is very valuable in a higher-order language. I may not always know at a call site whether I have ref, replace, or one of the others, and it would be very convenient for them to accept exactly the same parameters with the same semantics. ~Alexey _______________________________________________ Scheme-reports mailing list Scheme-reports@scheme-reports.org http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports