[scheme-reports-wg2] Seeking review of sets and hash tables proposals
John Cowan
(20 May 2013 05:29 UTC)
|
Re: [Scheme-reports] Seeking review of sets and hash tables proposals
Alaric Snell-Pym
(24 May 2013 14:00 UTC)
|
Re: [Scheme-reports] Seeking review of sets and hash tables proposals
Per Bothner
(25 May 2013 00:59 UTC)
|
[scheme-reports-wg2] Re: [Scheme-reports] Seeking review of sets and hash tables proposals
Kevin Wortman
(25 May 2013 20:27 UTC)
|
Re: [Scheme-reports] Seeking review of sets and hash tables proposals
Evan Hanson
(25 May 2013 06:50 UTC)
|
Re: [Scheme-reports] Seeking review of sets and hash tables proposals
Per Bothner
(25 May 2013 18:48 UTC)
|
Re: [Scheme-reports] Seeking review of sets and hash tables proposals
Per Bothner
(25 May 2013 19:38 UTC)
|
Re: [Scheme-reports] Seeking review of sets and hash tables proposals Alexey Radul (25 May 2013 22:50 UTC)
|
Having read the proposals and the discussion thread, I have the following comments: 0) +1 for portable hash tables 1) Functional hash tables? I am not sure of the value of specifying names for functional update procedures that are permitted to take O(n) time. I fully support eventually spelling out some kind of story for some sort of functional association structure that offers both reasonably fast lookups and reasonably fast updates; but I do not know of any way to get said "reasonably fast lookups" to be O(1) in this case. If hash tables are advertised in the preamble as being inherently mutable structures for the purposes of this proposal, why bother with a specification for very slow functional update functions? 2) Universal lookup procedure I am fond of the following universal lookup procedure [1]: (hash-table-search table key (lambda (value update remove) ...) (lambda (insert) ...)) Here the first continuation argument is called iff the element is present and the second iff not. The "results" are: - `value' is the value the element maps to; - `update' is a unary procedure that updates the value of the element to a new value; - `remove' is a nullary procedure that removes the element; and - `insert' is a unary procedure that inserts the element with the given value. The whole `search' call returns whatever the continuation it called returns. The advantage of thinking about point access to collections in terms of `search' is that all other procedures that operate on one key can be specified as simple patterns of invocation of `search'; and, if one's compiler is polite enough to inline literal procedures, can even be efficiently implemented that way. Furthermore, if the procedural arguments to `search' do not mutate the hash table too badly, the common pattern (if (hash-table-contains? table key) (let ((value (hash-table-ref table key))) (do-something value)) (do-something-else)) can be implemented without having to repeat the actual lookup, because the `update', `remove', and `insert' procedures can close over the location to operate on. To the extent this item contains a suggestion, it is to consider specifying a `search' procedure for hash tables, sets, and bags, [2] [3] and to consider whether using it would simplify the implementation. Specifying a `search' that is suitable for external use is a little tricky, because, in order for `search' to be able to offer a performance advantage, the procedural arguments to `search' have to be carefully restricted. [1] Taylor Campbell suggested this to me years ago. [2] Both mutative and functional versions of `search' are possible, depending on the behavior of `update', `remove', and `insert'. In the functional case, the programmer would need to arrange to return the new hash table from the continuation procedures if they wanted to use it later. [3] set-search could be simpler, because `value' is always #t and `update' makes no sense, so those could go away. For bag-search, `update' still makes no sense, but `remove' and `insert' could accept an exact non-negative integer to mean "how many times". 3) Hash functions for sets? If a set is to be implemented as a hash table, how is an appropriate hash function to be derived from the equivalence predicate? There doesn't seem to be any room for the client programmer to supply a hash function that would be appropriate for the desired equivalence predicate. 4) Weak vs Ephemeral hash tables The hash table proposal leaves a hook for requesting "hash tables with weak keys" and "hash tables with weak values". Herein lurks a subtlety that will need to be addressed in a final proposal (even a final "intermediate" proposal, because it would be appropriate to specify what, exactly, the hooks are being left for). The subtlety is this: What should happen if the only strong-reference path to a key in a key-weak hash table goes through that key's value in that hash table, and the value is not accessible any other way [4]? If the programmer's intent is that the hash table's values can be accessed by traversing the hash table, then the entry should be kept. If the programmer's intent is that the only way to use the hash table to get one of the values is by looking up the key, then the entry should be removed and the key and value garbage collected. The latter semantics is considerably more difficult to implement (see "ephemerons"), and is probably inevitably less efficient. There are seven possible distinct weakness behaviors for associations, [5] and I am under the impression that they are all useful, albeit some in more abstruse situations than others (my apologies for the nomenclature): a strong (entries hold keys and values strongly) b key-weak (entries hold keys weakly) c value-weak (entries hold values weakly) d key-weak-and-value-weak (entries hold keys and values weakly) e key-and-value-weak (an entry is dropped if both the key and the value are inaccessible through strong pointers, but not otherwise) f key-ephemeral (an entry is dropped if the key cannot be accessed except through the entry) g value-ephemeral (an entry is dropped if the value cannot be accessed except through the entry) I suggest adjusting the proposal to make clear which of these semantics a user is permitted to request, and how; or to make clear that this is not going to be made clear. The reason I suggest this is that the phrase "key weak" could mean either (b) or (f) to different users and implementers in the absence of a more thorough discussion. For what it's worth, MIT Scheme contains an undocumented implementation of hash tables with all of these weakness semantics. [4] This actually happened to me when I tried to use MIT Scheme's weak hash tables to memoize a function whose return value was a data structure that contained its argument. The same problem arises if the path goes through the value of some other key that can only be accessed through the value of that key, or if there are longer cycles. [5] Assuming that entries whose keys or values have been reclaimed are themselves removed from the hash table, but I cannot image why anyone would want any other behavior. 5) In-place key change? It seems as though the in-place mutation of hash-table-map! would be considerably more useful (for efficiency) if the proc were not allowed to change the keys to inequivalent ones. In fact, since an implementation of hash-table-map! would need to prevent keys returned by proc from being fed back into proc, it couldn't do an in-place loop at all, and would effectively need to build a new hash table and then mutate its argument to become it. Is this behavior useful for anything? Even if it is, should a variant be specified that requires proc not to change the key (and that can therefore actually do an in-place traversal)? It also seems that leaving the key alone would be a common use case for hash-table-map; is it worth specifying a variant that accepts this? Or specifying that the procedure given to hash-table-map may return either one value or two, with the second being assumed to be the key if absent? 6) Mergers changing keys I am surprised by the proposal to allow the merger argument to the various hash table manipulation procedures to return an arbitrary new key as a result of a collision. What purpose does this serve? Its cost strikes me as high: what if the merger returns a key that is not equivalent to either of its input keys? What if that key conflicts with a different key already present in the result being built? This is also inconsistent with the mergers specified for hash-table-union! and hash-table-intersection!. 7) Small nits Why is the failure argument to hash-table-replace! mandatory? If not a typo, that seems like a useless irregularity. hash-table-push! should probably say "may be implemented more efficiently than" to be consistent Presumably all the hash table traversal operations should forbid any procedural arguments from mutating the hash table underfoot (with the possible exception of changing the binding of the key being inspected). Clarification in "hash tables as functions": presumably hash-table-mutator should accept a key and a value, rather than just a key; and presumably hash-table-replacer should return the result of the underlying hash-table-replace! rather than nothing. MIT Scheme supports hash tables with arbitrary equivalence predicates: see strong-hash-table/constructor and weak-hash-table/constructor. Best, ~Alexey On Mon, May 20, 2013 at 1:29 AM, John Cowan <cowan@mercury.ccil.org> wrote: > I'm writing this as an individual contributor to WG2 and the Scheme > community, not as the chair of WG2. > > I would like to ask people to review and send criticisms of two > proto-SRFIs that I intend to propose for R7RS-large. These will both > become SRFIs (the first is submitted already, the second will be soon) > and then will be voted on by the WG after SRFI finalization. The more > early commentary from the community the better. > > The first is on sets, bags, integer sets, and enumeration > sets. The current editor's draft is in SRFI format at > <http://ccil.org/~cowan/temp/srfi-sets.html>. Send feedback to either > list or directly to me. > > The second is on intermediate hash tables -- "intermediate" > because it does not yet give a serious account of weak hash > tables, a WG2 goal. The current editor's draft is on the wiki at > <http://trac.sacrideo.us/wg/wiki/HashTablesCowan>. > > Send feedback to either list or directly to me. > > -- > John Cowan > cowan@ccil.org > I am a member of a civilization. --David Brin > > _______________________________________________ > Scheme-reports mailing list > Scheme-reports@scheme-reports.org > http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports _______________________________________________ Scheme-reports mailing list Scheme-reports@scheme-reports.org http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports