Re: [Scheme-reports] Seeking review of sets and hash tables proposals Alexey Radul (25 May 2013 23:05 UTC)

Re: [Scheme-reports] Seeking review of sets and hash tables proposals Alexey Radul 25 May 2013 23:05 UTC

Oops, forgot one:

8) hash-mod

The proposal does not actually specify the interface of a
(user-supplied) hash function.  Some kinds of hash functions are more
efficient to implement if they know the index range in advance, so
they can use modular arithmetic internally.  Unfortunately, supplying
this information to all hash functions needlessly complicates the
interfaces of ones that do not take advantage of it.  It may also be
the case that specifying that all hash tables have fewer than
max-fixnum buckets may be sufficient (espcially if the underlying
implementation offers an efficient *-mod operation that can be used to
avoid consing intermediate bignums), though that sort of restriction
has its own down sides.

I don't have a concrete suggestion, except to lament the asymmetry
between function definitions (which are allowed to specify that some
parameters are optional) and function call sites (which cannot specify
that some of the arguments can be safely dropped if the callee doesn't
need them).

~Alexey

On Sat, May 25, 2013 at 6:49 PM, Alexey Radul <axch@mit.edu> wrote:
> Having read the proposals and the discussion thread, I have the
> following comments:
>
> 0) +1 for portable hash tables
>
> 1) Functional hash tables?
>
> I am not sure of the value of specifying names for functional update
> procedures that are permitted to take O(n) time.  I fully support
> eventually spelling out some kind of story for some sort of functional
> association structure that offers both reasonably fast lookups and
> reasonably fast updates; but I do not know of any way to get said
> "reasonably fast lookups" to be O(1) in this case.  If hash tables are
> advertised in the preamble as being inherently mutable structures for
> the purposes of this proposal, why bother with a specification for
> very slow functional update functions?
>
> 2) Universal lookup procedure
>
> I am fond of the following universal lookup procedure [1]:
>   (hash-table-search table key
>     (lambda (value update remove)
>       ...)
>     (lambda (insert)
>       ...))
> Here the first continuation argument is called iff the element is
> present and the second iff not.  The "results" are:
> - `value' is the value the element maps to;
> - `update' is a unary procedure that updates the value of the element
>   to a new value;
> - `remove' is a nullary procedure that removes the element; and
> - `insert' is a unary procedure that inserts the element with the
>   given value.
> The whole `search' call returns whatever the continuation it called
> returns.
>
> The advantage of thinking about point access to collections in terms
> of `search' is that all other procedures that operate on one key can
> be specified as simple patterns of invocation of `search'; and, if
> one's compiler is polite enough to inline literal procedures, can even
> be efficiently implemented that way.
>
> Furthermore, if the procedural arguments to `search' do not mutate
> the hash table too badly, the common pattern
>   (if (hash-table-contains? table key)
>       (let ((value (hash-table-ref table key)))
>         (do-something value))
>       (do-something-else))
> can be implemented without having to repeat the actual lookup, because
> the `update', `remove', and `insert' procedures can close over the
> location to operate on.
>
> To the extent this item contains a suggestion, it is to consider
> specifying a `search' procedure for hash tables, sets, and bags, [2]
> [3] and to consider whether using it would simplify the
> implementation.  Specifying a `search' that is suitable for external
> use is a little tricky, because, in order for `search' to be able to
> offer a performance advantage, the procedural arguments to `search'
> have to be carefully restricted.
>
> [1] Taylor Campbell suggested this to me years ago.
>
> [2] Both mutative and functional versions of `search' are possible,
> depending on the behavior of `update', `remove', and `insert'.  In the
> functional case, the programmer would need to arrange to return the
> new hash table from the continuation procedures if they wanted to use
> it later.
>
> [3] set-search could be simpler, because `value' is always #t and
> `update' makes no sense, so those could go away.  For bag-search,
> `update' still makes no sense, but `remove' and `insert' could accept
> an exact non-negative integer to mean "how many times".
>
> 3) Hash functions for sets?
>
> If a set is to be implemented as a hash table, how is an appropriate
> hash function to be derived from the equivalence predicate?  There
> doesn't seem to be any room for the client programmer to supply a hash
> function that would be appropriate for the desired equivalence
> predicate.
>
> 4) Weak vs Ephemeral hash tables
>
> The hash table proposal leaves a hook for requesting "hash tables with
> weak keys" and "hash tables with weak values".  Herein lurks a
> subtlety that will need to be addressed in a final proposal (even a
> final "intermediate" proposal, because it would be appropriate to
> specify what, exactly, the hooks are being left for).
>
> The subtlety is this: What should happen if the only strong-reference
> path to a key in a key-weak hash table goes through that key's value
> in that hash table, and the value is not accessible any other way [4]?
> If the programmer's intent is that the hash table's values can be
> accessed by traversing the hash table, then the entry should be kept.
> If the programmer's intent is that the only way to use the hash table
> to get one of the values is by looking up the key, then the entry
> should be removed and the key and value garbage collected.  The latter
> semantics is considerably more difficult to implement (see
> "ephemerons"), and is probably inevitably less efficient.
>
> There are seven possible distinct weakness behaviors for associations,
> [5] and I am under the impression that they are all useful, albeit
> some in more abstruse situations than others (my apologies for the
> nomenclature):
> a strong (entries hold keys and values strongly)
> b key-weak (entries hold keys weakly)
> c value-weak (entries hold values weakly)
> d key-weak-and-value-weak (entries hold keys and values weakly)
> e key-and-value-weak (an entry is dropped if both the key and the
>   value are inaccessible through strong pointers, but not otherwise)
> f key-ephemeral (an entry is dropped if the key cannot be
>   accessed except through the entry)
> g value-ephemeral (an entry is dropped if the value cannot be
>   accessed except through the entry)
>
> I suggest adjusting the proposal to make clear which of these
> semantics a user is permitted to request, and how; or to make clear
> that this is not going to be made clear.  The reason I suggest this is
> that the phrase "key weak" could mean either (b) or (f) to different
> users and implementers in the absence of a more thorough discussion.
>
> For what it's worth, MIT Scheme contains an undocumented
> implementation of hash tables with all of these weakness semantics.
>
> [4] This actually happened to me when I tried to use MIT Scheme's weak
> hash tables to memoize a function whose return value was a data
> structure that contained its argument.  The same problem arises if the
> path goes through the value of some other key that can only be
> accessed through the value of that key, or if there are longer cycles.
>
> [5] Assuming that entries whose keys or values have been reclaimed are
> themselves removed from the hash table, but I cannot image why anyone
> would want any other behavior.
>
> 5) In-place key change?
>
> It seems as though the in-place mutation of hash-table-map! would be
> considerably more useful (for efficiency) if the proc were not allowed
> to change the keys to inequivalent ones.  In fact, since an
> implementation of hash-table-map! would need to prevent keys returned
> by proc from being fed back into proc, it couldn't do an in-place loop
> at all, and would effectively need to build a new hash table and then
> mutate its argument to become it.  Is this behavior useful for
> anything?  Even if it is, should a variant be specified that requires
> proc not to change the key (and that can therefore actually do an
> in-place traversal)?
>
> It also seems that leaving the key alone would be a common use case
> for hash-table-map; is it worth specifying a variant that accepts
> this?  Or specifying that the procedure given to hash-table-map may
> return either one value or two, with the second being assumed to be
> the key if absent?
>
> 6) Mergers changing keys
>
> I am surprised by the proposal to allow the merger argument to the
> various hash table manipulation procedures to return an arbitrary new
> key as a result of a collision.  What purpose does this serve?  Its
> cost strikes me as high: what if the merger returns a key that is not
> equivalent to either of its input keys?  What if that key conflicts
> with a different key already present in the result being built?  This
> is also inconsistent with the mergers specified for hash-table-union!
> and hash-table-intersection!.
>
> 7) Small nits
>
> Why is the failure argument to hash-table-replace! mandatory?  If not
> a typo, that seems like a useless irregularity.
>
> hash-table-push! should probably say "may be implemented more
> efficiently than" to be consistent
>
> Presumably all the hash table traversal operations should forbid any
> procedural arguments from mutating the hash table underfoot (with the
> possible exception of changing the binding of the key being
> inspected).
>
> Clarification in "hash tables as functions": presumably
> hash-table-mutator should accept a key and a value, rather than just a
> key; and presumably hash-table-replacer should return the result of
> the underlying hash-table-replace! rather than nothing.
>
> MIT Scheme supports hash tables with arbitrary equivalence predicates:
> see strong-hash-table/constructor and weak-hash-table/constructor.
>
> Best,
> ~Alexey
>
> On Mon, May 20, 2013 at 1:29 AM, John Cowan <cowan@mercury.ccil.org> wrote:
>> I'm writing this as an individual contributor to WG2 and the Scheme
>> community, not as the chair of WG2.
>>
>> I would like to ask people to review and send criticisms of two
>> proto-SRFIs that I intend to propose for R7RS-large.  These will both
>> become SRFIs (the first is submitted already, the second will be soon)
>> and then will be voted on by the WG after SRFI finalization.  The more
>> early commentary from the community the better.
>>
>> The first is on sets, bags, integer sets, and enumeration
>> sets.  The current editor's draft is in SRFI format at
>> <http://ccil.org/~cowan/temp/srfi-sets.html>.  Send feedback to either
>> list or directly to me.
>>
>> The second is on intermediate hash tables -- "intermediate"
>> because it does not yet give a serious account of weak hash
>> tables, a WG2 goal.  The current editor's draft is on the wiki at
>> <http://trac.sacrideo.us/wg/wiki/HashTablesCowan>.
>>
>> Send feedback to either list or directly to me.
>>
>> --
>> John Cowan
>>         cowan@ccil.org
>>                 I am a member of a civilization. --David Brin
>>
>> _______________________________________________
>> Scheme-reports mailing list
>> Scheme-reports@scheme-reports.org
>> http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports

_______________________________________________
Scheme-reports mailing list
Scheme-reports@scheme-reports.org
http://lists.scheme-reports.org/cgi-bin/mailman/listinfo/scheme-reports