RejectedSoftware Forums

Sign up

HashMap in utils

I notice a HashMap class in utils. It looks like it is only used for timers in win32.d. A few questions:

  • What was the reason for rolling your own?
  • Were there problems with the associative array? Performance concerns?
  • Is this intended as a replacement or alternative to built in?

The reason I am asking is I want to work with vibe - but to do that I think I should move to the latest 2.062 of D and get the latest vibe. This is not a problem - but in moving from 2.060 to 2.062 some of my code is breaking due to issues with built in associative arrays. The issue is related to http://forum.dlang.org/thread/ilgtrfwftsgbxtbichhq@forum.dlang.org for which I can not find a workaround. Would you think switching to HashMap from associative array is reasonable or crazy?

Thanks
Dan

Re: HashMap in utils

On Wed, 20 Mar 2013 13:09:48 GMT, Daniel Davidson wrote:

I notice a HashMap class in utils. It looks like it is only used for timers in win32.d. A few questions:

  • What was the reason for rolling your own?
  • Were there problems with the associative array? Performance concerns?
  • Is this intended as a replacement or alternative to built in?

The reason I am asking is I want to work with vibe - but to do that I think I should move to the latest 2.062 of D and get the latest vibe. This is not a problem - but in moving from 2.060 to 2.062 some of my code is breaking due to issues with built in associative arrays. The issue is related to http://forum.dlang.org/thread/ilgtrfwftsgbxtbichhq@forum.dlang.org for which I can not find a workaround. Would you think switching to HashMap from associative array is reasonable or crazy?

Thanks
Dan

I consider the vibe.utils package to be more internal stuff (I actually thought about moving it to vibe.internal.utils) that contains candidates for replacement by later Phobos additions. But considering that the interface is compatible to normal AAs and it will probably not go away any time soon, there is probably no practical reason to avoid using it. In this particular case I'd say HashMap is an alternative to the built-in AAs for cases when memory performance is a concern.

The reason I made it was that inserting elements into that timer map showed up in the profiler quite prominently because of GC allocations. Built-in AAs use a linked list to handle collisions and thus allocate for every element that is inserted. HashMap on the other hand uses a single closed array to store all elements and uses a special key value (e.g. null) to mark an empty slot. This makes it a lot more memory and cache friendly, although deletions and handling many big-objects can have a higher overhead.

But apart from that performance problem, my recent experiences with AA bugs such as yours have also been very bad. I really hope that this whole implementation gets cleaned up properly soon.

Re: HashMap in utils

On 03/20/2013 02:44 PM, Sönke Ludwig wrote:

Built-in AAs use a linked list to handle collisions and thus allocate for every element that is inserted. HashMap on the other hand uses a single closed array to store all elements and uses a special key value (e.g. null) to mark an empty slot. This makes it a lot more memory and cache friendly, although deletions and handling many big-objects can have a higher overhead.

We should collect more statistical data or benchmarks to optimize the
standard implementation. I do agree, that arrays make a saner default
choice, but only for a reasonable load factor.

Re: HashMap in utils

Am 28.03.2013 16:28, schrieb Martin Nowak:

On 03/20/2013 02:44 PM, Sönke Ludwig wrote:

Built-in AAs use a linked list to handle collisions and thus allocate
for every element that is inserted. HashMap on the other hand uses a
single closed array to store all elements and uses a special key value
(e.g. null) to mark an empty slot. This makes it a lot more memory
and cache friendly, although deletions and handling many big-objects
can have a higher overhead.

We should collect more statistical data or benchmarks to optimize the
standard implementation. I do agree, that arrays make a saner default
choice, but only for a reasonable load factor.

Definitely. The main reason here is just to avoid any GC/heap
allocations as long as the number of timer object stays constant. In
other places I use it to keep down the per-item overhead when large
numbers of small objects are stored and then looked up many times -
performing many insertions/deletions during use will usually have a bad
impact on performance due to the way conflicts are resolved and because
of the array reallocations. The fact that it needs a special key value
to mean "empty" also really makes it unsuitable as a standard AA
implementation.