Am 22.07.2015 um 18:46 schrieb Etienne Cimon:

On Wed, 22 Jul 2015 17:30:19 +0200, Sönke Ludwig wrote:

single multi-threaded one because of the garbage collector. The GC is
currently very inefficient in a multi-threaded scenario, so it's very
important to minimize it's use for high-performance multi-threaded
applications (use -version=VibeManualMemoryManagement for vibe.d in that

The GC will lock (for all threads) during allocation and during collection, this happens also for manual memory management as well (but the collection is spared). Even the single-threaded scenario will also flush cpu L1 cache at every allocation because of this synchronization, costing probably >100 cycles + contention :/

The only real solution is to use an experimental TLS GC that I added to a custom druntime/phobos here:

https://github.com/etcimon/druntime/tree/2.068-custom
https://github.com/etcimon/phobos/tree/2.068-custom

I wonder what the benchmark numbers would be in comparison! I use it because I don't construct an object in a thread and let it get collected in another, so I never got any problems with it with my vibe.d fork.

Yes, The difference is just that the manual allocation is nothing more
than taking a pointer from/to a free list and the collection simply
doesn't happen, so in contrast to the GC situation there will
realistically never be contention. But why do you thing that the
synchronization in LockAllocator generally flushes the cache? If there
is no contention, it should basically boil down to a single CAS if the
implementation isn't totally dumb. That's of course still more expensive
than it should be (with thread-local allocation), but I simply don't
think we are there yet in terms of the type system when it comes to a
thread local GC.

For that to work reliably, using shared and immutable correctly at
the point of allocation is vital, but there are a lot of places where
those attributes are (and currently have to be) added/removed after
allocation using a cast. It also requires assistance of the Isolated!T
type to tell the GC to move memory between different thread heaps when
the isolated reference is passed between threads. That, or isolated
values would live on their own or on the shared heap until they are made
thread-local mutable or immutable.