In the SortedTable classes (See Section 3.2.2), the implementation of one cache which contains the last key with its index has made the CPU usage about one-third as before.
There are many algorithms to be improved. Some of them are described below.
The findChunk operation in the Arena class has significant inefficiencies because of the single and lengthy doubly-linked freeList. If the inventory of on-memory free chunks are stored in somewhere, the findChunk operation would become more efficient.
The determination of Window sizes should be sensitive with the object size but not implemented as such.
The full implementation of the dirty flag in the Window would double the performance. The clean-up should be more intelligent and more sensitive with contexts.