Yeah, it doesn't trash, but tries to cache for multiple threads for the same key, basically how you described it is how I thought it works: take the penalty of compiling for the same key in the beginning vs locking, given it stops quite fast - no problem.
Phew, that was a nice investigation and refactoring. We tackled like three issues, imho, my personal best so far
Thanks again!