So what is it?
According to your statements, latency sensitive part of code is problem, not hardware. But to solve those issues in tiled architectures, to be precise multi chiplet architectures you need to use advanced packaging. Which again, is not used according to your statements.
Well, no...