If you start naively without any library that avoids the problem then memory access is the problem. Have a look at how much effort is needed to avoid the problem, for example with blocking algorithms.
AI training time is at a point in an exponential where more throughput isn't going to advance functionality much at all. The underlying problem, problem solving by training, is computationally ...