Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates

Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates

Theregister | 28-08-2024 05:06am |

Faster than you can read? More like blink and you'll miss the hallucination Hot Chips Inference performance in many modern generative AI workloads is usually a function of memory bandwidth rather than compute. The faster you can shuttle bits in and out of a high-bandwidth memory (HBM) the faster the model can generate a response....

Stay Updated with the Latest News!

Don't miss out on breaking stories and in-depth articles.