Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

Theregister | 24-08-2024 10:10am |

For 100 concurrent users, the card delivered 12.88 tokens per second—just slightly faster than average human reading speed If you want to scale a large language model (LLM) to a few thousand users, you might think a beefy enterprise GPU is a hard requirement. However, at least according to Backprop, all you actually need is a four-year-old graphics card....

Kaduna Police Nab Cattle Rustlers, Notor... Top
Ex-Sokoto Gov, Tambuwal Loses Brother Top
FG appoints Dangote, Otedola, Elumelu to... Top
Forex outflow hits $1bn with loan repaym... Top
Omosehin: Why Insurers Apply Caution In ... Top
With Arrest of Durov Pavel, Encryption a... Top

Stay Updated with the Latest News!

Don't miss out on breaking stories and in-depth articles.

See the latest News