anywhichway
anywhichway3mo ago

inference speed on Groq using Llama 3 70B through Buildship

The inference speed on Groq using Llama 3 70B through Buildship seems to vary dramatically. At times I get 9 and 10 second delays that I do not get when interacting with Groq directly. It has the feel of cold starting somewhere, but I am on the Pro plan where pricing page says no cold starts. This may be true of other inference engines, I do not know.
0 Replies
No replies yetBe the first to reply to this messageJoin