Post

Figuring out what to do with $300 of GCP credits [devlog 1]

I have worked with open source LLMs locally in the past and I must say, without a beefy PC an application like this would have a horrible user experience. For context, I have an integrated Intel Iris for my graphics card and I tried infering a 7B parameter 4bit quantized model, and that would take a couple minutes to generate a response. And that time just shot up with a larger input. I have not changed my PC yet, since it’s serving me well even now. I plan on implementing this locally someday, but I’d like to start by finding out exactly how much compute I’d need to run something like this comfortably as a background task. But even before all that I’ll first be getting the base code running with gemini, not an open source model that I can eventually run locally on a beefy PC. Should be good for prototyping.

I’m picking gemini 1.5 flash for most agents. It seems to be free if I have less than 15 requests/minute, 1 million tokens/minute and 1500 requests/day, all of which would be within my daily use. Initially I thought I’d need to create a VM in GCP and run my LLMs in there and serve those inferences. Not only would that be more expensive (since I’d need to keep that VM running the entire day as that’s the kind of app this is), it wouldn’t even give me a high end model like gemini. So for now I’ll be working with gemini API. The VM idea will be done later to get a model that performs satisfactorily compared to gemini so that I can buy a PC with those specs.

That covers everything I’ve done today, doesn’t feel like a lot but in retrospect it’s pretty alright. As a recovering perfectionist, I’m learning to be more appreciative of the good work I do. I don’t feel like an alien in GCP’s menus anymore and that’s progress ^^

This post is licensed under CC BY 4.0 by the author.

Trending Tags