This is currently negative expected value over the lifetime of any hardware you can buy today at a reasonable price, which is basically a monster Mac - or several - until Apple folds and rises the price due to RAM shortages.
The next best thing is to use the leading open source/open weights models for free or for pennies on OpenRouter [1] or Huggingface [2]. An article about the best open weight models, including Qwen and Kimi K2 [3]. [1]: https://openrouter.ai/models [2]: https://huggingface.co [3]: https://simonwillison.net/2025/Jul/30/
This requires hardware in the tens of thousands of dollars (if we want the tokens spit out at a reasonable pace). Maybe in 3-5 years this will work on consumer hardware at speed, but not in the immediate term.
$2000 will get you 30~50 tokens/s on perfectly usable quantization levels (Q4-Q5), taken from any one among the top 5 best open weights MoE models. That's not half bad and will only get better!
If you are running lightweight models like deepseek 32B. But anything more and it’ll drop. Also, costs have risen a lot in the last month for RAM and AI adjacent hardware. It’s definitely not 2k for the rig needed for 50 tokens a second
Related question which might fit here so I'm going to try: What is the absolute cheapest way to get started on AI coding a simple website? I have a couple ideas I want to test out and get out of my head and onto the web but have resisted for years because my webdev knowledge is stuck in 2004 and I've had no desire to change that. These are not complicated things (all static, I think) but... I hate webdev. I am not really willing to pay to do any initial explorations, but if I like where things are going then, sure, I'll pay up. I have a decently powerful machine that can run things locally, but it is Windows (because I'm an EE, sadly), which does matter.