I picked up a pair of old Tesla P40s. Right now I’m running a Q4 quant of Qwen 2.5 72B that fits in the combined 48GB of VRAM with 12k context. They aren’t as fast as newer consumer cards, but it generates as fast as I can read while costing less than a used 3080.
I have a dell power edge 730, which was about $200. It’s CPU shrouds perfectly match the GPU intakes so air just flows through both from the server fans. I’ve seen a few 3d printable fan mounts for jury rigging them into a regular tower too.