

I like to write, but have never done so professionally. I disagree that it hurts writers. I think people reacted poorly to AI because of the direct and indirect information campaign Altmann funded to try and make himself a monopoly. AI is just a tool. It is fun to play with in unique areas, but these often require very large models and/or advanced frameworks. In my science fiction universe I must go to extreme lengths to get the model to play along with several aspects like a restructure of politics, economics, and social hierarchy. I use several predictions I imagine about the distant future that plausibly make the present world seem primitive in several ways and with good reasons. This restructuring of society violates both some of our cultural norms in the present and is deep within areas of politics that are blocked by alignment. I tell a story where humans are the potentially volatile monsters to be feared. That is not the plot, but convincing a present model to collaborate on such a story ends up in the gutter a lot. My grammar and thought stream is not great and that is the main thing I use a model to clean up, but it is still collaborative to some extent.
I feel like there is an enormous range of stories to tell and that AI only makes these more accessible. I have gone off on tangents many times exploring parts of my universe because of directions the LLM took. Like I limit the model to generate a sentence at a time and I’m writing half or more of every sentence for the first 10k tokens. Then it picks up on my style so much that I can start the sentence with a word or change one word in a sentence and let it continue with great effect. It is most entertaining to me because it is almost as fast as me telling a story as fast as I can make it up. I don’t see anything remotely bad about that. No one makes a career in the real world by copying someone else’s writing. There are tons of fan works but those do not make anyone real money and they only increase the reach of the original author.
No, I think all the writers and artists hype was all about Altmann’s plan for a monopoly that got derailed when Yann LeCunn covertly leaked the Llama weights after Altmann went against the founding principles of OpenAI and made GPT3 proprietary.
People got all upset about digital tools too back when they first came on the scene; about how they would destroy the artists. Sure it ended the era of hand painted cartoon cell animation, but it created stuff like Pixar.
All of AI is a tool. The only thing to hate is this culture of reductionism where people are given free money in the form of great efficiency gains and they choose to do the same things with less people and cash out the free money instead of using the opportunity to offer more, expand, and do something new. A few people could get a great tool chain together and create a franchise greater, better planned, and more rich than anything corporations have ever done to date. The only thing to hate are these little regressive stupid people without vision, without motivation, and far too conservatively timid to take risks and create the future. We live in an age of cowards worthy of loathing. That is the only problem I see.
Anything under 16 is a no go. Your number of CPU cores are important. Use Oobabooga Textgen for an advanced llama.cpp setup that splits between the CPU and GPU. You'll need at least 64 GB of RAM or be willing to offload layers using the NVME with deepspeed. I can run up to a 72b model with 4 bit quantization in GGUF with a 12700 laptop with a mobile 3080Ti which has 16GB of VRAM (mobile is like that).
I prefer to run a 8×7b mixture of experts model because only 2 of the 8 are ever running at the same time. I am running that in 4 bit quantized GGUF and it takes 56 GB total to load. Once loaded it is about like a 13b model for speed but is ~90% of the capabilities of a 70b. The streaming speed is faster than my fastest reading pace.
A 70b model streams at my slowest tenable reading pace.
Both of these options are exponentially more capable than any of the smaller model sizes even if you screw around with training. Unfortunately, this streaming speed is still pretty slow for most advanced agentic stuff. Maybe if I had 24 to 48gb it would be different, I cannot say. If I was building now, I would be looking at what hardware options have the largest L1 cache, the most cores that include the most advanced AVX instructions. Generally, anything with efficiency cores are removing AVX and because the CPU schedulers in kernels are usually unable to handle this asymmetry consumer junk has poor AVX support. It is quite likely that all the problems Intel has had in recent years has been due to how they tried to block consumer stuff from accessing the advanced P-core instructions that were only blocked in microcode. It requires disabling the e-cores or setting up a CPU set isolation in Linux or BSD distros.
You need good Linux support even if you run windows. Most good and advanced stuff with AI will be done with WSL if you haven’t ditched doz for whatever reason. Use https://linux-hardware.org/ to see support for devices.
The reason I mentioned avoid consumer e-cores is because there have been some articles popping up lately about all p-core hardware.
The main constraint for the CPU is the L2 to L1 cache bus width. Researching this deeply may be beneficial.
Splitting the load between multiple GPUs may be an option too. As of a year ago, the cheapest option for a 16 GB GPU in a machine was a second hand 12th gen Intel laptop with a 3080Ti by a considerable margin when all of it is added up. It is noisy, gets hot, and I hate it many times, wishing I had gotten a server like setup for AI, but I have something and that is what matters.