Mind Manifold
Posts
Why open-source LLMs are the future

Why open-source LLMs are the future

Large Language Models like Mistral, Mixtral and Llama are only the beginning

Self
January 05, 2024

Why open-source LLMs are the future

As of now, OpenAI's models remain the leading Large Language Models (LLMs) in the market. However, their proprietary nature means they can get pricey, especially when scaled up. For those running a chatbot app, be prepared for service providers to add a premium fee. Some key drawbacks of using proprietary models include:

Lack of Privacy
Censorship

I've been exploring open-source models for a while. They're not without flaws, but their progress is incredibly rapid. It seems like every couple of weeks, there's a breakthrough with a new model that pushes the boundaries even further. And you can likely save a lot of money while keeping your privacy when using open models.

Emad agrees that fine-tuning open-source models is the way to go:

Open-source offer enhanced privacy, greater control, and no censorship.

As of this writing, the standout open-source models include the Llama and Mistral series. Stability AI is also making strides in this area.

Take the Mistral 7B, for instance. Its 7 billion parameters make it accessible to a wide audience, and it's impressively outperforming most other models in the 7B and 13B range from the Llama series.

What's exciting is that these models can be fine-tuned with custom datasets (more on this at a later time). This approach not only brings their performance closer to that of proprietary models but also eliminates or at least reduces built-in censorship. Plus, self-hosting these models gives you complete control. They're even user-friendly enough for jailbreaking.

Uncensored model response (source)

Mixtral MOE

The recently released Mixtral model is better than the proprietary GPT3.5 from OpenAI, at least according to these benchmarks:

Source Mistral’s blog

Don’t trust benchmarks

As in all things that matter, only direct experience counts. This is true also for llm benchmarks that tend to be optimized for.

When a measure becomes a target, it ceases to be a good measure

Goodheart’s law

As Karpathy noted, most benchmarks don’t have much practical relevance and I have to agree. Benchmarks need to be taken with a grain of salt and for some use-cases such as multi-turn chat applications I have found open-source models to quite lacking (at least so far). They tend to repeat themselves, get logic or context wrong, and many of the favorites on LocalLLaMA don’t hold up to the hype. This might sound negative but I only see this as a start. Everyone waits for new Mistral and Llama versions to be released and the tech will ride the exponential into the future.

Speaking of the chatbot arena, people seem to like Mixtral a lot, especially compared to proprietary models that are often highly censored. This makes Mixtral one of the best, if not the best open-source model at the time of writing.

Mixtral 8×7B is like a mini GPT4

Why is it named "Mixtral"? This name stems from its architecture, being a "mixture of experts" model. Unlike a singular Large Language Model (LLM) like Mistral 7B, Mixtral incorporates multiple expert models.

It has a trainable gating network to determine which expert, or combination of experts, to use. This gating network is like a router to choose the best one. You wouldn’t go to the car mechanic to fix your teeth.

For example, in the Mixtral 8×7B configuration, the gating network selects 2 out of 8 experts. This structure implies a technical parameter count of 56 billion. However, the effective count stands at 46.7 billion due to the sharing of parameters. Only the FeedForward blocks in the Transformer architecture are multiplied by eight, while all other components remain unchanged.

GPT-4 itself is rumored to be a MoE itself.

Mixture of Experts (MoE). The gating network chooses the best expert

Hugging Face has released a nice blog post on the topic of mixture of experts.

Finetuned Mixtrals

You will find Mixtral, and many other base models, and fine-tuned versions on Hugging Face. TheBloke provides a great service to quantize the models and to upload them to Hugging Face. You will find variants such as GPTQ or AWQ that require much less VRAM to run the models, making it much easier to run.

Cool recent findings

Apple releases an open-source Multimodal LLM (but notice the licence: for research purposes only). This is not only notable as the model seems to be one of the best open multimodal models, training on top of LLaVA and Vicuna (a Llama finetune), but it’s great to see that Apple has entered the open-source realm itself.

Tiny LLama is a very small LM with 1.1B parameters trained on 3 trillion tokens, which is a lot - e.g. the much larger Llama2 models were trained on “only” 2 trillion tokens.

Here is some example notebook on finetuning tinyllama on a google colab:

Google Colaboratory

colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing

LLM course on Github

This repo was trending and seems to be quite exhaustive. It goes over the basics of LLMs to more advanced topics such as quantization and finetuning.

Question for you for self reflection

What are you optimizing for in life?

Socials

Was this newsletter helpful, too long or too short? I’m still in the process of figuring out a good format and might go for shorter newsletters with high value density.

Connect with me if you like to learn more.