Un Ministral, Des Ministraux

216 points1 yearmistral.ai

ed • 1 year ago

3b is is API-only so you won’t be able to run it on-device, which is the killer app for these smaller edge models.

I’m not opposed to licensing but “email us for a license” is a bad sign for indie developers, in my experience.

8b weights are here https://huggingface.co/mistralai/Ministral-8B-Instruct-2410

Commercial entities aren’t permitted to use or distribute 8b weights - from the agreement (which states research purposes only):

"Research Purposes": means any use of a Mistral Model, Derivative, or Output that is solely for (a) personal, scientific or academic research, and (b) for non-profit and non-commercial purposes, and not directly or indirectly connected to any commercial activities or business operations. For illustration purposes, Research Purposes does not include (1) any usage of the Mistral Model, Derivative or Output by individuals or contractors employed in or engaged by companies in the context of (a) their daily tasks, or (b) any activity (including but not limited to any testing or proof-of-concept) that is intended to generate revenue, nor (2) any Distribution by a commercial entity of the Mistral Model, Derivative or Output whether in return for payment or free of charge, in any medium or form, including but not limited to through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or behind a software layer.

diggan • 1 year ago

> I’m not opposed to licensing but “email us for a license” is a bad sign for indie developers, in my experience.

At least they're not claiming it's Open Source / Open Weights, kind of happy about that, as other companies didn't get the memo that lying/misleading about stuff like that is bad.

talldayo • 1 year ago

Yeah, a real silver-lining on the API-only access for a model that is intentionally designed for edge devices. As a user I honestly only care about the weights being open - I'm not going to reimpliment their training code and I don't need or want redistributed training data that both already exists elsewhere. There is no benefit, for my uses, to having an "open source" model when I could have weights and finetunes instead.

There's nothing to be happy about when businesses try to wall-off a feature to make you salivate over it more. You're within your right to nitpick licensing differences, but unless everyone gets government-subsidized H100s in their garage I don't think the code will be of use to anyone except moneyed competitors that want to undermine foundational work.

tarruda • 1 year ago

Isn't 3b the kind of size you'd expect to be able to run on the edge? What is the point of using 3b via API when you can use larger and more capable models?

littlestymaar • 1 year ago

GP misunderstood: 3b will be available for running on edge devices, but you must sign a deal with Mistral to get access to the weights to run.

I don't think that can work without a significant lobbying push towards models running on the edge but who knows (especially since they have a former French Minister in the founding team).

ed • 1 year ago

> GP misunderstood

I don’t think it’s fair to claim the weights are available if you need to hammer out a custom agreement with mistral’s sales team first.

If they had a self-serve process, or some sort of shink-wrapped deal up to say 500k users, that would be great. But bespoke contracts are rarely cheap or easy to get. This comes from my experience building a bunch of custom infra for Flux1-dev, only to find I wasn’t big enough for a custom agreement, because, duh, the service doesn’t exist yet. Mistral is not BFL, but sales teams don’t like speculating on usage numbers for a product that hasn’t been released yet. Which is a bummer considering most innovation happens at a small scale initially.

littlestymaar • 1 year ago

I'm not defending Mistral here, I don't think it's a good idea I just wanted to to not out that there is no paradox as if the 3b model was API-only.

mark_l_watson • 1 year ago

You are correct, convenience for trying many new models is important. For me, this means being able to run with Ollama.

DreamGen • 1 year ago

From what I have heard, getting license from them is also far from guaranteed. They are selective about who they want to do business with -- understandable, but something to keep in mind.

wg0 • 1 year ago

Genuine question - if I have a model which I only release weights with restrictions on commercial usage and then someone deploys that model and operates it commercially - what are the way to identify that it is my model that's doing the online per token slavery over HTTP endpoint?

dest • 1 year ago

There are ways to watermark the output, by slightly altering the choice of tokens in a recognizable pattern.

wg0 • 1 year ago

Within the model? Like as part of training afterwards some fine tuning?

csomar • 1 year ago

Thanks, I was confused for a bit. The 3b comparison with llma3.2 is useless. If I can't run it on my laptop, it's no longer comparable to open models.

moralestapia • 1 year ago

Lol, the whole point of Edge models is to be able to run them locally.

cjtrowbridge • 1 year ago

They released it on huggingface.

aabhay • 1 year ago

This press release is a big change in branding and ethos for Mistral. What was originally a vibey, insurgent contender that put out magnet links is now a PR-crafting team that has to fight to pitch their utility to the public.

littlestymaar • 1 year ago

I was going to say the same. Incredible to see how quickly Mistral went from “magnet links casually dropped on twitter by their CTO” to “PR blog post without the model weights” in just a year.

Not a good sign at all as it means their investors are already getting nervous.

swyx • 1 year ago

just want to point out that this isnt entirely true. pixtral was magnet link dropped recently. mistral simply has two model rollout channels depending on the level of openness they choose. dont extrapolate too much due to vc hate.

Whiteshadow12 • 1 year ago

Nice voice of reason swyx, people who are not hooked on X, will have selective memory, "Mistral has changed, I miss the old Mistral".

Last year Mistral watched as every provider host their models with little to no value capture.

Nemo is Apache 2.0 license, they could have easily made that a Mistral Research License model.

It's hard to pitch vc for more money to build more models when you don't capture anything making it Apache 2.0.

Not everyone can be Meta.

Magnet links are cute but honestly, most people rather use HF to get their models.

wg0 • 1 year ago

That's usually the evidence of VCs getting involved. Somber corporate tone proud on accomplishments user will find useful we continue to improve looking to the future and such.

csomar • 1 year ago

This might suggest that they are plateauing. If you think your next model won't improve a lot, then you'll try to start earning from this current model. Luckily, we still have meta. llma3.2 is really good. and it runs on my laptop with a regular intel CPU.

xnx • 1 year ago

Has anyone put together a good and regularly updated decision tree for what model to use in different circumstances (VRAM limitations, relative strengths, licensing, etc.)? Given the enormous zoo of models in circulation, there must be certain models that are totally obsolete.

leetharris • 1 year ago

People keep making these, but they become outdated so fast and nobody keeps up with it. If your definition of "great" changes in 6 months because a new model shatters your perception of "great," it's hard to rescore legacy models.

I'd say keeping up with the reddit LocalLLama community is the "easiest" way and it's by no means easy.

kergonath • 1 year ago

> I'd say keeping up with the reddit LocalLLama community is the "easiest" way and it's by no means easy.

The subreddit is… not great. It’s a decent way of keeping up, but don’t read the posts too much (and even then, there is a heavy social aspect, and the models that are discussed there are a very specific subset of what’s available). There is a lot of groupthink, the discussions are never rigorous. Most of the posts are along the lines of “I tested a benchmark and it is 0.5 points ahead of Llama-whatever on that one benchmark I made up, therefore it’s the dog’s and everything else is shite”. The Zuckerberg worshiping is also disconcerting. Returns diminish quickly as you spend more time on that subreddit.

potatoman22 • 1 year ago

Someone should use an LLM to continuously maintain this decision tree. The tree itself will decide which LLM is used for maintainence.

mark_l_watson • 1 year ago

I tend to choose a recent model available for Ollama, and usually stick with a general purpose local model for a month or so, then re-evaluate. Exceptions to sticking to one local model at a time might be needing a larger context size.

iamjackg • 1 year ago

This is definitely a problem. I mostly take a look at the various leaderboards, but there is a proliferation of fine-tuned models that makes it incredibly daunting to explore the model space. Add to that that often they're not immediately available on turn-key tools like ollama, and the friction increases even more. All this without even considering things like licenses, what kind of data has been used for fine tuning, quantization, merges, multimodal capabilities.

I would love a curated list.

tarruda • 1 year ago

They didn't add a comparison to Qwen 2.5 3b, which seems to surpass Ministral 3b MMLU, HumanEval, GSM8K: https://qwen2.org/qwen2-5/#qwen25-05b15b3b-performance

These benchmarks don't really matter that much, but it is funny how this blog post conveniently forgot to compare with a model that already exists and performs better.

DreamGen • 1 year ago

Also, the 3B model, which is API only (so the only thing that matters is price, quality and speed) should be compared to something like Gemini Flash 1.5 8B which is cheaper than this 3B API and also has higher benchmark performance, super long context support, etc.

butterfly42069 • 1 year ago

At this point the benchmarks barely matter at all. It's entirely possible to train for a high benchmark score and reduce the overall quality of the model in the process.

Imo use the model that makes the most sense when you ask it stuff, and personally I'd go for the one with the least censorship (which imo isn't AliBaba Qwen anything)

cmehdy • 1 year ago

For anybody wondering about the title, that's a sort-of pun in French about how words get pluralized following French rules.

The quintessential example is "cheval" (horse) which becomes "chevaux" (horses), which is the rule they're following (or being cute about). Un mistral, des mistraux. Un ministral, des ministraux.

(Ironically the plural of the Mistral wind in the Larousse dictionnary would technically be Mistrals[1][2], however weird that sounds to my french ears and to the people who wrote that article perhaps!)

[1] https://www.larousse.fr/dictionnaires/francais/mistral_mistr... [2] https://fr.wiktionary.org/wiki/mistral

BafS • 1 year ago

It's complex because french is full of exceptions

the classical way to pluralize "–al" words:

  un animal → des animaux [en: animal(s)]
  un journal → des journaux [en: journal(s)]

with some exceptions:

  un carnaval → des carnavals [en: carnival(s)]
  un festival → des festivals [en: festival(s)]
  un idéal → des idéals (OR des idéaux) [en: ideal(s)]
  un val → des vals (OR des vaux) [en: valley(s)]

There is no logic there (as many things in french), it's up to Mistral to choose how the plural can be

EDIT: Format + better examples

maw • 1 year ago

But are these truly exceptions? Or are they the result of subtler rules French learners are rarely taught explicitly?

I don't know what the precise rules or patterns actually might be. But one fact that jumped out at me is that -mal and -nal start with nasal consonants and three of the "exceptions" end in -val.

cwizou • 1 year ago

No, like parent says, with many things in French, grammar and what we call "orthographe" is based on usage. And what's accepted tends to change over time. What's taught in school varies over the years too, with a large tendency to move to simplification. A good example is the french word for "key" which used to be written "clef" but over time moved to "clé" (closer to how it sounds phonetically). About every 20/30 years, we get some "réformes" on the topic, which are more or less followed, there's some good information here (the 1990 one is interesting on its own) : https://en.wikipedia.org/wiki/Reforms_of_French_orthography

Back to this precise one, there's no precise rule or pattern underneath, no rhyme or reason, it's just exceptions based on usage and even those can have their own exceptions. Like "idéals/idéaux", I (french) personally never even heard that "idéals" was a thing. Yet it is, somehow : https://www.larousse.fr/dictionnaires/francais/idéal/41391

speed_spread • 1 year ago

epolanski • 1 year ago

If it is like Italian, my native language, it's just exceptions you learn by usage.

makapuf • 1 year ago

I've never heard of such a rule (am native), and your reasoning is fine but there are many common examples : cheval (horse), rival, estival (adjective, "in the summer "), travail (work, same rules for -ail words)...

Muromec • 1 year ago

Declesion patterns are kinda random in general.

realo • 1 year ago

Indeed not always rational...

cuissots de veau cuisseaux de chevreuil

kergonath • 1 year ago

I think you got it backwards ;)

In any case, this is (officially) obsolete now.

https://fr.m.wikipedia.org/wiki/Cuisseau

rich_sasha • 1 year ago

That's news to me that French for "valley" is masculine and "val" - isn't it feminine "vallée"? Like, say "Vallée Blanche" near Chamonix? And I suppose the English ripoff, "valley" sounds more like "vallée" than "val" (backwards argument, I know).

idoubtit • 1 year ago

The "Vallée blanche" you mentioned is not very far from "Val d'Arly" or "Val Thorens" in the Alps. Both words "val" and "vallée", and also "vallon", come from the Latin "vallis". See the Littré dictionary https://www.littre.org/definition/val for examples over the last millennium.

By the way "Le dormeur du val" (The sleeper of the small valley) is one of Rimbaud's most famous poems, often learned at school.

bambax • 1 year ago

Un val is a small vallée. Une vallée is typically several kilometers wide; un val is a couple of hundred meters wide, tops.

The "Trésor de la langue française informatisé" (which hasn't been updated since 1994) says val is deprecated, but it's common in classic literary novels, together with un vallon, a near synonym.

dustypotato • 1 year ago

Le terme vallée, utilisé comme toponyme, doit être distingué du terme val qui est souvent employé pour désigner et nommer une région limitée dans divers pays d'Europe et dans leurs langues.

-- https://fr.wikipedia.org/wiki/Vall%C3%A9e I agree. it's weird. I'm sure there are other similar examples

mytailorisrich • 1 year ago

Yes, la vallée (feminine) and le val (masculine). Valley is usually la vallée. Val is mostly only used in the names of places.

Apparently val gave vale in English.

makapuf • 1 year ago

clauderoux • 1 year ago

The exceptions are usually due to words that were borrowed from other languages and hence do not follow French rules. Many of the words that were mentioned here are borrowed from the Occitan language.

kergonath • 1 year ago

> Ironically the plural of the Mistral wind in the Larousse dictionnary would technically be Mistrals

This is getting off-topic, but anyway…

The Larousse definition is wrong, that’s for sure. The Tramontane comes from the West, between the Pyrenees and the Massif Central, it is not at all the same current as the Mistral.

I am not sure how prevalent “les Mistrals” is in the literature. I don’t doubt that some people wrote this, possibly for some poetic effect, but it sounds very wrong as well. Mistral is a proper noun, and it is not collective like “Alizés”. It means specifically the wind that blows along the Rhône valley, there cannot be more than one.

[edit] as other pointed out, there is the Mistral gagnant sweet, which can indeed be plural.

mytailorisrich • 1 year ago

Mistral is essentially never in plural form because it is the name of a specific wind.

The only plural form people will probably know is from the song Mistral Gagnant where the lyrics include les mistrals gagnants but that refers to sweets!

Not sure why anyone would think "les mistraux"... ;)

ucarion • 1 year ago

I'm not sure if being from the north of France changes things, but I think the Renaud song is much more familiar to folks I know than the wind.

kergonath • 1 year ago

This is probably heavily population-dependent. I don’t think they named the Mistral-class ships after the song or the sweet.

https://en.m.wikipedia.org/wiki/Mistral-class_landing_helico...

Spone • 1 year ago

The song actually refers to a kind of candy named "Mistral gagnant"

https://fr.m.wikipedia.org/wiki/Mistral_gagnant_(confiserie)

mytailorisrich • 1 year ago

Well yes, it is a Mediterranean wind!

Rygian • 1 year ago

On the subject of French plurals, you also get some counterintuitive ones:

- Egg: un œuf (pronounced /œf/), des œufs (pronounced /œ/ !)

- Bone: un os (pronounced /os/), des os (pronounced /o/ !)

lairv • 1 year ago

Hard to see how can Mistral compete with Meta, they have order of magnitude less compute, their models are only slightly better (at least on the benchmarks) with less permissive licenses?

leetharris • 1 year ago

In general I feel like all model providers eventually become infrastructure providers. If the difference between models is very small, it will be about who can serve it reliably, with the most features, with the most security, at the lowest price.

I'm the head of R&D at Rev.ai and this is exactly what we've seen in ASR. We started at $1.20/hr, and our new models are $0.10/hr in < 2 years. We have done human transcription for ~15 years and the revenue from ASR is 3 orders of magnitude less ($90/hr vs $0.10/hr) and it will likely go lower. However, our volumes are many orders of magnitude higher now for serving ASR, so it's about even or growth in most cases still.

I think for Mistral to compete with Meta they need a better API. The on-prem/self-hosted people will always choose the best models for themselves and you won't be able to monetize them in a FOSS world anyways, so you just need the better platform. Right now, Meta isn't providing a top-tier platform, but that may eventually change.

cosmosgenius • 1 year ago

Their 12b nemo model is very good in a homelab compared llama models. This is for story creation.

simonw • 1 year ago

Yeah, the license thing is definitely a problem. It's hard to get excited about an academic research license for a 3B or 8B model when the Llama 3.1 and 3.2 models are SO good, and are licensed for commercial usage.

sigmar • 1 year ago

to be clear- these ministal model are also licensed for commercial use, but not freely licensed for commercial use. and meta also has restrictions on commercial use (have to put “Built with Meta Llama 3” and need to pay meta if you exceed 700 million monthly users)

sthatipamala • 1 year ago

You need to pay meta if you have 700 million users as of the Llama 3 release date. Not at any time going forward.

simonw • 1 year ago

... or presumably if you build a successful company and then try to sell that company to Apple, Microsoft, Google or a few other huge companies.

tarruda • 1 year ago

> need to pay meta if you exceed 700 million monthly users

Seems like a good problem to have

harisec • 1 year ago

Qwen 2.5 models are better than Llama and Mistral.

speedgoose • 1 year ago

I disagree. I tried the small ones but they too frequently output Chinese when the prompt is English.

harisec • 1 year ago

I never had this problem but i guess it depends on the prompt.

dotnet00 • 1 year ago

For one, Mistral's models seem less censored and less rambly than the Llama models.

blihp • 1 year ago

They can't since Meta can spend billions on models that they give away and never need to get a direct ROI on it. But don't expect Meta's largess to persist much beyond wiping out the competition. Then their models will probably start to look about as open as Android does today. (either through licensing restrictions or more 'advanced' capabilities being paywalled and/or API-only)

sangnoir • 1 year ago

> But don't expect Meta's largess to persist much beyond wiping out the competition

I don't quite follow your argument - what exactly is Meta competing for? It doesn't sell access to a hosted models and shows no interest of being involved in the cloud business. My guess is Meta is driven by enabling wider adoption of AI, and their bet is more (AI-generated) content is good for its existing content-hosting-and-ad-selling business, and good for it's aspirational Metaverse business too, should it pan out.

blihp • 1 year ago

I'm arguing that Meta isn't in this for altruistic reasons. In the short term, they're doing this so Apple/Google can't do to them with AI tech what they've done to them with mobile/browsers. (i.e. Meta doesn't want them owning the stack, and therefore controlling and dictating, who can do what with it) In the longer term: Meta doesn't sell access... yet. Meta shows no interest... yet. You could have said the same thing about Apple and Google 15+ years ago about a great many things. This has all happened before and this will all happen again.

thrance • 1 year ago

In Europe, they are basically the only LLM API provider that is GDPR compliant. This is a big factor here, when selecting a provider.

TheFragenTaken • 1 year ago

With the advent of OSS LLMs, it's "just" a matter of renting compute.

isoprophlex • 1 year ago

Azure openai is definitely compliant...

vineyardmike • 1 year ago

Are all the big clouds not GDPR compliant?

Hard to imagine anyone competing with AWS/GCP/Azure for slices of GPUs/TPU. AFAIK, most major models are available a la carte via API on these providers (with a few exclusives). I can’t imagine how anyone can compete the big clouds on serving an API, and I can’t imagine them staying “non compliant” for long.

thrance • 1 year ago

Maybe, but when selling a SAAS here, big clients will always ask what cloud provider you use. Using an European one is always a plus, if it isn't simply required.

espadrine • 1 year ago

> Hard to see how can Mistral compete with Meta

One significant edge: Meta does not dare even distribute their latest models (the 3.2 series) to EU citizens. Mistral does.

kergonath • 1 year ago

I am not sure why you are downvoted, because AFAICT this is still true. It was definitely true when they were released.

gunalx • 1 year ago

Not having open ish weigths is a total dealbreaker for me. The only really compelling reason behind sub 6B models is them being easy to run on even consumer hardware or on the edge.

sharkjacobs • 1 year ago

I know that Mistral is a French company, but I think it's really clever marketing the way they're using French language as branding.

smcleod • 1 year ago

It's pretty hard to claim it's the world's best then not compare it to Qwen 2.5....

daghamm • 1 year ago

Yeah, qwen does great in benchmarks but is it really that good in real use?

akvadrako • 1 year ago

In my experience using it for story telling, no. Even their largest model likes to produce garbage which isn't even words when you turn the temperature up a bit. And without that it's really bland.

smcleod • 1 year ago

Yes it's fantastic, really great for both general use and coding.

barbegal • 1 year ago

Does anyone know why Mistral use a 17 bit (131k) vocabulary? I'm sure it's more efficient at encoding text but each token doesn't fit into a 16 bit register which must make it more inefficient computationally?

cpldcpu • 1 year ago

The tokens are immediately transformed into embeddings (very large vectors), so the 17 bit values are not used for any computation.

druskacik • 1 year ago

How many Mistral puns are there?

The benchmarks look promising, great job, Mistral.

daghamm • 1 year ago

I don't get it.

These are impressive numbers. But while their use case is local execution to preserve privacy, the only way to use these models right now is to use their API?

mergisi • 1 year ago

Just started experimenting with Ministral 8B! It even passed the "strawberry test"! https://x.com/mustafaergisi/status/1846861559059902777

fhdsgbbcaA • 1 year ago

That just means it was trained recently enough to have it in training data.

amelius • 1 year ago

I have no idea how to compare the abilities of these models, so I have no idea how much of a deal this is.

zurfer • 1 year ago

poor title, Mistral released new open weight models that win across benchmarks in their weight class: Ministral 3B and Ministral 8B

scjody • 1 year ago

Are they really open weights? Ministral 3B is "Mistral Commercial License".

leetharris • 1 year ago

Yeah the 3B are NOT open. The 8B are as they can be used under commercial license.

diggan • 1 year ago

"commercial license != open", by most standards

zurfer • 1 year ago

too late to edit now. I was completely wrong about open-weights.

The meme at the bottom made me jump to that conclusion. Well, not that exciting of a release then. :(

DreamGen • 1 year ago

That would be misleading. They aren't open weight (3B is not available). They aren't compared to Qwen 2.5 which beats them in many of the benchmarks presented while having more permissive license. The closed 3B is not competitive with other API only models, like Gemini Flash 8B which costs less and has better performance.

WiSaGaN • 1 year ago

"For self-deployed use, please reach out to us for commercial licenses. We will also assist you in lossless quantization of the models for your specific use-cases to derive maximum performance.

The model weights for Ministral 8B Instruct are available for research use. Both models will be available from our cloud partners shortly."

throwaway796857 • 1 year ago

[dead]