← Visual Notes

SLMs vs LLMs: Sizing Models in a Fast-Moving Landscape

New models ship weekly and MoE (Mixture of Experts) makes parameter counts misleading. A visual guide to navigating model sizes — and knowing when to go small, go big, or route between both.

Feb 21, 2026

Size Taxonomy
The Model Landscape
Where models actually sit on the parameter scale — and why MoE makes it complicated.
SLM · 1–15B
?
LLM · 30B+
1B 10B 100B 1T
Model
Parameters (log scale)
These models store hundreds of billions of parameters but only activate a small fraction per token. The bright segment shows what's actually used during inference.
17B / 397B
17B / 400B
37B / 685B
32B / 1T
MoE — Total ≠ Active
17B active
397B total
4.3% util
Llama 4 Maverick
17B active
400B total
4.3% util
DeepSeek V3.2
37B active
685B total
5.4% util
Kimi K2
32B active
1T total
3.2% util
1B–15BSLM range
30B+LLM range
15B–30BGrey zone
When does bigger actually matter?
Go Big
When Bigger Is Better
Five capability domains where large models still dominate.
Complex Reasoning
Multi-step logic and chain-of-thought require massive parameter counts to store compositional knowledge.
Zero-shot Generalization
No fine-tuning data? Large models handle novel tasks through broad pre-training coverage.
Long Context (128K+)
Processing entire codebases or legal documents needs the capacity of large architectures.
Agentic Workflows
Complex tool use, multi-turn planning, and self-correction demand the reasoning depth only LLMs provide.
Open-ended Generation
Creative writing, brainstorming, and unconstrained generation benefit from diverse training at scale.
× $$$ × Slower ✓ No fine-tuning needed
But what if you have training data and care about cost?
Go Small
The Small Model Advantage
Fine-tuned SLMs routinely match or beat zero-shot LLMs on well-defined tasks.
Sentiment Analysis (SST-2, F1)
BERT 110M FT
94.4%
GPT-4o zero-shot
87.0%
A 110M-param model from 2018 — with SFT data, even old architectures win.
Headline Classification (Financial, wt. F1)
Phi-3-mini 3.8B FT
95.6%
GPT-4 zero-shot
83.4%
Financial Sentiment (FiQA-SA, wt. F1)
Llama 3 8B FT
86.6%
GPT-4 zero-shot
75.7%
Named Entity Recognition (CoNLL)
Mistral 7B LoRA
98.9%
GPT-4 zero-shot
74.2%
Sources: arXiv:2602.06370 (SST-2) · arXiv:2411.02476 (Headline, FiQA-SA) · arXiv:2405.00732 (CoNLL)
✓ Significantly cheaper at scale ✓ Own your weights & data ✓ Runs on-device / air-gapped × Requires SFT data & training pipeline × More ops complexity vs. 3P API
What about MoE — is that big or small?
Mixture of Experts
The MoE Complication
MoE models have up to 1T total parameters but only activate 17–37B per token. So are they big or small?
In a dense model, every parameter fires on every token. MoE splits the network into dozens of specialist sub-networks (“experts”) and uses a lightweight router to pick a small handful for each token. The result: frontier-level quality at a fraction of the compute, because most of the model stays asleep.
Try it — route a token through the pipeline and watch which experts activate.
Token In
input embedding
Attention
shared layers
Router
selects experts
Output
shared projection
Token Out
next token
1TTotal (Kimi K2)
32BActive per token
3.2%Utilization
So how do you actually decide?
Decision Framework
Match the Signal to the Size
Nine questions that tell you whether to go big or small. Click each to see why.
Is the task well-defined?
Go Small
Narrow, well-scoped tasks (classification, extraction, routing) are SLM territory. Fine-tuning on your specific labels consistently beats zero-shot LLMs.
Do you have labeled training data?
Go Small
Even 500–1K labeled examples can push a 1–3B model past a 400B generalist. No data means you need the LLM's zero-shot ability.
Is query volume high?
Go Small
At 10K+ queries/day, the cost gap becomes existential. A fine-tuned SLM at $0.01/1K tokens vs $1.00/1K tokens compounds fast.
Do you need real-time latency?
Go Small
SLMs generate 150–300 tok/s vs 50–100 tok/s for LLMs. For user-facing applications, that latency gap defines the entire UX.
Is the deployment target edge or mobile?
Go Small
Models under 4B parameters can run on-device with quantization. No network round-trip, no cloud dependency, works fully offline.
Is budget constrained?
Go Small
Fine-tuning a 3B model costs ~$50–200. Running it costs 100x less than an LLM API. The ROI is immediate for well-defined tasks.
Does data need to stay on-premise?
Go Small
Self-hosted SLMs mean no data leaves your infrastructure. For healthcare, finance, and legal this is often a hard requirement, not a preference.
Does the system handle diverse, unpredictable tasks?
Go Big
If you can't enumerate the task space — customer service bots, coding assistants, research tools — you need the LLM's broad generalization ability.
Does the task require multi-step reasoning?
Go Big
Chain-of-thought, planning, and complex inference still favor large models. SLMs struggle with problems requiring 3+ reasoning steps or compositional logic.
Your signals: 0 Big | 0 Small
In practice, you don't pick one — you route between both.
Routing Playbook
Route, Don't Choose
A small router model classifies each request, sending simple queries to the SLM and hard ones to the LLM.
Complex query ratio 20%
SLM Path
Fast, cheap — handles ~80% of traffic
0
LLM Path
Powerful — handles ~20% of complex queries
0
$0.00
With routing
$0.00
All-LLM baseline
100xSLM cost advantage
$0.01SLM / 1K tokens
$1.00LLM / 1K tokens
The bottom line
Key Takeaways
What to Remember
A fine-tuned 3B model will beat a zero-shot 400B model on a well-scoped task. The benchmarks are clear on this.
MoE makes “how big is this model?” a harder question. Always check active parameters, not just total.
LLMs still win on open-ended reasoning, long context, and tasks you can’t define upfront. Don’t force-fit an SLM where the task is ambiguous.
Routing is a practical middle ground: classify requests by complexity, send the easy ones to a small model, and save the expensive model for what actually needs it.
Prototype with an LLM API first. Once the task stabilizes and you have labeled data, distill down to an SLM you own.
← Visual Notes