Fine-tuning modifies weights; prompting doesn't. That's the fundamental difference. Full fine-tuning is expensive and rare (requires access to the full model + lots of compute). In practice, "fine-tuning" usually means LoRA -- training a small adapter that layers over the base model.
2026 intuition: try prompt engineering + retrieval first. If performance is still short of target and you have > 5k high-quality task examples, fine-tune. Diminishing returns vs. prompting kick in fast for frontier models on popular tasks; fine-tuning earns its cost most on narrow, consistent workloads (classification, extraction, style-matching).
Example Prompt
# OpenAI fine-tuning prep (JSONL of message traces)
{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [...]}
{"messages": [...]}
# Upload + train
client.files.create(file=open("train.jsonl", "rb"), purpose="fine-tune")
client.fine_tuning.jobs.create(
training_file="file-abc",
model="gpt-4o-mini-2024-07-18"
)When to use it
- Narrow, high-volume tasks where prompt engineering has plateaued
- Distinctive style or terminology the model doesn't know
- Cost pressure on a high-volume inference path (smaller FT model often beats larger base)
When NOT to use it
- Prompt engineering and retrieval haven't been exhausted yet
- You have < 1000 examples -- not enough signal
- Task requires up-to-date facts -- fine-tuning doesn't update knowledge the way you think it does (use RAG)
