מתי להשתמש
"Fine-tuning", "Fine-tune", "Train custom model", "Domain-specific AI".
הוראות עבודה
1. What is Fine-tuning
Train base model on your data → custom model. Different from RAG (retrieval) or prompting.
2. When to Fine-tune (vs Alternatives)
Try First (Cheaper)
- Better prompting (system prompts, few-shot).
- RAG (your data + base model).
Then Consider Fine-tuning When
- Specific tone/style needed at scale.
- Reduce token usage (smaller prompts).
- Domain-specific tasks (medical, legal).
- Faster inference (smaller model can outperform).
- 100+ training examples available.
Don't Fine-tune If
- Your data changes often (RAG better).
- < 50 training examples.
- General task (base model fine).
3. Fine-tuning Options 2026
א. OpenAI
- Models: gpt-4o-mini, gpt-3.5-turbo.
- Cost: $25-100 per training run + inference.
- Tools: console.openai.com.
ב. Anthropic
- Limited fine-tuning (mostly via Bedrock).
- Focus on prompt engineering instead.
ג. Open Source
- Llama 3 fine-tuning.
- Self-host or via Together AI, Anyscale.
- Most control + cost-effective.
ד. Specialty
- Cohere — fine-tune Command models.
- Mistral — fine-tune their models.
4. Dataset Preparation
Format (OpenAI)
{"messages": [
{"role": "system", "content": "You are a..."},
{"role": "user", "content": "Question"},
{"role": "assistant", "content": "Answer"}
]}
Quality Tips
- 100-1,000 examples typically enough.
- Diversity — cover edge cases.
- Quality > Quantity — bad data = bad model.
- Same format every example.
- Train/Val split (80/20).
5. Process
1. Prepare dataset (JSONL).
2. Upload to provider.
3. Start training job (15 min - few hours).
4. Evaluate on val set.
5. If not good — iterate dataset.
6. Deploy fine-tuned model.
7. Monitor in production.
6. Evaluation
Metrics
- Loss — auto-tracked.
- Custom evals — domain-specific.
- Human eval — ultimate test.
Compare to Baseline
- Fine-tuned vs base model on val set.
- If no improvement — abandon.
7. Cost (OpenAI gpt-4o-mini fine-tuning)
Training
- $3 per 1M training tokens.
- 100 examples × 500 tokens × 3 epochs = 150K tokens = $0.45.
- 1K examples = $4.50.
Inference (vs base model)
- Input: $0.30 / 1M (vs $0.15 base) — 2x.
- Output: $1.20 / 1M (vs $0.60 base) — 2x.
Net: pays off only if substantial improvement.
8. RAG vs Fine-tuning vs Prompting
| Prompting | RAG | Fine-tuning | |
|---|---|---|---|
| Setup time | Hours | Days | Days-weeks |
| Update content | Edit prompt | Re-index | Re-train |
| Cost | Low | Medium | High upfront |
| Best for | General | Knowledge | Tone/style |
| Data scale | Small | Large | Medium |
Decision Tree
Need domain knowledge?
├── Static, < 10K tokens → Prompt with context
├── Dynamic, large → RAG
└── Specific style/tone → Fine-tuning
Need consistent output format?
├── Few examples needed → Few-shot prompting
└── Many examples + complex → Fine-tuning
9. Common Pitfalls
❌ Fine-tune for facts — RAG better. ❌ Small bad dataset — model overfits. ❌ No baseline — don't know if better. ❌ No human eval — trusting metrics alone. ❌ Fine-tuning when prompting works — wasted effort.
10. Alternatives Worth Trying First
- Better prompts (90% of cases).
- Few-shot prompting (1-10 examples).
- RAG (large knowledge).
- Combination (RAG + prompt).
- Then fine-tune if still not good.
11. Israel Specifics
- Hebrew fine-tuning possible via Llama 3.
- Local datasets for domain (legal, medical).
- Privacy: self-host fine-tuned models if PII.
12. אסיים בהמלצה.
פרומפט לדוגמה
Fine-tune for Hebrew customer support tone. ROI?
500 training examples, gpt-4o-mini. Cost?
RAG vs fine-tune for legal Q&A?
© 2026 AI Expert Pro | גרסה 1.0.0