📖 מה ה-Skill הזה כולל

מתי להשתמש

"Fine-tuning", "Fine-tune", "Train custom model", "Domain-specific AI".

הוראות עבודה

1. What is Fine-tuning

Train base model on your data → custom model. Different from RAG (retrieval) or prompting.

2. When to Fine-tune (vs Alternatives)

Try First (Cheaper)

Better prompting (system prompts, few-shot).
RAG (your data + base model).

Then Consider Fine-tuning When

Specific tone/style needed at scale.
Reduce token usage (smaller prompts).
Domain-specific tasks (medical, legal).
Faster inference (smaller model can outperform).
100+ training examples available.

Don't Fine-tune If

Your data changes often (RAG better).
< 50 training examples.
General task (base model fine).

3. Fine-tuning Options 2026

א. OpenAI

Models: gpt-4o-mini, gpt-3.5-turbo.
Cost: $25-100 per training run + inference.
Tools: console.openai.com.

ב. Anthropic

Limited fine-tuning (mostly via Bedrock).
Focus on prompt engineering instead.

ג. Open Source

Llama 3 fine-tuning.
Self-host or via Together AI, Anyscale.
Most control + cost-effective.

ד. Specialty

Cohere — fine-tune Command models.
Mistral — fine-tune their models.

4. Dataset Preparation

Format (OpenAI)

{"messages": [
  {"role": "system", "content": "You are a..."},
  {"role": "user", "content": "Question"},
  {"role": "assistant", "content": "Answer"}
]}

Quality Tips

100-1,000 examples typically enough.
Diversity — cover edge cases.
Quality > Quantity — bad data = bad model.
Same format every example.
Train/Val split (80/20).

5. Process

1. Prepare dataset (JSONL).
2. Upload to provider.
3. Start training job (15 min - few hours).
4. Evaluate on val set.
5. If not good — iterate dataset.
6. Deploy fine-tuned model.
7. Monitor in production.

6. Evaluation

Metrics

Loss — auto-tracked.
Custom evals — domain-specific.
Human eval — ultimate test.

Compare to Baseline

Fine-tuned vs base model on val set.
If no improvement — abandon.

7. Cost (OpenAI gpt-4o-mini fine-tuning)

Training

$3 per 1M training tokens.
100 examples × 500 tokens × 3 epochs = 150K tokens = $0.45.
1K examples = $4.50.

Inference (vs base model)

Input: $0.30 / 1M (vs $0.15 base) — 2x.
Output: $1.20 / 1M (vs $0.60 base) — 2x.

Net: pays off only if substantial improvement.

8. RAG vs Fine-tuning vs Prompting

	Prompting	RAG	Fine-tuning
Setup time	Hours	Days	Days-weeks
Update content	Edit prompt	Re-index	Re-train
Cost	Low	Medium	High upfront
Best for	General	Knowledge	Tone/style
Data scale	Small	Large	Medium

Decision Tree

Need domain knowledge?
├── Static, < 10K tokens → Prompt with context
├── Dynamic, large → RAG
└── Specific style/tone → Fine-tuning

Need consistent output format?
├── Few examples needed → Few-shot prompting
└── Many examples + complex → Fine-tuning

9. Common Pitfalls

❌ Fine-tune for facts — RAG better. ❌ Small bad dataset — model overfits. ❌ No baseline — don't know if better. ❌ No human eval — trusting metrics alone. ❌ Fine-tuning when prompting works — wasted effort.

10. Alternatives Worth Trying First

Better prompts (90% of cases).
Few-shot prompting (1-10 examples).
RAG (large knowledge).
Combination (RAG + prompt).
Then fine-tune if still not good.

11. Israel Specifics

Hebrew fine-tuning possible via Llama 3.
Local datasets for domain (legal, medical).
Privacy: self-host fine-tuned models if PII.

12. אסיים בהמלצה.

פרומפט לדוגמה

Fine-tune for Hebrew customer support tone. ROI?

500 training examples, gpt-4o-mini. Cost?

RAG vs fine-tune for legal Q&A?

📥 התקנה בחצי דקה

1. הורד ופתח את קובץ ה-ZIP — תקבל תיקייה בשם fine-tuning.

2. ב-Claude Code: העבר את התיקייה אל ~/.claude/skills/.
באפליקציה (Claude / Cowork): הגדרות ← Capabilities ← Skills ← העלאה.

3. בקש מ-Claude את מה שצריך בעברית — הוא יפעיל את ה-skill לבד כשזה רלוונטי.

Skill Fine-tuning של מודלי שפה ל-Claude