ZIP · ללא הרשמה · רישיון שימוש כלול בקובץ

📖 מה ה-Skill הזה כולל

מתי להשתמש

"RAG", "Vector DB", "Embeddings", "Semantic search", "AI Q&A on docs", "Knowledge base AI".

הוראות עבודה

1. RAG בקצרה

במקום לכלול את כל הידע בprompt (יקר/לא scalable), שמור ב-Vector DB → חפש בrelevant chunks → תן לClaude/GPT לענות.

2. Architecture

1. Ingestion: Docs → Chunks → Embeddings → Vector DB
2. Query: Question → Embed → Search Vector DB → Top K chunks
3. Generation: Question + Retrieved chunks → LLM → Answer

3. Building Blocks

Vector DBs

Pinecone — managed, popular, $70-400/m.
Weaviate — open source.
Qdrant — open source.
Chroma — lightweight.
pgvector — Postgres extension.

Embeddings

OpenAI text-embedding-3-small/large.
Cohere embed-multilingual-v3 (Hebrew!).
Voyage AI.

Frameworks

LangChain — most popular, complex.
LlamaIndex — RAG-focused.
n8n LangChain nodes — visual.

4. Common Use Cases

A. Customer Support AI

Ingest: Help docs, past tickets, KB articles.
User asks question → RAG → AI answer with citations.

B. Sales Enablement

Ingest: Battle cards, case studies, objection handling.
Sales asks → RAG → AI suggests responses.

C. Internal Q&A

Ingest: Policies, processes, wiki.
Employees ask → RAG → AI answer.

D. Legal Doc Q&A

Ingest: Contracts, agreements.
Lawyer asks → RAG → AI find clauses.

5. Chunking Strategy

Why

Docs too long for context.
Search needs smaller units.

Approaches

Fixed size (512-1024 tokens).
Semantic (split at headings, paragraphs).
Recursive (split if too big).

Best Practice

512-token chunks with 50-token overlap.
Add metadata (source URL, section).

6. Sample Workflow — Build RAG (n8n)

Ingestion (one-time or periodic):
1. Trigger: New file in Google Drive
2. Extract text (Tika / OCR)
3. Chunk (Code node — split by 512 tokens)
4. Embed (OpenAI Embeddings module)
5. Upsert to Pinecone (with metadata)

Query (real-time):
1. Trigger: User question (Slack / Webhook)
2. Embed question (OpenAI Embeddings)
3. Pinecone search (top 5 chunks)
4. Build prompt:
   "Context: [chunks]
    Question: [user input]
    Answer based on context only. Cite sources."
5. Claude (Sonnet) generate answer
6. Reply to Slack with answer + sources

7. Hebrew RAG

Considerations

Embedding model: Cohere multilingual works for Hebrew.
Chunking: Hebrew shorter than English (use char count not just tokens).
OCR for Hebrew PDFs: Tesseract Hebrew, Google Vision API.

Israeli Use Case Example

Israeli law firm: Embed all client contracts → Lawyer asks "What does Acme contract say about IP?"

8. Quality Tips

Better Retrieval

Hybrid search — semantic + keyword.
Reranking — second-stage filter.
Metadata filters — date, source, language.

Better Generation

Cite sources in prompt.
Allow "I don't know" — prevent hallucinations.
Limit answer length — concise.

9. Monitoring

Retrieval quality — were right chunks returned?
Answer quality — accurate, complete?
User feedback — thumbs up/down per response.
Cost — embeddings + LLM.

10. Costs

Small RAG (10K docs, 100 queries/day)

Embeddings (one-time): $5-20.
Pinecone: $70/m.
LLM (Haiku): $20-50/m.
Total: ~$100/m.

Large RAG (1M docs, 10K queries/day)

Embeddings: $500-2K.
Pinecone: $400-2K/m.
LLM (Sonnet): $500-3K/m.
Total: $1.5K-5K/m.

11. Common Pitfalls

❌ Bad chunking — splitting mid-sentence loses context. ❌ No metadata — can't filter / track sources. ❌ Embeddings of bad source data — garbage in, garbage out. ❌ Re-embed everything on update — wasteful.

12. אסיים בהמלצה.

קלט נדרש

פריט	תיאור
Source docs	type + volume
Use case	Q&A / Search / Other
Language	EN / HE / Multi
Volume	queries/day
Tool	n8n / LangChain / Custom

פלט צפוי

רכיב	תיאור
Architecture	high-level
Vector DB recommendation	Pinecone/etc
Embedding model	OpenAI/Cohere
LLM choice	Haiku/Sonnet
Cost estimate	$/m
המלצה	פעולה אחת

דגלים אדומים

🚨 No source citations — hallucinations look real.
🚨 Stale embeddings (didn't re-index) — outdated answers.
🚨 Privacy — sensitive docs to OpenAI embeddings.

הערות חשובות

Start simple — basic RAG before fancy.
Test retrieval quality before generation.
Self-hosted Vector DB for sensitive data (Qdrant).

פרומפט לדוגמה

Build RAG for Israeli law firm contracts. Hebrew + English.

Customer support RAG ב-n8n. Plan it.

RAG cost optimization — איך?

📥 התקנה בחצי דקה

1. הורד ופתח את קובץ ה-ZIP — תקבל תיקייה בשם ai-rag-automations.
2. ב-Claude Code: העבר את התיקייה אל ~/.claude/skills/.
באפליקציה (Claude / Cowork): הגדרות ← Capabilities ← Skills ← העלאה.
3. בקש מ-Claude את מה שצריך בעברית — הוא יפעיל את ה-skill לבד כשזה רלוונטי.