מתי להשתמש
"RAG", "Vector DB", "Embeddings", "Semantic search", "AI Q&A on docs", "Knowledge base AI".
הוראות עבודה
1. RAG בקצרה
במקום לכלול את כל הידע בprompt (יקר/לא scalable), שמור ב-Vector DB → חפש בrelevant chunks → תן לClaude/GPT לענות.
2. Architecture
1. Ingestion: Docs → Chunks → Embeddings → Vector DB
2. Query: Question → Embed → Search Vector DB → Top K chunks
3. Generation: Question + Retrieved chunks → LLM → Answer
3. Building Blocks
Vector DBs
- Pinecone — managed, popular, $70-400/m.
- Weaviate — open source.
- Qdrant — open source.
- Chroma — lightweight.
- pgvector — Postgres extension.
Embeddings
- OpenAI text-embedding-3-small/large.
- Cohere embed-multilingual-v3 (Hebrew!).
- Voyage AI.
Frameworks
- LangChain — most popular, complex.
- LlamaIndex — RAG-focused.
- n8n LangChain nodes — visual.
4. Common Use Cases
A. Customer Support AI
- Ingest: Help docs, past tickets, KB articles.
- User asks question → RAG → AI answer with citations.
B. Sales Enablement
- Ingest: Battle cards, case studies, objection handling.
- Sales asks → RAG → AI suggests responses.
C. Internal Q&A
- Ingest: Policies, processes, wiki.
- Employees ask → RAG → AI answer.
D. Legal Doc Q&A
- Ingest: Contracts, agreements.
- Lawyer asks → RAG → AI find clauses.
5. Chunking Strategy
Why
- Docs too long for context.
- Search needs smaller units.
Approaches
- Fixed size (512-1024 tokens).
- Semantic (split at headings, paragraphs).
- Recursive (split if too big).
Best Practice
- 512-token chunks with 50-token overlap.
- Add metadata (source URL, section).
6. Sample Workflow — Build RAG (n8n)
Ingestion (one-time or periodic):
1. Trigger: New file in Google Drive
2. Extract text (Tika / OCR)
3. Chunk (Code node — split by 512 tokens)
4. Embed (OpenAI Embeddings module)
5. Upsert to Pinecone (with metadata)
Query (real-time):
1. Trigger: User question (Slack / Webhook)
2. Embed question (OpenAI Embeddings)
3. Pinecone search (top 5 chunks)
4. Build prompt:
"Context: [chunks]
Question: [user input]
Answer based on context only. Cite sources."
5. Claude (Sonnet) generate answer
6. Reply to Slack with answer + sources
7. Hebrew RAG
Considerations
- Embedding model: Cohere multilingual works for Hebrew.
- Chunking: Hebrew shorter than English (use char count not just tokens).
- OCR for Hebrew PDFs: Tesseract Hebrew, Google Vision API.
Israeli Use Case Example
- Israeli law firm: Embed all client contracts → Lawyer asks "What does Acme contract say about IP?"
8. Quality Tips
Better Retrieval
- Hybrid search — semantic + keyword.
- Reranking — second-stage filter.
- Metadata filters — date, source, language.
Better Generation
- Cite sources in prompt.
- Allow "I don't know" — prevent hallucinations.
- Limit answer length — concise.
9. Monitoring
- Retrieval quality — were right chunks returned?
- Answer quality — accurate, complete?
- User feedback — thumbs up/down per response.
- Cost — embeddings + LLM.
10. Costs
Small RAG (10K docs, 100 queries/day)
- Embeddings (one-time): $5-20.
- Pinecone: $70/m.
- LLM (Haiku): $20-50/m.
- Total: ~$100/m.
Large RAG (1M docs, 10K queries/day)
- Embeddings: $500-2K.
- Pinecone: $400-2K/m.
- LLM (Sonnet): $500-3K/m.
- Total: $1.5K-5K/m.
11. Common Pitfalls
❌ Bad chunking — splitting mid-sentence loses context. ❌ No metadata — can't filter / track sources. ❌ Embeddings of bad source data — garbage in, garbage out. ❌ Re-embed everything on update — wasteful.
12. אסיים בהמלצה.
קלט נדרש
| פריט | תיאור |
|---|---|
| Source docs | type + volume |
| Use case | Q&A / Search / Other |
| Language | EN / HE / Multi |
| Volume | queries/day |
| Tool | n8n / LangChain / Custom |
פלט צפוי
| רכיב | תיאור |
|---|---|
| Architecture | high-level |
| Vector DB recommendation | Pinecone/etc |
| Embedding model | OpenAI/Cohere |
| LLM choice | Haiku/Sonnet |
| Cost estimate | $/m |
| המלצה | פעולה אחת |
דגלים אדומים
- 🚨 No source citations — hallucinations look real.
- 🚨 Stale embeddings (didn't re-index) — outdated answers.
- 🚨 Privacy — sensitive docs to OpenAI embeddings.
הערות חשובות
- Start simple — basic RAG before fancy.
- Test retrieval quality before generation.
- Self-hosted Vector DB for sensitive data (Qdrant).
פרומפט לדוגמה
Build RAG for Israeli law firm contracts. Hebrew + English.
Customer support RAG ב-n8n. Plan it.
RAG cost optimization — איך?
© 2026 Automation Expert Pro | גרסה 1.0.0