מתי להשתמש
כשמישהו אומר: "A/B test", "Split test", "Statistical significance", "Test winner", "Conversion experiment", "Multivariate".
מקור / רקע מקצועי
- ConversionXL Institute
- Optimizely, VWO documentation
- Evan Miller — Sample size calculator
הוראות עבודה
1. A/B Testing — Why
- Data > Opinions — מסיר guesswork.
- Cumulative wins — 5% lift × 20 tests = 50%+ overall.
- Risk reduction — לא משנה הכל בבת אחת.
- Learning — מה אנחנו לא מבינים על customers.
2. Test Types
א. A/B Test (השקול ביותר)
- 2 variations: Control (A) vs Variant (B).
- 50/50 traffic split.
- One variable changed.
ב. A/B/n Test
- 3+ variations.
- Smaller sample per variant.
- Need MORE traffic for significance.
ג. Multivariate (MVT)
- Test multiple variables simultaneously.
- E.g., 2 headlines × 2 CTAs = 4 combinations.
- Requires significant traffic.
ד. Split URL Test
- Different URLs for each variant.
- Test major redesigns.
- Use redirects.
3. Hypothesis Formation
Bad
- "Let's try a green button instead of blue."
Good
- Because [insight from data/research],
- We expect [change] will result in [metric improvement],
- We'll know it's true when [statistical significance + practical lift].
דוגמה
Because session recordings show 60% of users hesitate at the 3-field form, we expect reducing to 2 fields will increase form CR by 20%+. We'll know after 1,000 conversions per variant with 95% confidence.
4. What to Test (Prioritization)
Highest Impact (test first)
- Headlines / Value Prop — שני הכי בולט.
- CTA copy + color — small change, big impact.
- Hero image / Video.
- Form fields.
- Pricing presentation.
Medium Impact
- Social proof placement.
- Page length.
- Navigation.
Low Impact
- Button color (overrated).
- Footer changes.
5. Prioritization Frameworks
א. PIE Framework (Wider/Conversion Rate Experts)
- Potential — How big the upside?
- Importance — Traffic value of this page?
- Ease — Implementation difficulty?
- Score 1-10 each. Sum.
ב. ICE Framework
- Impact (potential lift).
- Confidence (likelihood to win).
- Ease (resources needed).
- Score 1-10 each.
6. Sample Size Calculation
Why
- Small sample = false signals.
- Statistical significance דורש מספר conversions מספיק.
Calculator inputs
- Baseline CR: e.g., 3%.
- Minimum Detectable Effect (MDE): 10% (relative lift).
- Statistical Power: 80% (standard).
- Significance Level: 95% (standard).
דוגמה
- Baseline 3% → Detect 10% lift → ~6,000 conversions per variant.
- ב-1,000 visitors/day with 3% CR = 30 conv/day → ~100 days per test.
Tool: Evan Miller's calculator (חינם).
7. Statistical Significance
- 95% confidence — סטנדרט.
- 99% — strict.
- 80% — exploratory.
Check at minimum
- 1,000 visitors per variant.
- 100+ conversions per variant.
- 14+ days (account for weekly variations).
❌ Don't do
- Stop test early when "winner" appears.
- Run 2 days of test.
- Cherry-pick segments after the fact.
8. Test Duration
Minimum
- 2 weeks (full business cycles).
- Account for weekly seasonality (B2B Sun-Thu peak).
Maximum
- 4-6 weeks — beyond, traffic patterns change.
- אם לא reached significance — likely no real difference.
9. Multiple Testing Problem
Issue
- Test 20 things, 1 will look "winning" by random chance.
Solutions
- Bonferroni correction.
- Holdout period — re-test winners.
- One test at a time per page when possible.
10. Common Pitfalls
❌ Peeking — looking at results before test ends. ❌ Stopping early — false winners. ❌ Testing too small a sample. ❌ Variants too similar — no learning. ❌ Not segmenting — winner overall, loser for mobile. ❌ Ignoring secondary metrics — won CR, lost LTV. ❌ No documentation — can't replicate / learn. ❌ Concurrent tests on same page — confounding.
11. Documentation
Per test
- Hypothesis.
- Variant screenshots.
- Sample size + Duration.
- Results (winner/no-winner).
- Lift % + confidence.
- Secondary metrics.
- Lessons learned.
Template (Notion/Sheet)
Test ID | Date | Page | Hypothesis | Variant | Winner | Lift % | Confidence | Notes
12. Tools
| כלי | מתאים ל |
|---|---|
| Optimizely | Enterprise, robust |
| VWO | Mid-market, full-featured |
| AB Tasty | Mid-market |
| Convert | Affordable |
| Google Optimize | Sunset 2023 |
| GrowthBook | Open source |
| Posthog | Product analytics + experiments |
13. Beyond A/B — Personalization
- Segmented experiences — different content for different audiences.
- Returning vs first-time visitors.
- Mobile vs Desktop.
- Geographic.
- Source (Google vs Email vs Social).
14. Israeli Context
- Smaller traffic = harder to reach significance.
- Solutions:
- More dramatic changes (vs subtle).
- Test highest-traffic pages only.
- Longer test windows.
- Use practical significance (own threshold).
15. אסיים בהמלצה.
קלט נדרש
| פריט | תיאור |
|---|---|
| Page being tested | URL |
| Current CR | אחוז |
| Traffic | weekly |
| Hypothesis | מה רוצים לבדוק |
| Goal metric | CR / Click / Revenue |
פלט צפוי
| רכיב | תיאור |
|---|---|
| Hypothesis statement | מובנה |
| Test variants | A vs B |
| Sample size needed | חישוב |
| Duration estimate | ימים |
| Primary + secondary metrics | רשימה |
| Risks / Considerations | רשימה |
| המלצה | פעולה אחת |
כללי עבודה
- פלט בעברית. מונחים מקצועיים באנגלית.
- Hypothesis-driven — לא לבדוק "סתם".
- Statistical rigor — לא לעצור מוקדם.
- Document everything — Knowledge compounds.
דגלים אדומים
- 🚨 לעצור test ב-3 ימים — לא valid.
- 🚨 Sample < 100 conversions/variant — too small.
- 🚨 שינוי 5+ דברים בבת אחת — לא A/B.
- 🚨 Concurrent tests overlapping — confounding.
- ⚠️ חשד ב"winner" של 5% lift — ייתכן לא significant.
הערות חשובות
- Test infrastructure = משקיעים פעם אחת.
- Test culture = רגילות.
- Most tests fail — 70-80% don't show significant winner. זה בסדר.
- Failed test = learning.
פרומפט לדוגמה
SaaS B2B LP, CR 2.5%, 1,500 visitors/week. רוצה לבדוק new headline. תכנון.
eCommerce, רוצה לבדוק single-step vs multi-step checkout. איך להריץ?
Test ניסיתי 5 דברים בבת אחת. למה זה לא נכון?
© 2026 כל הזכויות שמורות | CMO Online Israel Pro גרסה: 1.0.0 | עדכון אחרון: מאי 2026