מתי להשתמש

כשמישהו אומר: "A/B test", "Split test", "Statistical significance", "Test winner", "Conversion experiment", "Multivariate".

מקור / רקע מקצועי

ConversionXL Institute
Optimizely, VWO documentation
Evan Miller — Sample size calculator

הוראות עבודה

1. A/B Testing — Why

Data > Opinions — מסיר guesswork.
Cumulative wins — 5% lift × 20 tests = 50%+ overall.
Risk reduction — לא משנה הכל בבת אחת.
Learning — מה אנחנו לא מבינים על customers.

2. Test Types

א. A/B Test (השקול ביותר)

2 variations: Control (A) vs Variant (B).
50/50 traffic split.
One variable changed.

ב. A/B/n Test

3+ variations.
Smaller sample per variant.
Need MORE traffic for significance.

ג. Multivariate (MVT)

Test multiple variables simultaneously.
E.g., 2 headlines × 2 CTAs = 4 combinations.
Requires significant traffic.

ד. Split URL Test

Different URLs for each variant.
Test major redesigns.
Use redirects.

3. Hypothesis Formation

Bad

"Let's try a green button instead of blue."

Good

Because [insight from data/research],
We expect [change] will result in [metric improvement],
We'll know it's true when [statistical significance + practical lift].

דוגמה

Because session recordings show 60% of users hesitate at the 3-field form, we expect reducing to 2 fields will increase form CR by 20%+. We'll know after 1,000 conversions per variant with 95% confidence.

4. What to Test (Prioritization)

Highest Impact (test first)

Headlines / Value Prop — שני הכי בולט.
CTA copy + color — small change, big impact.
Hero image / Video.
Form fields.
Pricing presentation.

Medium Impact

Social proof placement.
Page length.
Navigation.

Low Impact

Button color (overrated).
Footer changes.

5. Prioritization Frameworks

א. PIE Framework (Wider/Conversion Rate Experts)

Potential — How big the upside?
Importance — Traffic value of this page?
Ease — Implementation difficulty?
Score 1-10 each. Sum.

ב. ICE Framework

Impact (potential lift).
Confidence (likelihood to win).
Ease (resources needed).
Score 1-10 each.

6. Sample Size Calculation

Why

Small sample = false signals.
Statistical significance דורש מספר conversions מספיק.

Calculator inputs

Baseline CR: e.g., 3%.
Minimum Detectable Effect (MDE): 10% (relative lift).
Statistical Power: 80% (standard).
Significance Level: 95% (standard).

דוגמה

Baseline 3% → Detect 10% lift → ~6,000 conversions per variant.
ב-1,000 visitors/day with 3% CR = 30 conv/day → ~100 days per test.

Tool: Evan Miller's calculator (חינם).

7. Statistical Significance

95% confidence — סטנדרט.
99% — strict.
80% — exploratory.

Check at minimum

1,000 visitors per variant.
100+ conversions per variant.
14+ days (account for weekly variations).

❌ Don't do

Stop test early when "winner" appears.
Run 2 days of test.
Cherry-pick segments after the fact.

8. Test Duration

Minimum

2 weeks (full business cycles).
Account for weekly seasonality (B2B Sun-Thu peak).

Maximum

4-6 weeks — beyond, traffic patterns change.
אם לא reached significance — likely no real difference.

9. Multiple Testing Problem

Issue

Test 20 things, 1 will look "winning" by random chance.

Solutions

Bonferroni correction.
Holdout period — re-test winners.
One test at a time per page when possible.

10. Common Pitfalls

❌ Peeking — looking at results before test ends. ❌ Stopping early — false winners. ❌ Testing too small a sample. ❌ Variants too similar — no learning. ❌ Not segmenting — winner overall, loser for mobile. ❌ Ignoring secondary metrics — won CR, lost LTV. ❌ No documentation — can't replicate / learn. ❌ Concurrent tests on same page — confounding.

11. Documentation

Per test

Hypothesis.
Variant screenshots.
Sample size + Duration.
Results (winner/no-winner).
Lift % + confidence.
Secondary metrics.
Lessons learned.

Template (Notion/Sheet)

Test ID | Date | Page | Hypothesis | Variant | Winner | Lift % | Confidence | Notes

12. Tools

כלי	מתאים ל
Optimizely	Enterprise, robust
VWO	Mid-market, full-featured
AB Tasty	Mid-market
Convert	Affordable
Google Optimize	Sunset 2023
GrowthBook	Open source
Posthog	Product analytics + experiments

13. Beyond A/B — Personalization

Segmented experiences — different content for different audiences.
Returning vs first-time visitors.
Mobile vs Desktop.
Geographic.
Source (Google vs Email vs Social).

14. Israeli Context

Smaller traffic = harder to reach significance.
Solutions:
- More dramatic changes (vs subtle).
- Test highest-traffic pages only.
- Longer test windows.
- Use practical significance (own threshold).

15. אסיים בהמלצה.

קלט נדרש

פריט	תיאור
Page being tested	URL
Current CR	אחוז
Traffic	weekly
Hypothesis	מה רוצים לבדוק
Goal metric	CR / Click / Revenue

פלט צפוי

רכיב	תיאור
Hypothesis statement	מובנה
Test variants	A vs B
Sample size needed	חישוב
Duration estimate	ימים
Primary + secondary metrics	רשימה
Risks / Considerations	רשימה
המלצה	פעולה אחת

כללי עבודה

פלט בעברית. מונחים מקצועיים באנגלית.
Hypothesis-driven — לא לבדוק "סתם".
Statistical rigor — לא לעצור מוקדם.
Document everything — Knowledge compounds.

דגלים אדומים

🚨 לעצור test ב-3 ימים — לא valid.
🚨 Sample < 100 conversions/variant — too small.
🚨 שינוי 5+ דברים בבת אחת — לא A/B.
🚨 Concurrent tests overlapping — confounding.
⚠️ חשד ב"winner" של 5% lift — ייתכן לא significant.

הערות חשובות

Test infrastructure = משקיעים פעם אחת.
Test culture = רגילות.
Most tests fail — 70-80% don't show significant winner. זה בסדר.
Failed test = learning.

פרומפט לדוגמה

SaaS B2B LP, CR 2.5%, 1,500 visitors/week. רוצה לבדוק new headline. תכנון.

eCommerce, רוצה לבדוק single-step vs multi-step checkout. איך להריץ?

Test ניסיתי 5 דברים בבת אחת. למה זה לא נכון?

📖 מה ה-Skill הזה כולל

מתי להשתמש

מקור / רקע מקצועי

הוראות עבודה

1. A/B Testing — Why

2. Test Types

א. A/B Test (השקול ביותר)

ב. A/B/n Test

ג. Multivariate (MVT)

ד. Split URL Test

3. Hypothesis Formation

Bad

Good

דוגמה

4. What to Test (Prioritization)

Highest Impact (test first)

Medium Impact

Low Impact

5. Prioritization Frameworks

א. PIE Framework (Wider/Conversion Rate Experts)

ב. ICE Framework

6. Sample Size Calculation

Why

Calculator inputs

דוגמה

7. Statistical Significance

Check at minimum

❌ Don't do

8. Test Duration

Minimum

Maximum

9. Multiple Testing Problem

Issue

Solutions

10. Common Pitfalls

11. Documentation

Per test

Template (Notion/Sheet)

12. Tools

13. Beyond A/B — Personalization

14. Israeli Context

15. אסיים בהמלצה.

קלט נדרש

פלט צפוי

כללי עבודה

דגלים אדומים

הערות חשובות

פרומפט לדוגמה

📥 התקנה בחצי דקה

רוצה skill כזה, אבל מותאם בדיוק לעסק שלך?

🧩 עוד skills מחבילת CMO דיגיטל ישראל

📚 פרומפטים באותו תחום

פרומפט לכתיבת פוסט פייסבוק שמוכר

פרומפט לשורות נושא למייל שיווקי