Red or green? “Buy now” or “Add to cart”? Send that promo email Tuesday morning or Thursday afternoon?

If you’ve ever argued with a coworker about which headline will get more clicks, you’ve already been thinking about A/B testing — you just didn’t have the data to settle the argument. And that’s the whole point: A/B testing replaces opinions with evidence.

Booking.com runs over 1,000 experiments at any given moment. Amazon grew from 546 experiments in their first year to over 12,000 annually. Google had nearly 1,200 active search experiments running as of mid-2025. These aren’t companies that guess. They test, measure, and let user behavior make the call. The gap between companies that experiment and companies that don’t is widening every quarter — and the data backs that up.

What is A/B testing?

A/B testing (also called split testing) is a method of comparing two versions of a webpage, email, ad, or other marketing asset to determine which one performs better. You show version A to one group of users and version B to another, then measure which version produces more of the outcome you care about — clicks, signups, purchases, or whatever your goal is.

The concept is borrowed from randomized controlled trials in science. Half your audience gets the control (the original), half gets the variant (the change), and statistical analysis tells you whether the difference in performance is real or just random noise. Nothing changes except the one element you’re testing, so you can draw a direct line from the change to the result.

Why A/B testing matters

The A/B testing tools market hit $969 million in 2025 and is growing at 14% annually through 2031, according to industry research. That kind of spending doesn’t happen because testing is trendy — it happens because it works.

Here’s what the numbers actually look like. Companies that run structured A/B testing programs grow revenue 1.5 to 2 times faster than those that don’t, according to Convert’s 2025 CRO benchmarks. Statistically significant A/B tests boost conversion rates by an average of 49%. And yet, only 0.2% of the 1.1 billion active websites globally are running structured experiments, according to the same report. That means almost everyone is leaving money on the table.

Jeff Bezos put it plainly: “Our success is a function of how many experiments we do per year, per month, per week, per day.” Amazon runs thousands of tests daily. That’s not a philosophy — it’s an operational system that produces measurable revenue gains.

The flip side matters too. Only about 12 to 15% of A/B tests produce a statistically significant winner, according to Invesp’s experiment velocity research. That sounds discouraging until you realize it means each winning test is a genuine discovery, not a fluke. The companies that test the most find the most winners — simple math.

How A/B testing works

1. Identify the problem with data

Every good test starts with a real problem, not a hunch. Look at your analytics for pages with high traffic but low conversion, emails with strong open rates but weak click-throughs, or checkout flows where users drop off. Google Analytics, heatmaps, and session recordings will show you where users are struggling. If you can’t point to a specific metric that’s underperforming, you’re not ready to test — you’re just tinkering.

2. Form a hypothesis

A hypothesis isn’t “let’s try a different button color.” It’s a statement you can prove or disprove: “Changing the CTA from ‘Learn more’ to ‘Start free trial’ will increase signups by at least 10% because the current copy doesn’t communicate that the product is free to try.” The hypothesis forces you to connect your change to a specific outcome with a reason why it should work. If you skip this step, you won’t know what you learned even when the test is over.

3. Decide what to measure

Pick one primary metric before the test starts. Not two, not five — one. Your primary metric is the thing that directly reflects the behavior you’re trying to change. You can track secondary metrics (time on page, bounce rate, average order value), but the test’s success or failure hinges on that single primary number. Changing your primary metric mid-test is how you end up with false positives and bad decisions.

4. Build the variants

Create version B (the challenger) with exactly one change from version A (the control). If you change the headline, the button color, and the image all at once, you won’t know which change caused the result. That said, there are situations where multivariate testing makes sense — but you need significantly more traffic to get reliable results, and most companies don’t have that volume for most pages.

5. Calculate your sample size

This is where most marketers cut corners, and it’s exactly where you can’t afford to. Use a sample size calculator (most testing tools include one) to determine how many visitors you need before the test can produce a reliable result. The inputs are your current conversion rate, the minimum improvement you’d consider meaningful, and your desired confidence level (95% is standard). Running a test without calculating sample size is like flipping a coin 10 times and declaring it biased because it came up heads six times.

6. Randomize and launch

Your testing tool handles the randomization — it splits incoming traffic so that each visitor sees only one version for the entire duration of the test. Make sure you’re randomizing at the user level, not the session level, so the same person doesn’t see version A on Monday and version B on Tuesday. Also, run your test across full weekly cycles to account for day-of-week effects. A test that only captures Tuesday-through-Thursday traffic will give you skewed data.

7. Wait for statistical significance

This is the hard part: patience. Don’t peek at the results after two days and call a winner. According to Convert’s 2025 data, 70% of experiments that reach the recommended sample size meet the 95% confidence threshold, and 49% reach 99%+ confidence. But those numbers only hold if you let the test run to completion. Stopping early because one variant looks promising is called “peeking bias,” and it inflates your false positive rate dramatically.

8. Analyze and act on results

When the test concludes, look at more than just the top-line number. Segment your results by device, traffic source, new vs. returning visitors, and any other dimension that matters to your business. A variant might win overall but lose badly on mobile — and if 60% of your traffic is mobile, that “winner” will actually hurt you. Document every test result, win or lose, in a shared log. The losses teach you as much as the wins.

Real-world A/B testing examples

Booking.com: experimentation as a company culture

Booking.com doesn’t just test — testing is how they build products. With over 1,000 concurrent experiments running at any moment and 80% of their product and technology teams actively launching tests, experimentation isn’t a marketing function. It’s baked into engineering, design, and product management. This approach helped move Booking.com from a small Dutch startup to a company valued at over $150 billion. Every employee can propose and run an experiment, and the data — not the highest-paid person’s opinion — decides what ships.

The Australian Red Cross: 84% conversion jump

The Australian Red Cross used A/B testing and digital experimentation to overhaul their online donation experience. By systematically testing page layouts, form fields, and donation prompts, they achieved an 84% increase in conversion rate and a 2,800% boost in online revenue within six months, according to a case study documented by Unbounce. That’s not a rounding error — it’s the difference between funding programs and cutting them.

Walmart Canada: mobile-first testing

Walmart Canada A/B tested a complete responsive redesign against their existing site. The result: a 20% conversion increase across all devices and a 98% increase in mobile orders. The test gave them the confidence to roll out the redesign knowing it would perform, rather than launching it and hoping for the best.

Kareo: fewer form fields, $1.56 million more

Kareo, a healthcare technology company, tested reducing the number of fields in their signup form. Fewer fields meant 30% more physicians signing up, a 40% improvement in marketing ROI, and $1.56 million in additional yearly revenue. The original form asked for information Kareo didn’t actually need during signup — the test proved that every unnecessary field was costing them real money.

Common A/B testing mistakes

Stopping tests too early

This is the single most common mistake. You see one variant ahead by 15% after three days and declare victory. The problem is that early results are unstable — small sample sizes produce wild swings. A test might show version B winning by 20% on day three and losing by 5% by day ten once the data stabilizes. Always wait for your predetermined sample size, no exceptions.

Testing without enough traffic

If your page gets 200 visitors a month, you can’t A/B test a 5% improvement — you’d need to run the test for over a year to reach statistical significance. Before starting any test, run the math. If the required sample size means running for more than 4-6 weeks, either test a bigger change (that might produce a larger effect) or focus your testing on higher-traffic pages.

Changing multiple elements at once

If you change the headline, the hero image, and the CTA button in one test, and the variant wins, which change caused it? You have no idea. Test one variable at a time. If you need to test a complete page redesign, treat it as a holistic test — just know that you’re measuring the total effect without understanding which individual element drove the result.

Ignoring mobile vs. desktop segments

A test result that looks positive in aggregate can hide a disaster on one device type. If your variant increases desktop conversions by 30% but drops mobile conversions by 15%, and mobile makes up 65% of your traffic, you might actually lose money by implementing the “winning” variant. Always segment results by device before making a decision.

Not documenting results

Companies that don’t maintain a test log end up re-testing ideas they already tried two years ago, or worse, implementing changes that a previous test showed were harmful. Keep a simple spreadsheet: date, hypothesis, what changed, sample size, result, confidence level, and the decision made. In two years, that log becomes the most valuable marketing asset in your company.

A/B testing tools

Google Optimize’s successors (2025-2026): After Google sunset Optimize in 2023, the market shifted. AB Tasty, VWO, and Convert picked up most of the migration traffic. Each offers visual editors for non-technical teams plus server-side testing for developers.

Optimizely: The most established enterprise platform. Strong statistical engine, feature flagging, and server-side experimentation. Pricing is opaque and starts high — this is for companies running hundreds of tests per month.

VWO (Visual Website Optimizer): Good middle ground between price and features. Includes heatmaps, session recordings, and surveys alongside A/B testing, so you can diagnose problems and test solutions in one tool.

Convert: Privacy-focused platform that’s GDPR and CCPA compliant by default. Their 2025 benchmarking data (cited earlier in this article) comes from their own platform’s experiment results.

Statsig: Built for product teams rather than marketers. Handles feature flags, A/B tests, and analytics in one platform. Free tier available for smaller teams.

LaunchDarkly: Primarily a feature flag platform, but its experimentation features are strong. Best for engineering teams that want to tie A/B tests directly into their deployment pipeline.

FAQ

How long should an A/B test run?

Until you hit your calculated sample size — not a day sooner. For most websites, that means one to four weeks. The minimum is usually one full business cycle (seven days) to capture day-of-week variation. If your calculator says you need 10,000 visitors per variant and you get 500 visitors a day, the test needs to run for at least 20 days.

What’s the difference between A/B testing and multivariate testing?

A/B testing compares two complete versions — A vs. B — where one element is different. Multivariate testing changes multiple elements simultaneously and tests every possible combination. If you’re testing two headlines and two images, multivariate testing would run four variants (headline 1 + image 1, headline 1 + image 2, headline 2 + image 1, headline 2 + image 2). You need much more traffic for multivariate tests to reach significance.

What’s a good conversion rate improvement from A/B testing?

It depends entirely on what you’re testing and where you started. A 2-5% relative improvement on a high-traffic page can translate to millions in revenue. The average successful A/B test (one that reaches significance) produces about a 49% improvement in the tested metric, according to Convert’s 2025 data — but remember, only 12-15% of tests produce a significant winner. The real value comes from compounding many small wins over time.

Can I A/B test with low traffic?

You can, but your options narrow. With low traffic, test bold changes that might produce large effects (a completely different page layout rather than a button color tweak). You can also test at higher levels of the funnel where you have more volume — email subject lines, ad copy, or landing page headlines. If your page gets fewer than 1,000 visitors per month, consider qualitative research methods (user interviews, surveys) instead of quantitative testing.

What’s the 95% confidence level everyone talks about?

It means there’s only a 5% probability that the observed difference between your variants happened by random chance. In practical terms, if you run the same test 20 times, one of those times might produce a misleading result just from statistical noise. The 95% threshold is an industry convention, not a law of physics — some companies use 90% for low-risk decisions and 99% for high-stakes changes like checkout page modifications.

Should I A/B test everything?

No. Testing has a cost — your team’s time, the opportunity cost of traffic allocated to losing variants, and the duration of the test itself. Prioritize tests where you have good reason to believe a change will work (from data, not gut feeling), where the potential impact is high (high-traffic pages, high-value actions), and where the change is reversible. Don’t test your 404 page. Do test your checkout flow.

Related terms

  • Conversion rate optimization (CRO) — the broader discipline that includes A/B testing as one of its primary methods for improving website performance.
  • Multivariate testing — a more complex form of testing that evaluates multiple variables simultaneously rather than a single change.
  • Statistical significance — the mathematical threshold that determines whether your test result reflects a real difference or random chance.
  • Landing page optimization — the practice of improving specific pages where visitors arrive, often using A/B testing to validate changes.
  • Call to action (CTA) — the button, link, or prompt that asks users to take a specific action, and one of the most frequently A/B tested page elements.
TheWeeklyClickbyAdogy

Join thousands in getting expert tips and tricks for digital growth. 

Free Website Audit Tool

Get an analysis of your website’s performance in seconds.

Expert Review Board

Our digital marketing experts fact check and review every article published across the Adogy’s

Technology is changing fast...

Are you ready for AI search?

Used by top investors and entrepreneurs from:
adogy_logo_banner