April 29, 2009 1

When A/B Testing is a) Good and b) Bad.

By in Uncategorized

I’ve heard and read a lot more references to A/B testing lately. For those of you that don’t know, A/B testing is a process where you pit 2 designs against each other to see which works better. For example, you have 2 home page designs and you want to know which one converts more users, so you implement both, then have some random switcher that presents half the people with Version A and the other half with Version B – then you evaluate.

Like most things in life, whether it’s “good” or “bad” depends on context. So…

Here’s some context.

I’ve heard of people deciding to split A/B test little tweaks to their designs, that someone has a hunch will make a positive difference, but where there’s been some internal debate about it, or the champion hasn’t been able to articulate their hunch and therefore hasn’t swayed their colleagues. In these cases, someone often exclaims, “Let’s A/B test it!” Well, ok. The good thing about A/B testing is that it’s democratic. May the most popular design win (as judged by end-users successfully completing their tasks with the design). Hard to argue with that, right?

In other situations, I’ve heard teams decide to A/B test whole modules or whole features of applications. This tends to come from more agile oriented shops who are often more reactive about design than pro-active. I say this not as a jab, but as an observation. Less decisions up front, more “figure it out as we go.” This is a complimentary style to A/B testing, in a Darwinian sort of way. Trouble with using A/B testing on larger chunks of designs, is there’s a point where it starts costing way too much. This is a similar optimization problem to How much unit testing is too much unit testing? To A/B test something implies you have both to test, PLUS you have some mechanism to randomly shuffle the 2 across users. So, you eat the cost of developing both and some switching mechanism so you can do your empirical observation.

The fundamental danger of relying too much on unit testing as opposed to the alternative (which is a dictator-like designer calling the shots based on their own skills and experience of the space), is that a good designer should have accumulated enough flight time to have internalized the knowledge that A/B testing often delivers, at least at a higher level. So, if used at a macro level (larger chunks of the design), it can be very costly. Also, the larger a chunk you’re testing, the more moving parts there are, the more likely that other influencing factors will creep in to the test and blur the findings.

That said, here’s my take. A/B testing is best used to optimize a design that was put together by a good designer. It’s about taking an existing design and either uncovering design errors, or optimizing it to squeeze every last drop of effect out of it (i.e. what happens if we move the purchase button closer to the screen shot tour?). It’s for things like testing wording, layouts, aesthetics, proximity, etc. It’s by no means a substitute for a strong designer making an executive decision.

The other major use for it I see is to de-risk a cut-over from an existing design to a re-think. If you’re worried about the switching drama (“Gah! Where’d my favorite button go?!!!”), it’s an alternative to having a separate Beta program where a sub-set of your users get to react to it first before you force it down everyone’s throats.

For the alternative version of this post, click here. Just kidding ;)

One Response to “When A/B Testing is a) Good and b) Bad.”

  1. Justin@fire450.com says:

    I agree with the points you make in this post. Determining the effectiveness of design tweaks that affect usability or the acceptance by users of larger features is better served by more formal usability testing methods where feedback can be more effectively user to improve the design. I think A/B testing has more relevance areas that are more to do with “marketing” objectives such as trying out the effectiveness of a sign up process or click through rate on an advertisement.

Leave a Reply