It’s not hard to see the importance of performing tests on web systems. We regularly hear how companies such as Amazon, Google, or Facebook employ A/B testing, also known as split testing and multivariate testing, to quickly test variations of their websites. Deploying a test to a percentage of a site’s requests can quickly measure the impact of a site change or a variation of an algorithm. The impact of testing in a company can be greater than an isolated experiment on measuring which color maximizes the click-through rate. An infrastructure for and culture of testing can impact positively the entire organization. An organization that doesn’t perform frequent tests is one doomed to adapt slowly or without information.
In order for a culture of experimentation to exist, it must be easy to perform these experiments and measure the results. There are some great frameworks for offline experimentation. MapReduce solutions are great for experimentation and examination of large data sets. I’ve seen the importance of a good software architecture for experimentation in my own experience with the Lemur Toolkit. However, I haven’t seen much discussion of frameworks for performing online experiments in complex web systems.
This post considers as an example split testing for a related content widget. Such a related content widget could be a deployed on a news publisher’s article pages. The widget would show related articles in a sidebar or below the content of the article. We may wish to test variations of the related content widget to see if they increase some measure of success, such as the click-through rate.
Conceptually, a related content widget is quite simple. There are two major components, one that returns a list of related items and one that renders the items on the web page. While this system is quite simple conceptually, a typical implementation of such a system may be much more complex. The renderer in the client may make use of a mix of XML, CSS, and HTML. Parts of the rendering algorithm may also live server-side. For example, if the user visited the web page from a search engine result page, the keywords could be used to create context sensitive snippets for the related content items. A snippet generation algorithm would more likely be run on the server than on the client.
The experiments we may wish to perform on the related-content could depend on any of the components used in the creation of the widget. For example, here is just a small sample of things we may wish to test for the impact on the click-through rate:
- the color of the recommended item titles (CSS changes),
- the number of items presented (parameters passed to the related content service), and
- keyword highlighting, use of keywords in ranking content selection, and custom snippet generation when the user arrives from search engine landing page (the ranking algorithm used by the related content service, the snippet generation algorithm, and possibly CSS changes).
To me, this means that a test framework will interface with many of a web system’s components. Here are some additional thoughts about the attributes I’d like a test management framework have. It should
- handle data flow between components,
- track metrics such as click-through-rates and response time of components,
- handle multiple parallel tests along with component dependencies,
- be able to handle various user types (such as split by session, long-term cookie, or logged-in user),
- be easy to register new tests,
- handle deployment of code and files to servers,
- verify to some degree the integrity of new code before returning results of the code to user requests,
- be cloud-aware, and
- be able to do all of this without site downtime.
I should acknowledge that there are some tools out there for managing split tests. However, I believe (possibly in error) that they only scratch at the surface of the requirements I’ve listed above. Also, it may be unfair for me to talk of this as test framework, because it’s really more of a web framework which has adequate support for testing.
I know I’m asking for quite a lot, but I believe these attributes are important for the creation of a culture of experimentation. Over the next few weeks I hope to go into more detail about why these attributes are important and share some initial thoughts on how a framework may support these goals. Since much of these ideas are still formative, I’d love to hear your own thoughts as well.
Trackback/Pingback (1)
[...] Information Retrieval on the Live Web by Paul Ogilvie < Engineering for experiments [...]