Introducing Data-aware Design to Paperless Post
If you prefer, you can read this on Medium, where it got some attention
In brief
At Paperless Post I had the chance to work with a team dedicated to improving our site through research and experimentation. In the process, I became more fluent in the language and processes of quantitative analysis, particularly A/B testing. This knowledge has significantly improved my ability to work with organizations that care about measurably improving their services. I’ve written this in the hope of contributing to the conversation about how designers can better integrate data and experimentation into their process.
A Brief History
Like many designers, my primary methods of information collection up to this point were qualitative: heuristic and competitive analyses, interviews, and usability tests. Thanks to my team at Paperless, I have since expanded that repertoire and become more fluent in using data as a key part of my design process. I also owe this book and its authors, Designing with Data by Rochelle King, Elizabeth Churchill and Caitlin Tan, credit for giving me a foundation in this topic’s practice and terminology. I’ll be borrowing and referencing the book throughout this article.
At first I was skeptical of the process that our data and marketing departments outlined: endless A/B tests on relatively small tweaks seemed liked the most boring and ineffective way to improve our users’ experience. Thanks to King et al.,I was eventually able to understand and articulate my concern with data driven design, and contrast it with data informed and data aware design.
Being data driven is to stay focused on one narrow path, focusing on pure optimizations and efficiencies along the way. Performance improvements and testing shades of blue might fall into this category.
Being data informed means you are not necessarily on one narrow path, and that you are working with input beyond quantitative data, such as experience or instinct. A/B testing different experiences or structured usability tests may fall in this category.
Why this matters
While many businesses promote themselves as being data driven, most designers are driven by instinct, collaboration, and qualitative research methods. So where do we fit in?
I still think a designer’s greatest asset can be their intuition and willingness to move from theory to practice. Data does not tell us what the problem is, nor how to solve it. However, it can help us expose problems, provide more information about them, and evaluate the effectiveness of solutions. Designers can use this additional information to evaluate problems from more angles, and then use the results to sharpen those instincts on how to solve them.
I particularly loved this quote, from Designing with Data. It speaks to the value of data and the importance of its integration with human instinct and decision-making:
One of the things we discovered is that increasing the number of things that you measure or improving the fidelity of your measurements often actually doesn’t result in certainty. It doesn’t actually result in something that is crisp as this one is better than that. It just reveals a deeper complexity that there are actually more things involved. Then it really becomes a balance. We still have to have an intuition. We still have to make a judgment about what’s important and what isn’t.
Jon Wiley, Director of Immersive Design at Google
Bringing it back to the 3-circle diagram above, I think it is critical for designers to champion a data-aware approach to problem solving. The ability to be a productive member of a data-informed team will also help designers to become respected business partners within an organization. While perhaps we don’t all need to be (nor should we be) completely data driven, the ability to speak fluently across a range of data gathering methods will help us communicate in a way that everyone can listen and respond to.
A Framework
King et al. outline a framework for running experiments with a data-aware mindset, which I have found helpful in organizing my work. Remember that data here can mean any information point, qualitative or quantitative. It’s only the result that should truly be objective.
1 — The goal
Establishing goals is outside the scope of this article, but it need not be hyper-specific. For example, one of my team’s goals was to increase revenue without relying on new heavy feature-sets. Broad, but still a goal with real metrics and straightforward business value. Of course, any goal like this should be tempered with a baseline of ethics and honest design practices, but that too is a conversation for another day.
2 — The Problem/Opportunity Area
There are likely many areas where you could try to achieve your goal. But we need to start somewhere. What information do we have to narrow down our options? Do you have data on where users drop out of the funnel? An idea of how “power users” use the site? Tagged customer support tickets? Surveys? User Research?
Ideally, some combination of all of these, and more. There are countless resources out there dedicated to finding the right area to home in on, from more traditional sources like HBR and those more targeted at design professionals like this workshop outline from IDEO or these activities from the now ubiquitous Google Design Sprint Toolkit.
At Paperless, my team had both qualitative (customer support tickets, anecdotal stories, personal instincts) and quantitative (high-volume surveys, conversion drop-off points) indicating that our pricing structure was a blocker for customers checking out. So we defined a problem/opportunity area around improving the user experience of our pricing mechanisms, with the goal of increasing revenue for the company.
3 — The hypothesis
With some broad agreement on the problem to be solved, it’s time to create a testable and verifiable solution — a hypothesis.
One of the most fundamental and challenging aspects to having a data-aware mindset is crafting a good hypothesis. We can’t say that “this interaction will be a better user experience” or “users will feel more confident with this layout”. These are not good hypotheses: they have no clear metric to be evaluated, and no clear path to a measurable result.
A hypothesis should clearly state and include:
- The segment of users to be evaluated [user group]
- The change we are making [change]
- What we believe the result will be [effect]
- Why we think that result will take place [rationale]
- And finally, the measurable result we expect to observe [measure]
King et al. suggests a format like the following:
But even a well-structured hypothesis shouldn’t be made in isolation. It should be a part of a larger strategy that takes into account:
- What we learn if we fail and how to apply it to future thinking
- What we learn if we succeed and how that impacts next steps
- How much work is it worth putting into this test
The last point is an important and complicated one. How big of problem are we trying to solve? The best possible checkout experience imaginable? A better version of the one we have now? Understanding the implications between those two approaches is important.
For example, my team knew that the complexity of our infrastructure meant in the short-term that we could not fundamentally alter our pricing models. Essentially, we were looking to alter and improve the existing system and not change the system itself. In the diagram below we can see a visualization of what the difference between these two options implies:
- Attempting to improve an existing system is to design against
- Looking for a better system is to design against global maxima
local maxima
We should be careful not to exclusively pursue local maxima, but most importantly the team must acknowledge where we are right now, and what is the best use of our time.
Choosing a hypothesis
So what information do we use to choose a hypothesis? In my experience the best ideas come from taking information from multiple sources. Perhaps it’s a marketing survey, an internal workshop, a high-volume survey, customer support requests, results from previous A/B tests, or all of the above. King et al., refer to this as data triangulation, or the the structured utilization of different information sources to help form new hypotheses and approaches.
I can’t help but be reminded of the Simpsons food chain pyramid. Simply replace the person in the middle with “ideas” and the animal kingdom as “various data points”.
4 — The experiment
Experiments, or tests, are both opportunities to validate ideas, as well as gain more information for future design decisions. Sometimes you may even decide the point of the test is exclusively to gain information. These more exploratory tests aren’t necessarily built to be launched and rolled out to all users, but instead to gain more insight about a particular type of behavior. An evaluative test is more about validating a specific solution, likely with the intent of rolling out a version to all users.
Either way, it’s important we create an experiment where:
- We’ve controlled for variables as much as possible, making the difference between your variations clear. The more muddy your variations are, the more muddy the results will be.
- The options clearly represent the hypothesis. This is easier said than done, but worth noting that a good idea with poor execution may yield similar results as a bad idea. Internal and external usability tests have helped me with this problem.
- We don’t sacrifice organizational requirements. This could mean different things for different organizations, but often includes brand integrity and accessibility.
- The test is meaningful and ethical. Testing a giant “click me” button is probably not meaningful. Testing a hidden surcharge is not ethical.
I’d recommend picking up Designing with Data for a basic summary of statistical best practices, but there are plenty of great resources out there. This article about StackOverflow’s AB testing process is also helpful as a general summary, and you can read this in-depth review of statistical significance from Analytics-Toolkit to dig even deeper. A lot of the information out there is fairly technical, but even the least math-minded designer should understand that a higher statistical significance requirement increases the potential validity of the results, but requires more rigorous collection. We can be more certain of the results, but we will likely have to record more interactions over a longer period of time.
Of course we want to be sure of our results, or else what’s the point? But if we don’t move fast enough to meaningfully improve the product, then what are we working towards? It’s all about working with your team and organization to understand how much risk and rigor to put into your testing. It’s up to each team to figure out what their baseline is.
Here’s an article that goes through an example of calculating sample sizes, and below is a screenshot from an online tool that can help you determine your own sample size and experiment with win-loss scenarios.
At Paperless, we started with a p-value of 0.05, meaning we believed there to be only a 5% chance of a false-positive. With that baseline we could then calculate how many user interactions we needed to record to get statistically significant results. If it looked like a test would need to run for more than 2–4 weeks to get the required numbers, we might reconsider that hypothesis in favor of one where we could results faster.
Some examples
I’ll run through some examples of how we planned, tested, and analyzed results at Paperless. To recap, we had already established the following:
- Our goal — Increase revenue from the site using our existing feature set
- Our problem/opportunity area — Our pricing model and how to better communicate it to our customers
Which leads us to the next step…
Hypothesis
Coins are the virtual currency Paperless uses. For example, an invitation may cost 1 coin/recipient, so for 25 recipients, you pay 25 coins. We then ask customers to purchase a coin package, such as 20/40/80/150 coins, with the value increasing in higher volume purchases. Confusing, I know.
A few teammates and I spent some time researching on our own and then held a few team-wide meetings to collect and collate results. We had a long list of ideas, but there were a few we felt had strong potential:
- Reduce the number of pricing tiers
- Provide more and better options to buy coins
- Replace the concept of “coins” with real dollars
- Explain earlier in the funnel how coins work
Remember, we’re chasing local maxima (improving the current system) here, not global ones (finding a better system). That ruled out some of the options above, but we still had plenty to work with. One idea we came up with involved the second point above, providing better options to purchase coins. We came up with a hypothesis like along these lines:
Here’s what it looked like:
Left: before / Right: after
Results
Well, the test was a “win”, in that the expected result (a higher Average Order Value in the test group) held true. While statistically significant, the relative increase was not dramatic: only a small number of people chose those more expensive packages. Surprisingly though, conversion from this step to final purchase increased at a much higher rate. So what does that mean? Our interpretation was that simply exposing users to options helped both inform them and convince them they were getting the best deal. This was meaningful information and results that we were able to obtain from a test that took about two days to code and launch.
This test is a great example of how we used an exploratory mindset to create a simple experiment, quantitative analysis to validate results both expected and unexpected, and then a qualitative analysis to interpret them. It also shows the value of an experiment outside beyond measuring a “win” or “loss.” A few potential follow-up tests we thought of after this test included:
- Improve the value increase between packages
- Promote or default to certain packages
- Explain more directly what you get with your coin purchase
Forming a strategy
I mentioned that before working on this team, my experience with A/B testing was somewhat limited. Well, I wasn’t the only person at Paperless who was new to rigorous test analysis and data collection. From the outside, some people saw a number of small, seemingly unrelated tests rolling out from our team. Some were put into full production, some not. While our team believed we were making progress, others saw a lack of vision and overly dogmatic approach to statistics, not an improvement to the user experience.
I stated earlier that the best hypotheses came with a strategy involving success and failures, noting what we learn along the way. I realized that while internally we had some agreement on this, we had no formal way to share it with others. Spreadsheets and statistical analyses did not effectively communicate our strategy to a wider audience My PM and I eventually drew out this diagram to help communicate that strategy, which finally helped bring the whole organization on board.
The labels on the left are broad themes, and the first column of cards represent Problem/Opportunities areas. The other cards represent hypotheses within those areas. Ideas from the different areas eventually overlap, which you can see on the right side of the diagram. The details are purposefully only two or so levels deep, since we are looking for information and ideas with each test that will inform the next. The final card on the right represents a preconceived concept of what we believed the best approach to be, but instead of jumping straight there, we made small steps, learning and adjusting along the way.
The reason for the two pronged approach in the diagram is that we wanted to more clearly communicate how our systems worked while showcasing the value of our product. We believed our beautifully designed products were worth it, and wanted customers to feel the same way. One area we did that is on the Card Detail view (often referred to as the PDP or Product Detail Page within many e-commerce organizations)
The big idea
So what was that big concept? Where the branches in the diagram coalesced?
Previous card detail view
New card detail view
As a team and an organization, we thought showing add-ons, allowing customization, and exposing pricing could all be valuable to customers. But instead of just throwing 3+ concepts into one big project, we decided to split them up and test into them. This way we could actually learn what works, what doesn’t, and ship solutions that actually worked for our organization and our customers.
One small idea
The existing view consisted of a thumbnail list of items, but we felt there could be a better way to engage our customers and show the value of all the design elements within a package. Our hypothesis was that:
And here’s what we came up with:
In a nutshell
While every project and team will end up working in the way that’s right for them, there are few steps that I believe would be helpful to anyone:
As per our hypothesis, Average Order Value did go up, though not spectacularly (a few relative percentage points). But, just like the first test, conversion increased significantly. Users who saw this new view were significantly more likely to enter our customization flow, and eventually checkout. We tentatively concluded that presenting the full package was a great hook to get people to explore our product, even if there were other factors informing customers decisions about what they would pay for.
This big shiny new dream design that we all agreed on never went live as originally conceived, but many ideas from it did. Its mere existence and presence in decks and documents allowed our team move forward, confident in both our vision and our statistics.
- Use data triangulation to find the right problem/opportunity areas and to narrow down hypotheses
- Write and document well-structured hypotheses that can be measured and validated
- Have a testing strategy that can be communicated broadly. It should go a few layers deep, but not too many, and it should be periodically updated.
- Have big ideas and maybe even some big designs that represent some of your team’s ideas. The point is not to split up and build that design, but to have a visual counterpoint to your testing strategy that can help you communicate with your peers and leadership. I also often find the act of designing itself can help create ideas as well.
- Test small, at least at first. The big idea probably has a lot hypotheses rolled in to it, so testing it could lead to hard-to-read results. Running smaller and more focused experiments can help you gain information and confidence in your ideas, which can then help your team gain the momentum and buy-in to try bigger tests.
Now what?
While obviously your mileage may vary, most designers probably won’t need to go too deep into the math of data analyses. But I feel confident saying that every designer would benefit from a basic understanding of the concepts and framework I’ve just reviewed. Doing so will help us sharpen our instincts and deliver work that truly benefits customers and the organizations trying to serve them. It will also help us speak the language of modern organizational leadership, and help us communicate in ways that will be taken seriously.
I am grateful for the exposure I’ve had to this process, and to the PMs, data scientists, and my math-major design intern for helping me wrap my head around some statistical concepts. I look forward to leveling up my knowledge and writing about it in the future.
Read the book, ask me questions, or let me know if I can improve any of the information above!