Wednesday, April 22, 2009

The Proper Care and Feeding of VolState.edu

Most literature on user centered design (UCD) focuses on redesigning existing products or creating wholly new products. No literature I’ve found focuses on applying these ideas to established, mature products in the maintenance stage of their lifecycles. However, with an eye towards abstraction, I think we should be able to take the high-level concepts and apply them to any stage of a product’s lifecycle. My goal with this document is to establish some guidelines for making iterative improvements to VolState.edu using my current understanding of UCD practices, without having to shift into a full redesign.

There is a split among experts in the field of usability. Some strive for true science. Tests use many participants, drawn from the target audience, with tight statistical fit. Conditions are controlled and kept close to real-world environments. Participants use high fidelity prototypes that are functionally complete, or very nearly so. The data such tests produce is significant and reliable. But these tests also cost into the tens or even hundreds of thousands of dollars. Attempts have been made to codify formal systems for web site development, but these systems do little to address the practical concerns of time and money. (De Troyer & Leune, 1998)

In 1989, Jacob Nielsen introduced the concept of discount usability in a paper titled “Usability engineering at a discount”. The gist of the argument, to borrow a phrase from Steve Krug is “Testing with 1 user is 100% better than testing none.” Many people, Nielsen included, have tried to show statistically where the sweet spot for return on investment lies in conducting usability studies. Many other people, such as Jared Spool, have counter argued against such reasoning. Each side tends to rely on math supplemented with anecdotal evidence. This is all well and good as a distraction for academics. Personally, I’m much more interested in producing a better product for our users than in the math and science underlying the methodology.

I’m following Krug’s ideas most closely. He goes even beyond Nielsen’s discount usability into what he calls “lost our lease, going out of business sale usability testing”. Such testing can have value without the need for strict scientific validity. He’s even gone so far as to call himself a usability testing evangelist, actively encouraging people to integrate it into their workflow. In his book, Don’t Make Me Think, he addresses the top 5 excuses for not performing usability testing.

  1. No time – Keep testing as small a deal as possible and you can fit it into one morning a month. The data produced will actually save time by pointing out obvious flaws in a way that sidesteps most possible internal politics and by catching these problems as early as possible, when they are most easily fixed.
  2. No money – Testing can be performed for as little as $50-$100 per session. While that’s not free, it’s far from the $50,000+ price tag of big ticket professional testing. In terms of ROI, it’s one of the wisest investments you can make.
  3. No expertise – Experts will be better at it, but virtually any human being of reasonable intelligence and patience can conduct basic testing that will produce valuable results.
  4. No usability lab – You don’t need one. Since Krug first wrote his book, things have actually gotten even easier with software such as Silverback.
  5. Can’t interpret results – You don’t have to use SPSS to mine for valuable data. The biggest problems will be the most obvious. Even if you only identify and fix the most obvious 3 problems, that’s 3 fewer problems in the final product.

Obviously if my goal were to get published, even as a case study, I would worry a bit more about scientific validity than Krug stresses. Although even among the academic literature some examples of successful case studies can be found where no attempt was made in recruiting beyond very broad user categories. (Bordac & Rainwater, 2008) But my goal in this document is to create a framework for usability testing that is simple and affordable enough to be actionable at Vol State. Ideally, it should be straightforward enough to persist beyond my tenure in this position. For those goals, Krug seems a perfect fit.

Involving Users Through the Development Process

Step 1: Identify a Problem or Unmet Need

Occasionally, users may volunteer information related to areas where improvements are possible or identify desired features not currently available. Anything we can do to encourage and facilitate this can only help.

Direct observation may also provide vital insight. Students may be observed unobtrusively in computer labs or in the library. Employees could be observed, with their permission, within their everyday work environment. Surveys and focus groups may also assist in identifying unmet needs at a high level of abstraction.

Problems exist within the context of users and goals. A proper definition of a problem should identify these contexts. Some possible examples:

  • Students wish they could perform a task online that currently requires a visit to an office on campus
  • Parents want easier access to a certain type of data
  • Faculty need a way to share educational resources
  • Sports fans need a better way of viewing or sorting our events listings

Understanding these contexts is vital to the success of a possible solution. If parents express a desire for easier access to student records, that may very well violate FERPA, in which case that problem is not actionable. Faculty looking for ways to share educational resources may be best served by training on Delicious. Even though we won’t stress a strict statistical fit with our test participants, understanding user groups in broad categories such as this allows for quick and easy targeting which should add value to the data collected via future testing.

Step 2: Establish Benchmark (Where Applicable)

Persistent, longitudinal data collected via survey could also apply at this step. Google Analytics is another powerful tool we have at our disposal. The type of benchmarking would depend greatly on the type of problem. For example, problems that require adding new pages to the site would have no data available in Google Analytics against which to establish benchmarking. But if those pages address an unmet need revealed via surveys that fail to turn up as an unmet need in a future survey, that could serve as a benchmark. (Comeaux, 2008) While I hope to incorporate this step as often as possible, I don’t want it to serve as justification for abandoning a project due to a lack of a method of establishing a proper benchmark.

Step 3: Wireframe Possible Solutions

Brainstorm for possible solutions to the problem. Produce low-fi wireframes such as semi-functional HTML prototypes or paper prototypes. Research seems to indicate that testing multiple prototypes produces more useful data (Tohidi et al, 2006). Test 3 or 4 users loosely recruited from the broad user category identified in step 1. Aim for 3 testable prototypes, but again don’t let this guideline derail an otherwise healthy project.

Step 4: Develop Solution Based on Wireframe Testing

Perhaps a clear winner will emerge from step 3, or perhaps the best possible solution will incorporate elements from multiple prototypes. Entering into the development phase with even a small dose of user involvement cuts down on costs and ensures that we “get the right design.” (Tohidi et al, 2006)

Step 5: Think-Aloud Usability Testing with Prototype

Construct a task list informed by the context and goal(s) identified in step 1. Recruit loosely from the broad user group as in step 3. Spend a morning walking 3 or 4 users through the task list. Record sessions using Silverback. Take quick notes between each session, recording the biggest issues while they are fresh in memory. Produce a single page list of issues, not a full report. Writing formal reports consume a lot of time better invested in fixing problems or further testing. Review session recordings with stakeholders as appropriate.

Step 6: Tweak Based on Testing

Fix the biggest 3 issues revealed in step 5. Avoid the temptation to go for the low hanging fruit until the truly catastrophic issues have been resolved. If something can be fixed with 15 minutes of work, by all means do so. But don’t count that as one of the big 3 unless it is also a major problem. After the big 3 and the easy fixes are addressed, act on any remaining issues as time permits.

Step 7: Launch & Evaluate

Put the solution into production. If benchmarks were established, collect data for a length of time allowing for a reasonable comparison up to a maximum of 1 year. If the data does not indicate improvement (and the criteria for judging this will be highly subjective and depend a great deal on context) consider returning to step 1 with an eye towards testing any assumptions in the original problem description.

Don’t Sweat the Details

Use the tools at our disposal to the best of their ability. Maybe a focus group would be the perfect tool for a given job. But we can pull that off because of finances or because it’s summer and campus is like a ghost town. Move on to a less than perfect solution, or plan to readdress that particular issue when the timing is right. But if we postpone one project, we should find something to keep us busy in the meantime. With a site as large and complex as VolState.edu, we should always be able to find something to improve.

The Tools of the Trade

In this section I will list some of the tools available to us. While I’ve made an honest effort to exhaust my knowledge on the subject, I make no claims to knowing every possible method. We should actively look for new tools and add them to this list as we become aware of them.

Card Sort Studies

These studies are often used for organization and naming conventions. There are 2 varieties. (a) Closed card sort studies involve asking participants to sort pre-determined items into pre-determined categories. (b) Open card sort studies involve asking participants to sort pre-determined items into groups which can then be labeled any way the participant chooses. Such studies are cheap and easy to conduct and provide valuable information within the limited context of language choices and organization schemes. Such problems are commonly reported in the literature involving case studies of university libraries. (Bordac & Rainwater, 2008; Proffitt, 2007; Shafikova, 2005)

Surveys

In the past we’ve had great luck with SurveyMonkey.com. Care should be taken in wording questions in such a way as to avoid biasing the answers. Surveys, particularly those delivered online, allow for a great number of participants from a wide range of our target audience. Smaller, more targeted surveys can also be conducted in person or on paper depending on the need. We also have access to data collected through marketing surveys conducted in 2003, 2008, and hopefully 2009.

Observation

Don’t underestimate the power of simply watching people using a system under normal conditions. Observation alone will often only produce hints or assumptions about possible problem areas. But that can be a great starting point from which to branch out into interviews or surveys. In the case of formal observations, such as with employees, a quick follow up interview can probe for more details immediately. (Pekkola et al, 2006)

Focus Groups

While there are dangers associated with relying too heavily on preference and opinion based data such as that produced by focus groups, they can provide data useful to our purposes. For example, it could be a great way to test assumptions arrived at through observing students using the website at the library. We’ve observed students doing X, which we assume is a way of working around problem Y. We could conduct a focus group to discover if Y really is a problem at all, or if students are engaging in behavior X for reasons we didn’t think of. Focus groups are most effective when drawn from within a single user category. For example, we probably would not want to mix students and faculty in the same focus group, nor faculty and administrators for that matter. (Kensing & Blomberg, 1998)

Interviews

Interviews could be paired with observations, helping to clarify and test assumptions much like a focus group. Or they could be used to help us construct personas. They would be most useful early on to help explain the context within which a given problem may be occurring.

Prototypes

Paper mock ups for a possible user interface can be produced quickly and cheaply. In a team environment, building such prototypes could even become a collaborative, iterative process. Some teams have reported success even with complex interactive paper prototypes. (Tohidi et al, 2006; see also i ♥ wireframes) Testing with such low-fi prototypes can lead to a greater willingness for participants to offer criticism since it’s obviously a work in progress.

HTML prototypes can range from semi-functional to fully functional. I’m personally much better at HTML wireframing than illustrating, so in the absence of outside assistance I will probably rely on HTML prototypes for both low-fi and high-fi testing.

Personas

Using demographic data, marketing research, and perhaps focus groups/interviews to construct an idealized but fictional user can help make user needs more personal and real (ironically enough) to a development team. Knowing that we have students with family and work obligations is important. But building a persona around that particular cluster of user attributes can make it easier to relate to and also give the development team a shorthand for addressing those issues.

For example, we could create a persona named Barbara who is a single mother of 2, working full time graveyard shift as a manager at a 24 hour restaurant, attending Vol State for 3-6 hours per semester working on a degree in Hospitality Management. She tries as often as possible to take classes on Tuesday/Thursday to minimize the burden of day time child care. She’s not technologically proficient, but she’s willing to learn. If given the choice between driving 45 minutes to campus or performing a task online, she prefers the online option in spite of her technological limitations.

That’s a very basic persona, but it still gives us enough detail to say “But what about Barbara?” rather than “But what about students who may have family or work obligations that prevent them from easily making an on campus visit on Mondays and Wednesdays?” Obviously personas could have applications outside of the realm of website development. Barbara could also provide insight into event planning, for example.

Think-Aloud Usability Study

This is the crown jewel in my usability toolbox. In a nutshell, it’s as simple as sitting a real human being in front of your product and watching them try to make sense of and use it. Ideally, you should have some specific tasks to ask them to perform and keep them talking about what they are looking at, what they think they should do next, what they expect to happen when they do it, their reaction to what actually happens when they do it, etc.

Full descriptions of the methodology can be found in lots of different places. I personally recommend Krug’s book. He’s even made the final 3 chapters of the first edition available on his website as a free download. Those chapters contain the bulk of the info on this sort of testing. There are other books, academic articles, and even blogs available, so pick your poison. Jakob Nielsen and Jared Spool have both written prolifically across many distribution channels.

Heuristic Study/Expert Review

My list would not be complete were I to leave this off, but personally I find this the most suspect. The general idea is you have an expert go over the interface and critique it based on a list of guidelines. This idea originated with Nielsen too and his list of guidelines can be found on his website. (Nielsen, 2005)

The highest value should be placed on methods that involve observations of real people interacting with some form of the interface in question. Preference and opinion based methods, such as surveys, focus groups, and interviews, can be quite effective for collecting marketing type data. But usability data relies too heavily on context for these methods to work alone. (Krug, 2008) However, any methods requiring direct human contact in the context of a college campus present a particular problem.

Sweat Some Details, but Document Them

One detail we have to sweat is the Institutional Review Board. All those highly valued methods that involve human interaction will require IRB approval. Even with approval, there’s no guarantee that a particular methodology will work. For example, I recently put together a proposal for a card sort study that failed to successfully recruit any participants. The good news is I’ve learned one method that does not work on this campus, so we can avoid any recruitment schemes that rely solely on face to face recruitment with no offers of compensation in the future. Eventually we will discover methods that both appease the IRB and meet with success out in the field. When that happens, we should save the IRB proposal, the information letter, the release forms, and all other deliverables. The next time the need for such a test comes up, we can build on the previous success. Many researchers, Krug among them, provide examples of their own materials that we can also integrate into our workflow.

Recommendations

  • Develop an easy to use feedback mechanism into VolState.edu such as a simple for to “Report a problem with this page” or an Amazon style “Was this page helpful to you?” tool. Possibly both.
  • Develop a standardized survey to be delivered via SurveyMonkey.com on an annual basis. Collecting standardized, longitudinal data will allow for some forms of benchmarking. Each iteration of the survey can include a handful of questions of a more timely nature in addition to the standard set. For example, a question about a specific change made in the past year.
  • Continue to develop our use of Google Analytics for statistical and possible benchmarking data. However, do so with the understanding that Jared Spool has called analytics “voodoo data” because they lack context. (Spool, 2009)
  • Integrate user testing to provide the context currently missing from our toolset as laid out in the 7 step model above.

Bibliography

  • Bordac, S, & Rainwater, J (2008). User-Centered Design in Practice: The Brown University Experience. Journal of Web Librarianship.
  • Comeaux, D.J. (2008). Usability Studies and User-Centered Design in Digital Libraries. Journal of Web Librarianship.
  • De Troyer, O, & Leune, C (1998). WSDM: a user centered design method for Web sites. Computer Networks and ISDN Systems.
  • Kensing, F, & Blomberg, J (1998). Participatory Design: Issues and Concerns. Computer Supported Cooperative Work.
  • Krug, S (2005). Don't Make Me Think: A Common Sense Approach to Web Usability, 2nd Edition. New Riders.
  • Krug, S (2008). Steve Krug on the least you can do about usability. Retrieved April 22, 2009, from blip.tv Web site: http://blip.tv/file/1557737/
  • Nielsen, J (2005). Heurisitics for User Interface Design . Retrieved April 22, 2009, from useit.com: usable information technology Web site: http://www.useit.com/papers/heuristic/heuristic_list.html
  • Pekkola, S, Kaarilahti, N, & Pohjola, P (2006). Towards formalised end-user participation in information systems development process: bridging the gap between participatory design and ISD methodologies. Proceedings of the ninth conference on Participatory design: Expanding boundaries in design. 1, 21-30.
  • Proffitt, M (2007). How and Why of User Studies: RLG's RedLightGreen as a Case Study. Journal of Archival Organization. 4
  • Spool, J Journey to the Center of Design. Retrieved April 22, 2009, from YouTube Web site: http://www.youtube.com/watch?v=WCLGnMdBeW8
  • Tohidi, M, Buxton, W, Baecker, R, & Sellen, A (2006). Getting the right design and the design right. Proceedings of the SIGCHI conference on Human Factors in computing systems

No comments: