Rethink Longitudinal Qualitative Testing

Recently I learned to code games from scratch using the Unity engine. Whether it was idea proofing, design testing, or product testing, qualitative studies were present throughout my recent journey of game making. The interview results were so productive that they helped me discover existing errors and showed me possibilities for future updates. I am not alone in realizing that longitudinal qualitative testing can be highly effective even in simple setups. Many one-man to small game development studios utilize this research method to qualify their games. For example, Stardew Valley’s maker largely relied on online input from participants drawn from an online community to conduct testing to improve Stardew Valley(ThatGuyGlen, 2020). It led me to wonder why does indie game development uses repeated qualitative usability testing a lot? Can other industries replicate it?

The more obvious problems with longitudinal qualitative testing in the commercial world are cost, time, and complexity in test setups. Longitudinal qualitative testing is often associated with research that aims to discover grounding or in-depth knowledge. Often, these researches took a similar format of mixing large-scale interviews and surveys. Companies like P&G(my former employer) use these grounding research sparingly due to data saturation resulting from the often rigid design, which can be made flexible with proper planning. Data saturation is when no new theme is produced due to the repetitive nature of the research (Moser & Korstjens, 2018). For most qualitative research, the data saturation point is generally the stopping point if resource and time allow (Price et al., 2014). As a result, longitudinal qualitative research generally is not considered unless the following are expected to occur: new suspected behavior change, drastic content change, new target population, new research focus, or adjustments needed for prior theories (errors or new information).

So why do gaming user and quality research repeat frequent qualitative testing? One of the reasons is that the gaming industry often uses player testing on new features, which has many structural changes between each research phase. The gaming industry is very prone to damaging content leaks seen from frequent legal actions against suspected leaking sources(Yang, 2020). Testing more frequently, no matter what kind, will inevitably result in a content leak. Thus, many changes to the game's content are made between each of the testing phases to try to reduce the need for additional studies. Another reason for indie game studios to use ongoing qualitative research is that each testing phase generally has a different research focus due to gaming products' development cycle. For example, the famous partner testing company of Ubisoft and Nexon, Center Code, emphasizes testing of the technical aspect of the game in the alpha stage while the beta testing stage is used to study customer satisfaction (Freiler, 2019). I take issues with Center Code’s approach. As May(2012) has said from his experience in launching MiniDate that one should test their ideas, models, UX, UI, QA as often as possible. However, Center Code’s recommendations do illustrate that focusing on different aspects, or even measurements in research will result in multiple rounds of ongoing qualitative testings.

Can other industries apply this method effectively? How can researchers design studies to allow flexibility in the research process or goals? What are some alternative methods in this category of longitudinal qualitative research? Below I will list a few examples where longitudinal qualitative studies did not incur high costs, required little in the test setup, and allowed flexibility in the study. One thing to note is that even though these are stories where insights gained success or attractions by commercial companies, by no means one should apply these methods directly. Instead, careful surveying of the situations where these longitudinal researches were applied gives clues to whether it would be a good fit for your organization or not.

Diary studies tracked over time is one of the ways to implement qualitative over time. By having users writing down their life events, thoughts, and actions, researchers can learn the changes or paths in the users’ actions, motivations, and attitudes in natural settings. Lichtner et al.(2009) were able to use online diary studies conducted over four weeks to improve the design of a “work-integrated learning system” for colleagues from four different locations in Europe. In this example, the diary study is the primary qualitative research that provides bulk(47%) of the qualitative feedback while giving a knowledge base for later research(Lichtner et al.,2009). These later researches consisted of interviews, observations, focus groups, and exit questionnaires. Additionally, these supplemental research can provide more contexts to the users’ thoughts, actions, and the environment, as seen in Lichtner et al’s study. As a result, Lichtner et al.(2009) were able to track current behaviors, usability, and perception of information over time. As noted above, the researchers need to supplement the diary research with other formats due to diary research’s tendency to lack on reflection of the impact of certain problems in the broader context, such as how specific usability problem causes the wholistic system user experience to change over time (Lichtner et al., 2009). One of the solutions to this shortcoming is to extend the research period, as Lichtner et al. (2009) indicated, to allow participants more time to think in the bigger picture, giving some room for introspection. In this example, qualitative research of a similar setup, diary studies, repeated over time increases the number of data points researchers can collect. Repeating this studies over a longer period of time can also help to track behavioral changes over time as usability improvements take place. When the current and past behaviors are tracked, the researcher can find some clues to try and predict the users’ future behaviors.

Ethnographic type longitudinal qualitative research is highly effective in yielding contextual data that often concerns an entire industry. Commercial companies do not usually use ethnography in its original form due to the complexity in test setup, high time investment, and prevalent belief of time invalidates research results. One of the modern forms of ethnographic study, netnography, may offer a solution to companies that otherwise wouldn’t consider ethnographic research. Netnography gives commercial organizations new possibilities to conduct contextual research and understand consumer trends with much improved timelines. Unlike traditional ethnographic research, netnography research takes less in terms of physical and time investment to conduct the research. Because of internet technologies' ability to backlog, record, and share, netnography can cover more recorded time in the same amount of research time. With internet technologies, the researcher can access thousands of contents from the past and present with just a few clicks of buttons. Kozinets(2015) remarked that his netnography research that he could cover the same amount of recorded time with less research time, thus, shortening the total research time needed. For example, Kozinets (2002), in a previous research, revealed the impact of the culturalization and the socialization over the internet heavily contributed to the consumption of “upscale coffee” through accessing recorded quotes and participant provided information as far as two years back. The purposive sampling and content analysis done during Kozinets’ netnography research allowed the result validity to be on par with any other ethnographic research. The paper was able to highlight that coffee to enthusiast is not a product of commodification of labor ((Firat and Dhalokia, 1998). Rather, coffee drinking is a metaphor for life and religious experience (Kozinets, 2002). This understanding is crucial for the entire coffee industry, who is not necessarily targeting niche consumers. As von Hippel(1998) has said, the future behavior of consumers can be deduced from the thoughts and beliefs of extreme consumers. From the results conducted amongst the enthusiasts, we can understand those coffee companies should look to shifting their branding and product offerings to provide consumers with cultural meaning and religious experiences in the near future. Kozinets’ netnography research is very important for companies that want to lead in future markets or seek to futureproof themselves. Due to the forward-looking scope of the research design and of the result, time will not easily jeopardize the outcomes of Kozinets’ study. Choosing extreme consumers helped to extend the “expiration date” of Kozinets’ research. This goes to show that well-designed netnography studies can be relatively time-proof.

Both pieces of research above produce the bulk of their insights after the studies have ended. Diary research produces some in-progress data points but is limited to usability feedback such as problem executing annotations (Lichtner et al.2009). Is there any longitudinal qualitative research that produces results while the process is ongoing? In a commercial research environment, researchers are often asked to respect the business timeline and construct insights as needed. Additionally, researchers are more often encouraged to produce insights regularly to help business progress, so what kind of study set up helps with such requests? To address these problems, the Interaction Design Foundation(2018) recommended scheduling regular qualitative user testing, which allows the production of periodic feedback. Researchers can vary the objectives between each period via altering tasks for the participants. This way, if the researchers think the current set up will cause data saturation, they can choose to change the goal for a specific period while maintaining the same setup in a set interval. For example, researchers can set to repeat the core tasks to repeat every quarter of the year. For the rest of the eight tests of that year, everyone is free to set up other types of tasks. This way, researchers can get results similar to a typical longitudinal qualitative test, produce different insights over time, and avoid data over-saturation. One of the finest examples is Meetup using their “minimally viable process” to conduct qualitative testing every single week (Gothelf, 2013). The testing scripts are spontaneous. Depending on how the participants initially interact with the system, the questions will be adjusted accordingly (Gothelf, 2013). Meetup eventually built an online community, which comprised most of the qualitative research participants (another purposive and availability sampling) with designated personnel managing it (Gothelf,2013). This research design was highly cost-efficient with high insights production speed. Meetup was able to conduct over 600 sessions in a year, with only 30,000 US dollars spent (Gothelf, 2013), a certain win on both speed and budgets. If I were to improve Meetup’s already successful model, I would complement the qualitative researches with periodic quantitative tests. Due to the number of participants enrolled in Meetup’s research, it is hard to confirm the hypothesis formed during the interviews. Thus, researchers may not be able to understand with ease what the logical next steps are. A quantitative survey can confirm or deny the hypothesis formed during the initial rounds of qualitative research due to the high number of participants. Confirming or denying the hypothesis will not only guide the direction of the study but can also serve to discover if a potential solution to the theory works or not. All of these insights can be produced while the research is ongoing, making this kind of longitudinal test suitable for companies like Meetup who rely on user feedback to improve.

Another advantage of these longitudinal tests with systematic data collection, such as the one Meetup ran, is flexibility and adaptability. As mentioned above, researchers will learn about their users and better understand their problems when more data is collected. As Burt(2019) noted in his article, “you don’t start with a hypothesis, you start with an open mind.” Researchers who are conducting longitudinal qualitative research may not have the “right ideas” initially. When we are conducting periodically scheduled tests, the researcher will find that the initial test setup, script, questions, or goals does not allow participants to convey the “root cause” of the problem adequately. Longitudinal tests with systematic data collection can adapt based on the knowledge collected prior. However, Researchers should be aware that changes to test setups cause a break in data trends. A Break in data trends just means the data collected with the new test setups may not be directly comparable to the results from the old setups. One way to reduce this problem is to set up an A/B test that includes the old and the new designs to see how two groups of participants respond. From the new answers gathered, researchers can understand if changing to the new design is genuinely worthy of breaking the data trends. Researchers should do this A/B test before moving on to a different setup.

One of the last points I will briefly touch on here is the factors that organizations should consider when attempting longitudinal qualitative research. Longitudinal Qualitative research can cover different degrees of depth. Comparing the studies conducted by Meetups in Gothelf (2013)’s article and Kozinet’s netnography, we can see that Kozinet’s netnography’s conclusions are a cumulative result of the study. In comparison, Meetup’s testings allow both accumulations of results and drawing conclusions from a single session. From the debrief standpoint, the difference between the two studies is the separation or accumulation of results between each data collection session. If validity still holds, the accumulation of results can help researchers understand the problem in a fuller picture. In comparison, the separation of results of the sessions can allow for more timely feedback. This separation will result in fragmentation of the data, which can lead to less in-depth insights, data omission, or missed priorities. Depending on the organization's needs and the study's goals, researchers should think through how the systematically collected responses accumulate, or not, onto each other and how. All insights can accumulate to form a “body of evidence,” although researchers should exercise keen judgment if the insights are still valid when compiling the evidence.

The second point to consider when thinking of a longitudinal qualitative test is staffing and capacity. Tim Ballew(2013) 8-bit Bear emphasized that if the UX consultant is limited in capacity, using agile methods, or has short turnaround time, the consultant is likely to manage only small portions of the qualitative test, or conduct less in-depth qualitative tests. This reduced capacity also meant that the studies might “remove the users from the testings by using heuristics.”(Ballew, 2013) In this case, research may be limited to produce usability results only because of the organizational capacity and staffing constraints. Chris Sader(2013) of PRO further elaborated on this point. His company hired additional UX personnel, which allowed depth and specialization. This increase in staffing and capacity allowed more strategic research to be conducted (Sader, 2013).

The third point is expectations and team alignment. Tons of publications have written about this, so I will just briefly touch on some of the points. Setting realistic expectations of the goal, process, and results with both the stakeholders and the participants are equally important. Sova and Nielsen(2003, 52) in Norman Niselsen Group’s recruiting guide specified that participants should know what to expect when being interviewed. They should know the location, how many people will be there, will they be videotaped, and other necessary alignment information (Nova & Nielsen, 2003). Nova and Nielsen(2003, 52) recommended that all of the above details be included in the recruiting document rather than provide them after recruiting. On the stakeholder front, Norman Nielsen Group’s instructions on conducting usability tests listed getting team alignment from the beginning as an essential process to help reduce insights adoption resistance (Farrel, 2017). One of the ways to go about this is to align stakeholders with your research plan/proposal at different stages of the project with varying objectives of alignment in mind. Margie Matero Villanueva(2018) of Indeed and, formerly, IBM said she roughly divided her research into three stages: foundational research, discovery research, and evaluative research. With each stage comes a realignment with the stakeholders to allow timely feedback and flexibility. As shown in Lichtner et al.’s diary studies, Kozinet’s netnographies, and Meetup’s research, longitudinal qualitative research can occur at any stage of Villanueva’s model. Thus, it is even more critical that UX teams and stakeholders can align periodically to ensure the productivity of chosen longitudinal qualitative tests.

So as you see, it’s not just indie gaming that can use longitudinal qualitative research. Lichtner et al.’s tests show that forms of longitudinal qualitative testing have the advantage of allowing researchers to learn the changes or paths in users’ actions, motivations, and attitudes in natural settings. Kozinet’s studies show that longitudinal qualitative research doesn’t necessarily require a complicated setup or lab environment; a computer will do. Meetup’s example has shown that longitudinal qualitative research doesn’t have to be costly or slow. Systematic data collection and setting individual goals for each periodic study can enable timely insight productions. The key here is to set up the research in ways that allow both flexibility and adaptability. Along with adequate expectations and goals, longitudinal tests can become your secret insight generation machine. Next time you need to dig into your consumers' minds and hearts, give a well-thought longitudinal qualitative research a try.

References:

Albine Moser & Irene Korstjens (2018) Series: Practical guidance to qualitative research. Part 3: Sampling, data collection and analysis, European Journal of General Practice, 24:1, 9-18, DOI: 10.1080/13814788.2017.1375091
Burt, AJ. (2019). The importance of ethnographic research in product design. UX Collective. https://uxdesign.cc/the-importance-of-ethnographic-research-in-product-design-cb08319051c0.
Chant A. Chiang, Rajiv S. Jhangiani, and Paul C. Price (2014).Qualitative Research Chapter 7: Nonexperimental Research. Research Methods in Psychology. BCcampus.
Farrel. S. (2017). From Research Goals to Usability-Testing Scenarios: A 7-Step Method. https://www.nngroup.com/articles/ux-research-goals-to-scenarios/
Firat, A. Fuat and Nikolesh Dholakia (1998), Consuming People:From Political Economy to Theaters of Consumption. London: Routledge.
Freiler, L.(2019). Alpha vs. Beta Testing. Center Code. https://www.centercode.com/blog/2011/01/alpha-vs-beta-testing
Gothelf, J. (2013). Lean UX: Applying lean principles to improve user experience. " O'Reilly Media, Inc.". 73-80.
Interaction Design Foundation (2018). 5 Ideas to Help Bring Lean #UX into Your Research. https://www.interaction-design.org/literature/article/5-ideas-to-help-bring-lean-ux-into-your-research
Kozinets, R. V. (2002). The Field behind the Screen: Using Netnography for Marketing Research in Online Communities. Journal of Marketing Research, 39(1), 61–72. https://doi.org/10.1509/jmkr.39.1.61.18935
Kozinets, R. V. (2015). Netnography. The international encyclopedia of digital communication and society, 1-8.
Lichtner, V., Kounkou, A. P., Dotan, A., Kooken, J. P., & Maiden, N. A. (2009). An online forum as a user diary for remote workplace evaluation of a work-integrated learning system. In CHI'09 Extended Abstracts on Human Factors in Computing Systems (pp. 2955-2970).
May, B. (2012) Applying Lean Startup: An Experience Report -- Lean & Lean UX by a UX Veteran: Lessons Learned in Creating & Launching a Complex Consumer App. 2012 Agile Conference, Dallas, TX, 141-147, doi: 10.1109/Agile.2012.18.
Meingast, M., Ballew, T., Edwards, R., Nordquist, E., Sader, C., & Smith, D. (2013). Agile and UX: The Road to Integration The Challenges of the UX Practitioner in an Agile Environment. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 57(1), 1002–1006. https://doi.org/10.1177/1541931213571224
Nova, H.D. & Nielsen, Jakob(2003). 234 Tips and Tricks for Recruiting Users as Participants in Usability Studies. https://media.nngroup.com/media/reports/free/How_To_Recruit_Participants_for_Usability_Studies.pdf. 52.
ThatGuyGlen. (2020) How Stardew Valley Was Made by Only One Person. https://www.youtube.com/watch?v=4-k6j9g5Hzk&t=553s
Villanueva, M.M. (2018). How to write a research plan that facilitates team alignment. https://uxdesign.cc/how-to-write-a-research-plan-that-facilitates-team-alignment-2f75396884ca
von Hippel, E. (1988). The Sources of Innovation. New York: Oxford University Press.
Yang, G. (2020). What to do when your video game leaks. Gamesindustry.biz. https://www.gamesindustry.biz/articles/2020-02-25-what-to-do-when-your-video-game-leaks