Episode 109 | March 18, 2024

How UX Research for ML is different from UX Research for design

In this Insights Unlocked episode, UserTesting’s Lawrence Williams talks with Dawn Procopio, founder and principal UX researcher at AI-Ethicist.com. They explore the critical role of UX researchers in ensuring that machine learning models are human-centered and ethically sound.

How UX Research for ML is different from UX Research for design

There are many examples of how AI and machine learning can be used to inform UX research and design, but how will UX research inform how AI and ML models are created and deployed to the public?

In this Insights Unlocked episode, UserTesting’s Lawrence Williams talks with Dawn Procopio, founder and principal UX researcher at AI-Ethicist.com. They explore the critical role of UX researchers in ensuring that machine learning models are human-centered and ethically sound. 

Dawn is a veteran researcher and has worked with Siemens, Meta, and Amazon Web Services among other companies. 

Ecological validity in machine learning

As a specialist at the intersection of UX research and machine learning, she emphasizes the importance of ecological validity, ensuring that the model's conclusions actually apply to the real-world problem it intends to solve.

“And they’re not just really fanciful displays of human intellect with regard to the performance of the model,” she said. 

She said the other way UX researchers can improve a model’s performance is by collaborating with machine learning professionals throughout the process. 

“And, to me, these are almost the same issue because they do converge at some point” Dawn said. “But there are five main ways I think UX researchers need to collaborate with machine learning professionals.”

Remote video URL

The PRIDE touchpoints for ensuring human-centered AI 

Dawn uses the acronym PRIDE as a framework for those five ways for collaboration:

  • Problem
  • Representation
  • Interpretability
  • Data leakage
  • Evaluation metrics

“Those five touchpoints in a conversation, or over many conversations, between a UX researcher and a machine learning professional can increase the ecological validity of the model and increase the performance of the model,” she said. “I just think that not a lot of researchers believe they can help these really talented engineers do their job better, but also that it's really almost unethical if they don't.”

Remote video URL

The Problem touchpoint for UX researchers and ML engineers

With regard to the problem, “UX researchers are always looking to make sure that the problem is centered on the user,” Dawn said. “And this is a little bit different [with machine learning models]. When a machine learning scientist talks about a problem type, they are actually talking about the solution.”

Researchers, she said, should flip it around. When a machine learning professional says they have a problem type, Dawn said, they are probably talking about one of three things: classification, regression or clustering problems. “That’s how the data should appear to the humans, not necessarily how the human thinks about the problem” she said. 

An example would be a movie recommendation algorithm. An ML engineer may design the model to deliver movie recommendations based on how a movie is related to another movie (clustering). But the user really wants recommendations based on a thumbs up or star rating (regression). That would require changing the entire model and become very expensive, Dawn said.

By better understanding what the user wants, needs or expects from a model will help inform how the model is built (sound familiar?).

Remote video URL

The Representation touchpoint for creating better ML models

Overall, the document is well-written and provides valuable information on the topic of machine learning for UX researchers.

Dawn said the representation touchpoint, for a machine learning professional, is about feature selection, feature modeling, and feature engineering.

“And so what the UX researcher should hear is variable creation and what variables matter and what inputs matter,” Dawn said.

Challenges include avoiding the curse of dimensionality where too many features (variables) make the model overly complex. So, Dawn said, it is important for UX researchers to provide input to the ML engineers on what features matter.  

“UX researchers often have these libraries where they can tag things,” Dawn said. “I love UserTesting for this, where they can make highlight reels, and those highlight reels have the hashtags that sync between the highlight reels. And then you can just look up a hashtag and see all the videos for that.”

“Over time, researchers can offer this data and say, ‘this hashtag/variable is a huge feature that I think has to be in your model or you need to engineer this variable,” she said. “And the way they'll do that is will take a lot of variables and mathematically make that variable for you with a larger sample size.”

Dawn said there are also some sticky dilemmas that you get into with representation.

For instance, with our movie recommendation model, a feature could be age and if the model thinks you’re 5-years old, you will get a large number of Disney recommendations but you may not see those if the model doesn’t think you’re five. 

“And that's where UX researchers really need to be at the table and say, ‘you know, my persona is really sensitive about this issue,’” Dawn said. 

Graphic promotion to run a free test on UserTesting

The Interpretability touchpoint for creating better ML models

The trade off for a machine learning scientist is interpretability versus model complexity. 

“They can make a really accurate model that's super great and magical, but then it becomes completely uninterpretable,” she said. And in certain scenarios, she said, that can cause user errors when a user can't understand what's being fed back to them.

The example she used is being at a dinner party and having someone you’ve never met come up to you and start sharing information about you and making pushy recommendations. It would be an awkward experience.

“And I don't think we want to start giving that up, where we just start blindly following whatever chatbots tell us,” Dawn said. “We need to have some rationale. They should be able to cite reliable sources. It is unbelievable that we have such a low bar for the machine learning process to back up what they think is an excellent performance.”

The Data leakage touchpoint for creating better ML models

Data leakage is not privacy, Dawn said.

“I know that sounds like privacy when people hear about it colloquially, but actually it's about when there are variables that shouldn't be in the training data. They're actually more about future information and they can sneak into the training data unbeknownst to the ML engineer because they don't know very much about the features.”

An example of this, Dawn said, is using patient IDs in an ML model. But a UX researcher needs to know how that patient ID was created. If it was created by seeing a specialist, say an oncologist, and if that ID looks different then the training data or the training period is looking overly optimistic because it has this silent information or this latent variable built within the patient ID.

“Avoiding that is really about knowing how the variables got created in the first place, knowing how every human interacted with that,” Dawn said. 

The Evaluation Metrics touchpoint for creating better ML models

Evaluation metrics are really important for UX researchers because what you want to know is whether you need to avoid false positives or avoid false negatives, Dawn said. 

For example, if you are detecting cancer then having a false positive, being told you have cancer when you don't, is not as bad as not being told that you have cancer when you do. In that scenario, you want to avoid false negatives and err on a false positive.

But that's different if you’re testing an interface for police officers to detect crime in an area. In that scenario, you want to avoid false positives and err on a false negative. 

There can be obvious cases, such as these two scenarios, but what happens in the middle where you’re not sure whether it should be a false positive or a false negative? 

“That's called a balanced data problem,” Dawn said. “And there are evaluation metrics for that. That's why it's important for you to collaborate. Your ML professional should know all this because they get trained on it. But they don't know whether the users want you to avoid false negatives or false positives, except in extreme cases such as medical diagnoses and crime detection.”

Episode links:

Stream On

Share

Get Started Now

Contact Sales