Human Insight in the AI-Driven Product Development Era

In this guide

Human insight for the AI-driven product development process

30 min read

Executive Summary

“What is AI doing to the product development life cycle?”

One of the most intense discussions in the tech industry today is about the way AI changes product development: the opportunities it creates, the challenges that need to be overcome, and what that means for people in product-related jobs. Most of the discussion focuses on the mechanics of product development, but neglects what it means for users and customer experience. This report attempts to fill that gap, and explains how human insight can improve and accelerate the AI-driven product development process. We’ll summarize the issues, separate facts from opinions, and give practical advice on what to do next.

Here’s what you most need to know:

AI is making software development far faster and cheaper. Engineering time is less and less the severe bottleneck it used to be. (You already know this, but it’s important to emphasize because it drives everything else.)
Because we’re moving so fast and empowering more people to build, it’s very important to make human insight available on demand to everyone. Needs discovery must be continuous, and fast feedback needs to be built into tools and processes.
As product development accelerates, the rest of the product process also needs to speed up, including discovery, launch, marketing, and sales. This isn’t getting as much attention as the changes to engineering, but it’s just as important.
AI doesn’t just change your development process, it also changes the way your customers interact with technology. We’re used to designing software for usability; now we need to also design for emotional reactions and the human-AI relationship. This is a profound change that most companies haven’t fully absorbed yet.
The way we test AI software is focused on the correctness of answers, but that doesn’t evaluate the overall customer experience. We need to add human insight to the standard AI testing process.
Many companies are focusing on how product-related job roles will change in the new model, but it’s actually more important to identify and nurture the key skills that lead to successful products in an AI world. Getting the org chart right won’t matter if you don’t have the right skills in house.

One other point: there isn’t one discussion about the future of product development; there are actually three

Those discussions often get mixed together, which can be very confusing because it creates apparent disagreement when people are actually talking about different issues. The three hot topics are:

How does AI change the product development process for traditional software? This discussion focuses on AI as a productivity-enhancer and process-changer in the development of today’s software products. This issue is especially important to legacy companies with established products and who want to make their development more efficient.
What additional changes need to be made to the development process to support AI features and products? How do you train software instead of coding it? How do you plan and test a conversational interface? And how do you build agents that are trustworthy and predictable? This changes the ways we test software and the things we test for.
How do the roles of people in the product process change as a result of those two issues? In particular, what happens to product managers, engineers, designers, and user researchers? This discussion is critical for managers in the product world who are trying to evolve their teams.

We’ll cover each of these discussions separately in the next three sections.

Part 1: How AI is changing the product development process

As engineering and design accelerate, the whole PDLC needs to adjust. Discovery needs to become continuous and proactive. Evaluative tests need to be embedded in development tools and processes. And functions like support, marketing, and sales need to operate at the new speed of development. Victory won’t go to the fastest engineering process; it will go to the companies that can move everything product-related at the speed of AI, while maintaining a great customer experience.

The traditional product development process: Conserve engineering bandwidth

The traditional pre-AI product development life cycle (PDLC) is shown in figure 1.

The traditional product development lifecycle

The most important assumptions driving the traditional PDLC are that software engineering is the longest part of the process, and software engineers are the scarcest resource. The PDLC is optimized to allocate engineering bandwidth efficiently:

There are far more feature ideas than we can possibly build, so we put a lot of effort into vetting and triaging features before engineers work on them.
We design user interfaces thoroughly before they’re handed to the engineers, so they won’t have to spend time on rework later.
To further protect against rework, specialized employees with deep training drive each phase of the process.

The details vary from company to company, and the handoffs are never as clean as theory says, but the basic principle is always that engineering time is precious and we have to conserve it.

The new Builder model: Engineering is cheap and fast

AI-assisted software development blows up that assumption. Here’s what it changes:

Engineering time is no longer the main bottleneck in the system. AI does most of the development work, under human direction. This makes software development dramatically faster and less expensive.
The lines between interface design and coding are erased. Because AI can also do much of the execution work on interface design, it happens at the same time as development.
Small changes are easy. Development can be done in small increments. Individual features can easily be added or tweaked.
Coding is democratized. Anyone in the product organization can use AI to produce working software; they do not have to be engineers (note: code from non-engineers that will touch customers will usually be reviewed before it goes live, so most observers do not advocate a complete free-for-all).

Most of the current discussion about the new PDLC focuses on what that means for the design and build phases of the process (figure 2). We call the Builder model because it says that anyone can build products.

The challenge: How do you structure and manage the Builder model?

There’s general agreement about those four principles of the Builder model, but intense discussion about the details underneath: how should the model be implemented, what are its weak spots, and how does it affect the rest of the company.

Three of the most important discussions are:

How does the new model affect needs discovery?
How do evaluative tests on things like prototypes work in the new build phase?
What does all this do to the later stages of the PDLC?

Below we’ll give a summary of those issues, and some thoughts on what you can do about them.

1. The Builder model makes discovery more important

The issue. The discussion of the Builder model has been so focused on the new details of designing and creating software that the discovery phase got relatively little attention. Among the people who are discussing it, a few say discovery is now less important. For example, startup incubator Y Combinator has argued that discovery is less relevant for startups than in the past, because AI development is so fast and the opportunities are so numerous that founders' technical intuition and rapid iteration can often unearth good business opportunities without discovery work.

But there are people who disagree passionately, and the idea also applies mostly to startups. For existing companies and products, the emerging consensus is that the Builder model makes good discovery more important than ever, so it can inform better product decisions. Even though AI coding might make it possible to build every requested feature in the backlog, actually doing so would lead to bloated and confusing products -- and besides, many customers have latent needs that they don’t know to ask for. Steve Jobs’s famous quote is often cited: “People don't know what they want until you show it to them.”

What people are saying:

The value of discovery research: "We may be awash in automated or third-party conversations but still miss those aha! moments that drive great product thinking. Accelerating the software development cycle will make those insights even more valuable – so that we're using our 15x throughput improvements for what matters. Increasingly, we'll earn our salaries through real insights and anticipating the future, rather than overseeing R&D pipelines." - Product management consultant Rich Mironov.
When you can build anything, the most important question is what to build. “Remember, the question has never been whether we can build a product. The question is determining the right thing to build, that will solve the problem and achieve the necessary outcome.” - Marty Cagan, founder of Silicon Valley Product Group; and design executive Bob Baxley.

What to do: Make discovery a lifestyle. We think there’s a very strong argument that companies using the Builder model need to double down on discovery, so that individual builders have a better understanding of customer needs, and more insightful ideas on what to build. Even though development is faster, there’s still a cost associated with releasing a feature, educating customers about it, and adding complexity to the product. The more features we add, the more those costs will rise.

What changes for discovery is that it’s no longer necessarily tied to a project. For example, when Teresa Torres talks about continuous discovery, she says the core team for the project needs to continuously interview customers. But in the Builder model, the team is much smaller, or may be a single person. They may be working on one small feature rather than the next version of something big, and they don’t necessarily have time to do interviews. So the type of discovery we do needs to evolve. Rather than waiting for requests from product teams, researchers need to proactively do discovery research continuously, and create general understanding of customer needs throughout the product organization, so every builder has the context to make better decisions. Discovery is no longer a phase in a project, it’s a lifestyle for the whole company.

Broader, more effective communication is also critical. Because insights will now be delivered much more broadly, researchers need to pay more attention to making sure the information lands with decision-makers. This was a dominant topic at the recent Research Week conference, an annual gathering of user research leaders. Here are some of the approaches companies are using:

Create multiple insight feeds. The traditional 50-slide PowerPoint deck with research findings is ineffective in the Builder model. Don’t communicate research; communicate insights, keep them short, and share them proactively in multiple ways. Some people watch TikTok-style short videos, some people are readers, some people won’t engage with an idea until they see a prototype. Communicate insights in the formats they prefer. (Brian Elliott, CEO of WorkForward)
Use LLMs to consolidate customer insight and make it available on demand.
- Customer-specific insights. Build an LLM that draws on all your information feeds about a particular customer, from Gong calls to Salesforce data. Teach the bot to identify trends and surface insights (Caitlin McCurrie, director of research at Intercom).
- LLMs for each persona. UserTesting has created “guru” bots to educate employees about different customer personas. The bots are trained with customer interviews, research studies, and positioning information, and are taught to act as coaches and advisors on different customer personas.
Communicate business implications, not just insights. Researchers often assume that decision-makers will understand the “obvious” implications of a piece of research. But what’s obvious to a researcher is not necessarily obvious to anyone else. To be effective, insights have to be explicitly translated into business implications and recommended actions. (Rachel Ousley, B2B research and insights lead at Canva).
- Find the nugget that will drive action. User researchers are better positioned than any other role to drive impact in the Builder world, because they are the people with a deep understanding of customers. Double down on impact, not quantity of research (Nizar Saqqar, head of UR at Snowflake)
- “Do not bore the CEO with the details of methodology.” They are short on time, communicate with them in a way that they can get their heads around. (Lucas Puente, VP of Research at Slack)
- Instead of making reports, share interactive playgrounds, a living dashboard summarizing quant and qual insights. The output could be a vibe coded prototype to show what a solution might look like. That lets you be very concrete about your implications. (Caitlin McCurrie, Director of Research at Intercom).

2. How do we verify that we’re making the right decisions in product development?

The issue. Great discovery can’t answer every question about a product or feature ahead of time; you also need evaluative tests to ensure that you’re building the right thing. In traditional development, prototype tests with real users ensured that the design was right before coding started. In the Builder model, we can build many more prototypes faster than we could in the past, and they are not limited to design. The consensus in the AI community is that this increases the demand for evaluative testing, and it also changes the goals of testing and the role it plays in the workflow.

"Generative AI allows product teams to create offering prototypes in hours, not days, so that iterative testing can be done more quickly. That means more opportunities for customer feedback, faster time to market, and improved product-market fit.”
Sam Somashekar, Forrester Research, in Generative AI: What It Means For Product Management

Here are some key attributes of prototype testing in the Builder model:

The uses of prototype tests expand beyond design validation. Because it’s now embedded with development, the purpose of a prototype test expands to include spec validation, stakeholder alignment, and settling product-direction debates.
Continuous rather than sporadic. Testing can happen weekly or daily, not only at handoff milestones, because prototypes are cheap to regenerate.
Higher-fidelity. AI-generated prototypes can use real data, real components, and increasingly real production code, blurring the line between prototype and product.
Embedded in workflows. Because everything is moving fast, evaluation tests need to be extremely convenient. Ideally, the ability to test should be embedded in the tools used for design and development; and because non-researchers may be driving development, there need to be templates and AI-driven assistance to create and interpret tests.
Supervised, rather than controlled, by researchers. The increased volume of testing is more than any research team could handle on its own (especially since the researchers need to focus on doing expanded discovery work.) For evaluative testing, the research team needs to focus on enabling others to test on their own, and creating guard rails and review steps to protect the quality of scaled tests.

3. What happens to the back end of the lifecycle?

Although the Builder model focuses specifically on the process for building products, it also has a big impact on the rest of the product lifecycle, including launch, marketing, sales, and post-launch iteration. The accelerated speed of production means that everything after it needs to speed up as well (figure 3).

Every stage of the PDLC is affected by AI

Anthropic is already feeling this:

“The timelines for a lot of our product features have gone down from six months to one month and sometimes to one week or even one day. And with that, we actually need to make sure that products ship quite quickly….We have a really tight process between engineering, marketing, and docs. So (we) can turn around the marketing announcement for it the very next day.”
Cat Wu, head of product for Claude Code at Anthropic (source)

Companies deploying the Builder model need to accelerate the follow-on functions as well, or the effect will be like installing a jet engine in a biplane.

Here are some of the areas that need to move at the speed of AI development:

Messaging for new features needs to be created and communicated to the rest of the company, including FAQs.
Documentation needs to be updated.
Support needs to be informed.
Sales enablement needs to be briefed on the changes and empowered to communicate to the sales team very rapidly.
Competitive analysis: As launch speed increases across the industry, competitors will change more rapidly and new competitors will emerge more quickly.
Pricing. If the company wants to give away the new feature, there’s no impact. But that has to be thought through in advance; if a feature is released for free and the company wants to charge for it later, that is at minimum a very awkward conversation with customers. There are workarounds, such as labeling all new features as free trial versions. But even that can lead to frustrated customers who ignored the caveat.

As was the case with product development, increasing the speed of these activities will increase the need for real-time customer feedback. Things like messaging and pricing need to be vetted to ensure that they’ll resonate with customers and won’t cause a backlash. It’s also very helpful to test support content. So the scaling of human insight needs to extend beyond the product team, and also support self-serve testing by the message creators in places like marketing, sales, and support.

Can customers keep up with all this change?

The final issue companies need to consider is whether their customers can absorb all of this accelerated change. This is especially true in B2B markets. Product executive consultant Rich Mironov summarized the situation (source):

"We will probably run into a pace-of-change issue with our production (paying) customers. The ability to build hundreds of new features and improvements each month will naturally tempt us to ship hundreds of new features and improvements each month. But our end users need some stability in the software they use – swapping out menus and shifting data inputs and reconfiguring agents can confuse or frustrate them. There's a limit to how fast we can evolve our fundamental Jobs To Be Done….So product (and design and engineering) folks will need to set some reasonable tempo for enhancements. How fast can our installed base absorb change, even if we can move much faster internally?"

This creates a potential dilemma for companies. If they slow the release of new features, a competitor may pull ahead. If they move as fast as they can, they may drive customers away. The winning answer may be to find ways to help customers absorb change quickly. That means making the user experience intuitive, and the benefits of a new feature self-explanatory. If done right, this could become a major competitive advantage.

This dynamic will put more pressure on the company to test on customers during development – for example, to vet interface changes to assure that they are intuitive, and test documentation to be sure it’s clearly understood.

Part 2. AI products must be tested differently

Traditional software is a tool that we optimize for usability. Generative AI software is an advisor and butler that we optimize for relationship. That requires a different set of user tests that focus on factors like emotional affinity and trust. These relationship tests need to be built into the evaluations that development teams use to tune their AI models.

AI changes our relationship with technology

AI isn’t only changing the way we develop software, it’s also fundamentally changing the way we relate to technology: How we interact with it, and what we expect it to do.

Traditional software is built around two assumptions: the computer or smartphone is a tool, and you control it through icons and menus. Both of those assumptions are being undercut by AI, and that has a profound effect on how AI software needs to be crafted and tested.

Conversational AI changes the way we communicate with computers

Instead of clicking and swiping we now can tell the computer what to do, and it responds. Conversation won’t completely replace today’s graphical interface – for example, if you want to crop a picture, tapping and dragging will always be more convenient than saying something like “crop out the left 20% and put the face about a third of the way up.” But a conversational interface makes it much easier to do a whole class of tasks that were time-consuming to do manually in the past, such as “remove the car in the background and change my shirt from red to green.”

Conversation also lets us focus more on expressing intent rather than manipulating tools. If you want to buy a new dishwasher, you can ask a bot to help you explore the options, and it’ll line up all the information for you. Previously you would have needed to start a web search, read through dozens of reviews and vendor websites, and build information step by step in your head or on paper.

Agentic AI transforms the computer from a tool to a servant

As AI gains the ability to autonomously do tasks for you online, you’ll be able to delegate many things that you previously did for yourself. This transforms the computer from a tool into a servant – like a high-end butler who understands you, anticipates your needs, and makes things happen for you effortlessly.

When you add together the shift from clicks to conversation, and from tool to butler, they drive a massive change in the way we design and evaluate software. In the old world we designed for usability; in the AI software world we also design for relationship.

How to test the human-AI relationship

Five factors drive user reactions to an AI relationship:

Understanding. Do users feel they can “read” the AI’s thinking, intent, and boundaries, just like they would with a knowledgeable coworker or partner?
Trust. Do users rely on the AI when it deserves it, and step in when it doesn’t?
Control. Do users perceive the AI as a collaborative partner that respects human authority, never “taking over” or making the user feel helpless? (This one is especially important for agents.)
Outcome. Does the partnership deliver better outcomes than the human could achieve alone? Do those outcomes feel relevant and aligned with the user’s intent?
Affinity. Do people like the AI? Humans judge an AI conversation the same way they judge a human conversation: they pick up on small nuances in wording and tone, and form sweeping emotional judgments. If people don’t like the AI’s personality, they won’t be back, the same as they would avoid an annoying human being.

As an industry, we’re just now learning how to design for these factors. It’s also clear that we’ll need to be able to test and measure them. This is a very active area of development for UserTesting.

Bringing a human voice to the AI evaluation process

AI developers have created an extensive set of best practices for “evals,” tests that are run on AI products during development to verify that they’re performing properly (if you’re not familiar with evals, think of them as in-process quality tests). AI software requires unique tests compared to traditional software:

Because AI development is very fast, eval tests need to be equally rapid and scalable. A conventional eval (as documented extensively by Hamel Husain) uses LLM technology to test other LLMs. Here’s a very simplified outline of the process:
- The company’s most skilled “domain expert” reviews sample interactions with the bot that’s in development, and grades them pass/fail.
- That grading is used to train a separate judge LLM whose purpose is to stand in for the domain expert in future testing.
- Once the judge LLM can reasonably reproduce the domain expert’s responses, the judge LLM can be used to very rapidly test future versions of the bot.
- Because everything is now automated, the AI eval process can run as fast as AI development itself.
- The domain expert training process is periodically repeated
- Husain quotes Bryan Bischof, Head of AI Engineering at Hex, on this process: “If this sounds a bit like the Large Language Snake is eating its tail, I was just as surprised as you! All I can say is: it works, ship it.”
Testing AI is also different because generative AI is not deterministic. If you give identical prompts to an AI several times, it won’t necessarily produce the same results each time. So you’re not looking for a bug that you can definitely fix, you’re identifying tendencies and percentages of correctness. AI interactions need to be graded on their quality, and improved until they reach a threshold where the company is willing to release the product. That grading needs to be repeated as the underlying model changes.

Two approaches for adding human insight to evals

The eval process has been mostly focused on grading the accuracy and appropriateness of AI answers, and doesn’t fully reflect the emotional and trust factors that drive a user-AI relationship. To incorporate those factors, we need to find ways to fit genuine user interactions into the eval process. There are two promising approaches in development:

User Evals: user tests in the eval process

The first approach is User Evals, which use conventional user tests as a separate layer in the eval process. This approach is being explored in several places, most prominently by AI teams at Microsoft and Meta. Their approach, which they sometimes label UXR Evals, was presented at the Research Week conference, and in a series of posts by Pooja Dhaka, a UX researcher at Microsoft

User Evals evaluate AI through user tests in first-person, multi-turn interactions, conducted with enough scale to produce reliable comparisons between model versions. Microsoft and Meta use AI-moderated interviews for this testing, but any user test, including unmoderated self-interviews, could work.

User Evals are particularly well-suited to the relationship factors described above – especially Outcome, Trust, and Affinity, which are difficult to assess without the user's voice. As Giuliano Morse of Mets’s Superintelligence Lab observed at Research Week, LLM judges can reliably measure accuracy and latency in an answer, but struggle to identify things like humor. Human SMEs can evaluate the correctness of an answer, but can’t represent the reactions of a typical user. “They struggle with the highly subjective components of quality.”

Chuck Kwong, a principal UX researcher for Microsoft AI, gave an example. He said that in a shopping-assistant study, an LLM model recommended stiletto shoes for a beach wedding. The response looked correct and complete, but real users quickly surfaced the problem: "I can't be wearing stilettos at a beach wedding. I will literally sink in the sand." The user's context and understanding were missing from every other form of eval.

User Evals have some limits. Running user tests against every model version is more expensive and slower than running automated tests. User Evals work best as a periodic grounding, used to calibrate the rest of the eval system, not to monitor every iteration.

Human-in-the-Loop Evals: adding human insight at scale

The second approach to adding human insight involves modifying an existing process called Human-in-the-Loop Evals (HITL). The HITL approach adds a human supplement to the automated tests of a conventional eval. Human subject matter experts (SMEs) review AI interactions that receive low ratings from the AI judge, and also review a random sample of other interactions to make sure things aren’t going wrong (there’s a detailed description of the process here). Because the humans are reviewing only a small sample of the total AI interactions, the process is relatively affordable.

HITL evals are now standard practice in the AI engineering community and have a developed ecosystem of tools. But they have their own limit: in the way HITL is typically practiced, the humans in the loop are domain experts, not users. A lawyer reviews legal output; a doctor reviews medical output; a developer reviews code output. This produces high-quality judgments about technical correctness, but it doesn't necessarily reflect how regular people would react. A medically correct response can still feel condescending. A legally accurate answer can still be unintelligible to the client who needed it. The user's experience of the relationship doesn't show up in the typical HITL pipeline.

To inject a user perspective into the HITL process, we recommend adding two new types of subject matter experts. One is users themselves, screened to be sure they are representative of your target customer. They’ll need to be paid for their time, of course, but they won’t likely cost as much as a lawyer or doctor who is acting as an SME. The other promising source for SMEs is your company’s own user researchers. They usually have strong empathetic skills and have spent a lot of time studying target customers. The combination of users and researchers can bring much more depth and insight to the HITL process.

The three-layer eval stack

Combining all three approaches gives a robust and efficient way to optimize an AI bot’s relationship with users:

Conventional evals, run constantly, help to ensure accuracy and clarity.
HITL evals, run on a sample of interactions, flag AI drift and other problems before they become serious. Adding users and UX researchers to the experts improves this process substantially.
User evals, run periodically, ensure that all the “soft” attributes of the relationship are being optimized, and that the user’s full reaction to the AI interaction is captured. User evals also ensure that as user needs and behaviors evolve, their impact on the AI experience is identified immediately, rather than waiting for it to filter through the subject matter experts. As AI drives faster market evolution, this sort of testing will become more and more important.

Adding human insight to the eval process has a significant impact on product organizations. User research becomes a continuous evaluation input, something the product team automatically uses in production. The relationship between research and engineering changes from sequential to integrated. And the role of the researcher changes from "person who studies users" to "person who keeps the entire eval system anchored in user reality."

Part 3. How do job roles change in the new PDLC?

There are intense debates about what AI does to job roles in product development. Do they disappear? Merge? Evolve? The discussion is important, but it masks a more important issue: To succeed in the Builder model, a product organization needs five key skills: empathy, taste, evaluative ability, business thinking, and technical judgment. It’s more important to nurture those skills than to craft the perfect org chart. Meanwhile, UX researchers have an opportunity to become far more strategic and central to the company’s success, but only if they’re willing to grow beyond their service group legacy.

One of the hottest discussions about the Builder model is what happens to the jobs of the people who do the work. If you read enough of the public debate, you'll often find the same role described as both dying and thriving. Product managers in particular are frequently declared both obsolete and ascendant. The discussions tend to fall into two camps:

The dissolution camp. In this view the traditional separate roles of product management, design, and engineering should dissolve into a single integrated role.

Tomer Cohen at LinkedIn has been the most visible advocate of this, popularizing the term "Full Stack Builder" to describe the new role.
Cat Wu at Anthropic describes product teams with less rigid boundaries between PM, engineering, and adjacent functions.
And VC Marc Andreessen has framed the change as a "Mexican standoff" in which each member of the core team believes they can do the jobs of the other two, and they are all at least partly correct.

The transformation camp. In this view, separate job roles survive, but their tasks evolve substantially:

Marty Cagan describes PMs becoming product leaders, not cheerleaders or backlog administrators, focused on outcomes and strategy rather than process and coordination.
Jenny Wen describes designer time shifting from mockups (formerly 60–70% of the job, now 30–40%) toward vision work, pairing with engineers, and direct implementation.
Addy Osmani argues the highest-level engineers in the future will focus on managing “a small fleet of parallel AI coding agents”.

In every case the job title and the functional identity survive, but the day-to-day content of the role changes substantially.

Note that the disagreement between the two camps isn't whether roles change; everyone agrees they do. The real dispute is whether the changes are so fundamental that we need to blow up the roles and start over.

We think the "right" answer about role change is probably situational. Startups and loosely-organized tech leaders will adopt more of the dissolution model. Large companies will focus more on the transformation model. Both will be different from what they were in 2023. This isn't a satisfying answer for executives who want a single forecast, but it's more realistic, and it suggests that the most productive question isn't "which role prediction is correct?" but "what capabilities does my organization actually need?"

The skills are more important than the titles

Both camps agree that certain human capabilities are essential to building good products, even in an AI world. The people who possess key skills will be valuable regardless of how their employer chooses to organize.

The five essential skills

A healthy product team in the Builder model needs people with five distinct human capabilities. They don't have to map to five separate job titles, but they all have to be present.

Customer empathy. This is the ability to stand in a user’s shoes, predict how they will feel about an experience, and notice the gaps that the rest of the team will miss. This skill is most often held by user researchers, but not exclusively. Strong PMs, designers, and customer-facing engineers can develop it too.
Design taste. This applies to both visual and interaction design. Some people have a strong intuitive ability to tell when something is right versus merely working. Many observers of the AI scene say design taste is one of the things AI is least able to substitute for.
Evaluative ability. The ability to assess AI output for quality, fit, and trustworthiness, and to build systems that evaluate at scale. This is the skill that most distinguishes the Builder model from its predecessor. Hamel Husain has built much of his eval methodology around the role of a "Principal Domain Expert" who defines what a good answer looks like.
Business and strategic thinking. This involves market understanding, prioritization, focus on business outcomes, and the ability to make trade-offs at the level of strategy and features. This is the skill that most people invoke when they say PMs need to become more strategic. As AI handles more of the routine work, the strategic part of the PM role becomes more important. Most product organizations are short on people who can do that work well.
AI fluency and technical judgment. This is the combination of understanding what AI can and can't do, directing AI tools effectively to produce working systems, and verifying the output well enough to catch failures. This is the most technical skill of the five, but it’s focused on supervising the creation of code by agents rather than directly coding.

The point of the five-skills framework isn't that every team needs five separate specialists. It's that every team needs all five capabilities, present at sufficient depth, regardless of how they're titled. An engineer with strong design taste could carry both the engineering judgment and the design taste skills. A PM with deep research knowledge can carry both strategic thinking and customer empathy. The mapping from skills to people is flexible, but what’s not flexible is the requirement that all five capabilities be present somewhere on the team.

User researchers are a special case. They hold one of the five skills, customer empathy, as their core professional identity. That makes researchers potentially very valuable in the Builder model, since empathy isn’t something AI does well. But there are also two well-defined tasks in the Builder model that researchers are well suited to perform:

Running the continuous discovery process. As we discussed in Part 1, the speed of AI-assisted development means that discovery research needs to turn into a lifestyle rather than a project. Researchers are the logical people to run this, but they must learn to communicate their findings more aggressively and in business terms.
Configuring and maintaining the evaluative testing infrastructure. Research professionals are needed to set up and supervise the human insight testing tools that builders will use to evaluate their products. Researchers also need to supervise quality control on those tests. This is an ops-heavy role that could also extend to managing all of the eval tools used in AI development.

We think successful user researchers in the future will likely be champions of customer empathy, and keepers of the customer-understanding infrastructure that the rest of the product organization depends on continuously. That's a more proactive and central role, compared to the old research service model. It's also a job that, if done well, is much harder for an organization to deprioritize.

What this means in practice

For individuals working in the PDLC, Identify which of the five skills you genuinely hold, and sharpen them. In a time of change when roles and job titles are unstable, skills are the thing employers will value. Treat the flexibility of the new PDLC as an opportunity to clarify your value and increase your influence, rather than a threat to your current job description.

For managers of product teams, worry less about getting the org chart right and more about whether your team collectively holds the five skills at sufficient depth. If any skill is missing or thin, that's a business risk regardless of how clean the titles look. Pay particular attention to skills that are easy to lose during reorgs. Customer empathy is the most fragile, because it requires ongoing investment in research practice and is the easiest to under-resource when budgets tighten (we all think we’re empathetic). Evaluation judgment is the newest and the least likely to be present at any meaningful depth on most existing teams; this is the skill most worth investing in proactively. AI fluency is the easiest to improve, because you can drive it through training and incentives.

The likely outcome: Different structures for different company cultures

We think it’s likely that large companies will settle into a pattern where the traditional PDLC job titles still exist, because they remain the easiest way to track, develop, and retain talent at scale But the boundaries between roles will be looser than they were, and everyone on a product team will have at least baseline competence in AI-assisted building and evaluation.

Startups and AI-native firms may look different. They may genuinely melt down roles into a jack-of-all-trades builder, or create hybrid arrangements where one person has several skills. That's not in conflict with the large-company prediction. Startups already operate differently today; this is just more of the same.

The bottom line for any product organization is this: the Builder model doesn't require you to pick the right role names; it requires you to retain the right skills. Companies that focus on the right skills will have a better chance of thriving in the Builder world. The ones that focus on titles will spend the next few years rearranging chairs while their competitors get on with the work.

Conclusion: This is the way forward

For many of us, the rise of AI feels like something that’s happening to us rather than something we want. The insecurities and rapid pace of change often feel overwhelming. We’re not trying to wish those issues away, but it’s also good to keep in mind that the Builder model is an opportunity to fix many longstanding problems in product development. No one loved the feature backlog, and the idea of communicating concepts through working code, rather than sketches and hand-waving, is genuinely exciting. For research teams, there has long been a desire to become more strategic and central to product thinking; this is the opportunity to finally make it happen.

For companies, the new PDLC is about more than just solving problems. For most of history, companies have had to choose between serving customers thoughtfully or moving fast (in other words, do you want fast food or a gourmet meal?) The new AI PDLC gives us the opportunity to both move quickly and serve customers better – but it’ll work only if we can bring in human insight as fast as we create new features. The companies that treat insight as optional will likely struggle to connect with customers even though they ship a higher volume of products.

The path ahead is challenging, but it can lead to a better place, so it’s appropriate to view it as an opportunity, and to enthusiastically invest the time and money to make it happen well.

Learn more