ScaleUp:AI

The Agent era may be the end of single-step thinking

Insight Partners | March 17, 2026| 5 min. read

AI is evolving from passive copilots to autonomous Agents capable of decision-making, execution, and reflection, and that shift is changing the nature of software itself. The tools we use are becoming the digital teammates we rely on.

The question on the table: What does AI mean for the future of work, creativity, and control when Agents are already being deployed in the wild — managing complex, multi-step tasks end to end?

Key speakers

Key takeaways

  • Three converging advances — reasoning, autonomy, and expanded memory — have made Agents viable for end-to-end task completion, not just single pipeline steps.
  • The biggest obstacle to agentic value isn’t the technology. It’s organizational willingness to rethink how work is structured around it.
  • Defining and measuring Agent failure is itself an unsolved problem. Most enterprises don’t yet have the infrastructure to know when an Agent is going wrong.
  • Business Process Outsourcing (BPOs) is the first major workforce category being actively displaced, but by volume transfer, with Agent builders capturing budget incrementally.
  • Within five years, org charts dominated by Agents — with humans primarily managing, verifying, and governing them — are a realistic near-term scenario.

These insights came from our ScaleUp:AI event in October 2025, an industry-leading global conference that features topics across technologies and industries. Watch the full session below:

Why this moment is different

Previous generations of LLMs were single-step tools, capable of summarizing an email, drafting a paragraph, and answering a discrete question. What has changed, according to Mathew, is a convergence of three advances that together make end-to-end task execution viable for the first time.

The first is reasoning: Models can now plan through multi-step problems, not just respond to them. The second is autonomy: Agents can execute a full task, reflect on the output, and course-correct without a human in the loop at each step. The third — and perhaps the most underappreciated — is memory. Context windows have expanded dramatically in the past 18 months, enabling Agents to remain stateful across long-running tasks. Where a model once handled one step in a pipeline, an Agent can now handle what was previously many hours of human work.

The implications are structural, not incremental.

Removing the human middleware

Adobe’s experience platform illustrates the shift in concrete terms. A marketing campaign that once required a sequential chain — campaign strategists, analytics teams, designers, and execution teams working in stages over days or weeks — can now move from a CMO’s brief to execution to monitoring through direct human-AI interaction. The workflow hasn’t been streamlined. It’s been restructured.

As Mu describes it, the most unpredictable variable in any workflow has historically been the human being. Agents don’t eliminate human judgment from the equation, but they remove the coordination overhead that surrounds it: the handoffs, the delays, the context loss between steps.

“The workflow is significantly shortened, cycle time significantly cut, and it impacts a lot of the functions within the corporate [sphere].”

— Bin Mu

For enterprises, the design question is no longer whether to adopt agentic workflows but how aggressively to rethink the organizational structures those workflows replace.

The human component is the bottleneck

WRITER* has been full-stack — building models and productizing through to the solution layer — since its founding. With Agents now central to its platform, Habib has a clear view of where adoption stalls: not in the technology, but in the people surrounding it.

When enterprises deploy semi-autonomous systems, most default to maximum human intervention, not minimum. Teams spend months in production before they are ready to consider replacing a compliance workflow with an Agent — even when the case is clear, and the cost savings are significant. The calcification of existing functions, systems, and processes isn’t just organizational inertia. It is often years of accumulated assumptions about how work has to flow.

Habib’s framing is pointed: The enterprises extracting real value from agentic platforms are those willing to question not just their tools, but the logic of the workflows those tools sit inside.

“We are really trying to challenge and encourage them to think much bigger about the radical simplicity they can bring to so much of their operations if they’re willing to rethink the calcification of functions, systems, data, people, and their processes.”

— May Habib

BPOs and open headcount

Paid builds monetization and ROI infrastructure for Agent deployments, which puts Medina close to where the displacement of human work is actually happening. The pattern he describes is less dramatic than headlines suggest and more durable.

Agent builders are targeting two categories simultaneously: open headcount and business process outsourcers (BPOs). BPOs are particularly exposed. They employ large numbers of people to perform repeatable, high-volume tasks, and they face a structural conflict in accelerating their own displacement.

The replacement isn’t wholesale. Enterprises are diverting portions of BPO volume to Agents, benchmarking performance, and adjusting allocation. The comparison isn’t Agents versus a perfect human workforce. It’s Agents versus a workforce with high turnover, shift-change context loss, and absenteeism. On that basis, the trade is more straightforward than the technology debate suggests.

“It is a trading of problems with a distinction that the agentic ones are getting better over time faster than the human solutions.”

— Manny Medina

The same logic applies to headcount. Organizations aren’t eliminating roles; they are opening fewer of them, deploying Agents alongside, and observing. Three engineering hires with ten Agents running in parallel is becoming a recognizable org structure.

Reliability is still an unsolved problem — and defining failure is harder than it looks

For all the progress, the panel was candid about what doesn’t work yet. Reliability at scale remains the central engineering challenge, and the infrastructure to even detect failure is immature.

Habib points to context engineering as the differentiating factor under the surface. When an Agent is daisy-chaining a complex workflow — retrieving an invoice, evaluating it against requirements, emailing a vendor, routing to an Enterprise Resource Planning (ERP) system — each step compounds. Small failures in context management or tool selection cascade. Most agentic deployments that don’t scale fail at exactly these junctures, not because the model is wrong but because the surrounding infrastructure can’t maintain consistency across steps.

There is also a governance gap. Enterprise IT wants guardrails federated at the organizational level, with interoperability across existing security infrastructure. Most labs are building guardrails at the Agent level. These are not the same thing.

Mu is candid about the difficulty from the practitioner side: Defining failure for an Agent is genuinely hard. What constitutes a failed output? Who decides? What threshold triggers an intervention? These are not rhetorical questions — they are active engineering problems with no settled answers.

“It’s just like treating and training a human being from kindergarten all the way to university to work. It’s a journey, it’s a process, requires that training, that muscle memory.”

— Bin Mu

HoneyHive*‘s answer to this is observability: fine-grained visibility into what an Agent is doing at each step of a workflow, with automated evaluation catching failure modes before they reach a human reviewer. Sharma introduces the concept of Agent entropy, or a measure of model confidence at each decision point. When entropy is high, the Agent is uncertain. That uncertainty, if undetected, is where failures in reasoning, memory, and tool selection accumulate. Surfacing it early and encoding recurring failure modes into automated checks is how supervision becomes scalable.

The org chart of 2030

In the session’s closing lightning round, the panel turned to the five-year view — a world where Agents operate at 100x human speed, and organizations run hundreds of thousands of parallel Agent instances.

Mu returns to data as the foundation: Without a trustworthy, unified data layer, the proliferation of Agents produces inconsistent outputs at scale. Governance structures for Agents — analogous to the governance structures organizations have built for human workers over decades — will need to follow.

Habib points to a coming inflection that goes beyond encoding human knowledge into AI: the intelligence generated by Agents interacting with Agents, and Agents proactively prompting humans, will itself become a critical organizational asset. The challenge will be building governance for what she describes as emerging superintelligence inside the enterprise — where AI is explaining its reasoning to humans, not the other way around.

Medina flags an economic consequence that is only beginning to surface: Agents don’t pay taxes. As Agent-driven displacement accelerates, particularly among workers under 25, the fiscal and social architecture built around human employment will face structural pressure. His prediction: The next generation doesn’t find work elsewhere; instead, they become Agent builders, because Agents will create demand for human oversight, verification, and governance.

“Scalable oversight is going to be one of the biggest problems of a lifetime.”

— Mohak Sharma

Sharma closes on the challenge he sees as the defining technical problem of the decade: scalable oversight. In a world with ten million Agents operating inside a single enterprise, the human supervisory model breaks down on basic math. The research direction at HoneyHive — Agents supervising Agents, with escalation to humans only at genuine decision boundaries — is one architecture for what that world requires.

The future has arrived unevenly

The agentic frontier is not a prediction. The deployments described in this session are happening now.

What remains uneven is organizational readiness to act on what is already possible. The gap between enterprises that are rethinking workflows from first principles and those waiting for the technology to mature further is widening. For the latter, the wait is the risk.

Watch more sessions from ScaleUp:AI, and read more recaps on the blog.


*Note: Insight Partners has invested in WRITER and HoneyHive. These insights came from our ScaleUp:AI event — an industry-leading global conference spanning technology, investment, and enterprise leadership. Watch the full session above.