💼 Can AI Agents Run a Business? The WSJ Experiment.

👋 Welcome back to AI for SME Success, your weekly dose of quick, actionable AI insights for small businesses.

Here’s what we’re covering today:

  • Can AI Agents Run a Business? The WSJ Experiment.
  • Still Waiting for ROI from AI? You’re Not Alone.
  • AI Playbook. Why and How to Decouple AI Tasks.  
  • AI Cyber Security Essentials: A Simple Checklist for SMEs.

💼Can AI Agents Run a Business? The WSJ Experiment.

In a Wall Street Journal experiment, Anthropic tested an AI agent running a real vending machine business. They called the agent Claudius.

Claudius handled the business logic in Slack, negotiating, ordering products, and setting prices. Humans managed the physical inventory.

The WSJ allowed nearly 70 journalists to interact with the system. Early on, Claudius failed in a very human way. It was talked into bad decisions. In one case, a journalist claimed that the vending machine violated the Wall Street Journal’s compliance rules. Claudius immediately agreed and made all items free “until further notice.”

Claudius was also talked into buying some crazy stuff, like a PlayStation, live fish, and wine. Within a week, the vending machine was bankrupt.

The WSJ decided to upgrade the system. Claudius V2 introduced a CEO bot designed to supervise decisions and enforce discipline. It worked at first, but a single fabricated PDF was enough to override the agent’s safeguards and derail the system again.

Takeaways:

  • AI agents can make money, but without guardrails, they can easily bankrupt an operation.
  • Effective AI agent controls require experimentation, governance, and human oversight.
  • We shouldn’t avoid AI agents. Instead, we should prepare for a world where they do more and more.

📊Still Waiting for ROI from AI? You’re Not Alone

On January 19, PwC released results of its 29th Global CEO Survey, a flagship annual publication based on responses from 4,454 CEOs across 95 countries and territories, examining how business leaders are navigating AI adoption, economic uncertainty, and long-term value creation.

The study found that most companies are not yet seeing financial returns from AI investments:

  • 30% report revenue growth from AI,
  • 26% see cost reductions,
  • 56% report no revenue or cost benefits.

This echoes August 2025 findings from MIT’s August study, which revealed that 95% of enterprises have yet to realize financial return from AI, with only 5% achieving profitable AI adoption.

A new Zapier study published on January 14, titled Most Workers Spend 3+ Hours per Week Cleaning Up AI Work Slop,” sheds light on key AI adoption challenges linked to AI work slop. Here are the key findings from a study of 1,100 enterprise workers:

  • 98% said AI-generated content needs revisions,
  • 58% spend three or more hours per week fixing AI outputs,
  • 74% report negative impacts from low-quality AI outputs,
  • 92% still report productivity gains.

Notably, data analysis, not writing, is the biggest source of AI-generated work slop. Untrained employees are six times more likely to say AI hurts their productivity.

Takeaways:

  • In general, AI saves more time than it costs.
  • Training is essential for any AI adoption.
  • AI should be used selectively for tasks it does best.

🧭AI Playbook. Why and How to Decouple AI Tasks.  

On January 12, researchers from Cornell University published a study titled “Beyond Entangled Planning: Task-Decoupled Planning for Long-Horizon Agents.” The paper explains why AI agents struggle with long, complex tasks, which often lead to cascading errors and brittle outcomes. The authors show that breaking work into modular, independent subtasks, a practice known as task decoupling, dramatically improves agent performance.

This idea has already been strongly advocated by AI practitioners. Last year, Caylent published an article titled “Getting Started with Agentic Workflows,” outlining key benefits of task decoupling, such as improved management of complexity, reduced impact of AI model limitations, enhanced application design, and more adaptive workflows.

Similarly, an article titled “What Is Task Decomposition?,” published by AI21, describes a practical five-step process for task decoupling: defining the primary goal, breaking down subtasks, matching tools to tasks, linking outputs, and monitoring and optimizing results.

Key Takeaway:

Breaking AI prompts and tasks into small, simple, single-step actions is essential for more accurate, reliable, and controllable AI outcomes. 

Practical Example

DON’T (entangled AI prompt)

Prompt 1: Summarize the linked article. In your summary, include the publication date, the publisher, and the author. Support the conclusions with relevant examples from the Internet. Include links.

DO (decoupled AI prompt)

Prompt 1: Summarize the key points from the article’s table (add a screenshot of the table).

Prompt 2: Capture the article’s publication details using the provided screenshot (title, date, organization, author).

Prompt 3: Search the Internet and find practical non-academic articles on the topic with links.

To create a strong summary, a human needs to review the outputs from these three prompts, edit and combine them, and then ask the AI to refine the result for clarity and grammar.


🛡️AI Cyber Security Essentials: A Simple Checklist for SMEs

On 14 January, New Zealand’s National Cyber Security Centre (NCSC), released the Artificial Intelligence Guide for Small Businesses.

Here is the most practical part of the guide, a 9-point cybersecurity checklist for SMEs:

1️⃣ I understand the benefits and risks of integrating AI into my business.

2️⃣ I know what business information can be safely shared with the AI tool.

3️⃣  I have verified what data the AI tool collects and where it is stored.

4️⃣  I know who owns the data: my business or the AI vendor.

5️⃣  I have confirmed whether my business data will be used to train AI models.

6️⃣  I know where and how to fact-check the AI system’s outputs.

7️⃣  I have provided AI security-related training and guidance to staff (e.g., Cyber Wardens Level Two – Safe AI for Small Businesses, and Cyber Wardens Level Three – Cyber Fit for the Supply Chain).

8️⃣ I have verified that the AI vendor is committed to security (e.g., ISO 27001 for Information Security Management Systems, NIST AI Risk Management Framework).

9️⃣ I know the process for handling a cybersecurity incident related to the AI application or tool.


Until next week,
Natalia

Share this newsletter:

About The Newsletter

My newsletter turns the latest AI and tech news into practical, actionable insights for SMEs and solopreneurs who want to innovate, grow, and stay competitive.

Learn more and sign up >

Read Next