别再手搓 Agent Loop：进入框架化与工程化阶段

2026-03-15 09:40:00 14 min

走到这一步，很多工程师都会进入一种很熟悉的状态：原理已经懂了，最小 demo 也都能跑，但代码开始越来越像临时拼出来的控制流。你要手动维护工具 schema、手动解析 function call、手动做多轮往返、手动拼日志。工具一多，逻辑就散；任务一复杂，循环就乱。

这不是你不会写，而是系统已经进入另一个阶段：问题不再是“能不能实现”，而是“怎么把它写成一个可以长期维护的工程”。

也正是在这里，框架的价值才真正显现出来。不是替你理解原理，而是把那些重复、脆弱、容易出错的 orchestration 收束成统一抽象，让你把精力放回更重要的地方：工具边界、任务建模和系统治理。

为什么前面那几步必须先手搓一次

在进入框架之前，前几篇文章故意一直用最小代码把链路摊开：

你已经看过 Stateless
你已经理解过 history 重放
你已经亲手拆过一次 Tool Calling 往返
你也已经知道 RAG 到底在补什么

这些原理必须先看透。否则一上来就用框架，很容易把 Agent 当黑盒，出了问题只能盲猜：“是不是模型抽风了”“是不是 SDK 有魔法”。

但一旦原理已经站稳，再继续手搓同样的控制流，收益会迅速下降，维护成本却会持续上涨。接下来更重要的是：

用统一方式定义工具
用统一方式约束参数
把多步工具循环交给框架托管
给系统最起码的执行可见性

为什么这种实现方式重要

因为框架阶段解决的不是“让模型更强”，而是“让系统不至于越写越乱”。

具体说，它至少在四件事上帮你收敛复杂度：

工具定义：schema 和执行逻辑放在一起，不再双份维护
多步循环：框架替你接住“模型决定 → 调工具 → 再推理”的样板流程
边界控制：用多步停止条件这类显式参数约束自动迭代空间
可观测性：把每一步调用过程暴露出来，至少能 debug

先看一个真实运行结果

当前版本运行 node 05-agent-framework.js，你会看到类似下面的输出：

🤖 启动 Agent Framework Demo (Model: gemini-3-flash-preview)...
[Tool] Fetching weather for 上海...
[Tool] Fetching weather for 北京...

User: 上海和北京现在的天气分别怎么样？请对比一下。
AI: 上海和北京现在的天气情况如下：

*   上海：目前是晴天，气温为 25°C。
*   北京：目前是多云，气温为 18°C。

[Debug] Execution Steps:
  - Called tool: weather with input: {"location":"上海"}
  - Called tool: weather with input: {"location":"北京"}

这段输出很能说明第五课真正教的东西：框架不是替你“发明能力”，而是替你接住多步 round-trip、参数传递和调试可见性。

完整代码

// 05-agent-framework.js
// Phase 4: Agent 框架化
// 目标：使用现代 AI SDK 风格接口简化 Tool Calling 和 ReAct Loop。

import { createGoogleGenerativeAI } from '@ai-sdk/google';
import { generateText, tool, stepCountIs, zodSchema } from 'ai';
import { z } from 'zod';
import dotenv from 'dotenv';
dotenv.config();

const google = createGoogleGenerativeAI({
  apiKey: process.env.GEMINI_API_KEY
});

const model = google('gemini-3-flash-preview');

async function main() {
  console.log("🤖 启动 Agent Framework Demo (Model: gemini-3-flash-preview)...");

  const weatherTool = tool({
    description: 'Get the weather in a location',
    inputSchema: zodSchema(z.object({
      location: z.string().describe('The location to get the weather for'),
    })),
    execute: async ({ location }) => {
      const loc = location || 'Unknown';
      console.log(`[Tool] Fetching weather for ${loc}...`);

      const mockDB = {
        '上海': '晴天，25°C',
        '北京': '多云，18°C',
        'Shanghai': 'Sunny, 25°C',
        'Beijing': 'Cloudy, 18°C',
        'London': 'Rainy, 12°C'
      };

      return {
        location: loc,
        weather: mockDB[loc] || 'Unknown weather.',
      };
    },
  });

  const { text, steps } = await generateText({
    model,
    tools: { weather: weatherTool },
    system: 'You are a helpful assistant. You have access to weather data via the `weather` tool. Use it whenever asked about weather, then answer in Chinese.',
    stopWhen: stepCountIs(5),
    prompt: '上海和北京现在的天气分别怎么样？请对比一下。',
  });

  console.log(`\nUser: 上海和北京现在的天气分别怎么样？请对比一下。`);
  console.log(`AI: ${text}`);

  console.log("\n[Debug] Execution Steps:");
  if (steps) {
    for (const step of steps) {
      if (step.toolCalls && step.toolCalls.length > 0) {
        step.toolCalls.forEach(call => {
          console.log(`  - Called tool: ${call.toolName} with input: ${JSON.stringify(call.input)}`);
        });
      }
    }
  }
}

main();

这段代码的重要性，不在于更短，而在于职责开始收敛

和前面的手写版本相比，这里最本质的变化不是 API 风格，而是你开始通过框架描述：

有哪些工具
这些工具需要什么参数
模型最多可以自动走几步
当前任务是什么
执行过程中要保留哪些可见性

也就是说，你写的是系统意图和边界，而不是自己手工维护每一轮细碎状态。

按实现流拆代码

1. provider 抽象：先把模型接入方式标准化

import { createGoogleGenerativeAI } from '@ai-sdk/google';
import { generateText, tool, stepCountIs, zodSchema } from 'ai';
import { z } from 'zod';
import dotenv from 'dotenv';
dotenv.config();

const google = createGoogleGenerativeAI({
  apiKey: process.env.GEMINI_API_KEY
});

const model = google('gemini-3-flash-preview');

这里开始不再直接围绕某家底层 SDK 的调用细节写业务逻辑，而是通过 provider 抽象得到一个模型句柄。

这件事的工程意义很大：你的业务层不需要深度绑定某个客户端 SDK 的细节，后面替换模型、切 provider、做统一封装都会轻松很多。

2. `tool()`：把 schema、参数约束和执行逻辑收进一个能力单元

const weatherTool = tool({
  description: 'Get the weather in a location',
  inputSchema: zodSchema(z.object({
    location: z.string().describe('The location to get the weather for'),
  })),
  execute: async ({ location }) => {
    const loc = location || 'Unknown';
    console.log(`[Tool] Fetching weather for ${loc}...`);

    const mockDB = {
      '上海': '晴天，25°C',
      '北京': '多云，18°C',
      'Shanghai': 'Sunny, 25°C',
      'Beijing': 'Cloudy, 18°C',
      'London': 'Rainy, 12°C'
    };
    return {
      location: loc,
      weather: mockDB[loc] || 'Unknown weather.',
    };
  },
});

这是这一篇最值得学的地方之一。

前面的手写版里，你要分别维护：

给模型看的 schema
真正执行的工具函数

现在这两块被收敛成了一个能力声明：

description 说明用途
parameters 定义输入契约
execute 负责真实执行

工具一旦多起来，这种结构化定义会明显降低认知负担。

3. `zod` 在这里不是装饰，而是工具输入契约

parameters: z.object({
  location: z.string().describe('The location to get the weather for'),
})

很多人把 zod 只理解成 TS 圈常见的类型工具，但在 Agent 体系里，它更像是工具边界的声明式表达。

它同时服务两件事：

给模型更清晰的参数提示
给运行时更统一的输入约束方式

对工程团队来说，这比散落的 JSON Schema 和手写校验更容易维护和复用。

4. `generateText()`：从“手搓 loop”切换到“声明任务”

const { text, steps } = await generateText({
  model,
  tools: {
    weather: weatherTool,
  },
  system: 'You are a helpful assistant. You have access to weather data via the `weather` tool. Use it whenever asked about weather, then answer in Chinese.',
  stopWhen: stepCountIs(5),
  prompt: '上海和北京现在的天气分别怎么样？请对比一下。',
});

这段代码真正重要的地方不是省了几行，而是你把控制权从逐轮手工编排，转成了对一次高层任务的声明。你只需要说清楚：

用哪个模型
有哪些工具
规则是什么
最多允许几步
用户问题是什么

至于中间到底要不要先调工具、调几次、什么时候停，由框架在显式的多步停止条件边界内接住。

5. 多步停止条件：这是自动迭代空间的治理开关

stopWhen: stepCountIs(5)

这不是“高级但可有可无的参数”，而是一个非常现实的系统边界。

这里真正回答的问题是：

你允许模型在一次任务里自动试探几次
你能接受多少延迟和 token 成本
你给它多大的纠错空间

在这个天气例子里，模型可能会：

查上海天气
查北京天气
对比后生成总结

如果没有类似 stopWhen 这样的约束，多步循环很容易变成不透明又不受控的成本黑洞。

6. system prompt：框架接住编排，但行为策略仍然要你定义

system: 'You are a helpful assistant. You have access to weather data via the `weather` tool. Use it whenever asked about weather.'

进入框架阶段后，prompt 设计的重要性不会下降，反而会更实际。因为框架托管的是 orchestration，不是行为约束。你仍然要通过 system prompt 告诉模型：

它扮演什么角色
什么时候应该使用工具
输出应遵循什么原则

成熟系统里的 prompt，通常都是策略说明书，而不是形容词堆砌。

7. `steps`：这是最基础的一层可观测性

console.log("\n[Debug] Execution Steps:");
if (steps) {
  for (const step of steps) {
    if (step.toolCalls && step.toolCalls.length > 0) {
      step.toolCalls.forEach(call => {
        console.log(`  - Called tool: ${call.toolName} with input: ${JSON.stringify(call.input)}`);
      });
    }
  }
}

这一段非常关键，因为 Agent 最怕“会跑，但你不知道它怎么跑的”。

有了 steps，你至少能看见：

它调用了哪些工具
参数是什么
是不是绕了不必要的弯路
哪一步开始偏掉了

这就是后面做 trace、日志、可视化观测的起点。

责任边界：框架替你做了什么，没替你做什么

框架替你做的

统一模型调用入口
统一工具定义方式
自动编排基本的 tool loop
暴露执行步骤供你调试

框架没替你做的

权限控制
速率限制
超时与重试
幂等设计
成本治理
错误恢复
业务语义正确性

也就是说，框架简化的是样板和编排，不会自动替你解决应用架构问题。

工程权衡：什么时候该上框架，什么时候没必要

适合上框架的时候

工具数量开始增加
多步任务开始变多
你需要统一工具定义和调试方式
你希望后续更容易替换模型或 provider

还可以继续手写的时候

只是验证一个极小的单步原型
工具链路非常短
团队还没想清楚系统边界，只是在做能力探索

真正不值得的是：原理已经看懂，却还在长期维护一堆重复的手写 orchestration。

收尾

当你已经理解 Agent 的几个核心原理后，继续手搓所有 loop 和 schema，价值会越来越低，维护成本会越来越高。框架真正提供的，不是魔法，而是把重复的 orchestration 从业务代码里抽出来，让你把注意力放回真正重要的地方：工具设计、边界控制、任务建模和可观测性。

接下来如果再往前走，系统就不能一直停留在天气和加法这种 mock 世界里了。要验证 Agent 是否真的有生产力价值，下一步必须把它接到真实环境——哪怕先只给它读权限，也足够让很多工程问题一下子变得具体起来。

欢迎关注我的其它发布渠道

WeChat

AI Agent Agent Framework Tool Calling ReAct Zod JavaScript