从 Demo 到真家伙：拆一个本地 Dev Assistant 的最小雏形

2026-03-15 09:50:00 17 min

很多“AI 能写代码”的演示，看起来都很热闹：模型能解释算法、能生成函数、还能一本正经地点评架构。但只要你真的把它放进开发工作流，马上就会遇到一个非常现实的问题：它到底有没有接触到你的项目环境？

如果工具还停留在 mock 天气和加法器，系统离真正的开发辅助还差得很远。对一个开发者助手来说，真正的分界点不是它会不会聊，而是它能不能看项目、读文件、理解代码结构，并且在这个过程中把权限边界收住。

这篇文章要拆的，就是这个分界点：把 Agent 接到本地文件系统，做成一个最小但已经开始有生产力意味的 Dev Assistant 雏形。

为什么开发助手最先需要的通常不是写权限，而是读权限

很多人一提 coding agent，立刻想到的是：

自动改代码
自动执行命令
自动提 PR
自动修 bug

但如果你真的从开发工作流倒推，最先有高价值的往往是更克制的能力：

帮我看看这个项目结构
帮我找到入口文件
帮我读一下这个模块在干嘛
帮我基于真实源码分析问题

这些场景只要能列目录和读取文件，其实就已经能覆盖很大一块高频需求。所以这份示例故意只开放两个只读工具：ls 和 read。

为什么这种实现方式重要

因为它把 Agent 从一个“封闭沙盒里的能力演示”，推进成一个“开始接触真实环境的系统原型”。一旦接到文件系统，很多前面还停留在概念层面的工程问题会立刻变得具体：

路径边界怎么限制
大文件怎么截断或分页
模型是不是先看文件再回答
多步探索过程怎么暴露出来
为什么只读权限往往比一上来开放写权限更重要

先看一个真实运行结果

当前版本运行：

node 06-dev-assistant.js

然后输入：

readme 里面有什么

你会看到类似下面的输出：

🤖 Dev Assistant Online (Type 'exit' to quit)
I can list files and read code in this directory.

You: readme 里面有什么
🤖 Thinking...

AI: README.md 文件的内容主要是关于 Learning AI Agent Development 的学习路径和各个示例代码的作用...
   [Tool Call] ls({"dirPath":"."})
   [Tool Call] read({"filePath":"README.md"})

这段输出说明第六课已经不再是单一工具演示，而是一个会先探索目录、再读取文件、最后基于真实内容给出总结的最小开发助手。

完整代码

// 06-dev-assistant.js
// Phase 5: 实战 - 打造一个“开发者助手” (Dev Assistant)

import { createGoogleGenerativeAI } from '@ai-sdk/google';
import { generateText, tool, stepCountIs, zodSchema } from 'ai';
import { z } from 'zod';
import dotenv from 'dotenv';
import fs from 'fs/promises';
import path from 'path';
import readline from 'readline';

dotenv.config();

const google = createGoogleGenerativeAI({
  apiKey: process.env.GEMINI_API_KEY
});

const model = google('gemini-3-flash-preview');

const fsTools = {
  ls: tool({
    description: 'List files in a directory',
    inputSchema: zodSchema(z.object({
      dirPath: z.string().describe('The directory path to list (relative to current working directory). Use "." for the current directory.'),
    })),
    execute: async ({ dirPath }) => {
      try {
        const targetDir = dirPath || '.';
        const safePath = path.resolve(process.cwd(), targetDir);
        const files = await fs.readdir(safePath);
        return files.join('\n');
      } catch (error) {
        return `Error listing directory: ${error.message}`;
      }
    },
  }),

  read: tool({
    description: 'Read the contents of a file',
    inputSchema: zodSchema(z.object({
      filePath: z.string().describe('The path to the file to read'),
    })),
    execute: async ({ filePath }) => {
      try {
        const safePath = path.resolve(process.cwd(), filePath);
        const content = await fs.readFile(safePath, 'utf-8');
        if (content.length > 5000) {
          return content.slice(0, 5000) + "\n...[Truncated]";
        }
        return content;
      } catch (error) {
        return `Error reading file: ${error.message}`;
      }
    },
  }),
};

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});

console.log("🤖 Dev Assistant Online (Type 'exit' to quit)");
console.log("I can list files and read code in this directory.");

async function chat() {
  rl.question('\nYou: ', async (input) => {
    if (input.toLowerCase() === 'exit') {
      rl.close();
      return;
    }

    try {
      console.log("🤖 Thinking...");

      const { text, steps } = await generateText({
        model,
        tools: fsTools,
        stopWhen: stepCountIs(10),
        system: `You are a helpful developer assistant running in a Node.js environment.
You have access to the file system via 'ls' and 'read' tools.
Your working directory is: ${process.cwd()}
When asked to analyze code, always read the file content first.
If you need to inspect the current directory, call ls with dirPath=".".
If the user mentions README/readme, look for README files and read the relevant one.
Start by listing files if you are unsure where things are.`,
        prompt: input,
      });

      console.log(`\nAI: ${text}`);

      if (steps) {
        steps.forEach(step => {
          if (step.toolCalls && step.toolCalls.length > 0) {
            step.toolCalls.forEach(call => {
              console.log(`   [Tool Call] ${call.toolName}(${JSON.stringify(call.input)})`);
            });
          }
        });
      }

    } catch (error) {
      console.error("❌ Error:", error.message);
    }

    if (!rl.closed) {
      chat();
    }
  });
}

chat();

先看这份代码第一次让系统面对了什么

前面的工具不管是天气、加法还是检索，严格说都还在一个教学友好的封闭世界里。而这里不同：

工具开始访问真实文件系统
模型需要先探索项目结构
回答质量开始取决于它有没有真的读到代码
安全边界、上下文预算、可观测性都立刻变成现实问题

也就是说，这已经不是“多一个 demo”，而是开始接近一个真正能辅助开发工作的系统原型。

按实现流拆代码

1. 模型层和框架层延续前文，但工作对象已经换了

import { createGoogleGenerativeAI } from '@ai-sdk/google';
import { generateText, tool, stepCountIs, zodSchema } from 'ai';
import { z } from 'zod';
import dotenv from 'dotenv';
import fs from 'fs/promises';
import path from 'path';
import readline from 'readline';

dotenv.config();

const google = createGoogleGenerativeAI({
  apiKey: process.env.GEMINI_API_KEY
});

const model = google('gemini-3-flash-preview');

这里继续沿用上一篇的工程化调用方式，但新引入的依赖其实已经说明了重心变化：

fs/promises：读目录、读文件
path：控制路径解析和访问范围
readline：维持持续交互的 CLI 壳

系统不再只是解释原理，而是开始操作真实开发环境。

2. `fsTools`：这是第一组真正带生产力意味的工具

const fsTools = {
  ls: tool({ ... }),
  read: tool({ ... }),
};

只有两个工具，但设计上很克制。它们刚好覆盖一个开发助手最基础也最高频的探索链路：

先看目录
再找文件
再读源码
最后再给分析

很多所谓的“代码理解能力”，本质上就建立在这四步之上。

3. `ls`：让模型先建立项目地图，而不是空想目录结构

ls: tool({
  description: 'List files in a directory',
  inputSchema: zodSchema(z.object({
    dirPath: z.string().describe('The directory path to list (relative to current working directory). Use "." for the current directory.'),
  })),
  execute: async ({ dirPath }) => {
    try {
      const targetDir = dirPath || '.';
      const safePath = path.resolve(process.cwd(), targetDir);
      const files = await fs.readdir(safePath);
      return files.join('\n');
    } catch (error) {
      return `Error listing directory: ${error.message}`;
    }
  },
})

一个看目录的工具看起来很普通，但对 Agent 来说非常关键。因为在陌生项目里，它首先需要回答的是：

当前目录下有什么
入口可能在哪
该先看 README、配置文件还是源码
哪些文件名最像用户问题相关区域

没有 ls，模型就像被丢进一间没开灯的房间里，只能靠猜。

4. `read`：决定回答是不是基于事实，而不是模式匹配式胡猜

read: tool({
  description: 'Read the contents of a file',
  inputSchema: zodSchema(z.object({
    filePath: z.string().describe('The path to the file to read'),
  })),
  execute: async ({ filePath }) => {
    try {
      const safePath = path.resolve(process.cwd(), filePath);
      const content = await fs.readFile(safePath, 'utf-8');
      if (content.length > 5000) {
        return content.slice(0, 5000) + "\n...[Truncated]";
      }
      return content;
    } catch (error) {
      return `Error reading file: ${error.message}`;
    }
  },
})

这一段几乎定义了“开发助手有没有价值”的下限。

当用户问“帮我解释这个模块”“这个报错大概率在哪”“入口文件做了什么”时，如果系统不先读取真实文件，它的回答基本只是语料驱动的猜测。只有把源码本身拉进上下文，分析才开始有事实基础。

5. `path.resolve(process.cwd(), ...)`：最基础的权限边界意识

const safePath = path.resolve(process.cwd(), dirPath);

无论 ls 还是 read，都用了类似的路径解析方式。它虽然还不是完整沙箱，但已经表达了一个很重要的工程原则：

Agent 工具的能力范围应该围绕任务边界收紧，而不是一上来无差别开放。

对开发助手来说，“先限制在当前工作目录附近”是一个非常自然的起点。后面如果要继续做严，可以再补：

根目录白名单
.. 路径穿越校验
忽略敏感目录
文件类型过滤

6. 文件截断：上下文预算第一次变成显性代码

if (content.length > 5000) {
  return content.slice(0, 5000) + "\n...[Truncated]";
}

到了这里，前面讲过的“上下文治理”终于不再抽象，而是变成了真实实现决策：

文件太长怎么办
要不要分页读取
要不要按函数或 chunk 切片
lockfile、构建产物、日志文件是不是应该直接跳过

也就是说，Agent 一旦开始读真实代码库，“读文件”就不只是 IO 问题，而是上下文预算问题。

7. CLI loop：让它具备持续探索项目的工作模式

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});

async function chat() {
  rl.question('\nYou: ', async (input) => {
    if (input.toLowerCase() === 'exit') {
      rl.close();
      return;
    }

    try {
      console.log("🤖 Thinking...");
      const { text, steps } = await generateText({ ... });
      console.log(`\nAI: ${text}`);
    } catch (error) {
      console.error("❌ Error:", error.message);
    }

    chat();
  });
}

这部分把整个系统变成一个最小可持续工作的 agent shell：

接收问题
自主探索目录或文件
基于真实内容给分析
继续等待下一轮任务

它和真正的 coding agent 最大的区别，不是工作流，而是权限范围和工具数量。

8. system prompt：这里更像一份操作手册，而不是人格设定

system: `You are a helpful developer assistant running in a Node.js environment.
You have access to the file system via 'ls' and 'read' tools.
Your working directory is: ${process.cwd()}
When asked to analyze code, always read the file content first.
Start by listing files if you are unsure where things are.`,

这段 prompt 写得很对，因为它在定义工作方式，而不是空泛地强调“你很智能”。它规定了：

你是谁：developer assistant
你有什么：ls、read
你在哪：当前工作目录
分析代码时怎么做：先读文件
不确定时怎么探索：先列目录

成熟 Agent 的 prompt，往往更接近操作说明而不是人格文学。

9. 多步停止条件和工具日志：自主探索必须和可追踪一起出现

stopWhen: stepCountIs(10),

这个配置允许模型在一次任务里分多步行动，例如：

先 ls 看项目根目录
再 read 看 README 或入口文件
如果还不够，再继续读别的文件
最后才输出结论

而下面的日志：

if (steps) {
  steps.forEach(step => {
    if (step.toolCalls && step.toolCalls.length > 0) {
      step.toolCalls.forEach(call => {
        console.log(`   [Tool Call] ${call.toolName}(${JSON.stringify(call.input)})`);
      });
    }
  });
}

则让你能验证它到底有没有认真探索，而不是看到一个文件名就开始瞎猜。当前版本的日志也已经对齐到新版结构，会直接输出 ls({"dirPath":"."})、read({"filePath":"README.md"}) 这样的参数，而不是早期那种容易误导的 undefined。

责任边界：这时候系统里谁在负责什么

模型负责

理解开发问题
决定先 ls 还是先 read
基于读取到的文件内容组织回答

框架负责

托管多步 tool loop
统一工具定义和执行接口
暴露步骤供你调试

工具负责

访问真实文件系统
返回目录内容或文件文本
在边界内提供事实材料

应用负责

定义可访问范围
决定文件长度上限
决定是否开放写权限或执行权限
记录链路并控制风险

这四层一旦分清，你就会发现所谓“coding agent”并不是一个神秘新物种，而是一个把环境能力逐步接进来的系统。

这一步为什么可以视为“从 demo 到真家伙”的分界

因为从这里开始，Agent 的价值不再依赖你有没有给它编一个漂亮的模拟世界，而取决于它能不能在真实项目环境里稳定工作。

这也是为什么这一版虽然只读不写，仍然已经很像一个简化版 coding agent：

会探索
会读取
会依据真实文件分析
会暴露自己的工具调用轨迹
会在一定权限边界内工作

这些特征，比“能不能自动改代码”更接近一个可靠系统的起点。

还缺什么

当然，这距离真正大规模可用的 coding agent 还有距离，比如：

更严格的路径沙箱
搜索与索引能力
写文件与基于 diff 的编辑
命令执行与测试反馈
git、lint、build、test 集成
审批流和回滚机制

但这些已经是在一个成立的基础上继续扩展，而不是另起炉灶。

收尾

到这一篇，整条路线已经开始收束：你不只是看见了一个本地 Dev Assistant 怎么跑，而是看见了 Agent 是如何一步步从“会聊天的模型”演化成“能进入工作环境的系统”的。

真正的变化，不在于模型突然更强，而在于上下文管理、工具能力、工程化编排和环境接入终于开始汇合。后面如果继续往前走，加上搜索、编辑、执行、审批和版本控制，这条线自然就会通向真正能在团队里落地的 coding agent。

欢迎关注我的其它发布渠道

WeChat

AI Agent Dev Assistant File System AI SDK 风格框架 Coding Agent JavaScript