My Agent Skill for Test-Driven Development

(saturnci.com)

63 points | by laxmena 1 day ago

6 comments

zuzululu 1 minute ago
TDD sounds great on paper for agentic development but you quickly realize it balloons the token cost. Often I write some feature and then its repurposed or removed, code is refactored moved around as time goes. With TDD I would be taxed heavily and velocity slow to a crawl.
The waterfall approach is better after trying out TDD especially when you have a multi-agent setup. Also I found that in some cases the tests were just superficial hallucinations that never actually tested the components written or there some some context corruption and ultimately triggered a false positive that kicked off a completely unintentional refactoring.
servercobra 2 minutes ago
This overall is pretty close to how I've set up my implementation skill. One thing I'm curious about is how well the analogies like "We don't make dinner in a dirty kitchen." work vs something a lot more straightforward. Any input OP?
simonw 50 minutes ago
This article would benefit from a date. It looks like it's recent (Internet Archive first grabbed it on May 29th) but it's the kind of information that can quickly become stale as models and agents improve.
(I've been getting solid results recently from simply telling Claude Code and Codex "Test with uv run pytest, use red/green TDD".)
[-]
- porphyra 14 minutes ago
  A lot of prompt engineering goes out of date quickly. Nobody nowadays goes "you are an expert software engineer. make no mistakes" lol.
  As a personal anecdote, I find that a lot of big prompts and skills use up context window budget and in many cases agents will eagerly try to use a skill even if it isn't super relevant or necessary for the current task. So when I have too many skills I have to spend a bunch of time toggling the checkboxes to figure out which ones are needed for the task at hand before starting...
- disgruntledphd2 45 minutes ago
  Me too, although I dislike the fact that it over-focuses on mocks (which I accept is over-represented in the training data).
  [-]
  - galsapir 17 minutes ago
    sometimes I also feel it tries to optimise for "per line coverage" over more "real, complex use cases" type tests
dluxem 29 minutes ago
I believe using a skill here is the wrong approach. LLMs already know what TDD is and how to do it, just like object oriented programming.
If this is encoded in a skill, that skill essentially has to be loaded for everything thing your LLM is doing. This is probably one of the few areas where direct instructions via AGENTS.md is best, and I don't believe it requires much direction here to force the issue.
But I think the OP is just trying to have their agent work in a very specific way -- that is fine too.
> 5. Show me the test and ask for approval before continuing
behnamoh 1 hour ago
Snake oil. Just ask the model, all these custom agents/skills haven't proven that useful in practice.
[-]
- jw1224 1 hour ago
  Skills already are "just asking the model". Unless you'd prefer to type out the same instructions every single time?
  Skills are literally just Markdown documents that get loaded into context when the /skill-name is invoked.
  [-]
  - Zetaphor 45 minutes ago
    I think they're maybe confusing Skills and MCP servers
  - dominotw 45 minutes ago
    i belive gp means llms produce what they see in training data/rl there isnt much too much customization you can do with skills.
    they are being sold as more powerful than they are. Like llms are intelligent blank slates that can be customized with mere markdown files.
    [-]
    - calebkaiser 28 minutes ago
      I don't understand this line of criticism exactly. By putting new information in the context window, you are materially changing the activations at your point of sampling, which is literally "customizing with mere markdown files."
      Taken to the extreme, the attitude that there is some special incantation that will unlock all capabilities is silly, and a lot of the "prompt engineering" discourse is similarly kind of dumb, but in-context learning is clearly a real thing.
      [-]
      - dominotw 22 minutes ago
        even if that works one time you can never be sure that your customization is in place or fell out of context's important zone. you've reverted back to base llm behavior.
- internet101010 12 minutes ago
  Lol wut. One of first things people do at a company when they get enterprise LLM tools is share a skill with company-specific color palettes or standards for creating visualizations (I prefer Tufte's principles).
- coffeeaddict1 1 hour ago
  I disagree. Not all skills are useless. For example, I sometime use Qt for GUI projects and I have found their skills [0] very useful to improve the quality and performance of my projects. I their absence, I would each time have to direct the agents to find the docs or specific tools, wasting tokens and thus decreasing the quality of the output.
  [0] https://github.com/TheQtCompanyRnD/agent-skills
- pramodbiligiri 59 minutes ago
  I don't think the idea of skills is quite snake oil. It seems you can change what LLM outputs next by what's called few-shot prompting or in-context learning: https://www.promptingguide.ai/techniques/fewshot
- john_strinlai 1 hour ago
  not that i know much about the effectiveness of these skill files, i find it odd to call something given for free "snake oil", which i thought referred to the sale of fraudulent products (to the benefit of the snake oil salesperson), typically around healthcare-related stuff.
  [-]
  - dominotw 28 minutes ago
    i think gp is calling skills snakeoil in genral
- theptip 29 minutes ago
  Nah. Skills are great. But you should write your own.
- beezlewax 1 hour ago
  I've found them useful for in house stuff where you are using a specific design system or architecture. But custom everything works best. Are that Claude works well on its own though at this point.
- wyre 45 minutes ago
  Ya, if im constantly asking a model to do TDD development, you know what would make it a lot easier? A skill.
steno132 25 minutes ago
Test driven development is one of the worst ideas nowadays in the LLM age. We have models that can consistently write expert level, usually bug free code for you and rapidly fix even complex bugs in your codebase.
The token cost and tech debt introduced by tests is just not worth it. There's usually no bugs and if there are, you can fix them quickly if and when it's needed.