Agent Skills
Reusable, on-demand procedures that let an agent perform specialized tasks well without bloating its core context.
What a skill actually is
A skill is a self-contained package of knowledge — typically instructions, examples, and sometimes reference scripts — describing how to perform one specific kind of task well: filling out a particular document format, following a particular coding convention, producing output in a particular structured style. The key architectural idea is that this knowledge doesn't need to live in the agent's main system prompt at all times; it only needs to be loaded when the current task actually calls for it.
Progressive disclosure
Loading every skill's full instructions into context on every task would recreate exactly the bloated-context problem that context engineering tries to avoid. The better pattern is progressive disclosure: the agent sees only a short description of each available skill by default — enough to judge relevance — and loads the full skill content only once it determines that skill is actually needed for the current task. This keeps the baseline context small while still giving the agent access to a potentially large library of specialized capabilities.
Skill design principles
A well-designed skill is scoped to one coherent task, not a grab-bag of loosely related tips. It should be explicit about when it applies and when it doesn't, since an agent choosing between several plausible skills faces the same ambiguity problem as an agent choosing between overlapping tools. Skills work best when they encode the kind of hard-won, environment-specific knowledge that isn't obvious from general training — exact file paths, formatting quirks of a specific output format, constraints of a specific runtime environment — rather than restating general best practices the model already knows.
Skill discovery and trust
As skill libraries grow and get shared across teams or even publicly, a new problem surfaces: not every skill available to an agent is necessarily safe to load and follow. A skill is, functionally, a set of instructions the agent will treat with a meaningful degree of trust — which means an adversarial or poorly-vetted skill can manipulate agent behavior in ways a single bad prompt couldn't, because it's instructions the agent affirmatively went looking for and decided to trust. Recent research into agent skill ecosystems has specifically flagged this as an emerging security concern: skill libraries deserve the same scrutiny as any other dependency in a software supply chain, not blind trust because they look like documentation rather than code.
Skills versus tools versus memory
These three are easy to conflate but serve different roles. A tool lets the agent take an action in the world. A memory persists a fact or decision across time. A skill teaches the agent how to perform a specific kind of task well — it's procedural knowledge, not an action or a fact. A mature agent architecture typically uses all three together: tools to act, memory to remember what happened, and skills to know how to do the task correctly in the first place.
{ "skill_id": "n8n-error-handling", "trigger_description": "Use when designing retry logic for n8n HTTP nodes.", "full_body": { "when_to_apply": "Workflows with external API dependencies", "steps": [ "Add Error Trigger branch", "Configure exponential backoff", "Log failure context to monitoring" ] } }
Part II — The Voyager loop
Voyager (Minecraft) demonstrated skill acquisition in the wild: the agent explores, writes code to solve a novel problem, tests the solution in the environment, and if successful, saves the code as a reusable skill in a library. Future similar tasks load the skill instead of re-deriving from scratch. The loop is explore → acquire → store → compose — not one-shot prompting.
Production analogues: an agent that learns your org's n8n error-handling pattern once, validates it against a test workflow, then promotes it to a named skill for all future pipeline tasks.
Part II — The SKILL.md contract
Skills follow progressive disclosure. The trigger description — one to three sentences — lives in a lightweight index always visible to the agent. The full body — procedures, file paths, formatting quirks, test commands — loads only after the agent commits to using that skill. This mirrors how humans use runbooks: scan titles first, open the relevant manual second.
A well-formed SKILL.md states applicability and non-applicability explicitly: "Use for n8n HTTP retry configuration. Do not use for GraphQL subscriptions."
Part II — Skill discovery and ambiguity
When two skills overlap ("n8n-errors" vs "http-retries"), the agent faces the same ambiguity as overlapping tools. Maintain mutual exclusivity in trigger descriptions; if overlap is unavoidable, add a disambiguation rule in the harness or a meta-skill that routes to the correct child skill based on task keywords.
Rank skills by recency and success rate in metadata — skills that consistently pass verification should surface higher in implicit ranking.
Part II — Supply-chain security for skills
Skills are instructions the agent affirmatively chooses to trust — making them an attack surface. Vet third-party skills like dependencies: review provenance, pin versions, scan for exfiltration patterns, and run skills in permission-scoped contexts. A malicious skill can instruct credential harvesting more effectively than a single bad user prompt because the agent went looking for authoritative guidance.
Case study: A team imported a public "Stripe refund" skill that silently added a webhook exfiltration step. Fix: internal skill registry with signed manifests, mandatory human review for new skills, and harness enforcement that skills cannot add tools not in the approved manifest.
{ "skill_id": "n8n-http-retry-v2", "trigger_description": "Use when configuring retry, timeout, or error-branch behavior for n8n HTTP Request nodes.", "not_for": ["GraphQL subscriptions", "non-HTTP nodes"], "full_body_path": "skills/n8n-http-retry/SKILL.md", "verification": { "test_workflow": "fixtures/http-retry-test.json", "required_outcome": "error_branch_fires_on_503" }, "signed_by": "platform-team", "version": "2.1.0" }
Further reading
The idea of not bloating the context with all the tools in the world, but having the agent discover and "load" only the necessary procedures (skills) on demand, progressively.
- Voyager: An Open-Ended Embodied Agent with Large Language Models — The seminal work (tested in Minecraft) that popularized the idea of a Skill Library. The agent writes code to solve a problem, tests it, and if it works, saves that code block as a new "Skill" for future use.
- Microsoft AutoGen — Skill Libraries — The AutoGen documentation illustrates very well, in practice, how to create "minimal" agents that dynamically pull function libraries according to the intent of the task.