The Problem with One-Shot AI Code Generation

One of the most frustrating experiences when using LLMs for code generation is syntactic hallucination — the model produces code that looks correct but crashes immediately at runtime. In 2025, the solution is not just a better prompt. It is a better architecture.

The Reflexion pattern gives your agent a retry loop with feedback:

  • Generate — the agent writes the code
  • Execute — the code is tested against a real runtime
  • Evaluate — success or failure is detected
  • Reflect — on failure, the error is fed back to the agent
  • Retry — the agent produces a corrected version
  • Tests show that models like GPT-4 increase their code generation success rate from ~60% to over 90% when this self-correction loop is applied. In n8n, this translates fragile automations into resilient, self-healing systems.

    The Architecture: A Conditional Loop

    Unlike a linear flow, this workflow uses a conditional loop. The four actors are:

    • Generator Agent — writes the initial code
    • Executor (Code Node / HTTP Request) — runs the code
    • Evaluator (If Node) — routes to success or failure path
    • Reflector Agent — reads the error and generates a corrected version

    Step 1: The Generator Agent

    Start with an AI Agent Node (or Basic LLM Chain).

    Set the System Prompt:

    You are a Python expert.
    Generate only the code, no markdown explanations.
    The code must solve this problem: {{$json.problem}}

    This node outputs raw Python code as a string.

    Step 2: The Execution Test

    Connect the agent's output to an HTTP Request Node calling an external code execution API. We use the Piston API for safety:

    {
      "url": "https://emkc.org/api/v2/piston/execute",
      "method": "POST",
      "body": {
        "language": "python",
        "version": "3.10.0",
        "files": [{ "content": "{{$json.output_code}}" }]
      }
    }

    The API returns run.stdout on success and run.stderr on failure.

    Step 3: The Evaluator (If Node)

    Add an If Node with the condition:

    • True path (success): run.stderr is empty → deliver the working code
    • False path (failure): run.stderr is not empty → trigger reflection

    Step 4: The Reflector Agent

    On the False path, add a second AI Agent Node with these dynamic inputs:

    • {{$json.output_code}} — the code that failed
    • {{$json.run.stderr}} — the exact error message
    System Prompt:
    You are a debugging agent.
    The following code failed with this error: [Insert Error].
    Analyze why it failed.
    Output ONLY the corrected code — no explanations.

    Step 5: Closing the Loop

    Connect the Reflector Agent's output back to the Executor's input. Critical: add a loop counter to prevent infinite retries.

    // At the start of the loop
    const attempts = $json.attempts || 0;
    if (attempts > 3) {
      throw new Error('Failed after 3 reflection attempts.');
    }
    return { ...$json, attempts: attempts + 1 };

    This limits the loop to 3 retries. After 3 failures, the workflow surfaces the error for human review rather than spinning endlessly.

    Why This Works

    The key insight is that the feedback signal — the actual runtime error — contains far more information than any static prompt improvement. The model does not guess what went wrong. It reads the exact traceback and fixes the precise issue.

    This pattern is especially powerful for:

    • Data transformation scripts — where schema mismatches cause subtle errors
    • API integration code — where authentication or endpoint formats change
    • JSON generation — where structural validation fails on first attempt
    Implementing Reflexion in n8n turns your automation from a one-shot gamble into a resilient, self-healing system that improves with every iteration.