Stop Guessing. Start Optimizing. How to build Self-Improving Agents.
Why "vibes-based" prompt engineering is dead. A dive into Microsoft’s agent-lightning.
If you are still tweaking prompts like “Please be helpful” or “Think step-by-step”, you are doing it wrong. You are treating AI like a chatbot, not a system.
The era of static agents is long over.
Microsoft just quietly dropped a framework called Agent Lightning, the main goal of Agent-lightning is to provide a structured way to train your agents. Just like you train a machine learning model on data, you can train an agent on a task dataset. This could involve using Reinforcement Learning (RL) to teach it new behaviors.
Step 1: Define the Agent 🤖
Instead of a standard function, we use the @agl.rollout decorator. This tells the framework: “Track everything happening inside here.”
import agentlightning as agl
# The framework will inject optimized prompts here automatically
@agl.rollout
def room_booking_agent(task, prompt_template: agl.PromptTemplate):
# 1. The Agent uses the dynamic prompt to think
# It decides if it needs to check availability or just answer
response = llm.chat(
system=prompt_template.format(**task),
user=task[”input”]
)
# 2. Reward Function (The Judge)
# 1.0 = Perfect booking, 0.0 = Failed
reward = grade_booking(response, task[”expected_room”])
return rewardStep 2: The Training Loop: How the magic happens
Training in Agent-lightning revolves around a clear, managed loop, orchestrated by the Trainer. The diagram below illustrates this core interaction:
The Loop Explained:
Algorithm → Agent (through Trainer):
The Algorithm is like the “brain.” It creates a better prompt template and decides what tasks the Agent should do. The Trainer sends both the prompt and tasks to the Agent.Agent → Algorithm (through Trainer):
For each task, the Agent follows the prompt template and tries to solve the task. While doing this, it records everything it does (these are called spans) and calculates a reward based on how well it did. Then it sends the spans and rewards back to the Algorithm through the Trainer.Algorithm Learning:
The Algorithm looks at the spans and rewards and learns how the Agent can do better. For example, it might create an improved prompt. The next tasks will use this better prompt, repeating the loop.
The Algorithm:
The Algorithm is the smart part that helps the system get better. We use Automatic Prompt Optimization (APO), which works like this:
Evaluate: It first tests the current prompt by running some tasks to see how well it works.
Critique: It looks at the detailed steps from those tasks and uses a strong LLM (like GPT-5-mini) to make a text critique of the prompt. For example: “The prompt is unclear about what to do if two rooms are equally good.”
Rewrite: Then it gives this critique and the original prompt to another LLM (like GPT-4.1-mini) to create a better prompt.
This process repeats, improving the prompt bit by bit. To use it, you just start the APO class with your settings.
This is where we swap “Manual Engineering” for “Machine Learning.” We use the Trainer to run the APO algorithm.
from agentlightning import Trainer, APO
# Initialize the optimizer
algo = APO(model=”gpt-4o”)
trainer = Trainer(
algorithm=algo,
n_runners=8, # Parallel processing
initial_resources={
“prompt_template”: “You are a booking assistant.” # Start dumb!
}
)
# Start the loop
# This runs the agent against your dataset and evolves the prompt
trainer.fit(agent=room_booking_agent, train_dataset=my_tasks, val_dataset=dataset_val)Training Results
The APO algorithm successfully improved the agent’s performance. We ran the example with the following hyper-parameters:
val_batch_size= 10gradient_batch_size= 4beam_width= 2branch_factor= 2beam_rounds= 2
The validation accuracy on the 29 samples of datasets steadily increase from 0.57 (baseline) to 0.721 (after round 2). The tuning takes around 10 minutes with 8 runners. We ran twice, and the results are shown in the chart below.
It learned edge cases - like handling room capacity constraints - that a human might forget to write in the prompt.
The Alpha
Don’t fall in love with your prompts. Let the data dictate them.
Traces > Vibes: You can’t optimize what you don’t measure. Agent Lightning handles the tracing (Spans) for you.
Scale: This works for RAG, Coding Agents, or any tool-use scenario.
