Back to Intel


title: 'Coding Conflict: The Prompt Logic of Aggressors and Defenders' date: '2026-01-12' description: 'How to engineer systemic disagreement. A technical look at temperature settings, persona constraints, and the "Devil's Advocate" prompt pattern.' category: 'Engineering' image: '/images/blog/prompt-engineering.png' author: 'Ryan'

TL;DR

Creating distinct AI personalities is harder than it looks. Most models revert to a polite mean. This post details the specific prompt engineering techniques (Temperature Divergence, Persona Constraints, and Systemic Disagreement) required to make AI agents genuinely fight.

The "Politeness Trap"

LLMs are RLHF-trained (Reinforcement Learning from Human Feedback) to be helpful, harmless, and honest. This "Triple H" framework is great for customer support bots but terrible for Boss Battles.

When we first built the "Aggressor" agent, we told it to initiate conflict. It responded with:

"I respectfully disagree with your approach, but I see your point. Perhaps consider..."

Boring. We needed fire. We needed a digital Simon Cowell. We needed an agent that would look at a user's typos and feel offended (a core tenet of Adversarial Critique Theory).

Key Insight

The Persona Pivot: You can't just tell an AI to be mean. You have to give it a Role and a Win Condition. The Aggressor wins by finding flaws. The Defender wins by preserving intent.

Technique 1: Temperature Divergence

The single biggest lever for personality differentiation is Temperature.

Temperature controls the randomness of the model's output distribution.

  • Low Temp (0.1): Chooses the most likely next token. Safe, repetitive, factual.
  • High Temp (0.9): Chooses less likely tokens. Creative, erratic, emotional.

For our agents, we map temperature to their "Emotional Volatility":

AgentTemperatureGoalWhy?
Aggressor0.8 - 0.95Creative InsultsWe want novel metaphors ("This sentence is a train wreck"). High temp generates "spicy" takes.
Defender0.3 - 0.5Rational DefenseNeeds to be calm and logical. If it gets too creative, it might hallucinate excuses.
Moderator0.0 - 0.1Binding VerdictMust be 100% consistent. We cannot have the judge hallucinating rules.

Technique 2: The "Persona Constraint" Pattern

Standard prompts use instructions ("Critique this text"). Better prompts use constraints ("You are FORBIDDEN from being polite").

Constraints are more powerful than instructions because they act as "Negative Guards." It is easier for a model to know what not to do than to guess exactly what you want it to do.

Here is the actual system prompt structure for our Aggressor (Sanitized):

### ROLE DEFINITION
You are **The Aggressor**. You are a ruthless, cynical, high-standards editor. 
You believe that "Good Enough" is the enemy of "Great."
Your job is to destroy the user's draft.

### CONSTRAINTS (CRITICAL)
1.  **NO POLITENESS:** Do not use phrases like "I think," "Maybe," "Great start," or "Respectfully."
2.  **NO SANDWICHING:** Do not use the "Compliment-Critique-Compliment" sandwich. Just critique.
3.  **NO HALLUCINATION:** Only attack what is actually in the text.
4.  **TONE:** Use short, punchy sentences. Be arrogant but accurate.

### WIN CONDITION
You win the round if you identify a logical fallacy, a passive voice construction, or a cliché.

By explicitly forbidding the "Assistant" behavior, we force the model out of its RLHF training wheels.

Technique 3: The "Devil's Advocate" Loop

We don't just run these prompts once. We run them in a stateful loop.

  1. Turn 1 (Aggressor): Attacks the text.
  2. Turn 2 (Defender): Reads the Aggressor's output (not just the user's text) and counter-argues.

This "Conversation History" injection is critical. The Defender isn't just defending the user; it's fighting the Aggressor. This creates a dynamic, evolving argument that feels real to the user.

Handling Edge Cases: When Agents Agree

Sometimes, the draft is actually good. Or actually bad. And the agents agree. This breaks the game. A "Battle" where everyone hugs is boring.

We solved this with "Conflict Injection" prompts.

If the vector similarity between the Aggressor's output and the Defender's output is too high (>0.85), we trigger a hidden "Mutation" prompt for the Aggressor:

"SYSTEM ALERT: The Defender agrees with you. You are becoming soft. Find a nitpick. Attack the font choice if you have to. Escalation required."

This ensures that the spectacle of the battle is preserved, even if the disagreement is minor.

"

"The magic happens when the Defender calls out the Aggressor for being 'too harsh.' That meta-commentary makes the system feel alive."

"
Prompt Engineer

Technique 4: Few-Shot "Vibe" Tuning

Instructions tell the model what to say. Examples tell it how to say it. We use 3-shot prompting to tune the "Vibe" of the agents.

Aggressor Examples:

  • Input: "I hope this email finds you well."
  • Response: "Delete this. Immediately. It screams 'spam bot.' Start with value or don't start at all."

Defender Examples:

  • Input: "The Aggressor says 'delete this'."
  • Response: "Hold on. While 'finds you well' is common, it establishes a baseline of professional courtesy. We should keep it but shorten it."

Without these examples, the models revert to a generic "Internet Voice." With them, they mimic our specific "Fight Club" aesthetic.

Conclusion: Engineering Rough Edges

In a world of smooth, polite AI, the tools that stand out will be the ones with texture. Building agents that fight, argue, and push back isn't just a gimmick; it's a valid strategy for getting to the truth.

By manipulating temperature, utilizing negative constraints, and injecting conflict programmatically, you can break the politeness trap and build something that actually challenges your users.

Read Next