Our Approach to the AI PR Problem: Learning from tldraw's Experience

tldraw, a popular React drawing library, recently announced they’re pausing external contributions due to challenges with AI-generated pull requests.

“Like many other open-source projects on GitHub, tldraw has recently seen a significant increase in contributions generated entirely by AI tools. While some of these pull requests are formally correct, most suffer from incomplete or misleading context, misunderstanding of the codebase, and little to no follow-up engagement from their authors.”

— tldraw Issue #7695

While we don’t have comprehensive data on how widespread this issue is, tldraw’s experience resonates with challenges we’ve observed in our own project and heard about anecdotally from other maintainers.

The “AI PR Tsunami” Problem

Here’s what’s happening:

Developer uses Claude Code or GitHub Copilot to implement a feature
AI writes code that compiles and passes basic tests
PR gets submitted with confidence (“AI said it’s good!”)
Maintainer reviews and finds:
- Pattern violations (doesn’t match existing conventions)
- Architectural drift (solves problem incorrectly for this codebase)
- Missing context (no ADR, no documentation update)
- Incomplete implementation (edge cases ignored)

Sound familiar?

The problem isn’t AI. The problem is AI doesn’t know YOUR architectural decisions.

Why This Happens: The Context Gap

AI coding assistants are phenomenal at:

✅ Writing syntactically correct code
✅ Implementing well-known algorithms
✅ Generating boilerplate
✅ Suggesting completions based on immediate context

But they’re terrible at:

❌ Understanding your project’s architectural philosophy
❌ Following established patterns you’ve codified
❌ Remembering decisions from ADRs written 6 months ago
❌ Knowing when to break rules (and when not to)

Example: Your team decided to use React Query for data fetching (documented in docs/decisions/003-use-react-query.md). AI suggests a custom useEffect hook because it doesn’t know about that decision.

Result? Architectural drift.

Multiply this by 10 contributors using AI tools, and you get the “tsunami.”

Common Approaches (And Their Challenges)

Teams are trying different approaches:

More human code review → Can lead to maintainer burnout
Stricter PR templates → Mixed results with compliance
Lengthy contribution guides → AI tools may not reference them
Closing external contributions → Reduces community involvement (tldraw’s choice)

The challenge: Current AI coding tools don’t automatically incorporate project-specific architectural context.

Our Experiment: Pattern-Based Review

We’re testing an idea: what if there was a tool that could:

Read your documented architectural decisions
Check code against your documented patterns
Flag potential violations before submitting PRs
Reference your project’s unique conventions

That’s what we’re building with Guardian. It’s early, and we’re learning as we go.

Meet Guardian: AI-Powered Code Review

Guardian is Cortex TMS’s new CLI tool that audits code against your project’s documented patterns and architectural decisions.

How It Works

Step 1: Document your patterns (you should already be doing this)

docs/core/PATTERNS.md

# Data Fetching Pattern

❌ Don't: Custom useEffect hooks for API calls
✅ Do: Use React Query for all server state

**Why**: Prevents stale data, reduces boilerplate, standardizes error handling

## Example

// ❌ Violates pattern
const [data, setData] = useState(null);
useEffect(() => { fetch('/api/users').then(setData) }, []);

// ✅ Follows pattern
const { data } = useQuery({ queryKey: ['users'], queryFn: fetchUsers });

Step 2: Run Guardian on AI-generated code

review.sh

cortex review src/components/UserList.tsx

Step 3: Guardian catches violations

guardian-output.txt

🔍 Reviewing src/components/UserList.tsx

❌ Pattern Violation (Line 12-18)
Pattern: Data Fetching
Severity: HIGH

Found: Custom useEffect hook for API call
Expected: React Query (see docs/decisions/003-use-react-query.md)

Suggestion: Replace useEffect with useQuery hook

---

✅ Architecture: Follows component structure pattern
✅ Domain Logic: Proper error boundary usage
❌ Pattern Compliance: 1 violation found

Review complete. Fix violations before committing.

Step 4: Fix before committing

No more “oops, I’ll fix that in the next PR.”

Our Early Experience: Dogfooding Guardian

We’ve been testing Guardian on Cortex TMS itself. Here’s what we’ve observed so far:

Important Context

These are preliminary results from our own small project (~20 PRs over 2 weeks, single maintainer). This is not independent verification or proof of effectiveness at scale.

Before Guardian (AI-assisted development, manual review)

⏱️ Review time: 20-30 min per PR (our experience)
🔄 Back-and-forth cycles: 3-4 rounds (typical for us)
🐛 Pattern violations reaching main: ~15% (in our commits)
😓 Maintainer frustration: High (subjective)

After Guardian (AI + Guardian pre-review)

⏱️ Review time: 7-10 min per PR (our experience, ~66% reduction)
🔄 Back-and-forth cycles: 1-2 rounds (our experience)
🐛 Pattern violations reaching main: ~3% (in our commits)
✅ Clean PRs: ~80% pass first review (our experience)

Our takeaway: In our limited testing, Guardian has helped us catch issues earlier. We can’t yet claim this will work for every project or team.

Our Hypothesis: Pattern-Based Pre-Review

Guardian is our experiment to address these challenges. The approach:

Enforcing Documented Patterns
- AI-generated code must match YOUR conventions
- Violations flagged immediately with references to docs
Preventing Architectural Drift
- Code audited against ADRs and domain logic
- Ensures consistency across AI-assisted contributions
Reducing Review Burden
- Maintainers focus on logic, not style/pattern violations
- 66% faster review cycles
Educating Contributors
- Guardian explains WHY code violates patterns
- Links to relevant documentation for context
Scaling Code Review
- Same quality bar whether you have 5 or 50 contributors
- AI-assisted contributions become assets, not liabilities

Real Example: Guardian in Action

Here’s a real commit from Cortex TMS development where Guardian caught an issue:

PR: Add blog infrastructure (TMS-284a)

Guardian Review:

review.sh

cortex review website/src/pages/blog/index.astro

❌ Pattern Violation (Line 8)
Pattern: Component Imports
Expected: Import CardGrid from '@astrojs/starlight/components'
Found: Manual grid implementation

Reference: docs/core/PATTERNS.md#starlight-components

✅ Fixed before commit

Result: Clean PR, no maintainer back-and-forth, pattern consistency maintained.

Get Started: Three Ways to Use Guardian

1. Pre-Commit Hook (Recommended)

.husky/pre-commit

# .husky/pre-commit
cortex review $(git diff --staged --name-only --diff-filter=d | grep -E '\.(ts|tsx|js|jsx)$')

2. Manual Review Before PR

manual-review.sh

# Review your changes before committing
cortex review src/

3. CI/CD Integration

.github/workflows/guardian.yml

# .github/workflows/guardian.yml
- name: Guardian Review
run: cortex review --strict src/

Guardian vs. Traditional Linting

ESLint/Prettier: Syntax and formatting (necessary but not sufficient)

Guardian: Architectural patterns and domain logic

Tool	Scope	Example Check
ESLint	Syntax	”Missing semicolon”
Prettier	Formatting	”Inconsistent indentation”
Guardian	Architecture	”Should use React Query, not useEffect (ADR-003)”

Guardian complements linting—it doesn’t replace it.

What We’re Learning

AI-assisted development is here to stay. The challenge is finding ways to maintain code quality and architectural consistency.

Our approach with Guardian is experimental. We’re testing whether pattern-based pre-review can help:

Keep AI-accelerated development productive
Maintain architectural consistency
Reduce review burden on maintainers
Scale community contributions effectively

Is it working? Too early to say definitively. Our early internal results are promising, but we need more real-world testing and feedback.

Trade-offs & When Guardian Doesn’t Help

Guardian is an experiment, and like all tools, it has real costs and limitations.

1. False Positive Noise

False Positive Risk

Guardian can flag violations that aren’t actually violations (hallucinations).

Impact: Instead of reducing review burden, this creates a NEW type of noise—developers have to evaluate whether Guardian’s feedback is correct.

Our current mitigation: We treat Guardian as a “second opinion,” not ground truth. When Guardian flags something, we manually review and decide whether to fix or dismiss it.

Future work: We’re considering adding // guardian-ignore comments or a CLI flag to suppress specific checks. This would make the “dismiss” decision explicit and portable.

Why it’s hard: Building a guardian-ignore is easy; building one that doesn’t just become a way for lazy AI agents to bypass quality checks is the real challenge. We’re thinking carefully about how to prevent ignore comments from becoming an escape hatch for low-quality contributions.

2. Documentation Drift (Architectural Debt)

Documentation Drift

Guardian enforces whatever is in PATTERNS.md. If your team changes a pattern but forgets to update the doc, Guardian will enforce outdated rules.

Impact: Developers get frustrated when Guardian enforces patterns the team no longer follows. This creates architectural debt—the gap between documented patterns and actual practices grows.

Our mitigation: We treat PATTERNS.md as “living documentation”—when we change a pattern in code, we update the doc in the same PR. Guardian actually helps here by forcing us to keep docs current.

The upside: This friction surfaces documentation drift immediately, preventing the accumulation of architectural debt caused by lack of shared context.

3. Cost & Latency

Cost Transparency

Running an LLM on every pre-commit hook adds latency (5-15 seconds) and API costs ($0.01-0.05 per review).

Our approach: Two deployment modes

Pre-commit (Education Tool): Helps developers learn patterns in real-time. You catch violations before they’re committed. Best for onboarding and AI-assisted development.

CI/CD (Safety Net): Runs on PR creation/update. Doesn’t slow down local commits. Ensures standards are never breached in the main branch. Best for high-frequency workflows.

Cost transparency (BYOK model):

Pre-commit: Individual contributor pays (uses their API key)
CI/CD: Project maintainer/org pays (API key stored in CI secrets)

Cost example: At ~$0.03 per review and 20 reviews/week, that’s ~$0.60/week or ~$30/year per developer.

Our workflow: We use pre-commit for features (learning mode) and CI for hotfixes (safety net mode).

When You Shouldn’t Use Guardian

Skip Guardian If

Your team already has strong pattern adherence (no AI-assisted development)
Your patterns aren’t clearly documented yet (Guardian will be confused)
You’re optimizing for commit speed over review quality
Your team is skeptical of LLM-based tools (adoption matters more than tech)
You can’t afford the latency (~~10 seconds) or cost (~~$0.03 per review)

Remember: Guardian is a bet that ~10 seconds of pre-commit checking saves 10+ minutes of human review. If that math doesn’t work for your team, don’t use it.

Try Guardian (Experimental)

Guardian is part of Cortex TMS v2.7 (released January 2026). It’s new and experimental—we’re still learning what works.

install-guardian.sh

# Install globally
npm install -g cortex-tms

# Initialize in your project
cortex init

# Review code against your patterns
cortex review src/

Requirements:

docs/core/PATTERNS.md (document your conventions)
OpenAI or Anthropic API key (BYOK - bring your own key)

Current Limitations

Early stage tool - expect false positives and false negatives
Target accuracy: 70%+ on architectural violations (unverified)
Works best when patterns are clearly documented
LLM API costs apply (you provide your own key)

We’d love your feedback on whether this approach is useful for your project.

Learn More

📖 Guardian CLI Reference - Full command documentation
🎯 Use Case: Open Source Maintainers - How Guardian scales OSS
🏢 Use Case: Enterprise Teams - Team governance at scale
🤝 AI Collaboration Policy - How we build with AI

Join the Conversation

Have you experienced the “AI PR Tsunami” in your projects? How are you handling it?

Built using our own standard. Every feature in Cortex TMS, including Guardian, is dogfooded during development. We experience the problems before you do. See how we build →

The “AI PR Tsunami” Problem

Why This Happens: The Context Gap

Common Approaches (And Their Challenges)

Our Experiment: Pattern-Based Review

Meet Guardian: AI-Powered Code Review

How It Works

Our Early Experience: Dogfooding Guardian

Before Guardian (AI-assisted development, manual review)

After Guardian (AI + Guardian pre-review)

Our Hypothesis: Pattern-Based Pre-Review

Real Example: Guardian in Action

Get Started: Three Ways to Use Guardian

1. Pre-Commit Hook (Recommended)

2. Manual Review Before PR

3. CI/CD Integration

Guardian vs. Traditional Linting

What We’re Learning

Trade-offs & When Guardian Doesn’t Help

1. False Positive Noise

2. Documentation Drift (Architectural Debt)

3. Cost & Latency

When You Shouldn’t Use Guardian

Try Guardian (Experimental)

Learn More

Join the Conversation

Sources

Related Articles

From Zero Docs to AI-Ready in 10 Minutes

Why AI Agents Need More Than a README