Let's All Write Good Software - Will Wilson

Transcribed from Bug Bash Conference Talk


Introduction & Conference Purpose

Speaker: Will Wilson, first speaker at the inaugural Bug Bash conference

Central Question: Why does software have bugs?

Wilson begins by challenging the accepted norm that software failures are inevitable, contrasting this with our expectations of other complex human-made systems like buildings and airplanes that we expect to work reliably.


Why Software Has Bugs: Common Explanations Examined

1. "Software Engineering Isn't Real Engineering"

Wilson's Assessment: Doesn't care - this is tautological

2. "Nobody Cares"

Wilson's Assessment: Terrible argument

3. "Software Is Hard"

Wilson's Assessment: Neutral - partly true, partly false

What makes software uniquely difficult:

Quote from Fred Brooks (The Mythical Man Month):

"The programmer, like the poet, works only slightly removed from pure thought stuff. He builds his castles in the air from air, creating by exertion of the imagination."

Two fundamental challenges:

  1. Infinite Malleability: Software components can modify the "laws of physics" under other components

    • Hardware engineers worry about interference between components
    • Software engineers must worry about components reaching down and modifying the foundational systems
  2. Near-Zero Marginal Cost: No natural pressure to simplify

    • Hardware has mass, volume, and power budgets that force constraint
    • Software has no such external pressure - all pressure is self-imposed
    • Results in "teetering monstrosities" like Electron apps

4. "Software Is Early"

Wilson's Assessment: Favorite explanation - doesn't get enough attention

Timeline Analysis:

Airplane Safety Analogy:

The Safety vs. Popularity Graph:

Software's Position: We're in the middle of this graph - software is everywhere and controlling everything, more reliable than the 1990s, but absolute incidents are up because usage is exponential.


The Fundamental Challenge: No Silver Bullet

Turing's Proof: No mechanical process can determine with perfect accuracy whether a software program will do the right thing.

Why this is actually good news: The proof shows that computer programs are very powerful and "can do anything" - power and unpredictability are inextricably linked.

Solution: "We need to do all the things"


Testing vs. Observability: False Opposition

Wilson's Position: Testing and observability are wrongly viewed as antagonistic when they share the same goal - software that works.

Wilson's Background:

The Observability Worldview

Central Insight: You control your software until deployment, then you have no control.

The Reality of Production:

Observability Response:

Epistemic Humility: No pretense of complete knowledge - acknowledges uncertainty.

The Testing Critique

How Observability Views Testing:

The Problem: Test environment doesn't match production reality.


Autonomous Testing: Bridging the Gap

Definition: Term coined by Wilson's team 5 years ago to unify various approaches that kept being "reinvented" by different communities.

Previous Names for the Same Concept:

Core Difference from Conventional Testing:

Test Generators:

Key Advantage: Replaces testing "what you thought to test" with testing via "evil model of evil user" that's probabilistic and unpredictable.

Adding Fault Injection

Beyond Bad Users: Simulate bad world conditions

Benefits of Test Environment:

  1. Massive Parallelism: Scale fake users infinitely (unlike real users)
  2. Complete Control: Better debugging tools, no production outage risk
  3. Timing Advantage: Find problems before users do, debug under less pressure

Pre-Observability: The Synthesis

Current Testing Limitation: Tests only provide pass/fail results (like limited alerts)

Missing Capability: Proactive exploration without being "paged"

The Vision: Apply observability techniques to test environments

Concrete Example: The 3 AM Page

Scenario: Transient issue that happens 2-3 times every 6 months

Pre-Observability Approach:

  1. Check Test Environment: Has this issue been happening in tests?
  2. Higher Frequency: Fault injection makes rare bugs more common
  3. Statistical Power: Can now bisect, correlate with changes
  4. Log Analysis: Enough examples to find patterns and correlations
  5. Time Travel: Rewind simulation to see system state before bug
  6. Safe Experimentation: Try fixes without risking production
  7. Close the Loop: Convert findings into test properties/alerts

Example Investigation:


Conclusion: The Future of Software Quality

What We've Built: A workflow that looks like both testing and observability

Key Innovation: "Pre-observability" - observability for an alternate universe with worst-case users and conditions

The Process:

  1. Take real production alerts
  2. Ask test system: "Have you seen this before?"
  3. Use autonomous testing + fault injection for amplification
  4. Leverage controlled environment for deep debugging
  5. Create test properties/alerts for future prevention

Wilson's Assessment: This approach is "tremendously powerful" and being rolled out to customers.

Conference Goal: Bring together diverse perspectives on making software good and safe, emphasizing that this is about software correctness, not just testing.


Meta-Commentary

Wilson positions Bug Bash as a "software correctness conference" hosted by testing company Antithesis, but emphasizing it's not just about testing - it brings together communities from languages, formal methods, testing, observability, and culture to work toward the shared goal of software that works.

The talk argues for synthesis rather than division between different approaches to software quality, using autonomous testing enhanced with observability techniques as a concrete example of how seemingly opposed methodologies can be combined for better results.

← Incoming Links (1)

Index
wiki • Line 30
"- Let's All Write Go..."

→ Outgoing Links

No outgoing links