Let's All Write Good Software - Will Wilson

Transcribed from Bug Bash Conference Talk

Introduction & Conference Purpose

Speaker: Will Wilson, first speaker at the inaugural Bug Bash conference

Central Question: Why does software have bugs?

Wilson begins by challenging the accepted norm that software failures are inevitable, contrasting this with our expectations of other complex human-made systems like buildings and airplanes that we expect to work reliably.

Why Software Has Bugs: Common Explanations Examined

1. "Software Engineering Isn't Real Engineering"

Wilson's Assessment: Doesn't care - this is tautological

Traditional engineering requires certification, quality standards, and auditing
Software engineering is "the wild west" - anyone can call themselves a software engineer
Even if true, doesn't explain why we haven't fixed this situation

2. "Nobody Cares"

Wilson's Assessment: Terrible argument

References Hammurabi's Code (6000 years ago): "If a builder builds a house and it collapses, the builder shall be put to death"
Argues this was once true but no longer applies because:
- "Software is eating the world" (Marc Andreessen)
- Software runs critical infrastructure (hospitals, airplanes)
- CrowdStrike outage example - likely caused deaths
- Boeing 737 MAX disasters (600 deaths) included buggy software in failure cascade
- AI will "drastically increase the stakes for software correctness"

3. "Software Is Hard"

Wilson's Assessment: Neutral - partly true, partly false

What makes software uniquely difficult:

Quote from Fred Brooks (The Mythical Man Month):

"The programmer, like the poet, works only slightly removed from pure thought stuff. He builds his castles in the air from air, creating by exertion of the imagination."

Two fundamental challenges:

Infinite Malleability: Software components can modify the "laws of physics" under other components
- Hardware engineers worry about interference between components
- Software engineers must worry about components reaching down and modifying the foundational systems
Near-Zero Marginal Cost: No natural pressure to simplify
- Hardware has mass, volume, and power budgets that force constraint
- Software has no such external pressure - all pressure is self-imposed
- Results in "teetering monstrosities" like Electron apps

4. "Software Is Early"

Wilson's Assessment: Favorite explanation - doesn't get enough attention

Timeline Analysis:

Programming computers: ~1950s
Software "eating the world": ~1990s (being generous)
Only ~30 years of experience with software at scale
30 years is "ridiculously short" in human timescale

Airplane Safety Analogy:

30 years after commercial aviation became popular, airplanes were incredibly dangerous
Airplanes have been getting "twice as safe every 10 years, reliably" - like Moore's Law
Shows exponential improvement over decades

The Safety vs. Popularity Graph:

Early: Airplanes dangerous but few people flying (low absolute deaths)
Middle: Airplanes getting safer but much more popular (deaths increase)
Later: Safety improvements outpace popularity growth (deaths decrease despite exponential growth)

Software's Position: We're in the middle of this graph - software is everywhere and controlling everything, more reliable than the 1990s, but absolute incidents are up because usage is exponential.

The Fundamental Challenge: No Silver Bullet

Turing's Proof: No mechanical process can determine with perfect accuracy whether a software program will do the right thing.

Why this is actually good news: The proof shows that computer programs are very powerful and "can do anything" - power and unpredictability are inextricably linked.

Solution: "We need to do all the things"

Better languages that make bugs harder to write
Formal methods for proving program properties
Testing to verify properties in the real world
Observability to notice and react when tests fail
People and culture to make it all happen

Testing vs. Observability: False Opposition

Wilson's Position: Testing and observability are wrongly viewed as antagonistic when they share the same goal - software that works.

Wilson's Background:

Primarily a "testing guy" interested in software testing for quality
Also an "observability guy" responsible for large production systems
Goal: Create synthesis better than either approach alone

The Observability Worldview

Central Insight: You control your software until deployment, then you have no control.

The Reality of Production:

Users (good and bad, unpredictable)
Hardware, networks, system administrators
All can help or harm your software unpredictably

Observability Response:

Don't try to predict everything
Be excellent at monitoring and reacting quickly
Collect production data (metrics, logs, spans, traces)
Set automated alerts
Enable proactive analysis and response

Epistemic Humility: No pretense of complete knowledge - acknowledges uncertainty.

The Testing Critique

How Observability Views Testing:

System you built and control
Test you built and control
Run test until green
Ship with confidence
"Obviously it's not [going to work]"

The Problem: Test environment doesn't match production reality.

Autonomous Testing: Bridging the Gap

Definition: Term coined by Wilson's team 5 years ago to unify various approaches that kept being "reinvented" by different communities.

Previous Names for the Same Concept:

Fuzzing
Property-based testing
Generative testing
Deterministic simulation testing
Rare event simulation (physics)

Core Difference from Conventional Testing:

Conventional: Write individual tests
Autonomous: Write test generators

Test Generators:

If run infinitely, would output all possible tests
In practice, creates probability distribution over all possible tests
Distribution is "leaky" - includes cases you never thought to test

Key Advantage: Replaces testing "what you thought to test" with testing via "evil model of evil user" that's probabilistic and unpredictable.

Adding Fault Injection

Beyond Bad Users: Simulate bad world conditions

Traditional test fixtures cause specific failures at specific points
Autonomous approach: Create probability distribution over all possible failures
Result: Test generator outputs both user behaviors AND world failures

Benefits of Test Environment:

Massive Parallelism: Scale fake users infinitely (unlike real users)
Complete Control: Better debugging tools, no production outage risk
Timing Advantage: Find problems before users do, debug under less pressure

Pre-Observability: The Synthesis

Current Testing Limitation: Tests only provide pass/fail results (like limited alerts)

Missing Capability: Proactive exploration without being "paged"

Good SREs read production logs proactively
Learn about system behavior even when not getting alerts
Similar to people management - "walking around looking for trouble"

The Vision: Apply observability techniques to test environments

Concrete Example: The 3 AM Page

Scenario: Transient issue that happens 2-3 times every 6 months

Alert at 3 AM with cryptic message
Reboot server, problem goes away temporarily
No idea what causes it
Best you can do: Add logging and hope

Pre-Observability Approach:

Check Test Environment: Has this issue been happening in tests?
Higher Frequency: Fault injection makes rare bugs more common
Statistical Power: Can now bisect, correlate with changes
Log Analysis: Enough examples to find patterns and correlations
Time Travel: Rewind simulation to see system state before bug
Safe Experimentation: Try fixes without risking production
Close the Loop: Convert findings into test properties/alerts

Example Investigation:

Find associated error about "something being compacted away"
Determine errors always occur together
Query etcd endpoints: two replicas have value 86, one has no value
Rewind time: all replicas previously had same value
Try writes to unstick database
Identify as etcd bug, not application bug

Conclusion: The Future of Software Quality

What We've Built: A workflow that looks like both testing and observability

Key Innovation: "Pre-observability" - observability for an alternate universe with worst-case users and conditions

The Process:

Take real production alerts
Ask test system: "Have you seen this before?"
Use autonomous testing + fault injection for amplification
Leverage controlled environment for deep debugging
Create test properties/alerts for future prevention

Wilson's Assessment: This approach is "tremendously powerful" and being rolled out to customers.

Conference Goal: Bring together diverse perspectives on making software good and safe, emphasizing that this is about software correctness, not just testing.

Meta-Commentary

Wilson positions Bug Bash as a "software correctness conference" hosted by testing company Antithesis, but emphasizing it's not just about testing - it brings together communities from languages, formal methods, testing, observability, and culture to work toward the shared goal of software that works.

The talk argues for synthesis rather than division between different approaches to software quality, using autonomous testing enhanced with observability techniques as a concrete example of how seemingly opposed methodologies can be combined for better results.