Let's All Write Good Software - Will Wilson
Transcribed from Bug Bash Conference Talk
Introduction & Conference Purpose
Speaker: Will Wilson, first speaker at the inaugural Bug Bash conference
Central Question: Why does software have bugs?
Wilson begins by challenging the accepted norm that software failures are inevitable, contrasting this with our expectations of other complex human-made systems like buildings and airplanes that we expect to work reliably.
Why Software Has Bugs: Common Explanations Examined
1. "Software Engineering Isn't Real Engineering"
Wilson's Assessment: Doesn't care - this is tautological
- Traditional engineering requires certification, quality standards, and auditing
- Software engineering is "the wild west" - anyone can call themselves a software engineer
- Even if true, doesn't explain why we haven't fixed this situation
2. "Nobody Cares"
Wilson's Assessment: Terrible argument
- References Hammurabi's Code (6000 years ago): "If a builder builds a house and it collapses, the builder shall be put to death"
- Argues this was once true but no longer applies because:
- "Software is eating the world" (Marc Andreessen)
- Software runs critical infrastructure (hospitals, airplanes)
- CrowdStrike outage example - likely caused deaths
- Boeing 737 MAX disasters (600 deaths) included buggy software in failure cascade
- AI will "drastically increase the stakes for software correctness"
3. "Software Is Hard"
Wilson's Assessment: Neutral - partly true, partly false
What makes software uniquely difficult:
Quote from Fred Brooks (The Mythical Man Month):
"The programmer, like the poet, works only slightly removed from pure thought stuff. He builds his castles in the air from air, creating by exertion of the imagination."
Two fundamental challenges:
-
Infinite Malleability: Software components can modify the "laws of physics" under other components
- Hardware engineers worry about interference between components
- Software engineers must worry about components reaching down and modifying the foundational systems
-
Near-Zero Marginal Cost: No natural pressure to simplify
- Hardware has mass, volume, and power budgets that force constraint
- Software has no such external pressure - all pressure is self-imposed
- Results in "teetering monstrosities" like Electron apps
4. "Software Is Early"
Wilson's Assessment: Favorite explanation - doesn't get enough attention
Timeline Analysis:
- Programming computers: ~1950s
- Software "eating the world": ~1990s (being generous)
- Only ~30 years of experience with software at scale
- 30 years is "ridiculously short" in human timescale
Airplane Safety Analogy:
- 30 years after commercial aviation became popular, airplanes were incredibly dangerous
- Airplanes have been getting "twice as safe every 10 years, reliably" - like Moore's Law
- Shows exponential improvement over decades
The Safety vs. Popularity Graph:
- Early: Airplanes dangerous but few people flying (low absolute deaths)
- Middle: Airplanes getting safer but much more popular (deaths increase)
- Later: Safety improvements outpace popularity growth (deaths decrease despite exponential growth)
Software's Position: We're in the middle of this graph - software is everywhere and controlling everything, more reliable than the 1990s, but absolute incidents are up because usage is exponential.
The Fundamental Challenge: No Silver Bullet
Turing's Proof: No mechanical process can determine with perfect accuracy whether a software program will do the right thing.
Why this is actually good news: The proof shows that computer programs are very powerful and "can do anything" - power and unpredictability are inextricably linked.
Solution: "We need to do all the things"
- Better languages that make bugs harder to write
- Formal methods for proving program properties
- Testing to verify properties in the real world
- Observability to notice and react when tests fail
- People and culture to make it all happen
Testing vs. Observability: False Opposition
Wilson's Position: Testing and observability are wrongly viewed as antagonistic when they share the same goal - software that works.
Wilson's Background:
- Primarily a "testing guy" interested in software testing for quality
- Also an "observability guy" responsible for large production systems
- Goal: Create synthesis better than either approach alone
The Observability Worldview
Central Insight: You control your software until deployment, then you have no control.
The Reality of Production:
- Users (good and bad, unpredictable)
- Hardware, networks, system administrators
- All can help or harm your software unpredictably
Observability Response:
- Don't try to predict everything
- Be excellent at monitoring and reacting quickly
- Collect production data (metrics, logs, spans, traces)
- Set automated alerts
- Enable proactive analysis and response
Epistemic Humility: No pretense of complete knowledge - acknowledges uncertainty.
The Testing Critique
How Observability Views Testing:
- System you built and control
- Test you built and control
- Run test until green
- Ship with confidence
- "Obviously it's not [going to work]"
The Problem: Test environment doesn't match production reality.
Autonomous Testing: Bridging the Gap
Definition: Term coined by Wilson's team 5 years ago to unify various approaches that kept being "reinvented" by different communities.
Previous Names for the Same Concept:
- Fuzzing
- Property-based testing
- Generative testing
- Deterministic simulation testing
- Rare event simulation (physics)
Core Difference from Conventional Testing:
- Conventional: Write individual tests
- Autonomous: Write test generators
Test Generators:
- If run infinitely, would output all possible tests
- In practice, creates probability distribution over all possible tests
- Distribution is "leaky" - includes cases you never thought to test
Key Advantage: Replaces testing "what you thought to test" with testing via "evil model of evil user" that's probabilistic and unpredictable.
Adding Fault Injection
Beyond Bad Users: Simulate bad world conditions
- Traditional test fixtures cause specific failures at specific points
- Autonomous approach: Create probability distribution over all possible failures
- Result: Test generator outputs both user behaviors AND world failures
Benefits of Test Environment:
- Massive Parallelism: Scale fake users infinitely (unlike real users)
- Complete Control: Better debugging tools, no production outage risk
- Timing Advantage: Find problems before users do, debug under less pressure
Pre-Observability: The Synthesis
Current Testing Limitation: Tests only provide pass/fail results (like limited alerts)
Missing Capability: Proactive exploration without being "paged"
- Good SREs read production logs proactively
- Learn about system behavior even when not getting alerts
- Similar to people management - "walking around looking for trouble"
The Vision: Apply observability techniques to test environments
Concrete Example: The 3 AM Page
Scenario: Transient issue that happens 2-3 times every 6 months
- Alert at 3 AM with cryptic message
- Reboot server, problem goes away temporarily
- No idea what causes it
- Best you can do: Add logging and hope
Pre-Observability Approach:
- Check Test Environment: Has this issue been happening in tests?
- Higher Frequency: Fault injection makes rare bugs more common
- Statistical Power: Can now bisect, correlate with changes
- Log Analysis: Enough examples to find patterns and correlations
- Time Travel: Rewind simulation to see system state before bug
- Safe Experimentation: Try fixes without risking production
- Close the Loop: Convert findings into test properties/alerts
Example Investigation:
- Find associated error about "something being compacted away"
- Determine errors always occur together
- Query etcd endpoints: two replicas have value 86, one has no value
- Rewind time: all replicas previously had same value
- Try writes to unstick database
- Identify as etcd bug, not application bug
Conclusion: The Future of Software Quality
What We've Built: A workflow that looks like both testing and observability
Key Innovation: "Pre-observability" - observability for an alternate universe with worst-case users and conditions
The Process:
- Take real production alerts
- Ask test system: "Have you seen this before?"
- Use autonomous testing + fault injection for amplification
- Leverage controlled environment for deep debugging
- Create test properties/alerts for future prevention
Wilson's Assessment: This approach is "tremendously powerful" and being rolled out to customers.
Conference Goal: Bring together diverse perspectives on making software good and safe, emphasizing that this is about software correctness, not just testing.
Meta-Commentary
Wilson positions Bug Bash as a "software correctness conference" hosted by testing company Antithesis, but emphasizing it's not just about testing - it brings together communities from languages, formal methods, testing, observability, and culture to work toward the shared goal of software that works.
The talk argues for synthesis rather than division between different approaches to software quality, using autonomous testing enhanced with observability techniques as a concrete example of how seemingly opposed methodologies can be combined for better results.