In the ever-evolving world of technology, the concept of observability has become a crucial aspect of system monitoring and debugging. At QCon London 2026, Colin Douch, a site reliability engineer at DuckDuckGo, delved into the complexities and realities of self-hosted observability, offering a thought-provoking perspective on this often-overlooked aspect of system management.
The Complexity Demon
Douch began his session by painting a vivid picture of the 'complexity demon' that haunts developers and engineers. This demon, he argued, is a constant companion in the world of complex systems, and observability is the tool we use to keep it at bay. But here's the catch: the very tools meant to simplify debugging can themselves become complex beasts to manage.
The SaaS vs. Self-Hosted Dilemma
Many organizations, Douch pointed out, opt for SaaS (Software as a Service) solutions to outsource their observability needs. However, the session focused on the challenges and considerations of running observability infrastructure in-house. Douch's warning was clear: 'Should I run my own Observability stack? No, at least not until you have exhausted each and every other option.' This statement sets the tone for a deep dive into the pros and cons of self-hosted observability.
The Cost of Self-Hosted Observability
Douch highlighted the significant resources required for self-hosted observability: at least 2-3 full-time engineers and a substantial financial investment. This is a major commitment, and one that organizations should not take lightly.
Choosing the Right Tools
When it comes to selecting tools for a self-hosted observability stack, Douch recommended Prometheus or VictoriaMetrics for metrics, emphasizing the importance of structuring logs and storing them in a columnar database. He also cautioned against sprinkling logs into the mix, as it can lead to a mess of unusable data.
The Modular Ecosystem
In practice, self-hosted observability often involves a loosely coupled system of projects and tools, including metrics collectors, distributed tracing frameworks, and log aggregation tools. While this modular approach offers flexibility, it also introduces operational overhead.
Open-Source Tooling
Douch reviewed the open-source ecosystem, endorsing OpenTelemetry for traces and recommending Prometheus Text Exposition and JSON for logs and metrics. He argued that the complexity of OpenTelemetry is justified, but advised against using it for metrics or logs.
Connecting the Signals
A key takeaway from Douch's talk was the importance of treating logs, metrics, and traces as interconnected signals rather than separate silos. He emphasized that the value of observability lies in the connections between these data sources, as logs are a subset of traces and metrics are aggregations of the same underlying data.
Designing a Coherent Telemetry Pipeline
Ultimately, building an observability platform is not just about selecting tools, but designing a coherent telemetry pipeline. It's about understanding the interplay between different data sources and ensuring they work together seamlessly.
Final Thoughts
Douch's session at QCon London 2026 offered a valuable insight into the world of self-hosted observability. It highlighted the challenges and considerations that organizations must navigate when deciding whether to outsource or manage their observability infrastructure in-house. As Douch emphasized, it's a decision that requires careful thought and a deep understanding of the complexities involved.