Leveling up your observability practice — Part 2
Lessons from the front lines: Challenges in your observability maturity journey
In our previous blog, we explored the observability maturity spectrum — revealing that while only 7% of organizations consider themselves experts, the majority (43%) are actively working to improve their practices. We saw how mature organizations achieve better outcomes, from faster root cause analysis to reduced user-reported incidents.
Now, let's tackle the practical side of advancing your observability maturity. We'll explore the common challenges teams face at different stages of their journey, from early-stage hurdles like cross-team collaboration to the scaling challenges that even experts grapple with. You'll discover concrete steps to level up your maturity, including insights on postmortems, service level objectives (SLOs), and emerging technologies like OpenTelemetry (OTel).
Finally, we'll examine the crucial role leadership plays in driving observability success for an organization and how to effectively advocate for resources and support.
Common challenges at different maturity levels
As organizations progress in their observability journey, the challenges they face evolve, as seen in the 2024 State of Observability survey.
Early-stage challenges:
Lack of collaboration between teams
Insufficient skills and expertise
High levels of toil
Mature/expert challenges:
Tool scale and performance issues
Managing different requirements across teams
As an SRE, understanding the typical progression of challenges in observability implementations can help you better prepare for and navigate your own journey. While every organization's path is unique, certain patterns emerge as teams move from initial implementation to mastery of their observability practices.
Early-stage observability maturity challenges
In the early stages of observability adoption, teams often face challenges that are more organizational than technical in nature. Consider the common scenario where development and operations teams — despite having access to the same observability tools — effectively speak different languages when discussing system health. For example, developers might focus primarily on application-level metrics while operations teams concentrate on infrastructure metrics, creating a disconnect that can significantly impact incident response times and system improvement initiatives.
This collaboration gap represents just one aspect of early-stage challenges. Another significant hurdle is building and maintaining the right expertise across the team. Without adequate knowledge sharing and training, organizations often find themselves dependent on a few key individuals who become bottlenecks for progress. This becomes particularly evident when junior team members struggle with complex tasks like querying distributed traces or correlating metrics across systems.
The prevalence of toil — those repetitive, manual tasks that consume valuable time and resources — presents another significant early-stage challenge. Think about teams spending hours each week manually updating dashboards and alert thresholds across different environments. This not only drains team resources but also introduces the risk of human error into monitoring setups.
Mature/expert observability maturity challenges
As observability practices mature, however, the nature of challenges evolves. Teams that have successfully built a strong observability culture often find themselves grappling with scale and performance issues. This might manifest as exponential growth in logging volume that leads to storage concerns and performance bottlenecks, requiring sophisticated sampling strategies and retention policies.
At the expert level, a common challenge emerges around managing different requirements across various teams within the organization. Imagine trying to create a unified observability framework that can accommodate diverse monitoring needs and compliance requirements while maintaining consistency and efficiency — no small feat, even for experienced teams.
Understanding this progression helps teams better prepare for what lies ahead and avoid the trap of trying to solve tomorrow's problems before addressing today's fundamentals. For those just starting, the focus should be on building strong collaborative practices and investing in team education. More mature teams need to concentrate on technology optimization and standardization while maintaining flexibility to support diverse needs.
Practical steps for advancing observability maturity
So, how do organizations move from novice to expert? Here are some concrete steps:
Embrace postmortems: Only 8% of early-stage companies regularly run postmortems compared to 45% of mature/expert companies. Make postmortems a standard practice after incidents to drive continuous improvement and minimize toil.
Implement SLOs: 89% of mature/expert companies use SLOs with 48% basing them on golden signals. Start by defining SLOs for your most critical services and basing them on industry standards.
Invest in skills development: Focus on key areas like monitoring and observability, automation and scripting, and performance tuning. These skills were seen as the most critical for SREs in our recent observability practitioner survey.
Adopt AI and machine learning (ML): 72% of teams are already using AI/ML for observability. Look for opportunities to implement these technologies, particularly for helpful use cases like log correlation and anomaly detection.
Standardize on OpenTelemetry: While adoption is still in the early stages, 87% of decision-makers see it becoming a standard within five years. Start experimenting with OTel now to future-proof your observability stack.
Unify your observability platform: Consider platforms like Elastic Observability that integrate logs, metrics, and APM. This can help address the tool scale and performance issues that mature teams often face.
If you’d like more details and observability insights, download the 2024 State of Observability: A practitioner perspective report.
The role of leadership for observability maturity
One interesting finding from the survey was the disconnect between practitioners and leadership when it comes to understanding the value of new technologies like OpenTelemetry. As SREs, we have an opportunity (and, I'd argue, a responsibility) to bridge this gap. Here are a few ways to do that:
Quantify the impact: Use data to show how improved observability translates to better business outcomes.
Speak the language of business: Frame observability improvements in terms of customer satisfaction, revenue protection, and operational efficiency.
- Advocate for resources: Use the data from this survey to make the case for investing in observability maturity.
The observability journey never ends
Remember, observability maturity isn't a destination — it's a journey. Even those 7% who classify themselves as experts are continually learning and adapting. The key is to keep pushing forward, learning from each incident, and continuously refining your practices. You’ll start to reduce toil for your team and allow everyone to focus on higher value activities.
As you progress on this journey, you'll likely find that your role as an SRE becomes more rewarding. You'll spend less time firefighting and more time on proactive improvements. You'll collaborate more effectively with other teams. And most importantly, you'll deliver more reliable, performant services to your users.
So, whether you're just starting out or well on your way to expert status, keep leveling up your observability maturity. Your future self (and your users) will thank you. Take the Elastic Observability Maturity Assessment to find out where you stand with observability today!
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.