Loading…
Thursday, October 3 • 11:00 - 11:45
Are We All on the Same Page? Let's Fix That

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

The industry defined as good practice to have as few alerts as possible, by alerting on symptoms that are associated with end-user pain rather than trying to catch every possible way that pain could be caused.

Organizations with complex distributed systems that span dozens of teams can have a hard time following such practice without burning out the teams owning the client-facing services. A typical solution is to have alerts on all the layers of their distributed systems. This approach almost always leads to an excessive number of alerts and results in alert fatigue.

Adaptive Paging is an alert handler that leverages the causality from tracing and OpenTracing's semantic conventions to page the team closest the problem. From a single alerting rule, a set of heuristics can be applied to identify the most probable cause, paging the respective team instead of the alert owner.

Speakers
LM

Luis Mineiro

Zalando SE
Luis's broad background in software engineering includes experience in DevOps, networks, mobile development, and more. Luis has been with Zalando since 2013—shaving yaks and creating the most beautiful bike sheds in the Shop team, later joining Platform Infrastructure to support... Read More →


Thursday October 3, 2019 11:00 - 11:45 BST
Track 2: The Liffey A