Lead Engineer, Observability at Auth0
United States of America
At Auth0, we are growing our Infrastructure team and we are looking for an experienced engineering lead to help us take our Observability practice to the next level.

What does the Infrastructure team do?
We value ownership and innovation, and we build our teams with that in mind. We want each team to be responsible and accountable for what they ship. We also don't want to reinvent the wheel every time, so we try to get alignment in terms of practices and technologies. Our philosophy to achieve this is relying on excellent tooling and automation over policies and processes. We aim to provide internal tools and services that other teams want to use to make their life easier when shipping their features.

Today the Infrastructure Services team provides the following services to other engineering teams:
- Storage Services: MongoDB, ElasticSearch, Postgres, Dynamo, backups and restores, etc.
- Networking Services: VPCs, Load Balancers, Service Discovery, etc.
- Observability Services: EKK (instead of Logstash we use AWS Kinesis), Datadog, our logging/metrics SDK, etc.
- Release Management: CI, CD, feature flags

As we continue to grow, we are creating new teams that own more specific parts of Auth0. To encourage and simplify operational ownership, we will be splitting our existing services into smaller, more decoupled ones that individual teams will own. At the same time, we want to allow new teams that are forming to quickly be able to go from development to production in a reliable way and following recommended practices.

What are we doing next?
From an observability perspective, having multiple teams and multiple services means two things:
- Educating engineers about what to log, measure and alert on
- Providing easy ways to understand the state of the system at a given point in time, including the ability to trace requests across multiple services.

In our case, this means:
- Providing request tracing capabilities across different components (think Zipkin, AWS X-Ray).
- Maintaining our current EKK stack (or any other tool and infrastructure) that allows teams to search their logs when troubleshooting issues.
- Having teams learn about which metrics and events are important and why, through guidance, documentation and internal talks.
- Providing a single platform to create dashboards and alerts. Today, in our public cloud we do this with our logging/metrics SDK, Datadog and Pagerduty.
- Providing tools and a platform for measuring and reporting availability for individual services
What will you be doing?

You will be leading this team
Designing and implementing features and bug fixes for the services the team owns
Helping people on the team grow and further their careers
Hiring new people to join the team
Providing context and direction so team members can do their best work
Being part of the Infrastructure team's on-call rotation
You'd be a good fit if

You have very good knowledge about a variety of infrastructure and general development topics, technologies, and tendencies
You have worked in an environment that runs multiple services owned by different teams, where there are multiple deployments a day, to services handling a large number of transactions per second
You understand that people problems, not technical ones, are the hardest to solve
You know when to let the team figure things out on their own and also have the necessary context and skills to help them out when they need hands-on support
You are a great communicator
You can add value to a conversation even when you are not familiar with something
You enjoy thinking about how to make life simpler for other engineers
You love and advocate for customers