Today’s organizations deal with a higher volume of change in a more complex tech environment leading to a higher risk of outages and incidents. IT teams must improve service reliability and system resiliency. With automation and observability becoming key factors for more efficient and rapid deployments, the SRE profile has become one of the fastest-growing enterprise roles and set of operational practices for managing services at scale.
To maintain the highest quality learning for our community, DevOps Institute Certifications expire two years from the date of completion. Members can maintain their certification by participating in the Continuing Education Program and earning Continuing Education Units through participation in learning opportunities.
At the end of the course, the following learning objectives are expected to be
- A practical view of how to successfully implement a flourishing SRE culture in your
- The underlying principles of SRE and an understanding of what it is not in terms of
anti-patterns, and how you become aware of them to avoid them.
- The organizational impact of introducing SRE.
- Acing the art of SLIs and SLOs in a distributed ecosystem and extending the usage of
Error Budgets beyond the normal to innovate and avoid risks
- Building security and resilience by design in a distributed, zero-trust environment.
- How do you implement full stack observability, and distributed tracing and bring about
an Observability-driven development culture?
- Curating data using AI to move from reactive to proactive and predictive incident
management. Also, how do you use DataOps to build clean data lineage?
- Why is Platform Engineering so important in building consistency and predictability of
- Implementing practical Chaos Engineering.
- Major incident response responsibilities for an SRE based on incident command
framework, and examples of the anatomy of unmanaged incidents.
- The perspective of why SRE can be considered the purest implementation of DevOps.
- SRE Execution model
- Understanding the SRE role and understanding why reliability is everyone’s problem.
- SRE success story learnings
- Implementing SRE and DevOps in the right way leads to higher Business Value
- Enhanced stability and reliability of services
- Major improvement of the product in the development, deployment, and operations life-cycle
- The increased balance between technical investment in reliability and customer experience
- Homogenous culture and greater synchronization between product, development, and operational teams Improvements in staff morale and retention
- Higher understanding of the practical implementation of SRE culture
- Designing services for higher security and reliability
- Building fault-tolerant distributed ecosystems that can be tested for risks of disaster
- Building observability and intelligence in operations
- Broader skills-based capabilities that leverage the latest in automation
- Higher understanding of other roles and contributing towards creating a better workplace culture
- Twenty-four (24) hours of instructor-led training and exercise facilitation
- Learner Manual (excellent post-class reference)
- Participation in unique exercises designed to apply concepts
- Sample documents, templates, tools, and techniques
- Access to additional value-added resources and communities
It is highly recommended that learners attend the SRE Foundation course with an accredited DevOps Institute Education Partner and earn the SRE Foundation certification prior to attending the SRE Practitioner course and exam. An understanding and knowledge of common SRE terminology, concepts, principles, and related work experience is recommended.
Successfully passing (65%) the 90-minute examination, consisting of 40 multiple-choice questions leads to the SRE Practitioner certificate. The certification is governed and maintained by DevOps Institute.