Senior Site Reliability Engineer Direct end client

Job Details

  • ID#43429032
  • Address 28201 , Charlotte,

    North Carolina

    Charlotte USA
  • Job type

    Contract

  • Salary USD Depends on Experience Depends on Experience
  • Hiring Company

    Projas Technologies, LLC

  • Showed21st June 2022
  • Date20th June 20222022-06-20T00:00:00-0700
  • Deadline19th August 2022
  • Category

    Systems/networking

Senior Site Reliability Engineer Direct end client

Vacancy expired!

Incident Management:- Delivering Incident Command for high-severity incidents- Running blameless postmortem reviews for high-severity incidents- Assisting in developing automated incident detection and response improvements Operational Excellence:- Delivering data analysis (Incident Management, Change Management, Service Availability etc)- Creation of regular reporting/insights and advancing automation of such to reduce manual toil- Conducting Production Readiness Reviews for new services- Reviewing of upcoming production change requests Incident Management - Incident Command for high-severity incidents Incident Management - Communications & Updates for high-severity incidents Operational Excellence - Reporting and analytics (Incident Management, Change Management, Service Availability etc)- 7+ years of experience in a web-centric Linux production environment in a NOC or DevOps in a continuous release environment- Experience in running critical incidents from a technical leadership position- Experience with Computer Engineering with a focus on Infrastructure, Platform, and Application (Cloud, Containerization, Container orchestration, Network, Application Reliability, Database Architecture) and an understanding of full stack and the SDLC (Software Development Life Cycle)- Experience running and monitoring applications at scale, using metrics and tracing tools like Prometheus, Influx, Grafana, New Relic, Data Dog, Stackdriver, Zipkin, etc- Professional experience with Python, Go, or similar programming languages- Experience developing production quality tooling- Familiarity with SRE methodologies; passionate about solving operational challenges by using automation and software- Ability to communicate effectively vertically and horizontally within the organization through demonstrating written and verbal communication skills- Scala, Typescript, JS, Java, C,)- The team also develops automation and AI capabilities to ensure minimum toil across the engineering organization- Lead essential incidents in our environment with a focus on troubleshooting and fast restoration of our essential services- Provide insights on trends on issues affecting reliability and partner in cross functional projects to provide scalable solutions- Review high risk platform changes to minimize impact to the site- Work within a large distributed system based on Kubernetes and Google Cloud services- Maintain an automation-centric vision and incorporate SRE methodologies to increase reliability and decrease toil- Participate in technical design and architecture decisions and contribute to technical troubleshooting in various parts of the system

Vacancy expired!