Senior Site Reliability Engineer

Job Details

  • ID#49987200
  • Address 80110 , Englewood,

    Colorado

    Englewood USA
  • Job type

    Contract

  • Salary USD $73 - $88 73 - 88
  • Hiring Company

    NomiSo

  • Showed24th May 2023
  • Date17th May 20232023-05-17T00:00:00-0700
  • Deadline16th July 2023
  • Category

    Et cetera

Senior Site Reliability Engineer

Vacancy expired!

About the Role:The Site Reliability Engineer (SRE) will be responsible for both uplifting and maintaining our evolving technology platforms, infrastructure and technology controls. As an SRE, the role will include both oversight for production operations of our systems, as well as development/engineering of solutions to maximize system reliability & automation. The role will address three dimensions:
  • Tools Coverage – Assess the tools coverage and ensure sufficient monitoring is in place to enable mature observability and data driven decision making
  • Defining and educating Engineering teams - Process, Procedures, GuardGuide Rails and best practices
  • Culture – Inculcate the culture of high performing teams and adopt the ways of working with the influence of SRE
The role will need to work with a global team responsible for a mission critical business function, and will partner with Infrastructure, DevOps and Core practices (like Security, Identity, ProdOps, Cloud platform and Tools) teams to identify and implement automation opportunities to drive down toil, reduce technical debt and improve system reliability.

Day to Day Responsibilities:
  • Work with DevOps teams to Build, Release, Monitor and run the services to improve service reliability.
  • Write software to automate API-driven tasks at scale and contribute to the product codebase in Java, JS, React, Node, Go and Python
  • Write automation to reduce toil and eliminate manual tasks that are repeatable.
  • Work with Ansible, Puppet, Chef, Terraform or another config management / orchestration suite, know where it's broken, work towards fixing them and explore new alternatives
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system reliability
  • Handle cross team performance issues from identification of the cause, determining the areas of improvement and driving those actions to closure
  • Performance and maturity baselining of DevOps process, tools maturity & coverage, metrics, technology and engineering practices
  • Define, Measure and improve Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Ops process (Incident, Problem Mgmt.) and streamline – automate release management. Build dashboards to provide visibility into performance of the applications.
  • Understand the current process, system setup and propose the improvements needed in the processes, and technology so that the application exceeds the desired Service Level Objective.
  • Strong believer of automation to bring in sustained continuous improvement by automating Toil, Runbooks, improving ability of the applications to auto heal leading to improved reliability

Must Have:
  • 7+ years of Development and Operations experience in building and running applications in production that has uptime over 99%. related experience and/or training; or equivalent combination of education and experience
  • 7+ years of experience as a SRE in handling applications that are web scale
  • Strong hands-on coding experience in one or more programming languages such as Python, Golang, Java, Bash, etc.
  • Good understanding of Observability (monitoring, logging, tracing, metrics), Chaos engineering concepts.
  • Proficiency in using Application Performance Monitoring (APM) tool New Relic for monitoring, logging, tracing.
  • Expert level hands on knowledge in public cloud platform AWS and/or Google Cloud Platform. Professional level certificate on one of the public clouds is highly desirable.
  • Must have hands-on experience in using configuration management systems such as Ansible or SaltStack and infrastructure automation tools like Terraform or CloudFormation.
  • Should have used altering systems such as Pager Duty.
  • Should have implemented solutions around Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for services. Measurement should have been within a system and across systems in distributed systems
  • Should have supported Production Incidents (PIs) on critical applications of a company. Troubleshoot, debug, and diagnose operational issues and drive them to closure.
  • Understanding of software delivery life cycles, particularly Agile/Lean & DevOps
  • Proven experience in handling large scale and growing infrastructure across Data Centers and heterogeneous Cloud platforms
  • Experience as a service owner in managing large – geographically diverse stakeholders
  • Ability to work with creative – fast growing engineering team and motivate them to deliver their best work
  • History of driving innovation
  • Bachelor’s/Master’s Degrees

Nice to Have:
  • Familiarity with handling:
  • Containerization – Kubernetes, Docker, Rancher, etc
  • Kafka, Yarn, ElasticSearch etc.
  • Source code management and Implementation of Security best practices.
  • Tech Stack - Python, Falcon, Elastic Search, MongoDB, AWS (SQS S3), Map Reduce
  • Networking knowledge
  • Understanding of software delivery life cycles, particularly Agile/Lean & DevOps
  • Contribution to open source community

About NomiSo: NomiSo is a Product Engineering company currently focussed on Video Stream Engineering, backed by AI and ML. We are a team of Software Engineers, Architects and Cloud Experts with more than 100 years of combined expertise in Technology and Delivery Management. At NomiSo we encourage entrepreneurial spirit - to learn, grow and improve. A great workplace, thrives on ideas and opportunities. That is a part of our DNA. We’re in pursuit of colleagues who share similar passions, are nimble and thrive when challenged. We offer a positive, stimulating and fun environment – with opportunities to grow, a fast-paced approach to innovation, and a place where your views are valued and encouraged.We invite you to push your boundaries and join us in fulfilling your career aspirations! Our mission is to Empower and Enhance the lives of our customers, through simple solutions for their complex business problems.

Vacancy expired!