Site Reliability Engineer
Vacancy expired!
Position Summary.We are seeking Site Reliability Engineers for our team who thrive on pushing the limits of technology to produce state of the art solutions. Our SRE team is challenged with creating scalable solutions for monitoring live trading infrastructures, building command frameworks, and generating actionable alerts. SRE team members are tasked with keeping the TT trading platform running on a day-to-day basis, and ensuring a stable platform by working on a number of strategic initiatives to ease operational work. The SRE team continually builds tools to monitor the state of the TT platform and take action to correct as needed. Job Responsibilities.
- Code, script and automate using Python
- Create and enhance tools to make operational workflows more automated and less error-prone
- Define metrics needed to measure service performance and health, and implement and maintain metrics collection tools
- Provide troubleshooting and support of trading system issues across the software, hardware, and network stacks to ensure that services are restored immediately
- Participate in design discussions, review sessions and prototyping
- Ensure the scalability and availability of the platform
- Work one-on-one with other application teams to ensure proper monitoring and tools are in place before the application moves into a live environment
- Act as part of a global team that facilitates operational coverage based on business need
- Proficiency in Python, with a minimum 3 years of experience required
- Proven experience with Icinga2, Prometheus, or ELK
- Experience with AWS is a plus
- Knowledge of Chef is a plus
- Experience providing troubleshooting and support for trading systems is a plus
- Solid understanding of functional programming, object oriented programming and computer science foundations
- Good understanding of backend and server side components
- Ability to work outside of standard business hours on an as-needed basis
- Proven and strong communication skills
- Must be self-directed, flexible and have the ability to prioritize and handle multiple projects simultaneously
- Experience working in an Agile environment a plus
- Competitive benefits, including: medical, dental, vision, FSA, 401(k) and pre-tax transit/parking
- Flexible work schedules - with some remote work
- 22 PTO (paid time off) days per year with the ability to roll over days into the following year, robust paid holiday schedule with early dismissal, generous parental leave (for all genders and staff, including adoptive parents) and backup child care as well as tutoring services
- Tech resources, including, a “rent-to-own” program where employees are eligible for a company-provided Mac/PC laptop and/or mobile phone of your choice; and a tech accessories budget for monitors, headphones, keyboards, office equipment, etc.
- Stipend and subsidy contributions toward personally-owned cell phones and laptops, gym memberships and health/wellness initiatives (including discounted healthcare premiums, healthy meal delivery programs or smoking cessation)
- Casual dress code and inspiring, motivating office environment
- Forward-thinking, culture-based organization with collaborative teams that promote diversity and inclusion through efforts such as TT Women in Tech and a committee dedicated to making TT a great place to work for everyone
- Office is conveniently located above Union Station and close to various public transportation