Senior Site Reliability Engineer
Your OpportunityAs a Site Reliability Engineer for Schwab's Portfolio Management Technology, you will be responsible for a sustainable approach to reliability using SRE principles. Our team is essential in supporting the operational reliability of real-time Trading and Portfolio performance applications for the firm. You will partner with multiple support teams to provide guidance and drive adoption of key reliability engineering practices in support of large-scale and mission-critical services. We are looking for a skilled engineer with disciplines that incorporate aspects of software systems engineering and operations. We are combining these skills to come up with better ways of managing and operating applications. The role will require a high level of responsibility and accountability yet has a support structure necessary for development growth What you are good at
- Support production environment and keep our shared environments available for customers.
- Partner within the Support organizations to build and rollout plans for enhanced telemetry and reduce defects for software delivery to multiple lower environments
- Triage alerts & diagnosing/resolving critical issues, handling implementation of changes
- Real-Time troubleshooting of critical application workflows and incorporating feedback to product development
- Hands-on enterprise systems administration, monitoring, and deployment activities
- Finding opportunities to build innovative tools and solving operations problems by building automation on a large enterprise and critical applications
- Building scripts to automate operational tasks & incorporating the solutions into infrastructure
- Coordinate and Collaborate with release management relating to infrastructure other critical changes.
- Develop and support automation and processes to enable teams to deploy, manage, configure, test, and monitor their applications
- Create and review documentation and process regarding recurring issues, new standard operating procedures, knowledge transfer material, etc.
- Collaborate with Engineering, Scrum and Ops resources to provide technical expertise and support on key initiatives for system availability and reliability.
- Ability to understand multiple technologies and how they inter-relate and integrate
- Bring a passion to stay on top of tech trends, experiment with and learn new technologies, participate in internal technology communities, and mentor other members of the team
- Review programming and environment changes and raise awareness for potential impacts Skills
- Coordinate capacity planning
- Participate in on-call and after hours support as needed .
- 10 + years of experience with enterprise level administration and support
- 10 + years of experience in troubleshooting and providing support to .NET/.NET Core Production applications
- Experience with Atlassian tools Jira, Confluence, Bamboo, BitBucket
- Experience and knowledge of noSQL database systems - MongoDB is a plus
- Working knowledge of Powershell scripting language and Windows administration
- Working knowledge of Windows Server and IIS webserver administration
- Working knowledge of in Cloud application configuration, deployment, support and migration - Google Cloud Platform/PCF is a plus
- Familiarity with logging/application monitoring tools (AppDynamics, Splunk, Zabbix, Nagios, etc.)
- Familiarity with large scale distributed systems and high-availability architectures
- Knowledge of one or more of Message Brokers such as Kafka, RabbitMQ
- SALT Stack experience or similar experience (Ansible for example) a plus
- Flexibility to operate in an environment with changing demands and priorities
- Ability to effectively engage subject matter experts and understand technical topics
- Financial services industry experience
- Agile methodologies
- Strong customer orientation with an affinity to proactively own, communicate, and follow-through projects and issues
- Extreme sense of ownership to resolve problems in a distributed environment
- Gritty resolve to dig deeper into technical issues in a complex trading eco -system
- A self-starter with the ability and confidence to independently resolve issues and bring results back to the team
- Outstanding verbal and written communication skills
- Ability to work collaboratively with internal and external stakeholders including onshore and offshore teams
Purple Drive Technologies LLC - Site Reliability Engineer (FULL- TIME)