Software Engineer (Site Reliability), Retail Engineering
This role demands extensive hands on experience of working as SRE engineer for large scale, customer facing Cloud applications. Candidate should have good understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts. Candidate should have excellent troubleshooting and problem solving skills.
Candidate will be expected to represent the SRE organization in design reviews and operational readiness exercises for new and existing services. They will also be required to collaborate with technical and non technical teams and analyze statistics to come up with a clear picture on current state of our system. Having good working knowledge of Oracle and Cassandra databases will be beneficial in this regard.
Candidate should have a passion to automate manual operations and to improve them through repeated iteration.They should have good understanding of networking and load balancing concepts and should be able to lead a small team and come up with innovative solutions. They should be self motivated, capable of taking business critical decisions and should be comfortable working in a dynamic, ever changing environment. Candidate should be proactive in dealing with critical production issues and take them to closure while working with required partners
Participate in an on call rotation providing hands-on technical expertise during service impacting events.
Apply Job!