Snowplow is building out a dedicated Technical Operations team in 2017, and is looking for experienced systems administrators to join it. Initially this role is open only to candidates based in London or the South-East of England; we regret that applicants based elsewhere will not get a response at this time.
The team and role
Technical Operations will be Snowplow’s first two track team, consisting of sysadmins alongside site reliability engineers (SREs).
The sysadmins within our Technical Operations team will have four key responsibilities:
- Handling deployments, upgrades and other maintenance of Snowplow-related infrastructure (load balancers, Redshift clusters, ASGs etc) for our Managed Service customers, across over 100 AWS accounts
- Responding to customer issues and questions concerning Snowplow-related infrastructure, as escalated to you by our L1 Support team
- Working with Snowplow SREs to design, deploy and operate Snowplow’s internal infrastructure, responsible for running the Snowplow Managed Service, the Snowplow website and other services
- Being on call rotation to triage and resolve operational incidents relating to internal or client infrastructure
To be clear: we already have all of this infrastructure automated to a high degree using a mix of Python, CloudFormation and Ansible - we are not looking here for another devops or automation engineer. Instead we are looking for experienced sysadmins who can use the provided tooling to deliver robust infrastructure support both to ourselves and to our customers.
The integrity of our customers’ systems and data underpin everything we do at Snowplow. As part of their probation, candidates will be put through a full background security check.
An important part of this role relates to out-of-hours work, particularly around:
- Performing planned upgrades and modifications to customer infrastructure outside of their working hours
- Investigating customer issues and incidents, whether escalated from L1 Support or automatically via our monitoring systems
To meet the above requirements, the candidate will be required to work two weekend or public holiday days per month, taking two weekdays off in lieu.
The candidate will also be expected to be on call for priority incidents only during work hours for a further two weekend or public holiday days per month. Priority incidents are defined as those where Snowplow is at risk of breaching a customer SLA.
These days will be agreed with their line manager before the start of the month.
This role will be a great fit for somebody who:
- Has deep knowledge of Linux, networking, containers etc
- Enjoys sustained customer-facing ticket-based sysadmin work
- Has some familiarity with Amazon Web Services, but wants to learn much more
- Is comfortable scripting in one or more of bash, Python, Ruby or Perl
- Can troubleshoot complex problems on individual servers and distributed systems