Leveraged by millions of users every day, EMS Software manages some of the highest profile spaces in the world (including the NASDAQ bell). We are consistently delivering new features to our suite of products. We want tackle bigger challenges and accomplish some truly amazing things. Our team is always improving our codebase and operations footprint and we have amassed a sizeable backlog of interesting challenges and product initiatives. Our team needs to grow to enable even greater success in our industry, and that is where you come in.
Your First Three Months
In your first month, as your familiarity with the product grows, your responsibilities and influence will grow as well. You, along with your team, will be responsible for supporting the product team’s operational needs in the lower environments.
You will collaborate with other members of the operations team in established patterns and continue to hone your skills as you push the design, architecture and implementation of our CI pipelines (lower and upper environments) to their next phase.
Within two months, you and your team will fill in the gaps to have a well-tested, low-latency and highly available environment for all our product lines. Working with the development team, you will help figure out the gaps in creating and supporting a truly scalable product offering. Your team will be responsible for supporting production environments.
Within three months, you will help drive changes to the operational and development roadmap as we inch closer to onboarding 20% of our customer base into hosted production environments by the end of 2017.
What You’ll Do
- Design, provision, configure and maintain the platform operations to handle the scale of running several application stacks in the cloud that will be consumed worldwide
- Automate the deployment and maintenance of cloud platform technologies
- Oversee production operations, log management, data warehouse, and database operations, including management of Splunk services
- Ensure all monitoring systems (IT, development, service management, Apdex) are in place
- Enforce consistency of monitoring, reporting, and alarming systems
- Help drive process improvements for service management, including: outage/incident management, rollbacks and reporting
- Research emerging virtualization techniques and advise management
- Perform capacity management, load and scalability planning
- Ensure compliance with deployment and operations documentation
- Assist management in development and optimization of operational cost models
- Design cloud infrastructure for high reliability and availability
- Build strategic and tactical plans for continued improvement of cloud architecture and operations
- Assist in the establishment of 24x7 performance monitoring and response protocols
- Provide on-call support outside of normal work hours/days
- You’re driven, humble, and autonomous
- You’re a quick study, a strong communicator, and you’re able to adapt to a fast-paced environment
- You have a working knowledge of Agile Development practices (e.g., SCRUM, TDD)
- You are or have the mindset of a developer, but are intrigued by the operational aspects of hosting developed solutions
- You are devoted to automation
- You’re an expert in Windows (IIS, SQL Server) and Linux
- You have at least 1 years of hands-on production experience with Amazon Web Services (AWS), Google Cloud or Microsoft Azure. This includes:
- Configuration of VPCs, with VPN to corporate network
- Experience setting up, maintaining and monitoring global production environments, QA and staging environments, with a strong understanding of the differing needs of such environments
- At least 6 months of experience in a professional production environment
- At least 6 months of experience managing networking infrastructure and monitoring at the application level
- Performance optimization experience, including: troubleshooting and resolving network and server latency issues; performing hardware evaluation/selection tasks; performance vs cost vs time analysis
- At least 1 year of experience with automation or scripting tools (e.g., GO, Python, Shell, PowerShell)
- At least 6 months of experience with Ansible, Jenkins
- You’re detail-oriented, with excellent documentation skills, and you’re someone who can successfully manage multiple priorities
- Troubleshooting skills that range from diagnosing hardware/software issues to large scale failures within a complex infrastructure
Other Things We Hope You Have
- Bachelors in Computer Science or equivalent work experience
- Experience with Mongo, MS SQL Server, Splunk, Grafana, Terraform and Prometheus
- Experience working with Docker, Kubernetes and GO Hands-on experience with performance, load and security penetration testing
- Hands-on experience with building out and maintaining a continuous integration and delivery pipeline
You will be part of a 5-person team of 4 Operational Engineers and a Technical Product Owner. You will report directly to the VP of Development, Assad Jarrahian, but will collaborate with your technical lead, Casey Entzi on a day-to-day basis.
The larger team consists of 13 Developers, 10 Quality Engineers, 4 Product Owners, and 3 UX Designers. We have an open and collaborative environment where everyone works together to deliver what is needed, from product features to operations needs (e.g., health checks).
We value open and direct communication, taking calculated risks that will push us forward, and investing in our people.
- We have current Production and Continuous Integration footprints in Google Cloud (primary), AWS, and Azure
- Our front-end applications leverage React and React Native, Redux, Node, C#, and Knockout
- Our APIs comprises of Golang, .NET and .NET core
- Our backend comprises of MS SQL Server
- We have a well built out CI pipeline that allows us to deploy and stand up customers on demand
- We leverage Ansible heavily, Splunk (JSON Logs) is our blood line and we enjoy operational efficiency and accessibility through Hubot and StackStorm