SOFTWARE RELIABILITY/SCALABILITY EXPERT | 100% REMOTE | RAISING THE FLOOR - US This is an 8-month contract position. Full-Time. WHO WE ARE Anytime, Anywhere, Any Computer Access. We’re an international coalition of individuals and organizations dedicated to ensuring that the Internet, and everything available through it, is accessible to people with accessibility barriers due to disability, literacy, digital literacy, or aging, and regardless of their economic resources. WHAT YOU WILL DO - Work with the Global Public Inclusive Infrastructure (GPII) architects and subject-matter experts (SME) to define the reliability and performance/ scalability metrics that need to be implemented and monitored. - Plan large scale stress testing. - Design and document a reliability plan and a performance/ scalability plan. - Implement the instrumentation required to collect data for analysis. - Recommend and document best practices. - Perform data analysis to detect performance bottlenecks and reliability issues. - Integrate the reliability and performance/ scalability test cases into release processes, automate them in the GPII’s Continuous Integration environment, store results using technologies such as Elasticsearch, and provide dashboards to team members. - Work with Infrastructure developers to plan application deployments on Kubernetes clusters for reliability testing. - Debug and resolve issues relating to the automated test scripts. WHAT WE ARE LOOKING FOR - 10+ years hands-on experience designing and writing reliability and performance test plans. - Experience with modern, containerized cloud infrastructure and load balancing techniques (in particular, Docker and Kubernetes), and the reliability techniques best suited to this style of architecture. - An Agile mindset and team player, with experience contributing to open source communities using collaborative environments such as Github. - Development background with ability to review code and write automation scripts and instrumentation for data gathering. - In-depth experience with profiling and debugging tools for Node.js and experience using these tools to identify the source of failures. - In-depth knowledge of profiling performance of services deployed on Unix-like operating systems using technologies such as dtrace, perf, systemtap, tcpdump, etc. - Ability to understand deployment topologies, identify problem areas, simulate failures, and recommend improvements. - Experience with load testing tools such as Gatling, JMeter, Tsung, etc., and ability to simulate dynamic user traffic. - Experience with networking protocols and one or more programming languages (JavaScript, Go, Python, Ruby). - Experience working in a distributed environment. This is an 8-month contract position. To apply, send resume or CV to jobs@raisingthefloor.org

