Cloud Operations Engineer - AWS
- Toronto, Ontario, Canada
- Full Time
- Platform Operations
- Mid Level
Privately owned and successfully operating for over 16 years, iTMethods has grown to become a market leader in enabling Enterprise DevOps. Our Managed DevOps SaaS Platform enables global enterprises to securely integrate, migrate, and modernize their complex, multi-cloud, multi-vendor DevOps environments leveraging our catalog of industry leading tools including industry standards like: Jira, Confluence, Jenkins and Github. The result is our global scale, world-wide clients across all business sectors can build better software, faster and more securely to enhance their pace of innovation.
The unrelenting pace of Digital Transformation means we what we do is critical to helping organizations capitalize on their growth opportunities using DevOps to accelerate development. As a small nimble operations with our head office in Toronto and a growing staff of remote workers around the globe we can offer both the excitement and agility of a start-up and the safe haven of a profitable well established company with enterprise grade clients.
Every day, everyone here strives to be an active part of the best DevOps platform and delivery teams our clients have ever worked with. This is a chance to work with the best, at the best job you've ever had. We ask for your ideas, expertise, and commitment. In return, we give you access to smart people, an opportunity to further your knowledge, and the chance to build innovation for the real world.
We’re very excited about our anticipated growth and are expanding many of our teams of innovators, consultants, and collaborators to drive that success.
Who we need
Reporting to the Senior Operations Manager, we are looking for a Cloud Operations Engineer to join the team. This is an opportunity to work closely with customers, engineers, technology consultants, and various delivery teams to ensure maximum uptime for customers on our platform. This is an opportunity to work with big-name customers, using a variety of tools as projects continuously enter and exit the pipeline. It’s a chance to create repeatable processes and procedures, and anticipate opportunities for automating the operations to minimize human error and move towards a self-healing environment with automatic recovery. Key to success in this role will be extremely good communications internally and externally and a constant desire to finding long term solutions to problems and implementing them right away without procrastination.
What you will do:
- Ensure a sustained focus on engineering with a goal of exposing faults and applying engineering to address root cause of faults so they do not re-occur.
- Demand Forecasting and Capacity Planning – Creating and maintaining good visibility to usage and demand and planning and executing the changes for provisioning the capacity through rigorous change management protocols and ensuring efficient use of resources.
- Eliminating toil related to manual, repetitive, tactical solutions with no enduring value which can either be eliminated or automated for a more sustained and scalable solution.
- Develop automation solutions to improve and optimize operational processes and services.
- Act as a subject matter expert and implement DevOps best practices for our customers.
- Assist in the configuration and support of customer environments; code deployments, optimization, and various tools.
- Architect and implement monitoring and logging solutions, identify issues proactively, and mitigate them to improve the customer experience.
- Test/Audit/Review solutions to ensure we deliver a resilient, monitored, highly secure, and complete solution.
- Troubleshoot and resolve escalated software and infrastructure related issues and challenges.
- Contribute to the continuous improvement of all operations collaterals and services to efficiently manage and maintain deployments.
- Explore and evaluate new and emerging software tools and technologies.
Who you are
You are a cloud focused Technology Operations professional with hands-on knowledge of AWS, Terraform, Ansible, Jenkins, Git, and Artifactory. You describe yourself as an innovator, curious, an automator, a multitasker, obsessed with continued education and an excellent communicator. You are addicted to problem-solving, you own your work, and execute flawlessly and independently. You like to share knowledge, mentor others, and thrive in high-growth, fast-paced environments. You are curious and enjoy experimenting with a range of technologies to constantly improve efficiency in how we respond to incidents and run our operations.
What you bring
- The mindset. You have a site resiliency engineering mindset constantly looking to avoid and eliminate faults.
- The drive. You thrive on developing solutions to open-ended business problems. You can work within a team and independently on multiple concurrent initiatives.
- The certifications. You are certified in AWS and Jenkins, or other DevOps tools.
- The expertise. You have at least 2 years hands-on knowledge of and experience with:
- AWS and Jenkins
- Configuring or customizing Jenkins and other DevOps tools
- Linux or Windows administration
- Continuous integration of best practices
- Managing and supporting AWS or other cloud environments.
- Configuration management, using Ansible, Terraform or similar tool
- Use of performance monitoring, and alerting tools to assess service as it relates to service levels
- In depth experience developing application-level monitoring using tool such as DataDog. Understanding of how to identify the meaningful metrics to monitor so that user-impacting problems can be identified before they occur.
- Functional knowledge and expertise in common DevOps Tools
- Using code to automate so that mistakes can be avoided
- The flexibility. You are available to work rotating daytime shifts and participate in an after-hours on-call schedule.
What’s in it for you
- Impact. You are ready to play a critical role in ensuring the overall stability and resiliency of our platform. You want to work on multiple initiatives using a range of tools instead of just focusing on operational issues and support requests. You want to engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, and refinement.
- Exposure. You want to deliver high-quality experiences for users on our platform. You want to use your expertise to advise and provide functional support to our customers on their DevOps tools (Jenkins, GitHub, Artifactory and similar products) and cloud workloads.
- Growth. You want to apply and expand your technical expertise including professional certification in AWS and Jenkins. You are eager to keep up to date with cutting edge technologies impacting the solutions being operated as well as best practices in software delivery.
Our people empower our clients; we empower them.
We are equally passionate about our clients as we are the people who make up our teams. We look for and welcome smart, talented people. Narrow role descriptions do not constrain our teams. They are empowered to pragmatically solve problems, with the autonomy to make decisions that will drive success for our clients. Employee success is driven by our customer obsession.
We work hard to stay connected and relay our appreciation for our people. With a wellness allowance, lunch-and-learn demo opportunities, and (remote) social gatherings, this is a place that prioritizes the need for balancing career development, physical well-being, and mental health.
Interested but worried you don't have everything listed here? If you have even 70% of what we're looking for, we still want to hear from you and encourage you to please apply. While we can't guarantee an interview, we'll consider your application.
iTMethods is committed to fostering an inclusive and accessible environment where employees feel valued and respected, and every employee has the opportunity to realize their potential. We are committed to providing reasonable accommodations, if required, and will work with you to meet your needs.