
Senior Systems Engineer
- Ottawa, ON
- $77,000-123,200 per year
- Permanent
- Full-time
- System Administration:
- Manage, configure, and maintain Linux servers in the Data centers and AWS public cloud.
- Perform system upgrades, patch management, and performance tuning.
- Monitor system health, performance, and capacity to ensure high availability.
- Troubleshoot and resolve hardware, software, and networking issues.
- Implement security best practices, including access control, vulnerability management and patching (e.g., Forman, Nessus, and Crowd strike)
- Regularly monitor system performance, utilization, and capacity using monitoring tools (e.g., Checkmk, ELK, Wazuh, New Relic and SolarWinds).
- Monitor and troubleshoot network, hardware, and software issues to minimize downtime and maintain high availability.
- Virtualization (VMware):
- Install, configure, and manage VMware ESXi hosts, vCenter servers, and other VMware products.
- Manage virtual machines (VMs), including provisioning, maintenance, and performance tuning.
- Monitor and optimize the performance and capacity of the VMware environment.
- Perform ESX host upgrades, patching, and addressing the vulnerabilities and remediation steps in VMware components.
- Automation and Scripting:
- Develop automation scripts using tools like PowerShell, Python, Ansible and bash.
- Implement and maintain Infrastructure as Code using tools like Terraform or CloudFormation for automated provisioning of infrastructure.
- Automate configuration management to ensure that all systems maintain consistent settings across multiple environments.
- Manage and maintain configuration state using Puppet, Ansible, Chef, or Salt Stack.
- Develop automated processes for deploying code, systems provisioning and services to different environments.
- Storage and Backup:
- Monitor and maintain storage systems for optimal performance, capacity, and reliability.
- Configure storage arrays, volumes, and file systems, ensuring high availability and fault tolerance.
- Ensure proper data redundancy, RAID configurations, and storage tiering for performance optimization.
- Design, implement, and maintain disaster recovery plans to ensure the organization's data can be recovered in case of disaster events.
- Configure and manage replication technologies for offsite data replication and failover.
- 7+ years experience with knowledge of Linux (RHEL, CentOS, Rocky Linux), VMware vSphere, ESXi, vCenter, and other virtualization technologies.
- Able to produce and maintain automation scripts in scripting languages such as PowerShell, Bash, Python and Ansible and Puppet
- Experience configuring/troubleshooting TCP/IP, DNS, DHCP in production environments, and other core networking protocols
- Experience in planning and implementing hardware upgrades, including firmware updates, and troubleshooting hardware failures.
- Experience performing root cause analysis, troubleshooting system issues and implementing post-incident corrective actions
- Ability to manage and resolve incidents quickly while maintaining system stability and minimizing downtime.
- AWS Experience in configuring and managing cloud services, infrastructure provisioning, and cloud networking.
- Proficiency in automation using Terraform, CloudFormation, or cloud-native automation tools.
- Bachelor's degree in computer science, Information Technology, or related field or equivalent experience.