
Mainframe Performance and Capacity Management Engineer (REMOTE)
- Toronto, ON
- Permanent
- Full-time
- Monitor real-time z/OS system health and performance across CPU, memory, DASD, and WLM-managed workloads, using tools including RMF, SmartIS, IzPCA, MICS, and other internal tools. Analyze performance data to identify trends, bottlenecks, and potential issues.
- Detect, troubleshoot, and resolve resource anomalies, workload misbehaviors, and degradation risks in production systems. Partner with incident response teams to resolve performance issues quickly and accurately.
- Develop and implement performance tuning strategies by recommending changes to service definitions, dispatching priorities, and workload placement.
- Contribute to capacity planning by forecasting and modeling workload resource demand & capacity requirements.
- Support cost modeling, vendor reporting (SCRT), infrastructure sizing and resource optimization efforts.
- Collect and analyze system performance data to generate reports and dashboards.
- Identify key performance indicators (KPIs) and develop metrics to track system performance.
- Visualize, summarize and present data findings, recommendations, and methodology to senior leadership, department leadership and enterprise stakeholders (technical and non-technical stakeholders)
- Work closely with cross-functional teams, including operations, development, and infrastructure teams.
- Provide technical support and guidance to team members and stakeholders.
- Participate in on-call rotations and provide timely responses to performance and observability issues.
- Participate in migration of performance/capacity tooling to Git change management and DevOps deployment pipelines.
- Bachelor’s degree in information systems, Mathematics, Finance or another quantitative or related subject
- 10+ years of mainframe systems experience with proficiency in performance management for large, multi-processor, multi-LPAR, Parallel Sysplex environments utilizing z/OS
- Proven 10 + Years of experience in mainframe performance monitoring, observability, capacity management, and data analysis.
- Proven 10 + Years of experience resolving systems performance problems in real-time via adjustments to WLM and batch initiators.
- 10 + Years experience of Strong understanding of PR/SM.
- Proficiency 10 + Years of in REXX/Python, Job Control Language (JCL) & DB2
- Strong understanding of Batch Processing and Job Scheduling
- Advanced user of MS Excel (Charts, Pivot tables, Vlookups, PowerPivot) and PowerPoint for data visualization.
- 10 + Years of Experience with mainframe monitoring tools and performance tuning techniques.
- 10 + Years of Experience working with large highly transactional datasets to draw insights and create organizational value.
- Experience working with DevOps
- Experience working with ADABAS