
Data Management Engineer
- Ottawa, ON
- Permanent
- Full-time
- Stakeholder Coordination: Liaise with internal teams to define data collection requirements and priorities, maintaining clear documentation on collection needs.
- Data Collection: Perform on-site data recording using our systems and directly obtain datasets from key stakeholders or customers.
- Data Storage & Archiving: Manage and organize large datasets across various storage solutions, including Google Cloud (GCloud) and local Network Attached Storage (NAS), ensuring data is secure, accessible, and up-to-date.
- Lidar Pre-processing: Conduct initial alignment and pre-validation of raw Lidar point cloud data to ensure its quality and usability.
- Data Validation & Quality Assurance: Design, Develop and Perform rigorous validation processes on labeled data received from vendors to confirm it meets our standards before being integrated into training or validation sets.
- Dataset Management: Handle the strategic splitting of datasets for training and validation purposes. Identify and document errors or inconsistencies in current datasets and coordinate with labeling teams for corrections.
- Validation & Regression Framework: Add new datasets into our validation framework, which includes creating precise Areas of Interest (AOIs) and generating new baseline performance metrics.
- Labeling Coordination: Manage the end-to-end labeling process by sending data to our labeling partners, tracking their progress, and serving as the primary technical point of contact.
- Documentation: Own and update all labeling documentation, including defining new classes and clarifying labeling instructions to ensure data consistency.
- Metrics & Reporting: Continuously improve the metrics and reports used for our validation, performance and regression testing, adding new parameters as needed to enhance our evaluation framework.
- Automation & Efficiency: Run, maintain, and improve our pre-labeling pipeline to increase the efficiency of our data operations.
- Tool & Industry Research: Conduct industry research to identify and evaluate new tools and technologies that can make our labeling and data management processes more efficient.
- Bachelor’s degree in Computer Science, Engineering, or a related technical field.
- Proven experience in managing large-scale datasets and complex validation pipelines for machine learning and computer vision applications.
- Proficiency in scripting languages such as Python for automation and data manipulation.
- Familiarity with data labeling processes and managing multiple labeling vendors.
- Strong organizational skills with the ability to manage multiple projects and priorities simultaneously.
- Excellent communication skills, with experience coordinating between technical teams and external vendors.
- Meticulous attention to detail, especially in data validation and quality control.
- Ability to identify a gap in processes and define a proper process to bridge over the gap.
- Strong knowledge of C++ / RUST.
- DevOps or MLOps experience.
- Hands-on experience with Lidar data and point cloud processing.
We are sorry but this recruiter does not accept applications from abroad.