Data Engineer (Localization & Language Data - Team Lead)
Alexa Translations View all jobs
- Montreal, QC
- Permanent
- Full-time
- Innovation
- Dedication
- Fanatical commitment to quality and service
- Resourcefulness
- Collaboration
- Architecture: Define the roadmap for our data warehouse, ensuring high availability and performance for massive multilingual datasets.
- Data Cataloging & Governance: Implement robust cataloging solutions to ensure data lineage and "discoverability" across the organization.
- Interface Development: Lead the creation of a user-centric interface that allows stakeholders to interact with, query, and extract data from the platform.
- Linguistic Asset Management: Manage the lifecycle of Translation Memories (TMs) and Terminology Databases.
- Systems Expertise: Optimize integrations between our data platform and CAT tools and TMS systems (e.g., Phrase, Trados, MemoQ).
- Domain Integration: Ensure data pipelines respect the nuances of translation metadata, XLIFF structures, and regional variants.
- RAG & Indexation: Oversee the creation and maintenance of Vector Databases and semantic search indexes to support Retrieval-Augmented Generation for automated translation and content creation.
- Data Preparation for LLMs: Architect pipelines that clean, chunk, and format localization data for fine-tuning or prompting Large Language Models (LLMs).
- Quality & Evaluation: Support the implementation of automated quality estimation (QE) and LLM-based evaluation metrics for translated content.
- Team Management: Lead a cross functional team of Linguists, Software Developers, Devops and Localization Engineers, providing technical guidance and mentorship.
- Cross-functional Collaboration: Act as the liaison between Data, Localization, and AI/ML Research teams.
- Experience: 5+ years in Data Engineering
- Technical Stack: Proficiency in SQL, Python, ETL Pipelines, and cloud data platforms (e.g., AWS S3 Data Lakes, AWS Athena, AWS Redshift, AWS Glue).
- AI/ML Fundamentals: Solid understanding of the GenAI lifecycle, specifically regarding how data is indexed for RAG (e.g., Pinecone, Milvus, or Qdrant).
- Domain Knowledge: Understanding of the localization industry, including experience with TMX, TBX, and CAT tool workflows.
- Product Mindset: Experience building and deploying production ready internal tools or interfaces (e.g., Streamlit, React) to democratize data access.
- Familiarity with embedding models and semantic similarity scoring.
- Knowledge of Data Privacy (ISO 27001, GDPR) specifically regarding PII in linguistic datasets.
- Comprehensive Health Insurance: Including vision, dental, complementary therapies, and support for your overall well-being.
- Your Birthday Off: We celebrate your special day!
- 6 Personal/Sick Days: Take the time you need for your health or life’s unexpected moments.
- Work-Ready Equipment: Get the tools you need to succeed, provided upon request.
- Hybrid Work Model: Enjoy the best of both worlds with a mix of in-office collaboration and remote flexibility.
- Learning & Growth Opportunities: Training and resources tailored to your role and department.
- Supportive & Collaborative Team Culture: Work alongside team members who genuinely have your back
- Team Recognition & Action Awards: Celebrate wins and contributions in meaningful ways.
- Employee Referral Program: Earn rewards for bringing amazing talent to our team.
We are sorry but this recruiter does not accept applications from abroad.