
AI Researcher
- Toronto, ON
- $72,000-138,000 per year
- Permanent
- Full-time
Work Model: Hybrid
Reference code: 130069
Primary Location: Toronto, ON
All Available Locations: Toronto, ONOur PurposeAt Deloitte, our Purpose is to make an impact that matters. We exist to inspire and help our people, organizations, communities, and countries to thrive by building a better future. Our work underpins a prosperous society where people can find meaning and opportunity. It builds consumer and business confidence, empowers organizations to find imaginative ways of deploying capital, enables fair, trusted, and functioning social and economic institutions, and allows our friends, families, and communities to enjoy the quality of life that comes with a sustainable future. And as the largest 100% Canadian-owned and operated professional services firm in our country, we are proud to work alongside our clients to make a positive impact for all Canadians.By living our Purpose, we will make an impact that matters.
- Have many careers in one Firm.
- Enjoy flexible, proactive, and practical benefits that foster a culture of well-being and connectedness.
- Learn from deep subject matter experts through mentoring and on the job coaching
- Collaborate with product managers, engineers, and stakeholders to design AI-driven solutions that meet technical and business requirements.
- Research, prototype, and develop generative AI applications by combining non-deterministic LLMs with deterministic software engineering techniques.
- Build evaluation frameworks and benchmarks to measure model quality, reliability, and business impact.
- Generate regular reports on model accuracy, drift, and performance.
- Debug, optimize, and enhance GenAI applications using prompt engineering, reinforcement learning, fine-tuning, and software engineering best practices.
- Train and fine-tune large language models using Hugging Face Transformers.
- Apply reinforcement learning fine-tuning techniques using Hugging Face TRL (Transformers Reinforcement Learning).
- Manage training workflows with experiment tracking tools and distributed training accelerators (DeepSpeed, Accelerate, FSDP).
- Run and optimize multi-GPU training and inference, leveraging vLLM for high-throughput, low-latency serving.
- Contribute to the design of scalable MLOps/DevOps pipelines for model deployment, monitoring, and continuous training.
- Ensure compliance with data privacy, security, and responsible AI guidelines when handling training or test datasets.
- Stay current with emerging research in LLMs, RLHF/RLAIF, multimodal AI, and generative models; apply findings to improve our systems.
- Author technical documentation and contribute to publications, patents, or open-source projects where applicable.
Candidates with experience in the following tools and frameworks will be strongly preferred:
- Transformers (Hugging Face) for model training, fine-tuning, and inference
- TRL (Transformers Reinforcement Learning) for RL-based fine-tuning (PPO, DPO, GRPO, RLAIF)
- DeepSpeed, Accelerate, or FSDP for multi-GPU and distributed training
- vLLM for optimized inference and serving of large models
- Weights & Biases (W&B) or MLflow for experiment tracking and reproducibility
- LangChain, AutoGen (A2A), or MCP for GenAI application development
- PyTorch as the primary deep learning framework
- 3+ years experience in machine learning engineering, data engineering, or applied research (industry or academic).
- Strong programming skills in Python and experience with frameworks such as PyTorch, TensorFlow, JAX.
- Hands-on experience with Hugging Face Transformers for pretraining, fine-tuning, or inference.
- Experience with Hugging Face TRL for reinforcement learning fine-tuning (e.g., PPO, DPO, GRPO, RLAIF).
- Practical experience managing multi-GPU training and distributed training at scale using DeepSpeed, Accelerate, or FSDP.
- Experience running inference on large models using vLLM or similar optimized serving frameworks.
- Familiarity with experiment tracking and reproducibility tools (e.g., W&B, MLflow).
- Knowledge of MLOps practices including continuous training, continuous monitoring, and model lifecycle management.
- Experience with GenAI frameworks such as LangChain, AutoGen (A2A), or MCP.
- Demonstrated ability to write clean, maintainable, production-ready code.
- Experience building or supporting cloud-based AI systems (GCP, AWS, or Azure; certifications preferred).
- Strong grasp of reinforcement learning, NLP, and/or generative modeling (transformers, diffusion, RAG, etc.).
- Track record of research contributions (papers, patents, open-source projects) is a plus.
- Experience with reinforcement learning from human/AI feedback (RLHF/RLAIF).
- Contributions to open-source AI frameworks.
- Familiarity with scaling laws, evaluation metrics, and benchmarking large models.
- Interest in pushing the boundaries of trustworthy, explainable, and safe AI.