We’re looking for an ML Engineer who can ship — from classical pipelines to LLM-powered features — on AWS. You’ll design, deploy, and maintain ML systems in production. This is an engineering role first; research experience alone won’t be enough.
Responsibilities
- Build end-to-end ML pipelines: data ingestion, training, evaluation, deployment, and monitoring.
- Design and implement RAG pipelines, prompt engineering systems, and LLM-based features with proper evaluation — not vibe-based iteration.
- Fine-tune open-weight models (LoRA/QLoRA) when API calls aren’t the right answer.
- Deploy and serve models on AWS — SageMaker, Bedrock, Lambda, or ECS depending on requirements.
- Write infrastructure as code (CDK or Terraform); no manual console configuration in production.
- Monitor deployed models for drift, quality degradation, and cost; own issues through to resolution.
- Translate ambiguous business problems into concrete ML problem framings.
Must-Have
| Area | Requirement |
|---|---|
| Python | Engineering-level — testable, reviewable code, not just scripts |
| Classical ML | Supervised/unsupervised methods; knows when not to use a neural network |
| LLM Fundamentals | Genuine understanding of transformers, tokenization, context windows, inference behaviour |
| RAG | Has built and evaluated at least one production or near-production RAG system |
| AWS Core | S3, IAM, Lambda, EC2, VPC — comfortable without a handbook |
| AWS ML | SageMaker (Training Jobs + Endpoints) and/or Bedrock |
| Docker | Containerising ML workloads for deployment |
| SQL | Comfortable writing queries for data extraction and validation |
Preferred Skills
Good to Have
- Fine-tuning with LoRA/QLoRA (Hugging Face PEFT/TRL)
- LLM evaluation frameworks — RAGAS, DeepEval, LLM-as-judge, or custom
- Vector databases — pgvector, Pinecone, OpenSearch (production, not demos)
- Agent frameworks — LangGraph, LlamaIndex, or custom tool-use implementations
- Workflow orchestration — Step Functions, SageMaker Pipelines, Airflow
- Infrastructure as Code — AWS CDK or Terraform
- Experiment tracking — MLflow or Weights & Biases
Technology Stack
| Category | Technologies |
|---|---|
| Language | Python |
| ML | Scikit-learn, XGBoost, PyTorch |
| LLM / Models | AWS Bedrock, OpenAI API, Llama / Mistral / Qwen |
| Fine-Tuning | Hugging Face Transformers, PEFT, TRL |
| RAG / Agents | LangChain, LlamaIndex, LangGraph |
| Vector Stores | pgvector, Pinecone, OpenSearch |
| AWS | SageMaker, Bedrock, S3, Lambda, ECS, Step Functions, CDK |
| MLOps | MLflow, W&B, Docker, GitHub Actions |
| Data | Pandas, NumPy, PySpark, PostgreSQL, Athena |