Masters degree in Software Engineering, Data Engineering, Computer Science or related field
5 years of relevant work experience
Strong Scala and Python background
Experience with Apache Spark and/or Ray
Knowledge of AWS, GCP, Azure, or other cloud platform
Knowledge of current principles and frameworks for ML Ops
Experience with ML Ops technologies such as ML Flow, DVC, Grafana, DataHub, Databricks
Experience with machine learning technologies such as PyTorch, TensorFlow, AWS Sagemaker
Experience with CI/CD pipelines, including Jenkins or Git Actions
Experience with Docker containerization or Kubernetes orchestration
Experience in improving data security and privacy, and managing and reducing cloud costs
Knowledge of API development and machine learning deployment
Responsibilities:
Develop and implement a strategy for continuous improvement of our Machine Learning Ops including versioning, testing, automation, reproducibility, deployment, monitoring, and data privacy
Develop and report on ML Ops metrics such as deployment frequency, lead time for changes, mean time to restore, and change failure rate
Collaborate with data scientists, data engineers, API engineers, and the dev ops team
Build scalable data ingestion and machine learning inference pipelines
Scale up production systems to handle increased demand from new products, features, and users
Provide visibility into the health of our data platform (comprehensive view of data flow, resources usage, data lineage, etc) and optimize cloud costs
Automate and handle the life-cycle of the systems and platforms that process our data