Join our forward-thinking data team to create advanced data tools for GenAI applications. We use data to help our customers and support our own new products and research. As a Data Engineer, you will play a crucial role in building and optimizing data infrastructure, ensuring our solutions are robust, scalable, and ready for the challenges of tomorrow.
Your responsibilities:
Design and implement scalable processes to handle large-scale data (terabytes of text) for storage, versioning, and documentation.
Architect, develop, and maintain web services that allow for efficient consumption of harvested data.
Collaborate with researchers and software engineers to enhance data collection methodologies.
Prepare datasets for various Machine Learning use cases, including Generative AI.
Build and automate preprocessing pipelines tailored to different applications
Provide reliable data services that enable other teams to build new products on top of our data infrastructure.
Your profile:
You have 3+ years of experience working as a Data Engineer.
You are fluent in Python and at least one other programming language.
You possess a strong understanding of distributed systems and how to leverage them for efficient data pipelines.
You have a solid software engineering background with a focus on writing clean and pragmatic code.
You excel in data wrangling, including extracting, transforming, cleaning, and standardizing data from multiple sources.
You are knowledgeable about Generative AI use cases and the critical role of data in developing new solutions.
Nice if you have:
Experience working in multicloud environments (e.g., GCP, Azure, AWS) and on-premise setups.
Experience with Machine Learning and/or Data Science.
Experience in Golang is a plus
What you can expect from us:
Become part of an AI revolution
30 Days of paid vacation
Flexible working hours
Join a dynamic start-up and a rapidly growing team
Work with international industry and science experts
Take on responsibility and shape our company and technology