Seeking a skilled Data Engineer with a robust background in PySpark and extensive experience with AWS services, including Athena and EMR. The ideal candidate will be responsible for designing, developing, and optimizing large-scale data processing systems, ensuring efficient and reliable data flow and transformation.Key Responsibilities:Data Pipeline Development: Design, develop, and maintain scalable data pipelines using PySpark to process and transform large datasets. AWS Integration Utilize AWS services, including Athena and EMR, to manage and optimize data workflows and storage solutions.Data Management: Implement data quality, data governance, and data security best practices to ensure the integrity and confidentiality of data.Performance Optimization Optimize and troubleshoot data processing workflows for performance, reliability, and scalability.Collaboration
Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver solutions that meet business needs. Documentation
Create and maintain comprehensive documentation of data pipelines, ETL processes, and data architecture.Required Skills and Qualifications:Education: Bachelors or Master s degree in Computer Science, Engineering, or a related field.Experience: 5+ years of experience as a Data Engineer or in a similar role, with a strong emphasis on PySpark. Technical Expertise:Proficient in PySpark for data processing and transformation. Extensive experience with AWS services, specifically Athena and EMR.Strong knowledge of SQL and database technologies.Experience with Apache Airflow is a plus Familiarity with other AWS services such as S3, Lambda, and Redshift.Programming: Proficiency in Python; experience with other programming languages is a plus.Problem-Solving: Excellent analytical and problem-solving skills with attention to detail.Communication: Strong verbal and written communication skills to effectively collaborate with team members and stakeholders.Agility: Ability to work in a fast-paced, dynamic environment and adapt to changing priorities.Preferred Qualifications: Experience with data warehousing solutions and BI tools. Knowledge of other big data technologies such as Hadoop, Hive, and Kafka.Understanding of data modeling, ETL processes, and data warehousing concepts.Experience with DevOps practices and tools for CI/CD.