You are viewing a preview of this job. Log in or register to view more details about this job.

Data Engineer (4+ Years of Experience)

Role Overview:

As a Data Engineer, you will be responsible for designing, implementing, and maintaining scalable and efficient data pipelines to collect, transform, and store large datasets. You will work closely with data scientists, analysts, and other teams to support the company's data-driven initiatives. This position requires a combination of technical expertise and problem-solving skills, with a focus on optimizing data flow and infrastructure.

Key Responsibilities:

Data Pipeline Development: Build and maintain ETL (Extract, Transform, Load) pipelines to process and integrate large datasets from multiple sources (e.g., databases, APIs, external data services).
Data Architecture: Design and implement scalable, reliable, and high-performance data storage solutions (e.g., data lakes, data warehouses).
Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand business requirements and translate them into effective data infrastructure solutions.
Performance Optimization: Identify bottlenecks in data workflows and optimize the data pipeline for performance, scalability, and cost efficiency.
Data Quality and Governance: Ensure data consistency, accuracy, and compliance with company standards. Implement monitoring and alerting systems to track data quality issues.
Automation: Develop automation scripts and workflows to streamline data operations and reduce manual intervention.
Documentation: Maintain clear documentation of data pipeline architectures, processes, and systems to ensure knowledge sharing and easy troubleshooting.
Cloud Services: Utilize cloud-based platforms like AWS, GCP, or Azure to deploy and manage data infrastructure and pipelines.

Skills & Qualifications:

Experience: 4+ years of experience in data engineering or a related field, with a strong understanding of data systems and data pipeline architecture.
Programming: Proficiency in programming languages such as Python, Java, or Scala for building data pipelines and processing data.
SQL: Advanced SQL skills for querying and managing data in relational databases (e.g., PostgreSQL, MySQL, MS SQL Server).
ETL Tools: Hands-on experience with ETL tools and frameworks such as Apache Airflow, Talend, or Informatica.
Data Warehousing: Strong knowledge of data warehousing concepts and tools like Amazon Redshift, Google BigQuery, or Snowflake.
Cloud Platforms: Familiarity with cloud computing platforms like AWS, Google Cloud, or Azure, and services related to data engineering (e.g., AWS S3, Lambda, Databricks).
Big Data: Experience with big data tools and frameworks such as Hadoop, Spark, or Kafka is a plus.
Version Control: Experience using version control systems like Git for collaborative code development.
Problem-Solving: Strong analytical and troubleshooting skills to identify and resolve data pipeline issues.
Communication: Excellent written and verbal communication skills, with the ability to explain technical concepts to non-technical stakeholders.

Preferred Qualifications:

Degree: Bachelor's or Master's degree in Computer Science, Engineering, Mathematics, or a related field.
Certifications: AWS Certified Data Analytics – Specialty, Google Professional Data Engineer, or other relevant certifications.
Machine Learning: Exposure to data science tools and frameworks (e.g., TensorFlow, PyTorch) for integrating machine learning models into data pipelines.