You are viewing a preview of this job. Log in or register to view more details about this job.

Data Engineer Intern

Data Engineer Intern (Web Scraping & Data Pipelines) 
Remote (US) | Bay Area Preferred | Part-time / Intern

About Jobnova

Jobnova.ai is building the AI-powered job and people discovery infrastructure of the future.
Our platform connects job seekers and companies through intelligent matching, AI agents, and large-scale data aggregation across job boards, social platforms, and talent networks.

We’re looking for a Data Engineer Intern who is excited about working with real-world data at scale — web scraping, cleaning, structuring, and building robust data pipelines that power AI models and recommendations.

This role is perfect for someone who loves building systems, automating data collection, and turning messy data into reliable signals.

💼 Responsibilities

Build and maintain web scrapers to extract job postings, candidate profiles, company data, and social signals from multiple platforms

Design and implement ETL / ELT pipelines to clean, normalize, and transform unstructured data into usable formats

Work with vector databases and structured stores to support retrieval, ranking, and AI matching

Develop automated workflows for recurring data collection and enrichment

Collaborate with AI engineers to support model training, evaluation, and production inference

Monitor data quality, reliability, and scraper performance, and implement optimizations

Explore new data sources and propose scalable approaches for long-term data infrastructure

🛠 Requirements

Strong programming skills in Python

Experience with web scraping tools / frameworks (Playwright, Selenium, Scrapy, BeautifulSoup, Apify, etc.)

Understanding of data cleaning, parsing, and normalization techniques

Knowledge of databases (SQL, NoSQL) and familiarity with data pipeline tools

Ability to design and automate ETL workflows

Curiosity, fast learning ability, strong problem-solving skills

Comfortable working in a fast-paced startup environment with ambiguity