Data Engineer Intern
Data Engineer Intern (Web Scraping & Data Pipelines)
Remote (US) | Bay Area Preferred | Part-time / Intern
About Jobnova
Jobnova.ai is building the AI-powered job and people discovery infrastructure of the future.
Our platform connects job seekers and companies through intelligent matching, AI agents, and large-scale data aggregation across job boards, social platforms, and talent networks.
We’re looking for a Data Engineer Intern who is excited about working with real-world data at scale — web scraping, cleaning, structuring, and building robust data pipelines that power AI models and recommendations.
This role is perfect for someone who loves building systems, automating data collection, and turning messy data into reliable signals.
💼 Responsibilities
Build and maintain web scrapers to extract job postings, candidate profiles, company data, and social signals from multiple platforms
Design and implement ETL / ELT pipelines to clean, normalize, and transform unstructured data into usable formats
Work with vector databases and structured stores to support retrieval, ranking, and AI matching
Develop automated workflows for recurring data collection and enrichment
Collaborate with AI engineers to support model training, evaluation, and production inference
Monitor data quality, reliability, and scraper performance, and implement optimizations
Explore new data sources and propose scalable approaches for long-term data infrastructure
🛠 Requirements
Strong programming skills in Python
Experience with web scraping tools / frameworks (Playwright, Selenium, Scrapy, BeautifulSoup, Apify, etc.)
Understanding of data cleaning, parsing, and normalization techniques
Knowledge of databases (SQL, NoSQL) and familiarity with data pipeline tools
Ability to design and automate ETL workflows
Curiosity, fast learning ability, strong problem-solving skills
Comfortable working in a fast-paced startup environment with ambiguity