- Published on
Automating Job Alerts with Node.js and Puppeteer: A Complete Guide
Automating Job Alerts with Node.js and Puppeteer: A Complete Guide
Job board email alerts are slow, sporadic, and incomplete. They miss roles posted directly on company career pages. They batch notifications into daily digests. They surface sponsored listings over organic matches. If you want real-time visibility into the job market, you build your own alert system.
The stack
A personal job alert pipeline needs four components:
- A scraper that extracts listings from target sources
- A deduplication store that knows which listings you have already seen
- A matching engine that filters listings by your criteria
- A notification layer that pushes new matches to you immediately
We will use Node.js with Puppeteer for scraping, SQLite for dedup, and Nodemailer plus a Slack webhook for notifications. All of it runs on a single machine—your laptop, a Raspberry Pi, or a $5 VPS.
Setting up the Puppeteer scraper
Start with a simple scraper that targets a single company's Lever careers page:
const puppeteer = require('puppeteer');
async function scrapeLever(company) {
const browser = await puppeteer.launch({ headless: 'new' });
const page = await browser.newPage();
await page.goto(`https://jobs.lever.co/${company}`, {
waitUntil: 'networkidle0',
timeout: 30000,
});
const jobs = await page.evaluate(() => {
return [...document.querySelectorAll('.posting')].map((el) => ({
title: el.querySelector('h5')?.innerText?.trim(),
location: el.querySelector('.location')?.innerText?.trim(),
url: el.href,
company: document.querySelector('.company-name')?.innerText?.trim(),
}));
});
await browser.close();
return jobs.filter((j) => j.title);
}
Extend this pattern to Greenhouse, Workday, and any other ATS with a consistent URL structure. Run each scraper sequentially with a 3-second delay between requests to stay under rate limits.
The SQLite dedup layer
Every scrape run returns listings you have seen before. Without deduplication, you get duplicate notifications and alert fatigue. A simple SQLite table solves this:
const Database = require('better-sqlite3');
const crypto = require('crypto');
const db = new Database('jobs.db');
db.exec(`CREATE TABLE IF NOT EXISTS seen_jobs (
hash TEXT PRIMARY KEY,
title TEXT,
company TEXT,
url TEXT,
first_seen TEXT
)`);
function hashJob(job) {
const str = `${job.title}|${job.company}|${job.url}`.toLowerCase();
return crypto.createHash('md5').update(str).digest('hex');
}
function insertIfNew(job) {
const hash = hashJob(job);
const exists = db.prepare('SELECT 1 FROM seen_jobs WHERE hash = ?').get(hash);
if (!exists) {
db.prepare(
'INSERT INTO seen_jobs (hash, title, company, url, first_seen) VALUES (?, ?, ?, ?, ?)'
).run(hash, job.title, job.company, job.url, new Date().toISOString());
return true;
}
return false;
}
Each new job gets hashed. Hashing on title plus company plus URL prevents duplicates across different board sources that may list the same role.
The matching engine
Scraping every listing is one thing. Surfacing only the relevant ones is another. Apply a keyword filter after scraping:
const KEYWORDS = ['react', 'typescript', 'node', 'full stack', 'frontend'];
function matchesKeywords(job) {
const text = `${job.title} ${job.location}`.toLowerCase();
return KEYWORDS.some((kw) => text.includes(kw));
}
const matches = jobs.filter((j) => insertIfNew(j) && matchesKeywords(j));
For more advanced matching, swap the keyword array for a scoring function that weights title matches higher than description matches, and applies a location penalty for non-remote roles outside your preferred cities.
Notifications: email and Slack
When a new match is found, fire both an email and a Slack message:
const nodemailer = require('nodemailer');
const transporter = nodemailer.createTransport({
service: 'gmail',
auth: { user: process.env.EMAIL_USER, pass: process.env.EMAIL_PASS },
});
async function notify(job) {
const html = `<b>${job.title}</b><br>${job.company} — ${job.location}<br><a href="${job.url}">Apply</a>`;
await transporter.sendMail({
from: process.env.EMAIL_USER,
to: process.env.NOTIFY_EMAIL,
subject: `New Job: ${job.title} at ${job.company}`,
html,
});
await fetch(process.env.SLACK_WEBHOOK, {
method: 'POST',
body: JSON.stringify({ text: `${job.title} at ${job.company}\n${job.url}` }),
});
}
Scheduling with cron
Wire everything together with a cron job that runs every 15 minutes:
*/15 * * * * node /home/user/job-alerts/index.js >> /var/log/job-alerts.log 2>&1
Fifteen minutes is frequent enough to catch listings within the early-apply window without hammering target servers. Adjust based on the number of sources and their observed rate-limit thresholds.
Production hardening
Before running this 24/7, add three safeguards: a retry wrapper with exponential backoff around each scraper call, a timeout that kills scrapers stuck on slow pages after 30 seconds, and a daily cleanup job that prunes seen_jobs entries older than 30 days to keep the SQLite file from growing indefinitely.
This system catches roles hours before LinkedIn or Indeed email alerts land in your inbox. For a weekend project, that is a strong ROI.