Fractional Executive Job Board Scraper — Continuation Prompt
Paste this entire file into a new Claude Code session to build the job board scraper pipeline. All context from prior research and planning is embedded below. No prior plans or memory files are needed.
What to Build
Build an automated system that scrapes fractional executive job boards daily, scores listings against my profile, and generates a daily markdown report of which jobs to apply to.
My profile (Dmitri Zasage, CEO of Solanasis LLC):
- Fractional CIO, CSIO, COO for SMBs and nonprofits
- Core services: Security Assessments, Disaster Recovery Verification, Data Migrations, CRM Setup, Systems Integration, Responsible AI Implementation
- Based in Colorado (remote-friendly)
- One-person operation with 1099 contractors
Output location: solanasis-docs/daily-outreach/YYYY-MM-DD-jobs.md
Script location: solanasis-scripts/job-board-scraper/
Architecture
scrape_and_score.py (main entry point)
|
+--> scrapers.py --> Fetch listings from 8 boards
| (trafilatura / Jina / Crawl4AI / JobSpy)
|
+--> parsers.py --> Normalize into unified schema
| Deduplicate via job_cache/
|
+--> score_jobs.py --> Score against my profile
| Assign tiers: A (40+), B (25-39), C (10-24)
|
v
generate_daily_jobs.py --> Daily markdown report + CSV
Target Job Boards (8 sources)
| Board | URL | Scraper Type | Notes |
|---|---|---|---|
| Fractional Jobs | fractionaljobs.io | Static (trafilatura) | Largest fractional talent network |
| GoFractional | gofractional.com | Crawl4AI (JS) | Permissive robots.txt |
| GigX | gigx.com | Crawl4AI (JS) | Drupal-based |
| Find Fractional Jobs | findfractionaljobs.com | Static | Smaller board |
| All Fractional Jobs | allfractionaljobs.com | Static | Aggregator |
| Indeed | via JobSpy | JobSpy library | DataDome anti-bot handled by JobSpy |
| via JobSpy | JobSpy library | Rate-limits at page 10 | |
| ZipRecruiter | via JobSpy | JobSpy library | Moderate anti-bot |
Phase 2 boards (add later): Toptal, Catalant, Business Talent Group, A-Team, Graphite, Paro
Scraping Tools (all free)
| Tool | Role | Already Installed? |
|---|---|---|
| trafilatura | Primary text extraction (static HTML) | Yes (pip show trafilatura) |
| Jina Reader | Fallback for JS-heavy sites (API: r.jina.ai/{url}) | Yes (free tier: 100 RPM, no key needed) |
| Crawl4AI | JS-rendered pages (GoFractional, GigX) | Install: pip install crawl4ai then crawl4ai-setup |
| JobSpy | Indeed/LinkedIn/ZipRecruiter aggregator | Install: pip install python-jobspy pandas |
| httpx | Async HTTP client | Yes |
If Crawl4AI install fails on Windows: Fall back to Jina Reader for JS sites. The existing codebase already uses Jina as a fallback; it works.
Key GitHub repo: JobSpy (~3K stars). Simple API:
from jobspy import scrape_jobs
jobs = scrape_jobs(site_name=["indeed","linkedin","zip_recruiter"], search_term="fractional CIO", location="Colorado")
# Returns pandas DataFrame with title, company, location, description, url, etc.Files to Create (7 files)
All in solanasis-scripts/job-board-scraper/:
| # | File | ~Lines | Purpose |
|---|---|---|---|
| 1 | config.py | 180 | Paths, board definitions, scoring model, keyword regex lists |
| 2 | scrapers.py | 350 | Per-board scraper classes (Static, Crawl4AI, JobSpy), async orchestrator |
| 3 | parsers.py | 200 | Normalize raw scraped data into unified listing schema |
| 4 | score_jobs.py | 180 | Score each job against my profile, assign tiers |
| 5 | scrape_and_score.py | 120 | Main pipeline entry point: scrape → parse → dedup → score |
| 6 | generate_daily_jobs.py | 280 | Daily markdown report, job tracker (applied/skipped), CLI flags |
| 7 | requirements.txt | 10 | Dependencies |
Directory Structure
solanasis-scripts/job-board-scraper/
config.py
scrapers.py
parsers.py
score_jobs.py
scrape_and_score.py
generate_daily_jobs.py
requirements.txt
data/
raw/ # Raw scraped HTML/JSON per board per date
intermediate/
job_cache/ # Per-job-ID JSON cache (dedup across runs)
listings_parsed.csv
listings_scored.csv
output/
jobs_report.md # Latest daily report
jobs_qualified.csv # A/B tier jobs only
Scoring Model
Positive Signals
| Signal | Points | Trigger |
|---|---|---|
| Title: exact match (fractional CIO/CSIO/COO) | +20 | Title regex |
| Title: adjacent match (fractional CTO/VP Tech) | +10 | Title regex |
| Description: role duties match | +5 | Description regex |
| Cybersecurity/security assessment mentioned | +8 | Keywords |
| Disaster recovery/business continuity | +6 | Keywords |
| Data migration/systems integration | +5 | Keywords |
| CRM setup/implementation | +4 | Keywords |
| AI implementation/responsible AI | +4 | Keywords |
| Compliance/risk assessment | +5 | Keywords |
| SMB/nonprofit/foundation target org | +10 | Keywords |
| Colorado location | +8 | Location field |
| Remote-friendly | +5 | Keywords |
| Compensation disclosed | +3 | Regex parse |
| Competitive rate (15K+/month) | +7 | Parsed rate |
Negative Signals (Disqualifiers)
| Signal | Points | Trigger |
|---|---|---|
| Full-time only | -30 | Keywords |
| Security clearance required | -25 | Keywords |
| Wrong seniority (junior/intern) | -20 | Keywords |
| Enterprise-only (Fortune 500) | -15 | Keywords |
| On-site only, not Colorado | -10 | Location check |
Tiers
- A (40+): Strong fit, apply immediately
- B (25-39): Good fit, worth reviewing
- C (10-24): Weak fit, review if time permits
- D (<10): Skip
Daily Report Format
Output: solanasis-docs/daily-outreach/YYYY-MM-DD-jobs.md
# Fractional Executive Job Board Report -- March 22, 2026
> Generated 07:30 | 12 new listings | 5 A-tier | 3 B-tier | 47 total tracked
## A-Tier: Apply Today (5)
### 1. Fractional CISO -- Nonprofit Healthcare Network (Remote)
- **Board:** GoFractional
- **Company:** Health Forward Foundation
- **Compensation:** $175-225/hr
- **Score:** 62 (A-tier)
- **Breakdown:** title_match(+20); security(+8); nonprofit(+10); remote(+5); rate(+7)
- **Apply:** [Link](https://gofractional.com/jobs/12345)
- **Preview:** Seeking a fractional CISO to lead security assessments...
## B-Tier: Worth Reviewing (3)
...
## Summary Stats
| Board | Scraped | New | A-tier | B-tier |
|-------|---------|-----|--------|--------|
| GoFractional | 23 | 4 | 2 | 1 |
| ... | ... | ... | ... | ... |
## Quick Actions
python scrape_and_score.py # Full scrape + score
python scrape_and_score.py --board gofractional # Single board
python generate_daily_jobs.py # Regenerate report
python generate_daily_jobs.py --mark-applied job123 # Mark as applied
python generate_daily_jobs.py --mark-skipped job456 # Mark as skipped
python generate_daily_jobs.py --status # Pipeline statsTracker File (data/job_tracker.json)
{
"applied": {"job_id": {"date_applied": "2026-03-20", "company": "...", "title": "...", "notes": "..."}},
"skipped": ["job_id_1", "job_id_2"],
"reviewed": ["job_id_3"],
"stats": {"total_scraped": 156, "total_applied": 12, "total_responses": 3}
}Fork These Patterns From fCTO Pipeline
The fCTO pipeline (solanasis-scripts/fcto-pipeline/) has battle-tested code to fork. Read these files before writing anything.
1. Scraping Pattern (enrich_websites.py)
Fork the following from solanasis-scripts/fcto-pipeline/enrich_websites.py:
- Cache helpers (lines 97-125):
cache_key(),load_cache(),save_cache()— JSON caching per domain/item - URL helpers (lines 130-152):
normalize_url(),extract_domain() - Scraping functions (lines 159-188):
scrape_with_trafilatura(),scrape_with_jina()— trafilatura primary, Jina fallback - Keyword checking (lines 195-206):
check_keywords()— regex pattern matching with human-readable output - Async orchestrator (lines 249-309):
scrape_domain()— semaphore-based concurrency, per-page scraping, fallback chain - Windows event loop (line 537):
asyncio.WindowsSelectorEventLoopPolicy()for Windows compatibility
Key constants to replicate:
SCRAPE_CONCURRENCY = 3 # Lower than fCTO's 5; fewer but heavier pages
SCRAPE_DELAY = 1.0 # Polite; 1 second between requests per domain
SCRAPE_TIMEOUT = 20 # Slightly higher for JS pages
JINA_READER_BASE = "https://r.jina.ai/"
JINA_RPM_LIMIT = 202. Scoring Pattern (score_prospects.py)
Fork from solanasis-scripts/fcto-pipeline/score_prospects.py:
- Score function (lines 36-122):
score_prospect()— signal-based scoring with breakdown strings - Tier assignment (lines 125-134):
assign_tier()— threshold-based tiering - Summary stats (lines 188-226): Tier distribution, top prospects, validation warnings
Adapt: Change signals from “partnership fit” to “job fit.” The structure (dict-based scoring, breakdown strings, tier assignment) stays identical.
3. Daily Report Pattern (generate_daily_outreach.py)
Fork from solanasis-scripts/fcto-pipeline/generate_daily_outreach.py:
- Tracker (lines 38-50):
load_tracker(),save_tracker()— JSON state management - Mark operations (lines 53-93):
mark_sent(),mark_replied()— status tracking with follow-up scheduling - CLI interface (lines 660-693): argparse with
--mark-sent,--mark-replied,--statusflags
Adapt: Change from “sent/replied” to “applied/skipped/reviewed.” Change email generation to job listing presentation.
4. Config Pattern (config.py)
Fork from solanasis-scripts/fcto-pipeline/config.py:
- Path setup (lines 9-16):
PIPELINE_DIR,DATA_DIR,RAW_DIR,INTERMEDIATE_DIR,OUTPUT_DIR - Keyword regex lists (lines 19-84): Pattern structure (but replace with job-fit keywords)
- Scoring dict (lines 87-109):
SCORING = {"signal": points, ...} - Tier thresholds (lines 111-117):
TIERS = {"A": 40, "B": 25, "C": 10}
Implementation Sequence
Phase 1: Foundation
- Create
solanasis-scripts/job-board-scraper/directory structure (all subdirs underdata/) - Write
requirements.txt - Install deps:
pip install crawl4ai python-jobspy pandas - Run
crawl4ai-setup(installs Playwright browser for Crawl4AI) - Write
config.pywith all board definitions, keyword lists, scoring model
Phase 2: Scraping Layer
- Read
fcto-pipeline/enrich_websites.pythoroughly - Write
scrapers.py:- Fork
BaseScraperclass with cache helpers and URL normalization StaticScraper(trafilatura + Jina fallback) for fractionaljobs.io, findfractionaljobs.com, allfractionaljobs.comCrawl4AIScraperfor gofractional.com, gigx.comJobSpyScraperfor Indeed, LinkedIn, ZipRecruiter (wrapspython-jobspy)- Async orchestrator with semaphore concurrency
- Fork
- Test each scraper manually against its target board
Phase 3: Parsing and Scoring
- Write
parsers.py:- Unified listing schema: job_id, board, title, company, description, location, compensation, url, date_posted, date_first_seen, date_last_seen, is_remote, is_fractional, raw_text
- Per-board parsing logic (different HTML structures)
- Job ID generation:
{board_id}_{md5(canonical_url)[:12]}
- Write
score_jobs.py:- Fork structure from
fcto-pipeline/score_prospects.py - Replace partnership signals with job-fit signals
- Regex-based keyword matching (free, fast, deterministic)
- Fork structure from
- Validate scoring with a few real job listings
Phase 4: Pipeline and Reporting
- Write
scrape_and_score.py(main orchestrator):- Orchestrates: scrape all boards → parse → deduplicate (check cache) → score → output CSV
- CLI flags:
--board <name>(single board),--force(ignore cache),--limit N
- Write
generate_daily_jobs.py:- Fork from
fcto-pipeline/generate_daily_outreach.py - Read scored CSV, pick new A/B listings
- Generate daily markdown report
- Tracker: applied/skipped/reviewed status per job
- CLI:
--mark-applied job_id,--mark-skipped job_id,--status
- Fork from
- Test end-to-end pipeline
Phase 5: Scheduling
- Test manually:
python scrape_and_score.py && python generate_daily_jobs.py - Set up as Claude Cowork scheduled task (instruction below)
- Optionally set up Windows Task Scheduler as fallback
Scheduling Setup
Claude Cowork Scheduled Task
Create a Cowork scheduled task with this instruction:
Every morning, run the fractional job board scraper:
1. cd solanasis-scripts/job-board-scraper
2. python scrape_and_score.py
3. python generate_daily_jobs.py
4. Read the generated report and summarize the top 5 opportunities
Limitation: Only runs when machine is awake and Claude Desktop is open.
Windows Task Scheduler (Fallback)
Program: python
Arguments: C:\Users\zasya\Documents\_solanasis\solanasis-scripts\job-board-scraper\scrape_and_score.py
Start in: C:\Users\zasya\Documents\_solanasis\solanasis-scripts\job-board-scraper
Schedule: Daily at 7:00 AM
Keyword Regex Lists (Pre-Built)
Role Title Keywords (Exact Match = +20 points)
ROLE_TITLE_KEYWORDS = [
r"\bfractional\s+c[is]o\b",
r"\bfractional\s+coo\b",
r"\bfractional\s+chief\s+(?:information|security|operating)\s+officer\b",
r"\bvirtual\s+c[is]o\b",
r"\binterim\s+c[is]o\b",
r"\bpart[\s-]time\s+c[is]o\b",
r"\bfractional\s+(?:technology|it)\s+(?:leader|executive)\b",
]Adjacent Role Keywords (Adjacent Match = +10 points)
ADJACENT_ROLE_KEYWORDS = [
r"\bfractional\s+cto\b",
r"\bfractional\s+vp\s+(?:tech|it|information)\b",
r"\bfractional\s+(?:it|technology)\s+director\b",
r"\binterim\s+cto\b",
r"\bvirtual\s+cto\b",
]Service Match Keywords (various point values)
SERVICE_MATCH_KEYWORDS = [
r"\bsecurity\s+assessment\b",
r"\bcybersecurity\b",
r"\bdisaster\s+recovery\b",
r"\bdata\s+migration\b",
r"\bcrm\s+(?:setup|implementation|integration)\b",
r"\bsystems?\s+integration\b",
r"\bresponsible\s+ai\b",
r"\bcompliance\b",
r"\brisk\s+(?:assessment|management)\b",
r"\bincident\s+response\b",
r"\bbusiness\s+continuity\b",
]Target Organization Keywords (+10 points)
TARGET_ORG_KEYWORDS = [
r"\bnonprofit\b",
r"\bfoundation\b",
r"\bsm(?:all\s+)?b(?:usiness)?\b",
r"\bsmall\s+(?:and\s+)?(?:mid(?:size|dle)?|medium)\b",
r"\bstartup\b",
r"\bgrowth[\s-]stage\b",
]Disqualification Keywords (negative points)
DISQUALIFY_KEYWORDS = [
r"\bfull[\s-]time\s+only\b", # -30
r"\bw[\s-]?2\s+only\b", # -30
r"\bon[\s-]?site\s+(?:only|required)\b", # -10
r"\bfortune\s+500\b", # -15
r"\b(?:series\s+[c-z]|ipo)\b", # -15
r"\benterprise[\s-]only\b", # -15
r"\brequires?\s+(?:clearance|ts[\s/]sci)\b", # -25
]
OVERQUALIFIED_KEYWORDS = [
r"\bjunior\b", # -20
r"\bentry[\s-]level\b", # -20
r"\bintern(?:ship)?\b", # -20
r"\bassociate\b", # -20
]Remote/Location Keywords (+5/+8 points)
REMOTE_KEYWORDS = [
r"\bremote\b",
r"\bhybrid\b",
r"\bflexible\s+location\b",
r"\bwork\s+from\s+(?:home|anywhere)\b",
]
COLORADO_KEYWORDS = [
r"\bcolorado\b",
r"\bdenver\b",
r"\bboulder\b",
r"\bcolorado\s+springs\b",
r"\bfort\s+collins\b",
]Risks and Mitigations
| Risk | Mitigation |
|---|---|
| Crawl4AI browser install fails on Windows | Fall back to Jina Reader for JS sites (already proven) |
| JobSpy anti-bot detection (Indeed DataDome) | Disable failing boards; niche boards are primary value |
| Job boards change HTML structure | Each scraper isolated; breakage in one doesn’t affect others |
| Too few listings from niche boards | JobSpy covers major boards as supplement |
| pandas dependency bloat | Only used by JobSpy; contained to that scraper class |
Verification Checklist
python scrape_and_score.py --board fractionaljobs(single board test)- Manually compare scraped listings to actual website content
- Review scoring: do A-tier jobs actually look like good fits for my profile?
- Run full pipeline:
python scrape_and_score.py && python generate_daily_jobs.py - Open the generated
solanasis-docs/daily-outreach/YYYY-MM-DD-jobs.mdand verify format - Test tracker:
python generate_daily_jobs.py --mark-applied <some_job_id> - Test status:
python generate_daily_jobs.py --status
Important Notes
- Prefer hacky and working over polished and slow. Get scraping working for 2-3 boards first, then expand.
- Windows machine. Use
asyncio.WindowsSelectorEventLoopPolicy()for async code. - No API keys needed for any of the scraping tools (trafilatura, Jina Reader free tier, Crawl4AI, JobSpy are all free).
- Existing
.envfile is atsolanasis-scripts/.envwith Baserow and other keys (not needed for scraping, but available if you add Baserow migration later). - Daily outreach dir already exists at
solanasis-docs/daily-outreach/with existing foundation and fCTO reports. - Respect robots.txt and rate-limit all scraping. 1 request per 1-2 seconds minimum.