Fractional Executive Job Board Scraper — Continuation Prompt

Paste this entire file into a new Claude Code session to build the job board scraper pipeline. All context from prior research and planning is embedded below. No prior plans or memory files are needed.

What to Build

Build an automated system that scrapes fractional executive job boards daily, scores listings against my profile, and generates a daily markdown report of which jobs to apply to.

My profile (Dmitri Zasage, CEO of Solanasis LLC):

Fractional CIO, CSIO, COO for SMBs and nonprofits
Core services: Security Assessments, Disaster Recovery Verification, Data Migrations, CRM Setup, Systems Integration, Responsible AI Implementation
Based in Colorado (remote-friendly)
One-person operation with 1099 contractors

Output location: solanasis-docs/daily-outreach/YYYY-MM-DD-jobs.md Script location: solanasis-scripts/job-board-scraper/

Architecture

scrape_and_score.py  (main entry point)
    |
    +--> scrapers.py    --> Fetch listings from 8 boards
    |                       (trafilatura / Jina / Crawl4AI / JobSpy)
    |
    +--> parsers.py     --> Normalize into unified schema
    |                       Deduplicate via job_cache/
    |
    +--> score_jobs.py  --> Score against my profile
    |                       Assign tiers: A (40+), B (25-39), C (10-24)
    |
    v
generate_daily_jobs.py  --> Daily markdown report + CSV

Target Job Boards (8 sources)

Board	URL	Scraper Type	Notes
Fractional Jobs	fractionaljobs.io	Static (trafilatura)	Largest fractional talent network
GoFractional	gofractional.com	Crawl4AI (JS)	Permissive robots.txt
GigX	gigx.com	Crawl4AI (JS)	Drupal-based
Find Fractional Jobs	findfractionaljobs.com	Static	Smaller board
All Fractional Jobs	allfractionaljobs.com	Static	Aggregator
Indeed	via JobSpy	JobSpy library	DataDome anti-bot handled by JobSpy
LinkedIn	via JobSpy	JobSpy library	Rate-limits at page 10
ZipRecruiter	via JobSpy	JobSpy library	Moderate anti-bot

Phase 2 boards (add later): Toptal, Catalant, Business Talent Group, A-Team, Graphite, Paro

Scraping Tools (all free)

Tool	Role	Already Installed?
trafilatura	Primary text extraction (static HTML)	Yes (`pip show trafilatura`)
Jina Reader	Fallback for JS-heavy sites (API: `r.jina.ai/{url}`)	Yes (free tier: 100 RPM, no key needed)
Crawl4AI	JS-rendered pages (GoFractional, GigX)	Install: `pip install crawl4ai` then `crawl4ai-setup`
JobSpy	Indeed/LinkedIn/ZipRecruiter aggregator	Install: `pip install python-jobspy pandas`
httpx	Async HTTP client	Yes

If Crawl4AI install fails on Windows: Fall back to Jina Reader for JS sites. The existing codebase already uses Jina as a fallback; it works.

Key GitHub repo: JobSpy (~3K stars). Simple API:

from jobspy import scrape_jobs
jobs = scrape_jobs(site_name=["indeed","linkedin","zip_recruiter"], search_term="fractional CIO", location="Colorado")
# Returns pandas DataFrame with title, company, location, description, url, etc.

Files to Create (7 files)

All in solanasis-scripts/job-board-scraper/:

#	File	~Lines	Purpose
1	`config.py`	180	Paths, board definitions, scoring model, keyword regex lists
2	`scrapers.py`	350	Per-board scraper classes (Static, Crawl4AI, JobSpy), async orchestrator
3	`parsers.py`	200	Normalize raw scraped data into unified listing schema
4	`score_jobs.py`	180	Score each job against my profile, assign tiers
5	`scrape_and_score.py`	120	Main pipeline entry point: scrape → parse → dedup → score
6	`generate_daily_jobs.py`	280	Daily markdown report, job tracker (applied/skipped), CLI flags
7	`requirements.txt`	10	Dependencies

Directory Structure

solanasis-scripts/job-board-scraper/
  config.py
  scrapers.py
  parsers.py
  score_jobs.py
  scrape_and_score.py
  generate_daily_jobs.py
  requirements.txt
  data/
    raw/                    # Raw scraped HTML/JSON per board per date
    intermediate/
      job_cache/            # Per-job-ID JSON cache (dedup across runs)
      listings_parsed.csv
      listings_scored.csv
    output/
      jobs_report.md        # Latest daily report
      jobs_qualified.csv    # A/B tier jobs only

Scoring Model

Positive Signals

Signal	Points	Trigger
Title: exact match (fractional CIO/CSIO/COO)	+20	Title regex
Title: adjacent match (fractional CTO/VP Tech)	+10	Title regex
Description: role duties match	+5	Description regex
Cybersecurity/security assessment mentioned	+8	Keywords
Disaster recovery/business continuity	+6	Keywords
Data migration/systems integration	+5	Keywords
CRM setup/implementation	+4	Keywords
AI implementation/responsible AI	+4	Keywords
Compliance/risk assessment	+5	Keywords
SMB/nonprofit/foundation target org	+10	Keywords
Colorado location	+8	Location field
Remote-friendly	+5	Keywords
Compensation disclosed	+3	Regex parse
Competitive rate ( $150 + / h ror$ 15K+/month)	+7	Parsed rate

Negative Signals (Disqualifiers)

Signal	Points	Trigger
Full-time only	-30	Keywords
Security clearance required	-25	Keywords
Wrong seniority (junior/intern)	-20	Keywords
Enterprise-only (Fortune 500)	-15	Keywords
On-site only, not Colorado	-10	Location check

Tiers

A (40+): Strong fit, apply immediately
B (25-39): Good fit, worth reviewing
C (10-24): Weak fit, review if time permits
D (<10): Skip

Daily Report Format

Output: solanasis-docs/daily-outreach/YYYY-MM-DD-jobs.md

# Fractional Executive Job Board Report -- March 22, 2026
 
> Generated 07:30 | 12 new listings | 5 A-tier | 3 B-tier | 47 total tracked
 
## A-Tier: Apply Today (5)
 
### 1. Fractional CISO -- Nonprofit Healthcare Network (Remote)
- **Board:** GoFractional
- **Company:** Health Forward Foundation
- **Compensation:** $175-225/hr
- **Score:** 62 (A-tier)
- **Breakdown:** title_match(+20); security(+8); nonprofit(+10); remote(+5); rate(+7)
- **Apply:** [Link](https://gofractional.com/jobs/12345)
- **Preview:** Seeking a fractional CISO to lead security assessments...
 
## B-Tier: Worth Reviewing (3)
...
 
## Summary Stats
| Board | Scraped | New | A-tier | B-tier |
|-------|---------|-----|--------|--------|
| GoFractional | 23 | 4 | 2 | 1 |
| ... | ... | ... | ... | ... |
 
## Quick Actions
python scrape_and_score.py                          # Full scrape + score
python scrape_and_score.py --board gofractional     # Single board
python generate_daily_jobs.py                       # Regenerate report
python generate_daily_jobs.py --mark-applied job123 # Mark as applied
python generate_daily_jobs.py --mark-skipped job456 # Mark as skipped
python generate_daily_jobs.py --status              # Pipeline stats

Tracker File (`data/job_tracker.json`)

{
  "applied": {"job_id": {"date_applied": "2026-03-20", "company": "...", "title": "...", "notes": "..."}},
  "skipped": ["job_id_1", "job_id_2"],
  "reviewed": ["job_id_3"],
  "stats": {"total_scraped": 156, "total_applied": 12, "total_responses": 3}
}

Fork These Patterns From fCTO Pipeline

The fCTO pipeline (solanasis-scripts/fcto-pipeline/) has battle-tested code to fork. Read these files before writing anything.

1. Scraping Pattern (`enrich_websites.py`)

Fork the following from solanasis-scripts/fcto-pipeline/enrich_websites.py:

Cache helpers (lines 97-125): cache_key(), load_cache(), save_cache() — JSON caching per domain/item
URL helpers (lines 130-152): normalize_url(), extract_domain()
Scraping functions (lines 159-188): scrape_with_trafilatura(), scrape_with_jina() — trafilatura primary, Jina fallback
Keyword checking (lines 195-206): check_keywords() — regex pattern matching with human-readable output
Async orchestrator (lines 249-309): scrape_domain() — semaphore-based concurrency, per-page scraping, fallback chain
Windows event loop (line 537): asyncio.WindowsSelectorEventLoopPolicy() for Windows compatibility

Key constants to replicate:

SCRAPE_CONCURRENCY = 3  # Lower than fCTO's 5; fewer but heavier pages
SCRAPE_DELAY = 1.0      # Polite; 1 second between requests per domain
SCRAPE_TIMEOUT = 20     # Slightly higher for JS pages
JINA_READER_BASE = "https://r.jina.ai/"
JINA_RPM_LIMIT = 20

2. Scoring Pattern (`score_prospects.py`)

Fork from solanasis-scripts/fcto-pipeline/score_prospects.py:

Score function (lines 36-122): score_prospect() — signal-based scoring with breakdown strings
Tier assignment (lines 125-134): assign_tier() — threshold-based tiering
Summary stats (lines 188-226): Tier distribution, top prospects, validation warnings

Adapt: Change signals from “partnership fit” to “job fit.” The structure (dict-based scoring, breakdown strings, tier assignment) stays identical.

3. Daily Report Pattern (`generate_daily_outreach.py`)

Fork from solanasis-scripts/fcto-pipeline/generate_daily_outreach.py:

Tracker (lines 38-50): load_tracker(), save_tracker() — JSON state management
Mark operations (lines 53-93): mark_sent(), mark_replied() — status tracking with follow-up scheduling
CLI interface (lines 660-693): argparse with --mark-sent, --mark-replied, --status flags

Adapt: Change from “sent/replied” to “applied/skipped/reviewed.” Change email generation to job listing presentation.

4. Config Pattern (`config.py`)

Fork from solanasis-scripts/fcto-pipeline/config.py:

Path setup (lines 9-16): PIPELINE_DIR, DATA_DIR, RAW_DIR, INTERMEDIATE_DIR, OUTPUT_DIR
Keyword regex lists (lines 19-84): Pattern structure (but replace with job-fit keywords)
Scoring dict (lines 87-109): SCORING = {"signal": points, ...}
Tier thresholds (lines 111-117): TIERS = {"A": 40, "B": 25, "C": 10}

Implementation Sequence

Phase 1: Foundation

Create solanasis-scripts/job-board-scraper/ directory structure (all subdirs under data/)
Write requirements.txt
Install deps: pip install crawl4ai python-jobspy pandas
Run crawl4ai-setup (installs Playwright browser for Crawl4AI)
Write config.py with all board definitions, keyword lists, scoring model

Phase 2: Scraping Layer

Read fcto-pipeline/enrich_websites.py thoroughly
Write scrapers.py:
- Fork BaseScraper class with cache helpers and URL normalization
- StaticScraper (trafilatura + Jina fallback) for fractionaljobs.io, findfractionaljobs.com, allfractionaljobs.com
- Crawl4AIScraper for gofractional.com, gigx.com
- JobSpyScraper for Indeed, LinkedIn, ZipRecruiter (wraps python-jobspy)
- Async orchestrator with semaphore concurrency
Test each scraper manually against its target board

Phase 3: Parsing and Scoring

Write parsers.py:
- Unified listing schema: job_id, board, title, company, description, location, compensation, url, date_posted, date_first_seen, date_last_seen, is_remote, is_fractional, raw_text
- Per-board parsing logic (different HTML structures)
- Job ID generation: {board_id}_{md5(canonical_url)[:12]}
Write score_jobs.py:
- Fork structure from fcto-pipeline/score_prospects.py
- Replace partnership signals with job-fit signals
- Regex-based keyword matching (free, fast, deterministic)
Validate scoring with a few real job listings

Phase 4: Pipeline and Reporting

Write scrape_and_score.py (main orchestrator):
- Orchestrates: scrape all boards → parse → deduplicate (check cache) → score → output CSV
- CLI flags: --board <name> (single board), --force (ignore cache), --limit N
Write generate_daily_jobs.py:
- Fork from fcto-pipeline/generate_daily_outreach.py
- Read scored CSV, pick new A/B listings
- Generate daily markdown report
- Tracker: applied/skipped/reviewed status per job
- CLI: --mark-applied job_id, --mark-skipped job_id, --status
Test end-to-end pipeline

Phase 5: Scheduling

Test manually: python scrape_and_score.py && python generate_daily_jobs.py
Set up as Claude Cowork scheduled task (instruction below)
Optionally set up Windows Task Scheduler as fallback

Scheduling Setup

Claude Cowork Scheduled Task

Create a Cowork scheduled task with this instruction:

Every morning, run the fractional job board scraper:
1. cd solanasis-scripts/job-board-scraper
2. python scrape_and_score.py
3. python generate_daily_jobs.py
4. Read the generated report and summarize the top 5 opportunities

Limitation: Only runs when machine is awake and Claude Desktop is open.

Windows Task Scheduler (Fallback)

Program: python
Arguments: C:\Users\zasya\Documents\_solanasis\solanasis-scripts\job-board-scraper\scrape_and_score.py
Start in: C:\Users\zasya\Documents\_solanasis\solanasis-scripts\job-board-scraper
Schedule: Daily at 7:00 AM

Keyword Regex Lists (Pre-Built)

Role Title Keywords (Exact Match = +20 points)

ROLE_TITLE_KEYWORDS = [
    r"\bfractional\s+c[is]o\b",
    r"\bfractional\s+coo\b",
    r"\bfractional\s+chief\s+(?:information|security|operating)\s+officer\b",
    r"\bvirtual\s+c[is]o\b",
    r"\binterim\s+c[is]o\b",
    r"\bpart[\s-]time\s+c[is]o\b",
    r"\bfractional\s+(?:technology|it)\s+(?:leader|executive)\b",
]

Adjacent Role Keywords (Adjacent Match = +10 points)

ADJACENT_ROLE_KEYWORDS = [
    r"\bfractional\s+cto\b",
    r"\bfractional\s+vp\s+(?:tech|it|information)\b",
    r"\bfractional\s+(?:it|technology)\s+director\b",
    r"\binterim\s+cto\b",
    r"\bvirtual\s+cto\b",
]

Service Match Keywords (various point values)

SERVICE_MATCH_KEYWORDS = [
    r"\bsecurity\s+assessment\b",
    r"\bcybersecurity\b",
    r"\bdisaster\s+recovery\b",
    r"\bdata\s+migration\b",
    r"\bcrm\s+(?:setup|implementation|integration)\b",
    r"\bsystems?\s+integration\b",
    r"\bresponsible\s+ai\b",
    r"\bcompliance\b",
    r"\brisk\s+(?:assessment|management)\b",
    r"\bincident\s+response\b",
    r"\bbusiness\s+continuity\b",
]

Target Organization Keywords (+10 points)

TARGET_ORG_KEYWORDS = [
    r"\bnonprofit\b",
    r"\bfoundation\b",
    r"\bsm(?:all\s+)?b(?:usiness)?\b",
    r"\bsmall\s+(?:and\s+)?(?:mid(?:size|dle)?|medium)\b",
    r"\bstartup\b",
    r"\bgrowth[\s-]stage\b",
]

Disqualification Keywords (negative points)

DISQUALIFY_KEYWORDS = [
    r"\bfull[\s-]time\s+only\b",         # -30
    r"\bw[\s-]?2\s+only\b",              # -30
    r"\bon[\s-]?site\s+(?:only|required)\b",  # -10
    r"\bfortune\s+500\b",               # -15
    r"\b(?:series\s+[c-z]|ipo)\b",      # -15
    r"\benterprise[\s-]only\b",          # -15
    r"\brequires?\s+(?:clearance|ts[\s/]sci)\b",  # -25
]
 
OVERQUALIFIED_KEYWORDS = [
    r"\bjunior\b",                       # -20
    r"\bentry[\s-]level\b",             # -20
    r"\bintern(?:ship)?\b",             # -20
    r"\bassociate\b",                    # -20
]

Remote/Location Keywords (+5/+8 points)

REMOTE_KEYWORDS = [
    r"\bremote\b",
    r"\bhybrid\b",
    r"\bflexible\s+location\b",
    r"\bwork\s+from\s+(?:home|anywhere)\b",
]
 
COLORADO_KEYWORDS = [
    r"\bcolorado\b",
    r"\bdenver\b",
    r"\bboulder\b",
    r"\bcolorado\s+springs\b",
    r"\bfort\s+collins\b",
]

Risks and Mitigations

Risk	Mitigation
Crawl4AI browser install fails on Windows	Fall back to Jina Reader for JS sites (already proven)
JobSpy anti-bot detection (Indeed DataDome)	Disable failing boards; niche boards are primary value
Job boards change HTML structure	Each scraper isolated; breakage in one doesn’t affect others
Too few listings from niche boards	JobSpy covers major boards as supplement
pandas dependency bloat	Only used by JobSpy; contained to that scraper class

Verification Checklist

python scrape_and_score.py --board fractionaljobs (single board test)
Manually compare scraped listings to actual website content
Review scoring: do A-tier jobs actually look like good fits for my profile?
Run full pipeline: python scrape_and_score.py && python generate_daily_jobs.py
Open the generated solanasis-docs/daily-outreach/YYYY-MM-DD-jobs.md and verify format
Test tracker: python generate_daily_jobs.py --mark-applied <some_job_id>
Test status: python generate_daily_jobs.py --status

Important Notes

Prefer hacky and working over polished and slow. Get scraping working for 2-3 boards first, then expand.
Windows machine. Use asyncio.WindowsSelectorEventLoopPolicy() for async code.
No API keys needed for any of the scraping tools (trafilatura, Jina Reader free tier, Crawl4AI, JobSpy are all free).
Existing .env file is at solanasis-scripts/.env with Baserow and other keys (not needed for scraping, but available if you add Baserow migration later).
Daily outreach dir already exists at solanasis-docs/daily-outreach/ with existing foundation and fCTO reports.
Respect robots.txt and rate-limit all scraping. 1 request per 1-2 seconds minimum.

Solanasis Docs

Explorer

job-board-scraper-continuation-prompt