Fractional Executive Job Board Scraper — Continuation Prompt

Paste this entire file into a new Claude Code session to build the job board scraper pipeline. All context from prior research and planning is embedded below. No prior plans or memory files are needed.


What to Build

Build an automated system that scrapes fractional executive job boards daily, scores listings against my profile, and generates a daily markdown report of which jobs to apply to.

My profile (Dmitri Zasage, CEO of Solanasis LLC):

  • Fractional CIO, CSIO, COO for SMBs and nonprofits
  • Core services: Security Assessments, Disaster Recovery Verification, Data Migrations, CRM Setup, Systems Integration, Responsible AI Implementation
  • Based in Colorado (remote-friendly)
  • One-person operation with 1099 contractors

Output location: solanasis-docs/daily-outreach/YYYY-MM-DD-jobs.md Script location: solanasis-scripts/job-board-scraper/


Architecture

scrape_and_score.py  (main entry point)
    |
    +--> scrapers.py    --> Fetch listings from 8 boards
    |                       (trafilatura / Jina / Crawl4AI / JobSpy)
    |
    +--> parsers.py     --> Normalize into unified schema
    |                       Deduplicate via job_cache/
    |
    +--> score_jobs.py  --> Score against my profile
    |                       Assign tiers: A (40+), B (25-39), C (10-24)
    |
    v
generate_daily_jobs.py  --> Daily markdown report + CSV

Target Job Boards (8 sources)

BoardURLScraper TypeNotes
Fractional Jobsfractionaljobs.ioStatic (trafilatura)Largest fractional talent network
GoFractionalgofractional.comCrawl4AI (JS)Permissive robots.txt
GigXgigx.comCrawl4AI (JS)Drupal-based
Find Fractional Jobsfindfractionaljobs.comStaticSmaller board
All Fractional Jobsallfractionaljobs.comStaticAggregator
Indeedvia JobSpyJobSpy libraryDataDome anti-bot handled by JobSpy
LinkedInvia JobSpyJobSpy libraryRate-limits at page 10
ZipRecruitervia JobSpyJobSpy libraryModerate anti-bot

Phase 2 boards (add later): Toptal, Catalant, Business Talent Group, A-Team, Graphite, Paro


Scraping Tools (all free)

ToolRoleAlready Installed?
trafilaturaPrimary text extraction (static HTML)Yes (pip show trafilatura)
Jina ReaderFallback for JS-heavy sites (API: r.jina.ai/{url})Yes (free tier: 100 RPM, no key needed)
Crawl4AIJS-rendered pages (GoFractional, GigX)Install: pip install crawl4ai then crawl4ai-setup
JobSpyIndeed/LinkedIn/ZipRecruiter aggregatorInstall: pip install python-jobspy pandas
httpxAsync HTTP clientYes

If Crawl4AI install fails on Windows: Fall back to Jina Reader for JS sites. The existing codebase already uses Jina as a fallback; it works.

Key GitHub repo: JobSpy (~3K stars). Simple API:

from jobspy import scrape_jobs
jobs = scrape_jobs(site_name=["indeed","linkedin","zip_recruiter"], search_term="fractional CIO", location="Colorado")
# Returns pandas DataFrame with title, company, location, description, url, etc.

Files to Create (7 files)

All in solanasis-scripts/job-board-scraper/:

#File~LinesPurpose
1config.py180Paths, board definitions, scoring model, keyword regex lists
2scrapers.py350Per-board scraper classes (Static, Crawl4AI, JobSpy), async orchestrator
3parsers.py200Normalize raw scraped data into unified listing schema
4score_jobs.py180Score each job against my profile, assign tiers
5scrape_and_score.py120Main pipeline entry point: scrape parse dedup score
6generate_daily_jobs.py280Daily markdown report, job tracker (applied/skipped), CLI flags
7requirements.txt10Dependencies

Directory Structure

solanasis-scripts/job-board-scraper/
  config.py
  scrapers.py
  parsers.py
  score_jobs.py
  scrape_and_score.py
  generate_daily_jobs.py
  requirements.txt
  data/
    raw/                    # Raw scraped HTML/JSON per board per date
    intermediate/
      job_cache/            # Per-job-ID JSON cache (dedup across runs)
      listings_parsed.csv
      listings_scored.csv
    output/
      jobs_report.md        # Latest daily report
      jobs_qualified.csv    # A/B tier jobs only

Scoring Model

Positive Signals

SignalPointsTrigger
Title: exact match (fractional CIO/CSIO/COO)+20Title regex
Title: adjacent match (fractional CTO/VP Tech)+10Title regex
Description: role duties match+5Description regex
Cybersecurity/security assessment mentioned+8Keywords
Disaster recovery/business continuity+6Keywords
Data migration/systems integration+5Keywords
CRM setup/implementation+4Keywords
AI implementation/responsible AI+4Keywords
Compliance/risk assessment+5Keywords
SMB/nonprofit/foundation target org+10Keywords
Colorado location+8Location field
Remote-friendly+5Keywords
Compensation disclosed+3Regex parse
Competitive rate (15K+/month)+7Parsed rate

Negative Signals (Disqualifiers)

SignalPointsTrigger
Full-time only-30Keywords
Security clearance required-25Keywords
Wrong seniority (junior/intern)-20Keywords
Enterprise-only (Fortune 500)-15Keywords
On-site only, not Colorado-10Location check

Tiers

  • A (40+): Strong fit, apply immediately
  • B (25-39): Good fit, worth reviewing
  • C (10-24): Weak fit, review if time permits
  • D (<10): Skip

Daily Report Format

Output: solanasis-docs/daily-outreach/YYYY-MM-DD-jobs.md

# Fractional Executive Job Board Report -- March 22, 2026
 
> Generated 07:30 | 12 new listings | 5 A-tier | 3 B-tier | 47 total tracked
 
## A-Tier: Apply Today (5)
 
### 1. Fractional CISO -- Nonprofit Healthcare Network (Remote)
- **Board:** GoFractional
- **Company:** Health Forward Foundation
- **Compensation:** $175-225/hr
- **Score:** 62 (A-tier)
- **Breakdown:** title_match(+20); security(+8); nonprofit(+10); remote(+5); rate(+7)
- **Apply:** [Link](https://gofractional.com/jobs/12345)
- **Preview:** Seeking a fractional CISO to lead security assessments...
 
## B-Tier: Worth Reviewing (3)
...
 
## Summary Stats
| Board | Scraped | New | A-tier | B-tier |
|-------|---------|-----|--------|--------|
| GoFractional | 23 | 4 | 2 | 1 |
| ... | ... | ... | ... | ... |
 
## Quick Actions
python scrape_and_score.py                          # Full scrape + score
python scrape_and_score.py --board gofractional     # Single board
python generate_daily_jobs.py                       # Regenerate report
python generate_daily_jobs.py --mark-applied job123 # Mark as applied
python generate_daily_jobs.py --mark-skipped job456 # Mark as skipped
python generate_daily_jobs.py --status              # Pipeline stats

Tracker File (data/job_tracker.json)

{
  "applied": {"job_id": {"date_applied": "2026-03-20", "company": "...", "title": "...", "notes": "..."}},
  "skipped": ["job_id_1", "job_id_2"],
  "reviewed": ["job_id_3"],
  "stats": {"total_scraped": 156, "total_applied": 12, "total_responses": 3}
}

Fork These Patterns From fCTO Pipeline

The fCTO pipeline (solanasis-scripts/fcto-pipeline/) has battle-tested code to fork. Read these files before writing anything.

1. Scraping Pattern (enrich_websites.py)

Fork the following from solanasis-scripts/fcto-pipeline/enrich_websites.py:

  • Cache helpers (lines 97-125): cache_key(), load_cache(), save_cache() — JSON caching per domain/item
  • URL helpers (lines 130-152): normalize_url(), extract_domain()
  • Scraping functions (lines 159-188): scrape_with_trafilatura(), scrape_with_jina() — trafilatura primary, Jina fallback
  • Keyword checking (lines 195-206): check_keywords() — regex pattern matching with human-readable output
  • Async orchestrator (lines 249-309): scrape_domain() — semaphore-based concurrency, per-page scraping, fallback chain
  • Windows event loop (line 537): asyncio.WindowsSelectorEventLoopPolicy() for Windows compatibility

Key constants to replicate:

SCRAPE_CONCURRENCY = 3  # Lower than fCTO's 5; fewer but heavier pages
SCRAPE_DELAY = 1.0      # Polite; 1 second between requests per domain
SCRAPE_TIMEOUT = 20     # Slightly higher for JS pages
JINA_READER_BASE = "https://r.jina.ai/"
JINA_RPM_LIMIT = 20

2. Scoring Pattern (score_prospects.py)

Fork from solanasis-scripts/fcto-pipeline/score_prospects.py:

  • Score function (lines 36-122): score_prospect() — signal-based scoring with breakdown strings
  • Tier assignment (lines 125-134): assign_tier() — threshold-based tiering
  • Summary stats (lines 188-226): Tier distribution, top prospects, validation warnings

Adapt: Change signals from “partnership fit” to “job fit.” The structure (dict-based scoring, breakdown strings, tier assignment) stays identical.

3. Daily Report Pattern (generate_daily_outreach.py)

Fork from solanasis-scripts/fcto-pipeline/generate_daily_outreach.py:

  • Tracker (lines 38-50): load_tracker(), save_tracker() — JSON state management
  • Mark operations (lines 53-93): mark_sent(), mark_replied() — status tracking with follow-up scheduling
  • CLI interface (lines 660-693): argparse with --mark-sent, --mark-replied, --status flags

Adapt: Change from “sent/replied” to “applied/skipped/reviewed.” Change email generation to job listing presentation.

4. Config Pattern (config.py)

Fork from solanasis-scripts/fcto-pipeline/config.py:

  • Path setup (lines 9-16): PIPELINE_DIR, DATA_DIR, RAW_DIR, INTERMEDIATE_DIR, OUTPUT_DIR
  • Keyword regex lists (lines 19-84): Pattern structure (but replace with job-fit keywords)
  • Scoring dict (lines 87-109): SCORING = {"signal": points, ...}
  • Tier thresholds (lines 111-117): TIERS = {"A": 40, "B": 25, "C": 10}

Implementation Sequence

Phase 1: Foundation

  1. Create solanasis-scripts/job-board-scraper/ directory structure (all subdirs under data/)
  2. Write requirements.txt
  3. Install deps: pip install crawl4ai python-jobspy pandas
  4. Run crawl4ai-setup (installs Playwright browser for Crawl4AI)
  5. Write config.py with all board definitions, keyword lists, scoring model

Phase 2: Scraping Layer

  1. Read fcto-pipeline/enrich_websites.py thoroughly
  2. Write scrapers.py:
    • Fork BaseScraper class with cache helpers and URL normalization
    • StaticScraper (trafilatura + Jina fallback) for fractionaljobs.io, findfractionaljobs.com, allfractionaljobs.com
    • Crawl4AIScraper for gofractional.com, gigx.com
    • JobSpyScraper for Indeed, LinkedIn, ZipRecruiter (wraps python-jobspy)
    • Async orchestrator with semaphore concurrency
  3. Test each scraper manually against its target board

Phase 3: Parsing and Scoring

  1. Write parsers.py:
    • Unified listing schema: job_id, board, title, company, description, location, compensation, url, date_posted, date_first_seen, date_last_seen, is_remote, is_fractional, raw_text
    • Per-board parsing logic (different HTML structures)
    • Job ID generation: {board_id}_{md5(canonical_url)[:12]}
  2. Write score_jobs.py:
    • Fork structure from fcto-pipeline/score_prospects.py
    • Replace partnership signals with job-fit signals
    • Regex-based keyword matching (free, fast, deterministic)
  3. Validate scoring with a few real job listings

Phase 4: Pipeline and Reporting

  1. Write scrape_and_score.py (main orchestrator):
    • Orchestrates: scrape all boards parse deduplicate (check cache) score output CSV
    • CLI flags: --board <name> (single board), --force (ignore cache), --limit N
  2. Write generate_daily_jobs.py:
    • Fork from fcto-pipeline/generate_daily_outreach.py
    • Read scored CSV, pick new A/B listings
    • Generate daily markdown report
    • Tracker: applied/skipped/reviewed status per job
    • CLI: --mark-applied job_id, --mark-skipped job_id, --status
  3. Test end-to-end pipeline

Phase 5: Scheduling

  1. Test manually: python scrape_and_score.py && python generate_daily_jobs.py
  2. Set up as Claude Cowork scheduled task (instruction below)
  3. Optionally set up Windows Task Scheduler as fallback

Scheduling Setup

Claude Cowork Scheduled Task

Create a Cowork scheduled task with this instruction:

Every morning, run the fractional job board scraper:
1. cd solanasis-scripts/job-board-scraper
2. python scrape_and_score.py
3. python generate_daily_jobs.py
4. Read the generated report and summarize the top 5 opportunities

Limitation: Only runs when machine is awake and Claude Desktop is open.

Windows Task Scheduler (Fallback)

Program: python
Arguments: C:\Users\zasya\Documents\_solanasis\solanasis-scripts\job-board-scraper\scrape_and_score.py
Start in: C:\Users\zasya\Documents\_solanasis\solanasis-scripts\job-board-scraper
Schedule: Daily at 7:00 AM

Keyword Regex Lists (Pre-Built)

Role Title Keywords (Exact Match = +20 points)

ROLE_TITLE_KEYWORDS = [
    r"\bfractional\s+c[is]o\b",
    r"\bfractional\s+coo\b",
    r"\bfractional\s+chief\s+(?:information|security|operating)\s+officer\b",
    r"\bvirtual\s+c[is]o\b",
    r"\binterim\s+c[is]o\b",
    r"\bpart[\s-]time\s+c[is]o\b",
    r"\bfractional\s+(?:technology|it)\s+(?:leader|executive)\b",
]

Adjacent Role Keywords (Adjacent Match = +10 points)

ADJACENT_ROLE_KEYWORDS = [
    r"\bfractional\s+cto\b",
    r"\bfractional\s+vp\s+(?:tech|it|information)\b",
    r"\bfractional\s+(?:it|technology)\s+director\b",
    r"\binterim\s+cto\b",
    r"\bvirtual\s+cto\b",
]

Service Match Keywords (various point values)

SERVICE_MATCH_KEYWORDS = [
    r"\bsecurity\s+assessment\b",
    r"\bcybersecurity\b",
    r"\bdisaster\s+recovery\b",
    r"\bdata\s+migration\b",
    r"\bcrm\s+(?:setup|implementation|integration)\b",
    r"\bsystems?\s+integration\b",
    r"\bresponsible\s+ai\b",
    r"\bcompliance\b",
    r"\brisk\s+(?:assessment|management)\b",
    r"\bincident\s+response\b",
    r"\bbusiness\s+continuity\b",
]

Target Organization Keywords (+10 points)

TARGET_ORG_KEYWORDS = [
    r"\bnonprofit\b",
    r"\bfoundation\b",
    r"\bsm(?:all\s+)?b(?:usiness)?\b",
    r"\bsmall\s+(?:and\s+)?(?:mid(?:size|dle)?|medium)\b",
    r"\bstartup\b",
    r"\bgrowth[\s-]stage\b",
]

Disqualification Keywords (negative points)

DISQUALIFY_KEYWORDS = [
    r"\bfull[\s-]time\s+only\b",         # -30
    r"\bw[\s-]?2\s+only\b",              # -30
    r"\bon[\s-]?site\s+(?:only|required)\b",  # -10
    r"\bfortune\s+500\b",               # -15
    r"\b(?:series\s+[c-z]|ipo)\b",      # -15
    r"\benterprise[\s-]only\b",          # -15
    r"\brequires?\s+(?:clearance|ts[\s/]sci)\b",  # -25
]
 
OVERQUALIFIED_KEYWORDS = [
    r"\bjunior\b",                       # -20
    r"\bentry[\s-]level\b",             # -20
    r"\bintern(?:ship)?\b",             # -20
    r"\bassociate\b",                    # -20
]

Remote/Location Keywords (+5/+8 points)

REMOTE_KEYWORDS = [
    r"\bremote\b",
    r"\bhybrid\b",
    r"\bflexible\s+location\b",
    r"\bwork\s+from\s+(?:home|anywhere)\b",
]
 
COLORADO_KEYWORDS = [
    r"\bcolorado\b",
    r"\bdenver\b",
    r"\bboulder\b",
    r"\bcolorado\s+springs\b",
    r"\bfort\s+collins\b",
]

Risks and Mitigations

RiskMitigation
Crawl4AI browser install fails on WindowsFall back to Jina Reader for JS sites (already proven)
JobSpy anti-bot detection (Indeed DataDome)Disable failing boards; niche boards are primary value
Job boards change HTML structureEach scraper isolated; breakage in one doesn’t affect others
Too few listings from niche boardsJobSpy covers major boards as supplement
pandas dependency bloatOnly used by JobSpy; contained to that scraper class

Verification Checklist

  1. python scrape_and_score.py --board fractionaljobs (single board test)
  2. Manually compare scraped listings to actual website content
  3. Review scoring: do A-tier jobs actually look like good fits for my profile?
  4. Run full pipeline: python scrape_and_score.py && python generate_daily_jobs.py
  5. Open the generated solanasis-docs/daily-outreach/YYYY-MM-DD-jobs.md and verify format
  6. Test tracker: python generate_daily_jobs.py --mark-applied <some_job_id>
  7. Test status: python generate_daily_jobs.py --status

Important Notes

  • Prefer hacky and working over polished and slow. Get scraping working for 2-3 boards first, then expand.
  • Windows machine. Use asyncio.WindowsSelectorEventLoopPolicy() for async code.
  • No API keys needed for any of the scraping tools (trafilatura, Jina Reader free tier, Crawl4AI, JobSpy are all free).
  • Existing .env file is at solanasis-scripts/.env with Baserow and other keys (not needed for scraping, but available if you add Baserow migration later).
  • Daily outreach dir already exists at solanasis-docs/daily-outreach/ with existing foundation and fCTO reports.
  • Respect robots.txt and rate-limit all scraping. 1 request per 1-2 seconds minimum.