Event Database Scraping Playbook

Owner: Solanasis Operations Created: 2026-04-02 Plan: ~/.claude-plans/deep-plan-event-super-database-master-2026-04-01.md

Overview

The event pipeline scrapes US/Canada professional events from multiple free sources, deduplicates, scores for Solanasis relevance + Matchkeyz fit, and loads into Supabase. It serves two businesses:

  1. Solanasis — Identify events for client acquisition and networking (wealth management, cybersecurity, nonprofit conferences)
  2. Matchkeyz — Identify conference organizers as Regenerosity Rounds partnership prospects (200-5000 attendee conferences)

Quick Start

cd /home/zasage/_my/_solanasis/solanasis-scripts/event-pipeline
 
# Preflight check
secret run supabase -- python3 run_pipeline.py --check-only
 
# Dry-run (plan only, no scraping)
secret run supabase -- python3 run_pipeline.py --dry-run
 
# Full pipeline run
secret run supabase -- python3 run_pipeline.py --run
 
# Single source scrape
secret run supabase -- python3 run_pipeline.py --run --source 10times
 
# Single stage
secret run supabase -- python3 run_pipeline.py --run --stage dedup
secret run supabase -- python3 run_pipeline.py --run --stage score
secret run supabase -- python3 run_pipeline.py --run --stage migrate

Pipeline Stages

StageScriptInputOutput
1. Scrapescrapers/scrape_*.pyWeb sourcesdata/raw/{source}_{date}.csv
2. Dedupdedup_events.pydata/raw/*.csvdata/intermediate/deduped_{date}.csv
3. Scorescore_events.pyDeduped CSVdata/output/scored_{date}.csv + top-50 reports
4. Migratemigrate_to_supabase.pyScored CSVSupabase events table
5. ROIanalyze_roi.pyScored CSVdata/output/roi_analysis_{date}.csv
6. Linklink_organizers.pySupabase eventsLinks organizers to crm_contacts
7. Exportexport_for_apollo.pycrm_contactsApollo-compatible CSV for enrichment
8. Enrichimport_apollo_enrichment.pyApollo CSVImports emails back to crm_contacts
9. Pushpush_to_brevo.pyEnriched contactsPushes to Matchkeyz/Solanasis Brevo
10. Syncsync_brevo_status.pyBrevo APIPulls engagement status to Supabase

Data Sources

Active (Free — Phases 2-4)

SourceScriptTypeEst. EventsRate Limit
10times.comscrape_10times.pyscrape50-100K1.5s delay
EventsEye.comscrape_eventseye.pyscrape3-5K1.0s delay
ConferenceIndex.orgscrape_conference_index.pyscrape5-10K1.0s delay
AllEvents.inscrape_allevents.pyscrape20-50K2.0s delay
Industry orgsscrape_industry_sites.pyscrape100-5002.0s delay

Deferred (Paid — Phase 5)

SourceMonthly CostWhen to Add
SerpApi Google Events$50If free sources have >20% coverage gaps
Meetup Pro$30If tech/meetup events underrepresented
AllEvents.in APITBDIf HTML scraping proves unreliable
PredictHQ$500-2000Only if enterprise intelligence needed

Scoring Models

Solanasis Relevance Score

Measures how valuable an event is for finding Solanasis clients.

High-value signals (+10-15 pts):

  • Wealth management / RIA keywords (+15)
  • Colorado-based events (+10)
  • Cybersecurity keywords (+12)
  • Estate planning keywords (+10)

Moderate signals (+3-8 pts):

  • Healthcare IT keywords (+10)
  • Nonprofit keywords (+8)
  • Western states location (+5)
  • 500+ attendees (+5)
  • Free or <$200 entry (+3)

Negative signals:

  • Consumer event (music festival, etc.) (-10)
  • Academic-only event (-5)
  • Expensive >$1000 entry (-2)

Tiers: A (35+), B (22-34), C (10-21), D (<10)

Matchkeyz Fit Score

Measures conference suitability for Regenerosity Rounds partnership.

Sweet spot: 500-2000 attendees, annual recurrence, conscious business/impact focus, charges $200+

Tiers: A (30+), B (20-29), C (10-19), D (<10)

ROI / Bang-for-Buck

ROI = (Solanasis Score * Attendees) / (Ticket Cost + Travel Estimate)

Travel estimates: CO=500, Other US=1500

Adding a New Scraping Source

  1. Check robots.txt and ToS — Verify the source allows scraping
  2. Add source to config.py — Add entry to SOURCE_CONFIGS dict
  3. Create scraper — Add scrapers/scrape_{source}.py following the standard interface:
    def scrape(config: dict, dry_run: bool = False) -> list[dict]:
        """Scrape events. Returns list of normalized event dicts."""
        ...
     
    def normalize(raw: dict) -> dict:
        """Normalize to standard EVENT_COLUMNS schema."""
        ...
  4. Register in run_pipeline.py — Add to SCRAPER_MAP dict
  5. Add to event_sources table — Insert row in Supabase
  6. Test — Run python3 scrapers/scrape_{source}.py --dry-run
  7. Add test — Add normalize test in tests/test_scrapers.py

Automation

Weekly Cron

The pipeline runs weekly via WSL crontab:

# Weekly event pipeline scrape — Sunday 4:00 AM Denver time
0 4 * * 0 cron-wrapper.py supabase -- python3 run_pipeline.py --run

Logs: event-pipeline/logs/cron-weekly.log

Manual Trigger

cd event-pipeline
secret run supabase -- python3 run_pipeline.py --run

Directus Views

The events collection in Directus (db.solanasis.com) has these bookmarks:

  • Upcoming Events — Future events, active only
  • Solanasis Tier A — Highest-value events for client acquisition
  • Matchkeyz Tier A — Best conference partnership prospects
  • Colorado Events — Local events (highest ROI)
  • Conferences (500+ attendees) — Large conferences
  • Not Contacted — Events where organizer hasn’t been reached out to

Outreach Workflow

After the pipeline scores and links organizers, the outreach stages handle enrichment and campaign delivery.

Apollo Enrichment (Stages 7-8)

cd event-pipeline
 
# Export organizers needing emails to Apollo CSV
secret run supabase -- python3 export_for_apollo.py --plan        # Preview
secret run supabase -- python3 export_for_apollo.py --run          # Generate CSV
secret run supabase -- python3 export_for_apollo.py --run --track matchkeyz  # MK only
 
# [Manual] Upload CSV to Apollo.io, download enriched results to data/raw/apollo_export_organizers.csv
 
# Import Apollo enrichment back to crm_contacts
secret run supabase -- python3 import_apollo_enrichment.py --plan  # Preview matches
secret run supabase -- python3 import_apollo_enrichment.py --run   # Apply updates

Brevo Campaign Push (Stage 9)

# Push enriched contacts to Brevo (separate accounts per track)
secret run supabase -- python3 push_to_brevo.py --plan                         # Preview
secret run supabase -- python3 push_to_brevo.py --run --track matchkeyz --batch 10  # Test batch
secret run supabase -- python3 push_to_brevo.py --run --track both             # Full push

Status Sync (Stage 10)

# Sync engagement data from Brevo back to Supabase
secret run supabase -- python3 sync_brevo_status.py --plan                     # Preview
secret run supabase -- python3 sync_brevo_status.py --run --track both         # Sync all

Campaign Tracks

TrackBrevo AccountSender DomainScore FieldMin Tier
MatchkeyzMatchkeyz Brevomatchkeyz.iomatchkeyz_fit_scoreC (10+)
SolanasisSolanasis Brevosolanasishq.comsolanasis_relevance_scoreC (10+)

Design spec: solanasis-docs/specs/2026-04-05-event-organizer-outreach-design.md

Troubleshooting

Pipeline preflight fails

secret run supabase -- python3 run_pipeline.py --check-only

Fix any FAIL items before running.

Scraper returns 0 events

  • Check if the source website changed its HTML structure
  • Try with --dry-run to see what URLs would be fetched
  • Check rate limiting — some sources block rapid requests

Dedup too aggressive (merging different events)

  • Increase threshold: python3 dedup_events.py --run --threshold 0.8
  • Reduce date tolerance: python3 dedup_events.py --run --date-tolerance 1

Migration errors

  • Check DB connection: secret run supabase -- python3 -c "from supabase.supabase_client import SupabaseClient; SupabaseClient()"
  • Check schema matches: docker exec supabase-db psql -U supabase_admin -d postgres -c '\d events'

Database Schema

events table

Key columns: event_name, event_type, start_date, end_date, location_city, location_state, location_country, organizer_name, solanasis_relevance_score, solanasis_relevance_tier, matchkeyz_fit_score, matchkeyz_fit_tier, roi_score, source, source_id

Dedup key: UNIQUE(source, source_id)

event_sources table

Tracks source health: source_name, source_type, last_scrape_at, last_scrape_count, last_scrape_status, is_active

File Layout

event-pipeline/
  config.py                     # Paths, scoring models, source configs
  requirements.txt              # Python dependencies
  run_pipeline.py               # Orchestrator (entry point)
  dedup_events.py               # Cross-source deduplication
  score_events.py               # Dual Solanasis + Matchkeyz scoring
  analyze_roi.py                # Bang-for-buck ROI analysis
  migrate_to_supabase.py        # Upsert to Supabase
  link_organizers.py            # Match organizers to crm_contacts
  export_for_apollo.py          # Export orgs to Apollo CSV
  import_apollo_enrichment.py   # Import Apollo emails to crm_contacts
  push_to_brevo.py              # Push contacts to Brevo accounts
  sync_brevo_status.py          # Sync Brevo engagement to Supabase
  templates.py                  # Template assignment logic
  scrapers/
    scrape_10times.py           # 10times.com scraper
    scrape_eventseye.py         # EventsEye.com scraper
    scrape_conference_index.py  # ConferenceIndex.org scraper
    scrape_allevents.py         # AllEvents.in HTML scraper
    scrape_industry_sites.py    # FPA, NAPFA, NAEPC, etc.
  data/
    raw/                        # Per-source raw scrape CSVs
    intermediate/               # Deduped intermediate files
    output/                     # Scored + ROI analysis outputs
  tests/
    test_scoring.py             # Scoring model tests
    test_dedup.py               # Dedup logic tests
    test_scrapers.py            # Scraper normalize tests
    test_apollo_export.py       # Apollo export tests
    test_apollo_import.py       # Apollo import tests
    test_brevo_push.py          # Brevo push tests
    test_brevo_sync.py          # Brevo sync tests
    test_templates.py           # Template assignment tests
  logs/                         # Cron job logs