Event Database Scraping Playbook
Owner: Solanasis Operations
Created: 2026-04-02
Plan: ~/.claude-plans/deep-plan-event-super-database-master-2026-04-01.md
Overview
The event pipeline scrapes US/Canada professional events from multiple free sources, deduplicates, scores for Solanasis relevance + Matchkeyz fit, and loads into Supabase. It serves two businesses:
- Solanasis — Identify events for client acquisition and networking (wealth management, cybersecurity, nonprofit conferences)
- Matchkeyz — Identify conference organizers as Regenerosity Rounds partnership prospects (200-5000 attendee conferences)
Quick Start
cd /home/zasage/_my/_solanasis/solanasis-scripts/event-pipeline
# Preflight check
secret run supabase -- python3 run_pipeline.py --check-only
# Dry-run (plan only, no scraping)
secret run supabase -- python3 run_pipeline.py --dry-run
# Full pipeline run
secret run supabase -- python3 run_pipeline.py --run
# Single source scrape
secret run supabase -- python3 run_pipeline.py --run --source 10times
# Single stage
secret run supabase -- python3 run_pipeline.py --run --stage dedup
secret run supabase -- python3 run_pipeline.py --run --stage score
secret run supabase -- python3 run_pipeline.py --run --stage migratePipeline Stages
| Stage | Script | Input | Output |
|---|---|---|---|
| 1. Scrape | scrapers/scrape_*.py | Web sources | data/raw/{source}_{date}.csv |
| 2. Dedup | dedup_events.py | data/raw/*.csv | data/intermediate/deduped_{date}.csv |
| 3. Score | score_events.py | Deduped CSV | data/output/scored_{date}.csv + top-50 reports |
| 4. Migrate | migrate_to_supabase.py | Scored CSV | Supabase events table |
| 5. ROI | analyze_roi.py | Scored CSV | data/output/roi_analysis_{date}.csv |
| 6. Link | link_organizers.py | Supabase events | Links organizers to crm_contacts |
| 7. Export | export_for_apollo.py | crm_contacts | Apollo-compatible CSV for enrichment |
| 8. Enrich | import_apollo_enrichment.py | Apollo CSV | Imports emails back to crm_contacts |
| 9. Push | push_to_brevo.py | Enriched contacts | Pushes to Matchkeyz/Solanasis Brevo |
| 10. Sync | sync_brevo_status.py | Brevo API | Pulls engagement status to Supabase |
Data Sources
Active (Free — Phases 2-4)
| Source | Script | Type | Est. Events | Rate Limit |
|---|---|---|---|---|
| 10times.com | scrape_10times.py | scrape | 50-100K | 1.5s delay |
| EventsEye.com | scrape_eventseye.py | scrape | 3-5K | 1.0s delay |
| ConferenceIndex.org | scrape_conference_index.py | scrape | 5-10K | 1.0s delay |
| AllEvents.in | scrape_allevents.py | scrape | 20-50K | 2.0s delay |
| Industry orgs | scrape_industry_sites.py | scrape | 100-500 | 2.0s delay |
Deferred (Paid — Phase 5)
| Source | Monthly Cost | When to Add |
|---|---|---|
| SerpApi Google Events | $50 | If free sources have >20% coverage gaps |
| Meetup Pro | $30 | If tech/meetup events underrepresented |
| AllEvents.in API | TBD | If HTML scraping proves unreliable |
| PredictHQ | $500-2000 | Only if enterprise intelligence needed |
Scoring Models
Solanasis Relevance Score
Measures how valuable an event is for finding Solanasis clients.
High-value signals (+10-15 pts):
- Wealth management / RIA keywords (+15)
- Colorado-based events (+10)
- Cybersecurity keywords (+12)
- Estate planning keywords (+10)
Moderate signals (+3-8 pts):
- Healthcare IT keywords (+10)
- Nonprofit keywords (+8)
- Western states location (+5)
- 500+ attendees (+5)
- Free or <$200 entry (+3)
Negative signals:
- Consumer event (music festival, etc.) (-10)
- Academic-only event (-5)
- Expensive >$1000 entry (-2)
Tiers: A (35+), B (22-34), C (10-21), D (<10)
Matchkeyz Fit Score
Measures conference suitability for Regenerosity Rounds partnership.
Sweet spot: 500-2000 attendees, annual recurrence, conscious business/impact focus, charges $200+
Tiers: A (30+), B (20-29), C (10-19), D (<10)
ROI / Bang-for-Buck
ROI = (Solanasis Score * Attendees) / (Ticket Cost + Travel Estimate)
Travel estimates: CO=500, Other US=1500
Adding a New Scraping Source
- Check robots.txt and ToS — Verify the source allows scraping
- Add source to config.py — Add entry to
SOURCE_CONFIGSdict - Create scraper — Add
scrapers/scrape_{source}.pyfollowing the standard interface:def scrape(config: dict, dry_run: bool = False) -> list[dict]: """Scrape events. Returns list of normalized event dicts.""" ... def normalize(raw: dict) -> dict: """Normalize to standard EVENT_COLUMNS schema.""" ... - Register in run_pipeline.py — Add to
SCRAPER_MAPdict - Add to event_sources table — Insert row in Supabase
- Test — Run
python3 scrapers/scrape_{source}.py --dry-run - Add test — Add normalize test in
tests/test_scrapers.py
Automation
Weekly Cron
The pipeline runs weekly via WSL crontab:
# Weekly event pipeline scrape — Sunday 4:00 AM Denver time
0 4 * * 0 cron-wrapper.py supabase -- python3 run_pipeline.py --run
Logs: event-pipeline/logs/cron-weekly.log
Manual Trigger
cd event-pipeline
secret run supabase -- python3 run_pipeline.py --runDirectus Views
The events collection in Directus (db.solanasis.com) has these bookmarks:
- Upcoming Events — Future events, active only
- Solanasis Tier A — Highest-value events for client acquisition
- Matchkeyz Tier A — Best conference partnership prospects
- Colorado Events — Local events (highest ROI)
- Conferences (500+ attendees) — Large conferences
- Not Contacted — Events where organizer hasn’t been reached out to
Outreach Workflow
After the pipeline scores and links organizers, the outreach stages handle enrichment and campaign delivery.
Apollo Enrichment (Stages 7-8)
cd event-pipeline
# Export organizers needing emails to Apollo CSV
secret run supabase -- python3 export_for_apollo.py --plan # Preview
secret run supabase -- python3 export_for_apollo.py --run # Generate CSV
secret run supabase -- python3 export_for_apollo.py --run --track matchkeyz # MK only
# [Manual] Upload CSV to Apollo.io, download enriched results to data/raw/apollo_export_organizers.csv
# Import Apollo enrichment back to crm_contacts
secret run supabase -- python3 import_apollo_enrichment.py --plan # Preview matches
secret run supabase -- python3 import_apollo_enrichment.py --run # Apply updatesBrevo Campaign Push (Stage 9)
# Push enriched contacts to Brevo (separate accounts per track)
secret run supabase -- python3 push_to_brevo.py --plan # Preview
secret run supabase -- python3 push_to_brevo.py --run --track matchkeyz --batch 10 # Test batch
secret run supabase -- python3 push_to_brevo.py --run --track both # Full pushStatus Sync (Stage 10)
# Sync engagement data from Brevo back to Supabase
secret run supabase -- python3 sync_brevo_status.py --plan # Preview
secret run supabase -- python3 sync_brevo_status.py --run --track both # Sync allCampaign Tracks
| Track | Brevo Account | Sender Domain | Score Field | Min Tier |
|---|---|---|---|---|
| Matchkeyz | Matchkeyz Brevo | matchkeyz.io | matchkeyz_fit_score | C (10+) |
| Solanasis | Solanasis Brevo | solanasishq.com | solanasis_relevance_score | C (10+) |
Design spec: solanasis-docs/specs/2026-04-05-event-organizer-outreach-design.md
Troubleshooting
Pipeline preflight fails
secret run supabase -- python3 run_pipeline.py --check-onlyFix any FAIL items before running.
Scraper returns 0 events
- Check if the source website changed its HTML structure
- Try with
--dry-runto see what URLs would be fetched - Check rate limiting — some sources block rapid requests
Dedup too aggressive (merging different events)
- Increase threshold:
python3 dedup_events.py --run --threshold 0.8 - Reduce date tolerance:
python3 dedup_events.py --run --date-tolerance 1
Migration errors
- Check DB connection:
secret run supabase -- python3 -c "from supabase.supabase_client import SupabaseClient; SupabaseClient()" - Check schema matches:
docker exec supabase-db psql -U supabase_admin -d postgres -c '\d events'
Database Schema
events table
Key columns: event_name, event_type, start_date, end_date, location_city, location_state, location_country, organizer_name, solanasis_relevance_score, solanasis_relevance_tier, matchkeyz_fit_score, matchkeyz_fit_tier, roi_score, source, source_id
Dedup key: UNIQUE(source, source_id)
event_sources table
Tracks source health: source_name, source_type, last_scrape_at, last_scrape_count, last_scrape_status, is_active
File Layout
event-pipeline/
config.py # Paths, scoring models, source configs
requirements.txt # Python dependencies
run_pipeline.py # Orchestrator (entry point)
dedup_events.py # Cross-source deduplication
score_events.py # Dual Solanasis + Matchkeyz scoring
analyze_roi.py # Bang-for-buck ROI analysis
migrate_to_supabase.py # Upsert to Supabase
link_organizers.py # Match organizers to crm_contacts
export_for_apollo.py # Export orgs to Apollo CSV
import_apollo_enrichment.py # Import Apollo emails to crm_contacts
push_to_brevo.py # Push contacts to Brevo accounts
sync_brevo_status.py # Sync Brevo engagement to Supabase
templates.py # Template assignment logic
scrapers/
scrape_10times.py # 10times.com scraper
scrape_eventseye.py # EventsEye.com scraper
scrape_conference_index.py # ConferenceIndex.org scraper
scrape_allevents.py # AllEvents.in HTML scraper
scrape_industry_sites.py # FPA, NAPFA, NAEPC, etc.
data/
raw/ # Per-source raw scrape CSVs
intermediate/ # Deduped intermediate files
output/ # Scored + ROI analysis outputs
tests/
test_scoring.py # Scoring model tests
test_dedup.py # Dedup logic tests
test_scrapers.py # Scraper normalize tests
test_apollo_export.py # Apollo export tests
test_apollo_import.py # Apollo import tests
test_brevo_push.py # Brevo push tests
test_brevo_sync.py # Brevo sync tests
test_templates.py # Template assignment tests
logs/ # Cron job logs