Baserow People Table Migration — Continuation Prompt
Purpose: Self-contained instructions for migrating LinkedIn pipeline unified contacts into the existing Baserow People table. A fresh Claude session should be able to execute (or debug) this migration with no prior context.
1. Context
Dmitri’s LinkedIn data mining pipeline (14 Python scripts) parses a LinkedIn
data export (36 CSVs, ~2,500 connections, ~2,700 messages) into a scored,
tiered unified contact database. The final output is contacts_unified.csv
with 2,478 rows, each scored for warmth (relationship strength) and
strategic value (ICP fit), assigned to tiers A/B/C/D.
These contacts need to live in the existing People table in self-hosted
Baserow at baserow.solanasis.com — NOT a separate table. The People table
already has 179 manually-entered rows (53 with LinkedIn URLs). The migration
must dedup against those, add LinkedIn-specific fields, and create filtered
views for tier-based segmentation.
2. What’s Already Built
Pipeline Scripts (all in solanasis-scripts/linkedin-pipeline/)
| Script | Purpose | Status |
|---|---|---|
config.py | Paths, scoring weights, ICP keywords | Done |
privacy_filter.py | Spam detection, PII redaction | Done |
parse_connections.py | Connections.csv → connections_clean.csv (2,498 rows) | Done |
parse_messages.py | messages.csv → messages_parsed.json (2,718 msgs, 1,329 convos) | Done |
parse_invitations.py | Invitations.csv → invitations_clean.csv (1,549 rows) | Done |
parse_engagement.py | Comments/Shares/Reactions → engagement_activity.csv (787) | Done |
parse_profile_data.py | Profile/Positions/AdTargeting → profile_summary.json | Done |
build_unified_contacts.py | Join all Stage 1 outputs, score, tier → contacts_unified.csv | Done |
analyze_network.py | → network-intelligence-report.md | Done |
analyze_gtm_alignment.py | → gtm-alignment-report.md | Done |
analyze_voice.py | → voice-analysis-report.md | Done |
generate_voice_enrichment_queue.py | → voice_llm_queue.json (bug: 0 convos) | Needs fix |
prepare_erpnext_import.py | → erpnext_leads.csv, erpnext_contacts.csv | Done |
push_to_erpnext.py | Stub with Frappe REST patterns | Stub only |
migrate_to_baserow.py | → Baserow People table import | Ready to run |
Pipeline Output (in solanasis-scripts/linkedin-pipeline/data/output/)
contacts_unified.csv— 2,478 rows, 33 columns (the migration source)contacts_unified.json— same data as JSONconnections_clean.csv— 2,498 parsed connectionsmessages_parsed.json— 1,329 conversationsmessage_stats_per_contact.csv— 562 contacts with message statsinvitations_clean.csv— 1,549 invitationsengagement_activity.csv— 787 engagement recordsconversations/— per-contact conversation exports
Reports (in solanasis-docs/linkedin-analysis/)
network-intelligence-report.md(26K)gtm-alignment-report.md(51K)voice-analysis-report.md(102K)
Tests
tests/test_parse_connections.py(27 tests)tests/test_privacy_filter.py(57 tests)tests/test_scoring.py(91 tests)- Total: 175 tests, all passing
3. Baserow Environment
- URL:
https://baserow.solanasis.com(self-hosted) - Credentials: In
solanasis-scripts/.env(parent of linkedin-pipeline/)BASEROW_BASE_URLBASEROW_DB_TOKEN(for row CRUD via Token auth)BASEROW_EMAIL+BASEROW_PASSWORD(for schema ops via JWT auth)BASEROW_DATABASE_ID
Existing Tables
| Table | ID | Rows | Purpose |
|---|---|---|---|
| Tag | 264 | 50 | Tags (link_row target for People.Tags) |
| Location | 265 | 32 | Locations (link_row target) |
| Organization | 266 | 72 | Organizations (link_row target) |
| People | 267 | 179 | Target table for LinkedIn import |
| Foundation Prospects | 271 | 2,446 | Separate pipeline (foundation-pipeline/) |
| Meeting Notes | 272 | 38 | Meeting notes (link_row target) |
People Table Schema (ID: 267) — Existing 20 Fields
| Field | Type | Notes |
|---|---|---|
| Name | text | Primary field |
| Tags | link_row | → Tag (264) |
| Location | link_row | → Location (265) |
| Title | text | |
| Organization | link_row | → Organization (266) |
| Phone Number | text | |
| url | 53 of 179 rows have this populated | |
| url | ||
| url | ||
| url | ||
| Blog | url | |
| Website | url | |
| Notes | long_text | |
| Interest Form Message | long_text | |
| Response to Interest Form | long_text | |
| Connected From | text | Set to “LinkedIn Pipeline” for imports |
| Referral Source | text | |
| LinkedIn Initial Outreach | date | |
| Meeting Notes | link_row | → Meeting Notes (272) |
New Fields Added by Migration (14 fields)
| Field | Type | Source CSV Column |
|---|---|---|
| Company | text | company |
| Warmth Score | number | warmth_score |
| Strategic Score | number | strategic_value_score |
| Relationship Tier | single_select (A/B/C/D) | relationship_tier |
| Message Count | number | message_count_total |
| Last Contact Date | date | last_message_date |
| Days Since Contact | number | days_since_last_contact |
| Decay Flag | boolean | relationship_decay_flag |
| Invitation Direction | single_select (INCOMING/OUTGOING) | invitation_direction |
| Segment Tags | text | segment_tags |
| ICP Match Details | long_text | icp_match_details |
| Conversation Count | number | conversation_count |
| First Message Date | date | first_message_date |
| Connection Date | date | connected_on |
4. Tier Distribution
| Tier | Count | Criteria |
|---|---|---|
| A | 145 | Combined score >= 50 |
| B | 457 | Combined score 30-49 |
| C | 866 | Combined score 15-29 |
| D | 1,010 | Combined score < 15 |
| Total | 2,478 |
5. Migration Logic
Field Mapping (CSV → Baserow)
full_name -> Name (text, primary)
position -> Title (text)
company -> Company (text, NEW)
email -> Email (email)
linkedin_url -> LinkedIn (url)
connected_on -> Connection Date (date, NEW)
warmth_score -> Warmth Score (number, NEW)
strategic_value_score -> Strategic Score (number, NEW)
relationship_tier -> Relationship Tier (single_select, NEW)
message_count_total -> Message Count (number, NEW)
last_message_date -> Last Contact Date (date, NEW)
days_since_last_contact-> Days Since Contact (number, NEW)
relationship_decay_flag-> Decay Flag (boolean, NEW)
invitation_direction -> Invitation Direction (single_select, NEW)
segment_tags -> Segment Tags (text, NEW)
icp_match_details -> ICP Match Details (long_text, NEW)
conversation_count -> Conversation Count (number, NEW)
first_message_date -> First Message Date (date, NEW)
(static) -> Connected From = "LinkedIn Pipeline"
segment_tags + tier -> Tags (link_row, existing) — new rows only
Dedup Strategy
- Fetch all existing People rows from Baserow
- Build an index:
normalized_linkedin_url -> row_id - For each unified contact:
- If LinkedIn URL matches an existing row: UPDATE (scoring fields only)
- If no match: CREATE (full field mapping + Tags)
- Update does NOT overwrite: Name, Title, Email, Notes, Tags, Phone, or other manually-entered fields. Only updates the 14 new LinkedIn fields.
Tag Handling
Segment tags from the pipeline (e.g., boulder-local, startup) are mapped
to display names and created in the Tag table (264) if they don’t exist:
| Pipeline Key | Tag Display Name |
|---|---|
| foundation | Foundation |
| nonprofit | Nonprofit |
| wealth-mgmt | Wealth Management |
| fractional-exec | Fractional Executive |
| tech | Tech |
| startup | Startup |
| coliving-community | Coliving / Community |
| boulder-local | Boulder Local |
| connector | Connector |
| spiritual-wellness | Spiritual / Wellness |
Plus tier tags: Tier A, Tier B, Tier C, Tier D
Tags are set as link_row values (list of Tag row IDs) on new rows only.
Existing rows keep their manually-curated tags untouched.
Batch Import
- Baserow batch limit: 200 rows per API call
- Rate limit handling: auto-retry on HTTP 429 with Retry-After header
- 0.3s sleep between batches to avoid hammering the server
- Creates come before updates (Phase 4a, then 4b)
6. Views Created
| View Name | Filter | Sort |
|---|---|---|
| A-Tier Contacts | Relationship Tier = A | Warmth Score DESC |
| B-Tier Contacts | Relationship Tier = B | Warmth Score DESC |
| C-Tier Contacts | Relationship Tier = C | Warmth Score DESC |
| Reactivation Targets | Decay Flag = true | Warmth Score DESC |
| LinkedIn Connections | Connected From = “LinkedIn Pipeline” | Warmth Score DESC |
The default “Grid” view still shows all People rows (original + LinkedIn).
7. How to Run
Prerequisites
-
Python dependencies installed:
cd solanasis-scripts/linkedin-pipeline pip install -r requirements.txt(Needs: httpx, python-dotenv, chardet, plus pandas/nltk for analysis scripts)
-
contacts_unified.csv exists:
python build_unified_contacts.py --runShould produce
data/output/contacts_unified.csvwith 2,478 rows. -
Baserow credentials in parent .env:
# solanasis-scripts/.env BASEROW_BASE_URL=https://baserow.solanasis.com BASEROW_DB_TOKEN=your-token BASEROW_EMAIL=your-email BASEROW_PASSWORD=your-password BASEROW_DATABASE_ID=54
Run Commands
# Step 1: Dry run — review field additions, dedup stats, sample row
python migrate_to_baserow.py --plan
# Step 2: Execute full migration (all 2,478 contacts)
python migrate_to_baserow.py --run
# Alternative: Only import A+B tier (602 contacts)
python migrate_to_baserow.py --run --tier A,B
# Alternative: Only create/update views (no data import)
python migrate_to_baserow.py --run --views-onlyExpected Output
Phase 1: Schema migration...
+ Company (text)
+ Warmth Score (number)
... (14 new fields)
Phase 2: Tag setup...
Created 14 new tags: Foundation, Nonprofit, ...
Phase 3: Dedup analysis...
Existing People rows: 179
With LinkedIn URL (dedup candidates): 53
New rows: 2,451
Updates: 27
Phase 4a: Creating 2,451 new rows...
Batch 1/13: 200 rows (200 total)
Batch 2/13: 200 rows (400 total)
...
Created 2,451 rows
Phase 4b: Updating 27 existing rows...
Batch 1/1: 27 rows (27 total)
Updated 27 rows
Phase 5: Creating views...
Created view: A-Tier Contacts
Created view: B-Tier Contacts
...
Verification
Previous rows: 179
Created: 2,451
Updated: 27
Final count: 2,630
Expected: 2,630
Row count matches
8. Verification Checklist
After migration, verify in Baserow UI (baserow.solanasis.com):
- People table row count is ~2,630 (179 existing + ~2,451 new)
- New fields visible: Warmth Score, Strategic Score, Relationship Tier, etc.
- A-Tier Contacts view shows ~145 contacts sorted by warmth
- B-Tier Contacts view shows ~457 contacts
- Reactivation Targets view shows contacts with Decay Flag = true
- LinkedIn Connections view shows ~2,478 rows (all LinkedIn imports)
- Original 179 rows still have their manually-entered data intact
- Spot-check known contacts:
- Search “Tim Lockie” — should be A or B tier
- Search “Kevin Roerty” — should have message stats
- Check a manually-entered contact — scoring fields updated, name/notes unchanged
- Tags column shows tier + segment tags on new rows
- Connected From = “LinkedIn Pipeline” on all imported rows
9. Scoring Model Reference
Warmth Scoring (relationship strength)
| Signal | Points |
|---|---|
| Bidirectional messages (both parties sent) | +20 |
| 5+ messages exchanged | +15 |
| Recent message (last 30 days) | +10 |
| Recent message (last 90 days) | +8 |
| Dmitri sent invitation | +5 |
| They sent invitation | +5 |
| Dmitri follows them | +3 |
| Connected 1yr+ with zero messages | -10 |
| Only spam messages from contact | -5 |
Strategic Value Scoring (ICP fit)
| Signal | Points |
|---|---|
| Title matches ICP (CEO, CTO, ED, etc.) | +15 |
| Company matches target vertical | +10 |
| Fractional C-suite keywords | +8 |
| Colorado/Boulder location | +5 |
| Connector role (investor, advisor, board) | +5 |
| Recent engagement (last 60 days) | +3 |
Tier Thresholds (warmth + strategic combined)
| Tier | Score Range |
|---|---|
| A | 50+ |
| B | 30-49 |
| C | 15-29 |
| D | < 15 |
10. Known Issues & Edge Cases
-
Voice enrichment queue bug:
generate_voice_enrichment_queue.pyloads 0 conversations frommessages_parsed.json. Likely a JSON structure mismatch. Not blocking for Baserow migration. -
Organization link_row not populated: The Company field is stored as plain text, not linked to the Organization table (266). This avoids creating 2,000+ Organization entries. Curate manually for A/B tier.
-
Location link_row not populated: The pipeline doesn’t extract clean location data from LinkedIn. Use
boulder-localsegment tag as a proxy. -
Follows matching is name-based: The
follows_and_interests.csvfrom LinkedIn has no URLs, only names. Only 6 of 2,329 follows matched by exact name. This meansdmitri_follows_themis underreported. -
Engagement count: Spec predicted ~1,085 engagement records but pipeline produced 787. This is correct — spec counted raw lines, but Comments.csv and Shares.csv have multiline quoted fields.
-
Re-running the migration: The script is NOT idempotent for creates. Running it twice would duplicate the ~2,425 new rows. To re-run safely:
- Delete all rows where Connected From = “LinkedIn Pipeline”
- Then run again
- Updates (53 dedup matches) are safe to re-run
11. Future Work
-
Periodic refresh: Re-run the pipeline with a fresh LinkedIn export, then re-run the migration with
--runto update scores. Would need an idempotent mode (upsert by LinkedIn URL for all rows, not just existing). -
Organization linking: For A/B tier contacts, manually or script-link Company text to Organization table entries.
-
Baserow automations: Set up Baserow webhooks/automations to notify when a contact’s decay flag flips to true.
-
ERPNext sync: The pipeline also generates
erpnext_leads.csvanderpnext_contacts.csv. Thepush_to_erpnext.pystub has Frappe REST patterns but isn’t implemented.
12. File Locations Quick Reference
solanasis-scripts/
.env # Baserow credentials
linkedin-pipeline/
config.py # Paths, scoring, ICP keywords
migrate_to_baserow.py # THIS migration script
build_unified_contacts.py # Produces contacts_unified.csv
data/output/contacts_unified.csv # 2,478 rows, the migration source
data/output/contacts_unified.json # Same as JSON
requirements.txt # Python dependencies
solanasis-docs/
linkedin-analysis/
linkedin-pipeline-continuation-prompt.md # Pipeline build spec (526 lines)
baserow-migration-continuation-prompt.md # THIS document
network-intelligence-report.md
gtm-alignment-report.md
voice-analysis-report.md