Baserow People Table Migration — Continuation Prompt

Purpose: Self-contained instructions for migrating LinkedIn pipeline unified contacts into the existing Baserow People table. A fresh Claude session should be able to execute (or debug) this migration with no prior context.


1. Context

Dmitri’s LinkedIn data mining pipeline (14 Python scripts) parses a LinkedIn data export (36 CSVs, ~2,500 connections, ~2,700 messages) into a scored, tiered unified contact database. The final output is contacts_unified.csv with 2,478 rows, each scored for warmth (relationship strength) and strategic value (ICP fit), assigned to tiers A/B/C/D.

These contacts need to live in the existing People table in self-hosted Baserow at baserow.solanasis.com — NOT a separate table. The People table already has 179 manually-entered rows (53 with LinkedIn URLs). The migration must dedup against those, add LinkedIn-specific fields, and create filtered views for tier-based segmentation.


2. What’s Already Built

Pipeline Scripts (all in solanasis-scripts/linkedin-pipeline/)

ScriptPurposeStatus
config.pyPaths, scoring weights, ICP keywordsDone
privacy_filter.pySpam detection, PII redactionDone
parse_connections.pyConnections.csv connections_clean.csv (2,498 rows)Done
parse_messages.pymessages.csv messages_parsed.json (2,718 msgs, 1,329 convos)Done
parse_invitations.pyInvitations.csv invitations_clean.csv (1,549 rows)Done
parse_engagement.pyComments/Shares/Reactions engagement_activity.csv (787)Done
parse_profile_data.pyProfile/Positions/AdTargeting profile_summary.jsonDone
build_unified_contacts.pyJoin all Stage 1 outputs, score, tier contacts_unified.csvDone
analyze_network.py network-intelligence-report.mdDone
analyze_gtm_alignment.py gtm-alignment-report.mdDone
analyze_voice.py voice-analysis-report.mdDone
generate_voice_enrichment_queue.py voice_llm_queue.json (bug: 0 convos)Needs fix
prepare_erpnext_import.py erpnext_leads.csv, erpnext_contacts.csvDone
push_to_erpnext.pyStub with Frappe REST patternsStub only
migrate_to_baserow.py Baserow People table importReady to run

Pipeline Output (in solanasis-scripts/linkedin-pipeline/data/output/)

  • contacts_unified.csv — 2,478 rows, 33 columns (the migration source)
  • contacts_unified.json — same data as JSON
  • connections_clean.csv — 2,498 parsed connections
  • messages_parsed.json — 1,329 conversations
  • message_stats_per_contact.csv — 562 contacts with message stats
  • invitations_clean.csv — 1,549 invitations
  • engagement_activity.csv — 787 engagement records
  • conversations/ — per-contact conversation exports

Reports (in solanasis-docs/linkedin-analysis/)

  • network-intelligence-report.md (26K)
  • gtm-alignment-report.md (51K)
  • voice-analysis-report.md (102K)

Tests

  • tests/test_parse_connections.py (27 tests)
  • tests/test_privacy_filter.py (57 tests)
  • tests/test_scoring.py (91 tests)
  • Total: 175 tests, all passing

3. Baserow Environment

  • URL: https://baserow.solanasis.com (self-hosted)
  • Credentials: In solanasis-scripts/.env (parent of linkedin-pipeline/)
    • BASEROW_BASE_URL
    • BASEROW_DB_TOKEN (for row CRUD via Token auth)
    • BASEROW_EMAIL + BASEROW_PASSWORD (for schema ops via JWT auth)
    • BASEROW_DATABASE_ID

Existing Tables

TableIDRowsPurpose
Tag26450Tags (link_row target for People.Tags)
Location26532Locations (link_row target)
Organization26672Organizations (link_row target)
People267179Target table for LinkedIn import
Foundation Prospects2712,446Separate pipeline (foundation-pipeline/)
Meeting Notes27238Meeting notes (link_row target)

People Table Schema (ID: 267) — Existing 20 Fields

FieldTypeNotes
NametextPrimary field
Tagslink_row Tag (264)
Locationlink_row Location (265)
Titletext
Organizationlink_row Organization (266)
Phone Numbertext
Emailemail
LinkedInurl53 of 179 rows have this populated
Instagramurl
Twitterurl
Facebookurl
Blogurl
Websiteurl
Noteslong_text
Interest Form Messagelong_text
Response to Interest Formlong_text
Connected FromtextSet to “LinkedIn Pipeline” for imports
Referral Sourcetext
LinkedIn Initial Outreachdate
Meeting Noteslink_row Meeting Notes (272)

New Fields Added by Migration (14 fields)

FieldTypeSource CSV Column
Companytextcompany
Warmth Scorenumberwarmth_score
Strategic Scorenumberstrategic_value_score
Relationship Tiersingle_select (A/B/C/D)relationship_tier
Message Countnumbermessage_count_total
Last Contact Datedatelast_message_date
Days Since Contactnumberdays_since_last_contact
Decay Flagbooleanrelationship_decay_flag
Invitation Directionsingle_select (INCOMING/OUTGOING)invitation_direction
Segment Tagstextsegment_tags
ICP Match Detailslong_texticp_match_details
Conversation Countnumberconversation_count
First Message Datedatefirst_message_date
Connection Datedateconnected_on

4. Tier Distribution

TierCountCriteria
A145Combined score >= 50
B457Combined score 30-49
C866Combined score 15-29
D1,010Combined score < 15
Total2,478

5. Migration Logic

Field Mapping (CSV Baserow)

full_name              -> Name (text, primary)
position               -> Title (text)
company                -> Company (text, NEW)
email                  -> Email (email)
linkedin_url           -> LinkedIn (url)
connected_on           -> Connection Date (date, NEW)
warmth_score           -> Warmth Score (number, NEW)
strategic_value_score  -> Strategic Score (number, NEW)
relationship_tier      -> Relationship Tier (single_select, NEW)
message_count_total    -> Message Count (number, NEW)
last_message_date      -> Last Contact Date (date, NEW)
days_since_last_contact-> Days Since Contact (number, NEW)
relationship_decay_flag-> Decay Flag (boolean, NEW)
invitation_direction   -> Invitation Direction (single_select, NEW)
segment_tags           -> Segment Tags (text, NEW)
icp_match_details      -> ICP Match Details (long_text, NEW)
conversation_count     -> Conversation Count (number, NEW)
first_message_date     -> First Message Date (date, NEW)
(static)               -> Connected From = "LinkedIn Pipeline"
segment_tags + tier    -> Tags (link_row, existing) — new rows only

Dedup Strategy

  1. Fetch all existing People rows from Baserow
  2. Build an index: normalized_linkedin_url -> row_id
  3. For each unified contact:
    • If LinkedIn URL matches an existing row: UPDATE (scoring fields only)
    • If no match: CREATE (full field mapping + Tags)
  4. Update does NOT overwrite: Name, Title, Email, Notes, Tags, Phone, or other manually-entered fields. Only updates the 14 new LinkedIn fields.

Tag Handling

Segment tags from the pipeline (e.g., boulder-local, startup) are mapped to display names and created in the Tag table (264) if they don’t exist:

Pipeline KeyTag Display Name
foundationFoundation
nonprofitNonprofit
wealth-mgmtWealth Management
fractional-execFractional Executive
techTech
startupStartup
coliving-communityColiving / Community
boulder-localBoulder Local
connectorConnector
spiritual-wellnessSpiritual / Wellness

Plus tier tags: Tier A, Tier B, Tier C, Tier D

Tags are set as link_row values (list of Tag row IDs) on new rows only. Existing rows keep their manually-curated tags untouched.

Batch Import

  • Baserow batch limit: 200 rows per API call
  • Rate limit handling: auto-retry on HTTP 429 with Retry-After header
  • 0.3s sleep between batches to avoid hammering the server
  • Creates come before updates (Phase 4a, then 4b)

6. Views Created

View NameFilterSort
A-Tier ContactsRelationship Tier = AWarmth Score DESC
B-Tier ContactsRelationship Tier = BWarmth Score DESC
C-Tier ContactsRelationship Tier = CWarmth Score DESC
Reactivation TargetsDecay Flag = trueWarmth Score DESC
LinkedIn ConnectionsConnected From = “LinkedIn Pipeline”Warmth Score DESC

The default “Grid” view still shows all People rows (original + LinkedIn).


7. How to Run

Prerequisites

  1. Python dependencies installed:

    cd solanasis-scripts/linkedin-pipeline
    pip install -r requirements.txt
    

    (Needs: httpx, python-dotenv, chardet, plus pandas/nltk for analysis scripts)

  2. contacts_unified.csv exists:

    python build_unified_contacts.py --run
    

    Should produce data/output/contacts_unified.csv with 2,478 rows.

  3. Baserow credentials in parent .env:

    # solanasis-scripts/.env
    BASEROW_BASE_URL=https://baserow.solanasis.com
    BASEROW_DB_TOKEN=your-token
    BASEROW_EMAIL=your-email
    BASEROW_PASSWORD=your-password
    BASEROW_DATABASE_ID=54
    

Run Commands

# Step 1: Dry run — review field additions, dedup stats, sample row
python migrate_to_baserow.py --plan
 
# Step 2: Execute full migration (all 2,478 contacts)
python migrate_to_baserow.py --run
 
# Alternative: Only import A+B tier (602 contacts)
python migrate_to_baserow.py --run --tier A,B
 
# Alternative: Only create/update views (no data import)
python migrate_to_baserow.py --run --views-only

Expected Output

Phase 1: Schema migration...
  + Company (text)
  + Warmth Score (number)
  ... (14 new fields)

Phase 2: Tag setup...
  Created 14 new tags: Foundation, Nonprofit, ...

Phase 3: Dedup analysis...
  Existing People rows: 179
  With LinkedIn URL (dedup candidates): 53
  New rows:     2,451
  Updates:      27

Phase 4a: Creating 2,451 new rows...
  Batch 1/13: 200 rows (200 total)
  Batch 2/13: 200 rows (400 total)
  ...
  Created 2,451 rows

Phase 4b: Updating 27 existing rows...
  Batch 1/1: 27 rows (27 total)
  Updated 27 rows

Phase 5: Creating views...
  Created view: A-Tier Contacts
  Created view: B-Tier Contacts
  ...

Verification
  Previous rows:   179
  Created:         2,451
  Updated:         27
  Final count:     2,630
  Expected:        2,630
  Row count matches

8. Verification Checklist

After migration, verify in Baserow UI (baserow.solanasis.com):

  • People table row count is ~2,630 (179 existing + ~2,451 new)
  • New fields visible: Warmth Score, Strategic Score, Relationship Tier, etc.
  • A-Tier Contacts view shows ~145 contacts sorted by warmth
  • B-Tier Contacts view shows ~457 contacts
  • Reactivation Targets view shows contacts with Decay Flag = true
  • LinkedIn Connections view shows ~2,478 rows (all LinkedIn imports)
  • Original 179 rows still have their manually-entered data intact
  • Spot-check known contacts:
    • Search “Tim Lockie” — should be A or B tier
    • Search “Kevin Roerty” — should have message stats
    • Check a manually-entered contact — scoring fields updated, name/notes unchanged
  • Tags column shows tier + segment tags on new rows
  • Connected From = “LinkedIn Pipeline” on all imported rows

9. Scoring Model Reference

Warmth Scoring (relationship strength)

SignalPoints
Bidirectional messages (both parties sent)+20
5+ messages exchanged+15
Recent message (last 30 days)+10
Recent message (last 90 days)+8
Dmitri sent invitation+5
They sent invitation+5
Dmitri follows them+3
Connected 1yr+ with zero messages-10
Only spam messages from contact-5

Strategic Value Scoring (ICP fit)

SignalPoints
Title matches ICP (CEO, CTO, ED, etc.)+15
Company matches target vertical+10
Fractional C-suite keywords+8
Colorado/Boulder location+5
Connector role (investor, advisor, board)+5
Recent engagement (last 60 days)+3

Tier Thresholds (warmth + strategic combined)

TierScore Range
A50+
B30-49
C15-29
D< 15

10. Known Issues & Edge Cases

  1. Voice enrichment queue bug: generate_voice_enrichment_queue.py loads 0 conversations from messages_parsed.json. Likely a JSON structure mismatch. Not blocking for Baserow migration.

  2. Organization link_row not populated: The Company field is stored as plain text, not linked to the Organization table (266). This avoids creating 2,000+ Organization entries. Curate manually for A/B tier.

  3. Location link_row not populated: The pipeline doesn’t extract clean location data from LinkedIn. Use boulder-local segment tag as a proxy.

  4. Follows matching is name-based: The follows_and_interests.csv from LinkedIn has no URLs, only names. Only 6 of 2,329 follows matched by exact name. This means dmitri_follows_them is underreported.

  5. Engagement count: Spec predicted ~1,085 engagement records but pipeline produced 787. This is correct — spec counted raw lines, but Comments.csv and Shares.csv have multiline quoted fields.

  6. Re-running the migration: The script is NOT idempotent for creates. Running it twice would duplicate the ~2,425 new rows. To re-run safely:

    • Delete all rows where Connected From = “LinkedIn Pipeline”
    • Then run again
    • Updates (53 dedup matches) are safe to re-run

11. Future Work

  • Periodic refresh: Re-run the pipeline with a fresh LinkedIn export, then re-run the migration with --run to update scores. Would need an idempotent mode (upsert by LinkedIn URL for all rows, not just existing).

  • Organization linking: For A/B tier contacts, manually or script-link Company text to Organization table entries.

  • Baserow automations: Set up Baserow webhooks/automations to notify when a contact’s decay flag flips to true.

  • ERPNext sync: The pipeline also generates erpnext_leads.csv and erpnext_contacts.csv. The push_to_erpnext.py stub has Frappe REST patterns but isn’t implemented.


12. File Locations Quick Reference

solanasis-scripts/
  .env                                    # Baserow credentials
  linkedin-pipeline/
    config.py                             # Paths, scoring, ICP keywords
    migrate_to_baserow.py                 # THIS migration script
    build_unified_contacts.py             # Produces contacts_unified.csv
    data/output/contacts_unified.csv      # 2,478 rows, the migration source
    data/output/contacts_unified.json     # Same as JSON
    requirements.txt                      # Python dependencies

solanasis-docs/
  linkedin-analysis/
    linkedin-pipeline-continuation-prompt.md   # Pipeline build spec (526 lines)
    baserow-migration-continuation-prompt.md   # THIS document
    network-intelligence-report.md
    gtm-alignment-report.md
    voice-analysis-report.md