Baserow People Table Migration — Continuation Prompt

Purpose: Self-contained instructions for migrating LinkedIn pipeline unified contacts into the existing Baserow People table. A fresh Claude session should be able to execute (or debug) this migration with no prior context.

1. Context

Dmitri’s LinkedIn data mining pipeline (14 Python scripts) parses a LinkedIn data export (36 CSVs, ~2,500 connections, ~2,700 messages) into a scored, tiered unified contact database. The final output is contacts_unified.csv with 2,478 rows, each scored for warmth (relationship strength) and strategic value (ICP fit), assigned to tiers A/B/C/D.

These contacts need to live in the existing People table in self-hosted Baserow at baserow.solanasis.com — NOT a separate table. The People table already has 179 manually-entered rows (53 with LinkedIn URLs). The migration must dedup against those, add LinkedIn-specific fields, and create filtered views for tier-based segmentation.

2. What’s Already Built

Pipeline Scripts (all in `solanasis-scripts/linkedin-pipeline/`)

Script	Purpose	Status
`config.py`	Paths, scoring weights, ICP keywords	Done
`privacy_filter.py`	Spam detection, PII redaction	Done
`parse_connections.py`	Connections.csv → connections_clean.csv (2,498 rows)	Done
`parse_messages.py`	messages.csv → messages_parsed.json (2,718 msgs, 1,329 convos)	Done
`parse_invitations.py`	Invitations.csv → invitations_clean.csv (1,549 rows)	Done
`parse_engagement.py`	Comments/Shares/Reactions → engagement_activity.csv (787)	Done
`parse_profile_data.py`	Profile/Positions/AdTargeting → profile_summary.json	Done
`build_unified_contacts.py`	Join all Stage 1 outputs, score, tier → contacts_unified.csv	Done
`analyze_network.py`	→ network-intelligence-report.md	Done
`analyze_gtm_alignment.py`	→ gtm-alignment-report.md	Done
`analyze_voice.py`	→ voice-analysis-report.md	Done
`generate_voice_enrichment_queue.py`	→ voice_llm_queue.json (bug: 0 convos)	Needs fix
`prepare_erpnext_import.py`	→ erpnext_leads.csv, erpnext_contacts.csv	Done
`push_to_erpnext.py`	Stub with Frappe REST patterns	Stub only
`migrate_to_baserow.py`	→ Baserow People table import	Ready to run

Pipeline Output (in `solanasis-scripts/linkedin-pipeline/data/output/`)

contacts_unified.csv — 2,478 rows, 33 columns (the migration source)
contacts_unified.json — same data as JSON
connections_clean.csv — 2,498 parsed connections
messages_parsed.json — 1,329 conversations
message_stats_per_contact.csv — 562 contacts with message stats
invitations_clean.csv — 1,549 invitations
engagement_activity.csv — 787 engagement records
conversations/ — per-contact conversation exports

Reports (in `solanasis-docs/linkedin-analysis/`)

network-intelligence-report.md (26K)
gtm-alignment-report.md (51K)
voice-analysis-report.md (102K)

Tests

tests/test_parse_connections.py (27 tests)
tests/test_privacy_filter.py (57 tests)
tests/test_scoring.py (91 tests)
Total: 175 tests, all passing

3. Baserow Environment

URL: https://baserow.solanasis.com (self-hosted)
Credentials: In solanasis-scripts/.env (parent of linkedin-pipeline/)
- BASEROW_BASE_URL
- BASEROW_DB_TOKEN (for row CRUD via Token auth)
- BASEROW_EMAIL + BASEROW_PASSWORD (for schema ops via JWT auth)
- BASEROW_DATABASE_ID

Existing Tables

Table	ID	Rows	Purpose
Tag	264	50	Tags (link_row target for People.Tags)
Location	265	32	Locations (link_row target)
Organization	266	72	Organizations (link_row target)
People	267	179	Target table for LinkedIn import
Foundation Prospects	271	2,446	Separate pipeline (foundation-pipeline/)
Meeting Notes	272	38	Meeting notes (link_row target)

People Table Schema (ID: 267) — Existing 20 Fields

Field	Type	Notes
Name	text	Primary field
Tags	link_row	→ Tag (264)
Location	link_row	→ Location (265)
Title	text
Organization	link_row	→ Organization (266)
Phone Number	text
Email	email
LinkedIn	url	53 of 179 rows have this populated
Instagram	url
Twitter	url
Facebook	url
Blog	url
Website	url
Notes	long_text
Interest Form Message	long_text
Response to Interest Form	long_text
Connected From	text	Set to “LinkedIn Pipeline” for imports
Referral Source	text
LinkedIn Initial Outreach	date
Meeting Notes	link_row	→ Meeting Notes (272)

New Fields Added by Migration (14 fields)

Field	Type	Source CSV Column
Company	text	company
Warmth Score	number	warmth_score
Strategic Score	number	strategic_value_score
Relationship Tier	single_select (A/B/C/D)	relationship_tier
Message Count	number	message_count_total
Last Contact Date	date	last_message_date
Days Since Contact	number	days_since_last_contact
Decay Flag	boolean	relationship_decay_flag
Invitation Direction	single_select (INCOMING/OUTGOING)	invitation_direction
Segment Tags	text	segment_tags
ICP Match Details	long_text	icp_match_details
Conversation Count	number	conversation_count
First Message Date	date	first_message_date
Connection Date	date	connected_on

4. Tier Distribution

Tier	Count	Criteria
A	145	Combined score >= 50
B	457	Combined score 30-49
C	866	Combined score 15-29
D	1,010	Combined score < 15
Total	2,478

5. Migration Logic

Field Mapping (CSV → Baserow)

full_name              -> Name (text, primary)
position               -> Title (text)
company                -> Company (text, NEW)
email                  -> Email (email)
linkedin_url           -> LinkedIn (url)
connected_on           -> Connection Date (date, NEW)
warmth_score           -> Warmth Score (number, NEW)
strategic_value_score  -> Strategic Score (number, NEW)
relationship_tier      -> Relationship Tier (single_select, NEW)
message_count_total    -> Message Count (number, NEW)
last_message_date      -> Last Contact Date (date, NEW)
days_since_last_contact-> Days Since Contact (number, NEW)
relationship_decay_flag-> Decay Flag (boolean, NEW)
invitation_direction   -> Invitation Direction (single_select, NEW)
segment_tags           -> Segment Tags (text, NEW)
icp_match_details      -> ICP Match Details (long_text, NEW)
conversation_count     -> Conversation Count (number, NEW)
first_message_date     -> First Message Date (date, NEW)
(static)               -> Connected From = "LinkedIn Pipeline"
segment_tags + tier    -> Tags (link_row, existing) — new rows only

Dedup Strategy

Fetch all existing People rows from Baserow
Build an index: normalized_linkedin_url -> row_id
For each unified contact:
- If LinkedIn URL matches an existing row: UPDATE (scoring fields only)
- If no match: CREATE (full field mapping + Tags)
Update does NOT overwrite: Name, Title, Email, Notes, Tags, Phone, or other manually-entered fields. Only updates the 14 new LinkedIn fields.

Tag Handling

Segment tags from the pipeline (e.g., boulder-local, startup) are mapped to display names and created in the Tag table (264) if they don’t exist:

Pipeline Key	Tag Display Name
foundation	Foundation
nonprofit	Nonprofit
wealth-mgmt	Wealth Management
fractional-exec	Fractional Executive
tech	Tech
startup	Startup
coliving-community	Coliving / Community
boulder-local	Boulder Local
connector	Connector
spiritual-wellness	Spiritual / Wellness

Plus tier tags: Tier A, Tier B, Tier C, Tier D

Tags are set as link_row values (list of Tag row IDs) on new rows only. Existing rows keep their manually-curated tags untouched.

Batch Import

Baserow batch limit: 200 rows per API call
Rate limit handling: auto-retry on HTTP 429 with Retry-After header
0.3s sleep between batches to avoid hammering the server
Creates come before updates (Phase 4a, then 4b)

6. Views Created

View Name	Filter	Sort
A-Tier Contacts	Relationship Tier = A	Warmth Score DESC
B-Tier Contacts	Relationship Tier = B	Warmth Score DESC
C-Tier Contacts	Relationship Tier = C	Warmth Score DESC
Reactivation Targets	Decay Flag = true	Warmth Score DESC
LinkedIn Connections	Connected From = “LinkedIn Pipeline”	Warmth Score DESC

The default “Grid” view still shows all People rows (original + LinkedIn).

7. How to Run

Prerequisites

Python dependencies installed:
```
cd solanasis-scripts/linkedin-pipeline
pip install -r requirements.txt
```
(Needs: httpx, python-dotenv, chardet, plus pandas/nltk for analysis scripts)
contacts_unified.csv exists:
```
python build_unified_contacts.py --run
```
Should produce data/output/contacts_unified.csv with 2,478 rows.

Baserow credentials in parent .env:

# solanasis-scripts/.env
BASEROW_BASE_URL=https://baserow.solanasis.com
BASEROW_DB_TOKEN=your-token
BASEROW_EMAIL=your-email
BASEROW_PASSWORD=your-password
BASEROW_DATABASE_ID=54

Run Commands

# Step 1: Dry run — review field additions, dedup stats, sample row
python migrate_to_baserow.py --plan
 
# Step 2: Execute full migration (all 2,478 contacts)
python migrate_to_baserow.py --run
 
# Alternative: Only import A+B tier (602 contacts)
python migrate_to_baserow.py --run --tier A,B
 
# Alternative: Only create/update views (no data import)
python migrate_to_baserow.py --run --views-only

Expected Output

Phase 1: Schema migration...
  + Company (text)
  + Warmth Score (number)
  ... (14 new fields)

Phase 2: Tag setup...
  Created 14 new tags: Foundation, Nonprofit, ...

Phase 3: Dedup analysis...
  Existing People rows: 179
  With LinkedIn URL (dedup candidates): 53
  New rows:     2,451
  Updates:      27

Phase 4a: Creating 2,451 new rows...
  Batch 1/13: 200 rows (200 total)
  Batch 2/13: 200 rows (400 total)
  ...
  Created 2,451 rows

Phase 4b: Updating 27 existing rows...
  Batch 1/1: 27 rows (27 total)
  Updated 27 rows

Phase 5: Creating views...
  Created view: A-Tier Contacts
  Created view: B-Tier Contacts
  ...

Verification
  Previous rows:   179
  Created:         2,451
  Updated:         27
  Final count:     2,630
  Expected:        2,630
  Row count matches

8. Verification Checklist

After migration, verify in Baserow UI (baserow.solanasis.com):

9. Scoring Model Reference

Warmth Scoring (relationship strength)

Signal	Points
Bidirectional messages (both parties sent)	+20
5+ messages exchanged	+15
Recent message (last 30 days)	+10
Recent message (last 90 days)	+8
Dmitri sent invitation	+5
They sent invitation	+5
Dmitri follows them	+3
Connected 1yr+ with zero messages	-10
Only spam messages from contact	-5

Strategic Value Scoring (ICP fit)

Signal	Points
Title matches ICP (CEO, CTO, ED, etc.)	+15
Company matches target vertical	+10
Fractional C-suite keywords	+8
Colorado/Boulder location	+5
Connector role (investor, advisor, board)	+5
Recent engagement (last 60 days)	+3

Tier Thresholds (warmth + strategic combined)

Tier	Score Range
A	50+
B	30-49
C	15-29
D	< 15

10. Known Issues & Edge Cases

Voice enrichment queue bug: generate_voice_enrichment_queue.py loads 0 conversations from messages_parsed.json. Likely a JSON structure mismatch. Not blocking for Baserow migration.
Organization link_row not populated: The Company field is stored as plain text, not linked to the Organization table (266). This avoids creating 2,000+ Organization entries. Curate manually for A/B tier.
Location link_row not populated: The pipeline doesn’t extract clean location data from LinkedIn. Use boulder-local segment tag as a proxy.
Follows matching is name-based: The follows_and_interests.csv from LinkedIn has no URLs, only names. Only 6 of 2,329 follows matched by exact name. This means dmitri_follows_them is underreported.
Engagement count: Spec predicted ~1,085 engagement records but pipeline produced 787. This is correct — spec counted raw lines, but Comments.csv and Shares.csv have multiline quoted fields.
Re-running the migration: The script is NOT idempotent for creates. Running it twice would duplicate the ~2,425 new rows. To re-run safely:
- Delete all rows where Connected From = “LinkedIn Pipeline”
- Then run again
- Updates (53 dedup matches) are safe to re-run

11. Future Work

Periodic refresh: Re-run the pipeline with a fresh LinkedIn export, then re-run the migration with --run to update scores. Would need an idempotent mode (upsert by LinkedIn URL for all rows, not just existing).
Organization linking: For A/B tier contacts, manually or script-link Company text to Organization table entries.
Baserow automations: Set up Baserow webhooks/automations to notify when a contact’s decay flag flips to true.
ERPNext sync: The pipeline also generates erpnext_leads.csv and erpnext_contacts.csv. The push_to_erpnext.py stub has Frappe REST patterns but isn’t implemented.

12. File Locations Quick Reference

solanasis-scripts/
  .env                                    # Baserow credentials
  linkedin-pipeline/
    config.py                             # Paths, scoring, ICP keywords
    migrate_to_baserow.py                 # THIS migration script
    build_unified_contacts.py             # Produces contacts_unified.csv
    data/output/contacts_unified.csv      # 2,478 rows, the migration source
    data/output/contacts_unified.json     # Same as JSON
    requirements.txt                      # Python dependencies

solanasis-docs/
  linkedin-analysis/
    linkedin-pipeline-continuation-prompt.md   # Pipeline build spec (526 lines)
    baserow-migration-continuation-prompt.md   # THIS document
    network-intelligence-report.md
    gtm-alignment-report.md
    voice-analysis-report.md

Solanasis Docs

Explorer

baserow-migration-continuation-prompt

Baserow People Table Migration — Continuation Prompt

1. Context

2. What’s Already Built

Pipeline Scripts (all in `solanasis-scripts/linkedin-pipeline/`)

Pipeline Output (in `solanasis-scripts/linkedin-pipeline/data/output/`)

Reports (in `solanasis-docs/linkedin-analysis/`)

Tests

3. Baserow Environment

Existing Tables

People Table Schema (ID: 267) — Existing 20 Fields

New Fields Added by Migration (14 fields)

4. Tier Distribution

5. Migration Logic

Field Mapping (CSV → Baserow)

Dedup Strategy

Tag Handling

Batch Import

6. Views Created

7. How to Run

Prerequisites

Run Commands

Expected Output

8. Verification Checklist

9. Scoring Model Reference

Warmth Scoring (relationship strength)

Strategic Value Scoring (ICP fit)

Tier Thresholds (warmth + strategic combined)

10. Known Issues & Edge Cases

11. Future Work

12. File Locations Quick Reference

Graph View

Table of Contents

Solanasis Docs

Explorer

baserow-migration-continuation-prompt

Baserow People Table Migration — Continuation Prompt

1. Context

2. What’s Already Built

Pipeline Scripts (all in solanasis-scripts/linkedin-pipeline/)

Pipeline Output (in solanasis-scripts/linkedin-pipeline/data/output/)

Reports (in solanasis-docs/linkedin-analysis/)

Tests

3. Baserow Environment

Existing Tables

People Table Schema (ID: 267) — Existing 20 Fields

New Fields Added by Migration (14 fields)

4. Tier Distribution

5. Migration Logic

Field Mapping (CSV → Baserow)

Dedup Strategy

Tag Handling

Batch Import

6. Views Created

7. How to Run

Prerequisites

Run Commands

Expected Output

8. Verification Checklist

9. Scoring Model Reference

Warmth Scoring (relationship strength)

Strategic Value Scoring (ICP fit)

Tier Thresholds (warmth + strategic combined)

10. Known Issues & Edge Cases

11. Future Work

12. File Locations Quick Reference

Graph View

Table of Contents

Pipeline Scripts (all in `solanasis-scripts/linkedin-pipeline/`)

Pipeline Output (in `solanasis-scripts/linkedin-pipeline/data/output/`)

Reports (in `solanasis-docs/linkedin-analysis/`)