Estate Planning Attorney Pipeline — Continuation Prompt

Purpose: Paste this entire document as the first message in a new Claude Code session (on the server or any machine with the same repo layout) to build and run the full estate planning attorney cold email prospecting pipeline.

Last updated: 2026-03-22 Session workspace: The _solanasis container folder (same layout as C:\Users\zasya\Documents\_solanasis) Estimated build time: 3-4 hours for full pipeline + first discovery run

What to Build

Build a complete estate planning attorney cold email prospecting pipeline at solanasis-scripts/attorney-pipeline/. This pipeline discovers estate planning attorneys nationally, enriches them with emails and signals, scores and tiers them, and generates daily outreach briefs with pre-written emails.

This is NOT a greenfield build. Clone heavily from the existing fcto-pipeline/ (same data shape: people-at-firms). Reuse the foundation pipeline’s email enrichment patterns. Adapt the Cold Outreach Kit v1 email templates.

Strategic Context (Why Estate Planning Attorneys)

Estate planning attorneys are the #1 smartcut into the wealth management ecosystem:

They sit at the center of every wealth team (RIA + CPA + attorney)
One attorney client = $138 K -$ 170K Year 1 potential via the flywheel (attorney → CPA referral → RIA referral)
March-May is the buying window (malpractice insurance renewals, bar conferences, summer associate urgency)
They handle the MOST sensitive data (SSNs, financial accounts, trust documents, medical info, beneficiary designations)
66% of law firms lack incident response plans (ABA TechReport 2023)
ABA Rules 1.6(c) and 1.1 require “reasonable efforts” to protect client data (nationwide, enforceable)
Malpractice carriers are now asking cybersecurity questions on renewal applications

Positioning: “Data Protection for Client Trust” — NOT “cybersecurity assessment.” Frame everything as compliance with ABA ethical obligations, not IT security.

Key playbooks to reference:

solanasis-docs/playbooks/Estate_Attorney_Cold_Outreach_Kit_v1.md — email templates, language shifts, phone scripts, objection handling
solanasis-docs/playbooks/Estate_Planning_Attorney_Smartcut_Playbook.md — full 9-part strategy

Data Acquisition Strategy: Multi-Layer Free Scraping

No paid tools required. Stack multiple free data sources, each handled by a Python script Claude Code builds and runs.

Layer 1: DuckDuckGo Search (PRIMARY, runs locally, free)

Clone from fcto-pipeline/discover_prospects.py. Already proven pattern. Law firm websites are well-indexed; this works even better for attorneys than it did for fCTOs.

Attorney-specific queries (run per city for each target state):

"estate planning attorney" {city} {state}
"trusts and estates" attorney {city} law firm
"elder law attorney" {city} {state}
"estate planning" "managing partner" {city}
"wealth transfer attorney" {city}
site:linkedin.com "estate planning attorney" {city}
site:justia.com estate planning attorney {state}
site:avvo.com estate planning attorney {city}

Expected yield: 50-100 unique firm/attorney records per state.

Layer 2: gosom/google-maps-scraper (BULK, Docker, free, no API key)

This machine should have Docker. The gosom/google-maps-scraper (3.5K+ GitHub stars, MIT license, actively maintained) is the single highest-yield free tool.

Input: text file with one search query per line (e.g., “estate planning attorney in Denver CO”)
Output: CSV with 33+ fields including name, address, phone, website, rating, review count, email
Built-in email extraction from business websites
No API key needed, no rate limits (self-hosted)

docker pull gosom/google-maps-scraper
# Run with query file:
docker run -v $(pwd)/data:/data gosom/google-maps-scraper -input /data/queries.txt -results /data/output.csv

Build server/discover_gmaps_scraper.py to:

Generate query files from the city list (one query per city: “estate planning attorney in {city} {state}”)
Run the Docker container
Import the CSV output into the pipeline

Layer 3: Website Scraping for Emails (free, unlimited)

For every firm with a website URL from Layers 1-2, scrape their contact/about/team pages.

Reuse these exact patterns:

foundation-pipeline/enrich_emails.py — Jina Reader (r.jina.ai) for clean text, regex email extraction, email quality classification (named_person / role_based / generic), SKIP_PREFIXES, PREFERRED_PREFIXES
fcto-pipeline/enrich_websites.py — trafilatura for text extraction, httpx async scraping, website cache system, Jina Reader as fallback for JS-heavy sites, keyword signal detection with regex

Scrape these pages on each firm website: /contact, /about, /attorneys, /our-team, /people, /staff

Layer 4: Email Pattern Guessing + DNS Verification (free, unlimited)

When website scraping finds attorney names but no email, generate candidates using law firm email patterns.

Most common law firm email formats (in order of frequency):

{first_initial}{last}@domain — most common for law firms (e.g., jsmith@smithlaw.com)
{first}.{last}@domain
{first}{last}@domain
{first}@domain

Verification:

DNS MX record lookup (does the domain accept email?) — dnspython package
Optional: SMTP RCPT TO check (does the specific address exist?)

Build guess_emails.py — takes name + domain, generates candidates, verifies via DNS MX, outputs best guess with confidence score.

Layer 5: theHarvester (domain OSINT, free)

Open-source tool (15.9K GitHub stars). Queries 40+ sources to find emails for a domain.

theHarvester -d smithlawfirm.com -b all

Use on high-value A-tier firms where Layers 3-4 didn’t find an email. Install: pip install theHarvester (requires Python 3.12+).

Layer 6: Outscraper Google Maps API (500 free records/month)

Managed API, no subscription. 500 free business records/month, then $3/1K. pip install outscraper Use as supplemental data when DDG + gmaps scraper don’t cover a city well enough.

Layer 7: Apify Justia Scraper ($5/mo free credits)

Pre-built scraper for Justia’s lawyer directory. Estate planning practice area filter. Returns name, firm, phone, email, bio, practice areas, social links.

$5/month free Apify platform credits = ~1,250 attorney listings/month.

Direct scraping of Justia returns 403 (Cloudflare). Apify handles this.

Pipeline Architecture

solanasis-scripts/attorney-pipeline/
  ## Core scripts (LOCAL)
  config.py                      # Scoring, keywords, paths, tiers, city lists
  templates.py                   # 5 email templates + 2 follow-ups
  discover_ddg.py                # DuckDuckGo discovery
  import_prospects.py            # Import from all sources (DDG, gmaps, Justia CSVs)
  enrich_websites.py             # Scrape firm websites for emails + signals
  guess_emails.py                # Email pattern generation + DNS verification
  score_prospects.py             # Attorney-specific scoring model
  generate_outreach_csv.py       # CRM-ready CSV with mailto links
  generate_daily_outreach.py     # Daily markdown brief (3-5 prospects/day)
  migrate_to_baserow.py          # Optional CRM migration
  requirements.txt

  ## Server scripts (DOCKER)
  server/
    discover_gmaps_scraper.py    # Generates query files + runs Docker scraper
    SERVER_SETUP.md              # Deployment guide
    queries/                     # Generated query files (one per state)

  data/
    raw/                         # Source CSVs from each discovery method
    intermediate/                # Imported, enriched, scored; website_cache/
    output/                      # Final outreach queue, outreach_tracker.json

What to Clone From (Exact File Paths)

New Script	Clone From	Key Adaptations
`config.py`	`fcto-pipeline/config.py`	Replace keywords, scoring model, city lists, add practice area regex
`templates.py`	`fcto-pipeline/templates.py`	Replace with 5 attorney templates from Cold Outreach Kit v1
`discover_ddg.py`	`fcto-pipeline/discover_prospects.py`	Replace QUERIES list, adapt name extraction for law firm patterns
`import_prospects.py`	`fcto-pipeline/import_prospects.py`	Add gmaps scraper and Justia column mappings
`enrich_websites.py`	`fcto-pipeline/enrich_websites.py` + `foundation-pipeline/enrich_emails.py`	Combine: trafilatura + Jina, email regex + quality classification, attorney keywords
`guess_emails.py`	New (see spec below)	Email pattern generation, DNS MX verification
`score_prospects.py`	`fcto-pipeline/score_prospects.py`	Attorney-specific signals and thresholds
`generate_outreach_csv.py`	`fcto-pipeline/generate_outreach_csv.py`	Attorney column names
`generate_daily_outreach.py`	`fcto-pipeline/generate_daily_outreach.py`	4-touch cadence (Day 1/4/8/14), phone scripts
`migrate_to_baserow.py`	`fcto-pipeline/migrate_to_baserow.py`	Attorney table/fields
`server/discover_gmaps_scraper.py`	New	Query file generation + Docker runner

Scoring Model

SCORING = {
    # Core fit signals
    "estate_planning_focus": 15,       # Practice area confirmed via website/directory
    "firm_size_sweet_spot": 12,        # 2-15 attorneys (has budget, no in-house IT)
    "has_email": 8,                    # Direct email discovered
 
    # Market signals
    "top_wealth_state": 5,             # CA, NY, FL, TX, CO
    "has_website": 5,                  # Enables enrichment
    "no_security_vendor": 5,           # No cybersecurity/CISO/SOC 2 mentions on site
    "no_it_team": 5,                   # No IT director/CISO listed on team page
 
    # Authority signals
    "actec_fellow": 5,                 # ACTEC membership (premium prospect)
    "mentions_confidentiality": 3,     # Website mentions client data protection (buying signal)
    "high_google_rating": 3,           # 4.5+ stars on Google
    "has_phone": 3,                    # Phone available for Day 8 follow-up
    "free_consultation": 2,            # Justia flag (accessible/open to conversations)
 
    # Negative signals
    "solo_practitioner": -10,          # Rarely has budget
    "has_security_vendor": -8,         # Already has cybersecurity covered
    "large_firm": -8,                  # 20+ attorneys = has in-house IT
    "litigation_focus": -5,            # Primary practice is litigation, not estate planning
}
 
TIERS = {"A": 35, "B": 22, "C": 10}  # D = below 10

Geographic Scope: National (Top Wealth States)

Target Cities by Tier

Tier 1 (highest wealth density):

CA: Los Angeles, San Francisco, San Diego, San Jose, Palo Alto, Beverly Hills, Newport Beach
NY: Manhattan, Brooklyn, White Plains, Garden City, Long Island
FL: Miami, Palm Beach, Fort Lauderdale, Naples, Tampa, Jacksonville
TX: Houston, Dallas, Austin, San Antonio, Fort Worth
CO: Denver, Boulder, Colorado Springs, Fort Collins, Lakewood

Tier 2 (strong wealth markets):

IL: Chicago, Naperville, Evanston
MA: Boston, Cambridge, Newton, Wellesley
CT: Greenwich, Stamford, Hartford, New Haven
NJ: Princeton, Morristown, Hackensack, Newark
PA: Philadelphia, Pittsburgh, King of Prussia
WA: Seattle, Bellevue, Tacoma

Tier 3 (secondary markets):

AZ: Scottsdale, Phoenix, Tucson
GA: Atlanta, Buckhead, Savannah
NC: Charlotte, Raleigh, Durham
VA: McLean, Arlington, Richmond
MN: Minneapolis, St. Paul
OH: Cleveland, Columbus, Cincinnati
MD: Bethesda, Baltimore, Annapolis

Email Templates (from Cold Outreach Kit v1)

Critical Voice Rules (MUST follow in all templates)

NO em dashes (use semicolons, periods, commas, parentheses)
NO “seamless”, “frictionless”, “audit”, “genuinely”, “SMBs”
Language shifts: “data protection review” not “cybersecurity assessment”; “reasonable efforts verification” not “compliance audit”; “exposure points for client data” not “vulnerabilities”; “breach notification readiness” not “incident response plan”
Under 125 words per email body
End with a specific question, not generic CTA
Plain text only, no HTML, no images, no tracking pixels

5 Template Variants

Template 1: Ethical Duty (default A/B-tier)

Subject: Quick question about client data protection at {firm_name}
Hook: ABA Rule 1.6(c) “reasonable efforts” + “66% of law firms lack incident response plans”
When: Default for any attorney with email, A/B-tier

Template 2: Malpractice Insurance (insurance renewal signals)

Subject: Malpractice insurance and cybersecurity at {firm_name}
Hook: “Your malpractice carrier is asking 3 specific cybersecurity questions this renewal”
When: Firms with insurance renewal timing, or mentions on website

Template 3: Data Sensitivity (trusts/wealth/elder law)

Subject: Protecting {firm_name}'s trust and estate documents
Hook: Estate data = SSNs + financial accounts + family relationships + medical info of multiple generations
When: Practice areas include trusts, wealth transfer, elder law, special needs

Template 4: Peer Proof (B-tier, casual)

Subject: How estate firms are handling the cybersecurity question
Hook: What we’re seeing from similar firms + specific question about their approach
When: B-tier, casual opener

Template 5: Industry Authority (out-of-state, national positioning)

Subject: Estate planning data protection, {first_name}
Hook: Position as a specialist firm serving estate practices nationally with specific stats
When: Out-of-state prospects where “local peer” doesn’t apply

Follow-Up Cadence

Day 4: Follow-up email (malpractice insurance angle from Kit Email 2)
Day 8: Phone call (90-second script from Kit: lead with ABA obligation + documented proof)
Day 14: Breakup email (“closing the loop” + $200K average breach cost for professional services)

Template Assignment Logic

def assign_template(row):
    tier = row["prospect_tier"]
    state = row["state"]
    practice = row.get("practice_keywords_found", "")
 
    # Trusts/wealth/elder law + A/B tier -> Data Sensitivity
    if any(k in practice for k in ["trust", "wealth", "elder"]) and tier in ("A", "B"):
        return 3
    # A-tier default -> Ethical Duty
    if tier == "A":
        return 1
    # Colorado -> Template 1 (Ethical Duty with local angle in first line)
    if state == "CO":
        return 1
    # B-tier -> Peer Proof
    if tier == "B":
        return 4
    # Out-of-state C-tier -> Industry Authority
    return 5

guess_emails.py Spec (New Script)

"""Email pattern generation + DNS MX verification for law firms.
 
Given an attorney name + firm domain, generates likely email addresses
using common law firm patterns, verifies the domain has MX records,
and returns the best candidate with a confidence score.
 
Usage:
    python guess_emails.py --csv data/intermediate/prospects_enriched.csv
    python guess_emails.py --test "John Smith" "smithlaw.com"
"""
 
# Pattern priority (law firm specific):
PATTERNS = [
    ("{f}{last}", 0.35),          # jsmith@domain (most common for law firms)
    ("{first}.{last}", 0.25),     # john.smith@domain
    ("{first}{last}", 0.15),      # johnsmith@domain
    ("{first}", 0.10),            # john@domain
    ("{f}.{last}", 0.08),         # j.smith@domain
    ("{last}", 0.05),             # smith@domain
    ("{first}_{last}", 0.02),     # john_smith@domain
]
 
# Verification chain:
# 1. Check domain has MX records (dnspython) — if no MX, skip entire domain
# 2. Generate all pattern candidates
# 3. If theHarvester found emails for this domain, match pattern to known emails
# 4. Output best guess with confidence score (0.0-1.0)

server/discover_gmaps_scraper.py Spec (New Script)

"""Generate query files for gosom/google-maps-scraper and import results.
 
Modes:
    python discover_gmaps_scraper.py --generate    # Create query files in queries/
    python discover_gmaps_scraper.py --import FILE  # Import CSV from gmaps scraper
 
The --generate mode creates one query file per state tier, with one search
query per line: "estate planning attorney in {city} {state}"
 
The --import mode reads the gmaps scraper CSV output and converts it to
the pipeline's standard CSV format in data/raw/gmaps_batch{N}.csv
"""

server/SERVER_SETUP.md Content

Step-by-step guide:

docker pull gosom/google-maps-scraper
Copy queries/ directory to server
Run: docker run -v $(pwd):/data gosom/google-maps-scraper -input /data/queries/tier1.txt -results /data/tier1_results.csv -lang en -depth 1
Copy results CSV back to local machine: data/raw/gmaps_tier1.csv
Import: python import_prospects.py --gmaps data/raw/gmaps_tier1.csv

Website Keyword Signals (for enrich_websites.py)

Practice Area Keywords (must-have to confirm estate focus)

PRACTICE_KEYWORDS = [
    r"\bestate\s+planning\b",
    r"\btrusts?\s+(?:and|&)\s+estates?\b",
    r"\belder\s+law\b",
    r"\bprobate\b",
    r"\bwealth\s+transfer\b",
    r"\bspecial\s+needs\s+trust\b",
    r"\bguardianship\b",
    r"\bconservator\b",
    r"\bwill(?:s)?\s+(?:and|&)\s+trust\b",
    r"\bfamily\s+(?:wealth|office)\b",
]

Tech Covered Keywords (negative; they already have security)

TECH_COVERED_KEYWORDS = [
    r"\bcybersecurity\b",
    r"\bsecurity\s+assessment\b",
    r"\bSOC\s+2\b",
    r"\bincident\s+response\b",
    r"\bCISO\b",
    r"\binformation\s+security\b",
    r"\bdata\s+protection\s+officer\b",
]

Buying Signal Keywords (positive; they conceptually care)

BUYING_SIGNAL_KEYWORDS = [
    r"\bclient\s+confidentiality\b",
    r"\bprotect(?:ing)?\s+(?:your|client)\s+(?:legacy|interest|data|information)\b",
    r"\bprivacy\b",
    r"\bsecure\s+(?:portal|file|document|client)\b",
    r"\bABA\b",
    r"\bethical\s+obligation\b",
    r"\bmalpractice\b",
]

Firm Size Detection

FIRM_SIZE_KEYWORDS = {
    "solo": [r"\bsolo\s+practi(?:ce|tioner)\b", r"\blaw\s+office\s+of\b"],
    "small": [r"\b[2-9]\s+attorney", r"\boutique\s+(?:firm|practice)\b"],
    "mid": [r"\b1[0-9]\s+attorney", r"\b20\s+attorney"],
    "large": [r"\b[3-9]\d\s+attorney", r"\b\d{3,}\s+attorney", r"\bnational\s+firm\b"],
}

CRM Output Columns

CRM_COLUMNS = [
    "prospect_id",
    "attorney_name",
    "first_name",
    "firm_name",
    "title",                    # Managing Partner, Partner, Associate, etc.
    "email",
    "email_source",             # website_scrape, pattern_guess, theHarvester, directory
    "email_quality",            # named_person, role_based, generic
    "phone",
    "website",
    "linkedin_url",
    "city",
    "state",
    "practice_areas",           # Comma-separated from website/directory
    "source",                   # ddg, gmaps, justia, actec, manual
    "google_rating",
    "google_reviews_count",
    "firm_size_estimate",       # solo, small, mid, large
    "practice_confirmed",       # True if estate planning confirmed from website
    "has_security_vendor",      # True if TECH_COVERED_KEYWORDS found
    "has_it_team",              # True if CISO/IT director found on site
    "buying_signals_found",     # Comma-separated matched keywords
    "prospect_score",
    "prospect_tier",            # A, B, C, D
    "score_breakdown",          # Human-readable scoring explanation
    "template_variant",         # 1-5
    "personalized_first_line",
    "mailto_link",
    "outreach_status",          # Not Sent, Sent, Replied, Meeting Booked, Not Interested, Bounced
    "date_sent",
    "follow_up_date",
    "notes",
    "last_updated",
]

Daily Outreach Configuration

Default count: 3-5 prospects/day (attorneys require more personalization than foundations)
Output file: solanasis-docs/daily-outreach/YYYY-MM-DD-attorney.md
Follow-up cadence: Day 1 (initial) / Day 4 (email) / Day 8 (phone) / Day 14 (breakup)
Tracker: data/output/outreach_tracker.json
Template rotation: Never send same template two consecutive days
Personalization: ALWAYS customize first line with firm name, city, or specific detail from their website
Anti-spam rules: Plain text only, space sends 2+ hours apart, max 5-10/day total across all pipelines, no HTML/images/tracking/attachments/UTMs/link shorteners

Execution Order

pip install -r requirements.txt
Create directory structure: data/raw/, data/intermediate/, data/intermediate/website_cache/, data/output/, server/queries/
Build config.py
Build templates.py (read Cold Outreach Kit v1 for exact email copy; adapt to voice rules)
Build discover_ddg.py (clone fCTO, swap queries)
Run discover_ddg.py for Tier 1 states → CSV in data/raw/
Build server/discover_gmaps_scraper.py (generate query files)
Run gmaps scraper via Docker: docker run ... → CSV in data/raw/
Build import_prospects.py (consolidate all CSVs, deduplicate by domain)
Run import_prospects.py → data/intermediate/prospects_imported.csv
Build enrich_websites.py (clone fCTO + foundation email patterns)
Run enrich_websites.py → data/intermediate/prospects_enriched.csv
Build guess_emails.py
Run guess_emails.py on prospects without email → updates enriched CSV
Build score_prospects.py
Run score_prospects.py → data/intermediate/prospects_scored.csv
Build generate_outreach_csv.py
Run generate_outreach_csv.py → data/output/attorney_outreach_queue.csv
Build generate_daily_outreach.py
Run generate_daily_outreach.py → solanasis-docs/daily-outreach/YYYY-MM-DD-attorney.md
Review generated emails for voice compliance, then start sending

Verification Checklist

What NOT to Do

Do NOT use Apollo, ZoomInfo, or any paid data tool. Free layers are sufficient.
Do NOT scrape Justia or FindLaw directly (both return 403). Use Apify if needed.
Do NOT scrape NAELA (reCAPTCHA protected). Small enough to browse manually.
Do NOT say “cybersecurity assessment” in any template. Always “data protection review.”
Do NOT claim Colorado mandates cybersecurity CLE. It does NOT as of 2026.
Do NOT send more than 10 emails/day total across all pipelines from solanasis.com.
Do NOT use HTML, images, tracking pixels, UTMs, or link shorteners in cold emails.
Do NOT commit .env files or API keys to git.

Reference: Solanasis Signature

Dmitri Sunshine
Founder, Solanasis
solanasis.com | 303-900-8969

Reference: Key Stats for Templates

66% of law firms lack incident response plans (ABA TechReport 2023)
$200K average breach cost for professional services firms
ABA Rule 1.6(c): “reasonable efforts to prevent the inadvertent or unauthorized disclosure”
ABA Rule 1.1 Comment 8: keep abreast of “benefits and risks associated with relevant technology”
294K identity theft tax returns intercepted in 2023
Estate files contain SSNs of clients AND all beneficiaries (often including children and grandchildren)

Solanasis Docs

Explorer

attorney-pipeline-continuation-prompt