Estate Planning Attorney Pipeline — Continuation Prompt
Purpose: Paste this entire document as the first message in a new Claude Code session (on the server or any machine with the same repo layout) to build and run the full estate planning attorney cold email prospecting pipeline.
Last updated: 2026-03-22
Session workspace: The _solanasis container folder (same layout as C:\Users\zasya\Documents\_solanasis)
Estimated build time: 3-4 hours for full pipeline + first discovery run
What to Build
Build a complete estate planning attorney cold email prospecting pipeline at solanasis-scripts/attorney-pipeline/. This pipeline discovers estate planning attorneys nationally, enriches them with emails and signals, scores and tiers them, and generates daily outreach briefs with pre-written emails.
This is NOT a greenfield build. Clone heavily from the existing fcto-pipeline/ (same data shape: people-at-firms). Reuse the foundation pipeline’s email enrichment patterns. Adapt the Cold Outreach Kit v1 email templates.
Strategic Context (Why Estate Planning Attorneys)
Estate planning attorneys are the #1 smartcut into the wealth management ecosystem:
- They sit at the center of every wealth team (RIA + CPA + attorney)
- One attorney client = 170K Year 1 potential via the flywheel (attorney → CPA referral → RIA referral)
- March-May is the buying window (malpractice insurance renewals, bar conferences, summer associate urgency)
- They handle the MOST sensitive data (SSNs, financial accounts, trust documents, medical info, beneficiary designations)
- 66% of law firms lack incident response plans (ABA TechReport 2023)
- ABA Rules 1.6(c) and 1.1 require “reasonable efforts” to protect client data (nationwide, enforceable)
- Malpractice carriers are now asking cybersecurity questions on renewal applications
Positioning: “Data Protection for Client Trust” — NOT “cybersecurity assessment.” Frame everything as compliance with ABA ethical obligations, not IT security.
Key playbooks to reference:
solanasis-docs/playbooks/Estate_Attorney_Cold_Outreach_Kit_v1.md— email templates, language shifts, phone scripts, objection handlingsolanasis-docs/playbooks/Estate_Planning_Attorney_Smartcut_Playbook.md— full 9-part strategy
Data Acquisition Strategy: Multi-Layer Free Scraping
No paid tools required. Stack multiple free data sources, each handled by a Python script Claude Code builds and runs.
Layer 1: DuckDuckGo Search (PRIMARY, runs locally, free)
Clone from fcto-pipeline/discover_prospects.py. Already proven pattern. Law firm websites are well-indexed; this works even better for attorneys than it did for fCTOs.
Attorney-specific queries (run per city for each target state):
"estate planning attorney" {city} {state}"trusts and estates" attorney {city} law firm"elder law attorney" {city} {state}"estate planning" "managing partner" {city}"wealth transfer attorney" {city}site:linkedin.com "estate planning attorney" {city}site:justia.com estate planning attorney {state}site:avvo.com estate planning attorney {city}
Expected yield: 50-100 unique firm/attorney records per state.
Layer 2: gosom/google-maps-scraper (BULK, Docker, free, no API key)
This machine should have Docker. The gosom/google-maps-scraper (3.5K+ GitHub stars, MIT license, actively maintained) is the single highest-yield free tool.
- Input: text file with one search query per line (e.g., “estate planning attorney in Denver CO”)
- Output: CSV with 33+ fields including name, address, phone, website, rating, review count, email
- Built-in email extraction from business websites
- No API key needed, no rate limits (self-hosted)
docker pull gosom/google-maps-scraper
# Run with query file:
docker run -v $(pwd)/data:/data gosom/google-maps-scraper -input /data/queries.txt -results /data/output.csvBuild server/discover_gmaps_scraper.py to:
- Generate query files from the city list (one query per city: “estate planning attorney in {city} {state}”)
- Run the Docker container
- Import the CSV output into the pipeline
Layer 3: Website Scraping for Emails (free, unlimited)
For every firm with a website URL from Layers 1-2, scrape their contact/about/team pages.
Reuse these exact patterns:
foundation-pipeline/enrich_emails.py— Jina Reader (r.jina.ai) for clean text, regex email extraction, email quality classification (named_person / role_based / generic), SKIP_PREFIXES, PREFERRED_PREFIXESfcto-pipeline/enrich_websites.py— trafilatura for text extraction, httpx async scraping, website cache system, Jina Reader as fallback for JS-heavy sites, keyword signal detection with regex
Scrape these pages on each firm website: /contact, /about, /attorneys, /our-team, /people, /staff
Layer 4: Email Pattern Guessing + DNS Verification (free, unlimited)
When website scraping finds attorney names but no email, generate candidates using law firm email patterns.
Most common law firm email formats (in order of frequency):
{first_initial}{last}@domain— most common for law firms (e.g., jsmith@smithlaw.com){first}.{last}@domain{first}{last}@domain{first}@domain
Verification:
- DNS MX record lookup (does the domain accept email?) —
dnspythonpackage - Optional: SMTP RCPT TO check (does the specific address exist?)
Build guess_emails.py — takes name + domain, generates candidates, verifies via DNS MX, outputs best guess with confidence score.
Layer 5: theHarvester (domain OSINT, free)
Open-source tool (15.9K GitHub stars). Queries 40+ sources to find emails for a domain.
theHarvester -d smithlawfirm.com -b all
Use on high-value A-tier firms where Layers 3-4 didn’t find an email. Install: pip install theHarvester (requires Python 3.12+).
Layer 6: Outscraper Google Maps API (500 free records/month)
Managed API, no subscription. 500 free business records/month, then $3/1K.
pip install outscraper
Use as supplemental data when DDG + gmaps scraper don’t cover a city well enough.
Layer 7: Apify Justia Scraper ($5/mo free credits)
Pre-built scraper for Justia’s lawyer directory. Estate planning practice area filter. Returns name, firm, phone, email, bio, practice areas, social links.
$5/month free Apify platform credits = ~1,250 attorney listings/month.
Direct scraping of Justia returns 403 (Cloudflare). Apify handles this.
Pipeline Architecture
solanasis-scripts/attorney-pipeline/
## Core scripts (LOCAL)
config.py # Scoring, keywords, paths, tiers, city lists
templates.py # 5 email templates + 2 follow-ups
discover_ddg.py # DuckDuckGo discovery
import_prospects.py # Import from all sources (DDG, gmaps, Justia CSVs)
enrich_websites.py # Scrape firm websites for emails + signals
guess_emails.py # Email pattern generation + DNS verification
score_prospects.py # Attorney-specific scoring model
generate_outreach_csv.py # CRM-ready CSV with mailto links
generate_daily_outreach.py # Daily markdown brief (3-5 prospects/day)
migrate_to_baserow.py # Optional CRM migration
requirements.txt
## Server scripts (DOCKER)
server/
discover_gmaps_scraper.py # Generates query files + runs Docker scraper
SERVER_SETUP.md # Deployment guide
queries/ # Generated query files (one per state)
data/
raw/ # Source CSVs from each discovery method
intermediate/ # Imported, enriched, scored; website_cache/
output/ # Final outreach queue, outreach_tracker.json
What to Clone From (Exact File Paths)
| New Script | Clone From | Key Adaptations |
|---|---|---|
config.py | fcto-pipeline/config.py | Replace keywords, scoring model, city lists, add practice area regex |
templates.py | fcto-pipeline/templates.py | Replace with 5 attorney templates from Cold Outreach Kit v1 |
discover_ddg.py | fcto-pipeline/discover_prospects.py | Replace QUERIES list, adapt name extraction for law firm patterns |
import_prospects.py | fcto-pipeline/import_prospects.py | Add gmaps scraper and Justia column mappings |
enrich_websites.py | fcto-pipeline/enrich_websites.py + foundation-pipeline/enrich_emails.py | Combine: trafilatura + Jina, email regex + quality classification, attorney keywords |
guess_emails.py | New (see spec below) | Email pattern generation, DNS MX verification |
score_prospects.py | fcto-pipeline/score_prospects.py | Attorney-specific signals and thresholds |
generate_outreach_csv.py | fcto-pipeline/generate_outreach_csv.py | Attorney column names |
generate_daily_outreach.py | fcto-pipeline/generate_daily_outreach.py | 4-touch cadence (Day 1/4/8/14), phone scripts |
migrate_to_baserow.py | fcto-pipeline/migrate_to_baserow.py | Attorney table/fields |
server/discover_gmaps_scraper.py | New | Query file generation + Docker runner |
Scoring Model
SCORING = {
# Core fit signals
"estate_planning_focus": 15, # Practice area confirmed via website/directory
"firm_size_sweet_spot": 12, # 2-15 attorneys (has budget, no in-house IT)
"has_email": 8, # Direct email discovered
# Market signals
"top_wealth_state": 5, # CA, NY, FL, TX, CO
"has_website": 5, # Enables enrichment
"no_security_vendor": 5, # No cybersecurity/CISO/SOC 2 mentions on site
"no_it_team": 5, # No IT director/CISO listed on team page
# Authority signals
"actec_fellow": 5, # ACTEC membership (premium prospect)
"mentions_confidentiality": 3, # Website mentions client data protection (buying signal)
"high_google_rating": 3, # 4.5+ stars on Google
"has_phone": 3, # Phone available for Day 8 follow-up
"free_consultation": 2, # Justia flag (accessible/open to conversations)
# Negative signals
"solo_practitioner": -10, # Rarely has budget
"has_security_vendor": -8, # Already has cybersecurity covered
"large_firm": -8, # 20+ attorneys = has in-house IT
"litigation_focus": -5, # Primary practice is litigation, not estate planning
}
TIERS = {"A": 35, "B": 22, "C": 10} # D = below 10Geographic Scope: National (Top Wealth States)
Target Cities by Tier
Tier 1 (highest wealth density):
- CA: Los Angeles, San Francisco, San Diego, San Jose, Palo Alto, Beverly Hills, Newport Beach
- NY: Manhattan, Brooklyn, White Plains, Garden City, Long Island
- FL: Miami, Palm Beach, Fort Lauderdale, Naples, Tampa, Jacksonville
- TX: Houston, Dallas, Austin, San Antonio, Fort Worth
- CO: Denver, Boulder, Colorado Springs, Fort Collins, Lakewood
Tier 2 (strong wealth markets):
- IL: Chicago, Naperville, Evanston
- MA: Boston, Cambridge, Newton, Wellesley
- CT: Greenwich, Stamford, Hartford, New Haven
- NJ: Princeton, Morristown, Hackensack, Newark
- PA: Philadelphia, Pittsburgh, King of Prussia
- WA: Seattle, Bellevue, Tacoma
Tier 3 (secondary markets):
- AZ: Scottsdale, Phoenix, Tucson
- GA: Atlanta, Buckhead, Savannah
- NC: Charlotte, Raleigh, Durham
- VA: McLean, Arlington, Richmond
- MN: Minneapolis, St. Paul
- OH: Cleveland, Columbus, Cincinnati
- MD: Bethesda, Baltimore, Annapolis
Email Templates (from Cold Outreach Kit v1)
Critical Voice Rules (MUST follow in all templates)
- NO em dashes (use semicolons, periods, commas, parentheses)
- NO “seamless”, “frictionless”, “audit”, “genuinely”, “SMBs”
- Language shifts: “data protection review” not “cybersecurity assessment”; “reasonable efforts verification” not “compliance audit”; “exposure points for client data” not “vulnerabilities”; “breach notification readiness” not “incident response plan”
- Under 125 words per email body
- End with a specific question, not generic CTA
- Plain text only, no HTML, no images, no tracking pixels
5 Template Variants
Template 1: Ethical Duty (default A/B-tier)
- Subject:
Quick question about client data protection at {firm_name} - Hook: ABA Rule 1.6(c) “reasonable efforts” + “66% of law firms lack incident response plans”
- When: Default for any attorney with email, A/B-tier
Template 2: Malpractice Insurance (insurance renewal signals)
- Subject:
Malpractice insurance and cybersecurity at {firm_name} - Hook: “Your malpractice carrier is asking 3 specific cybersecurity questions this renewal”
- When: Firms with insurance renewal timing, or mentions on website
Template 3: Data Sensitivity (trusts/wealth/elder law)
- Subject:
Protecting {firm_name}'s trust and estate documents - Hook: Estate data = SSNs + financial accounts + family relationships + medical info of multiple generations
- When: Practice areas include trusts, wealth transfer, elder law, special needs
Template 4: Peer Proof (B-tier, casual)
- Subject:
How estate firms are handling the cybersecurity question - Hook: What we’re seeing from similar firms + specific question about their approach
- When: B-tier, casual opener
Template 5: Industry Authority (out-of-state, national positioning)
- Subject:
Estate planning data protection, {first_name} - Hook: Position as a specialist firm serving estate practices nationally with specific stats
- When: Out-of-state prospects where “local peer” doesn’t apply
Follow-Up Cadence
- Day 4: Follow-up email (malpractice insurance angle from Kit Email 2)
- Day 8: Phone call (90-second script from Kit: lead with ABA obligation + documented proof)
- Day 14: Breakup email (“closing the loop” + $200K average breach cost for professional services)
Template Assignment Logic
def assign_template(row):
tier = row["prospect_tier"]
state = row["state"]
practice = row.get("practice_keywords_found", "")
# Trusts/wealth/elder law + A/B tier -> Data Sensitivity
if any(k in practice for k in ["trust", "wealth", "elder"]) and tier in ("A", "B"):
return 3
# A-tier default -> Ethical Duty
if tier == "A":
return 1
# Colorado -> Template 1 (Ethical Duty with local angle in first line)
if state == "CO":
return 1
# B-tier -> Peer Proof
if tier == "B":
return 4
# Out-of-state C-tier -> Industry Authority
return 5guess_emails.py Spec (New Script)
"""Email pattern generation + DNS MX verification for law firms.
Given an attorney name + firm domain, generates likely email addresses
using common law firm patterns, verifies the domain has MX records,
and returns the best candidate with a confidence score.
Usage:
python guess_emails.py --csv data/intermediate/prospects_enriched.csv
python guess_emails.py --test "John Smith" "smithlaw.com"
"""
# Pattern priority (law firm specific):
PATTERNS = [
("{f}{last}", 0.35), # jsmith@domain (most common for law firms)
("{first}.{last}", 0.25), # john.smith@domain
("{first}{last}", 0.15), # johnsmith@domain
("{first}", 0.10), # john@domain
("{f}.{last}", 0.08), # j.smith@domain
("{last}", 0.05), # smith@domain
("{first}_{last}", 0.02), # john_smith@domain
]
# Verification chain:
# 1. Check domain has MX records (dnspython) — if no MX, skip entire domain
# 2. Generate all pattern candidates
# 3. If theHarvester found emails for this domain, match pattern to known emails
# 4. Output best guess with confidence score (0.0-1.0)server/discover_gmaps_scraper.py Spec (New Script)
"""Generate query files for gosom/google-maps-scraper and import results.
Modes:
python discover_gmaps_scraper.py --generate # Create query files in queries/
python discover_gmaps_scraper.py --import FILE # Import CSV from gmaps scraper
The --generate mode creates one query file per state tier, with one search
query per line: "estate planning attorney in {city} {state}"
The --import mode reads the gmaps scraper CSV output and converts it to
the pipeline's standard CSV format in data/raw/gmaps_batch{N}.csv
"""server/SERVER_SETUP.md Content
Step-by-step guide:
docker pull gosom/google-maps-scraper- Copy
queries/directory to server - Run:
docker run -v $(pwd):/data gosom/google-maps-scraper -input /data/queries/tier1.txt -results /data/tier1_results.csv -lang en -depth 1 - Copy results CSV back to local machine:
data/raw/gmaps_tier1.csv - Import:
python import_prospects.py --gmaps data/raw/gmaps_tier1.csv
Website Keyword Signals (for enrich_websites.py)
Practice Area Keywords (must-have to confirm estate focus)
PRACTICE_KEYWORDS = [
r"\bestate\s+planning\b",
r"\btrusts?\s+(?:and|&)\s+estates?\b",
r"\belder\s+law\b",
r"\bprobate\b",
r"\bwealth\s+transfer\b",
r"\bspecial\s+needs\s+trust\b",
r"\bguardianship\b",
r"\bconservator\b",
r"\bwill(?:s)?\s+(?:and|&)\s+trust\b",
r"\bfamily\s+(?:wealth|office)\b",
]Tech Covered Keywords (negative; they already have security)
TECH_COVERED_KEYWORDS = [
r"\bcybersecurity\b",
r"\bsecurity\s+assessment\b",
r"\bSOC\s+2\b",
r"\bincident\s+response\b",
r"\bCISO\b",
r"\binformation\s+security\b",
r"\bdata\s+protection\s+officer\b",
]Buying Signal Keywords (positive; they conceptually care)
BUYING_SIGNAL_KEYWORDS = [
r"\bclient\s+confidentiality\b",
r"\bprotect(?:ing)?\s+(?:your|client)\s+(?:legacy|interest|data|information)\b",
r"\bprivacy\b",
r"\bsecure\s+(?:portal|file|document|client)\b",
r"\bABA\b",
r"\bethical\s+obligation\b",
r"\bmalpractice\b",
]Firm Size Detection
FIRM_SIZE_KEYWORDS = {
"solo": [r"\bsolo\s+practi(?:ce|tioner)\b", r"\blaw\s+office\s+of\b"],
"small": [r"\b[2-9]\s+attorney", r"\boutique\s+(?:firm|practice)\b"],
"mid": [r"\b1[0-9]\s+attorney", r"\b20\s+attorney"],
"large": [r"\b[3-9]\d\s+attorney", r"\b\d{3,}\s+attorney", r"\bnational\s+firm\b"],
}CRM Output Columns
CRM_COLUMNS = [
"prospect_id",
"attorney_name",
"first_name",
"firm_name",
"title", # Managing Partner, Partner, Associate, etc.
"email",
"email_source", # website_scrape, pattern_guess, theHarvester, directory
"email_quality", # named_person, role_based, generic
"phone",
"website",
"linkedin_url",
"city",
"state",
"practice_areas", # Comma-separated from website/directory
"source", # ddg, gmaps, justia, actec, manual
"google_rating",
"google_reviews_count",
"firm_size_estimate", # solo, small, mid, large
"practice_confirmed", # True if estate planning confirmed from website
"has_security_vendor", # True if TECH_COVERED_KEYWORDS found
"has_it_team", # True if CISO/IT director found on site
"buying_signals_found", # Comma-separated matched keywords
"prospect_score",
"prospect_tier", # A, B, C, D
"score_breakdown", # Human-readable scoring explanation
"template_variant", # 1-5
"personalized_first_line",
"mailto_link",
"outreach_status", # Not Sent, Sent, Replied, Meeting Booked, Not Interested, Bounced
"date_sent",
"follow_up_date",
"notes",
"last_updated",
]Daily Outreach Configuration
- Default count: 3-5 prospects/day (attorneys require more personalization than foundations)
- Output file:
solanasis-docs/daily-outreach/YYYY-MM-DD-attorney.md - Follow-up cadence: Day 1 (initial) / Day 4 (email) / Day 8 (phone) / Day 14 (breakup)
- Tracker:
data/output/outreach_tracker.json - Template rotation: Never send same template two consecutive days
- Personalization: ALWAYS customize first line with firm name, city, or specific detail from their website
- Anti-spam rules: Plain text only, space sends 2+ hours apart, max 5-10/day total across all pipelines, no HTML/images/tracking/attachments/UTMs/link shorteners
Execution Order
pip install -r requirements.txt- Create directory structure:
data/raw/,data/intermediate/,data/intermediate/website_cache/,data/output/,server/queries/ - Build
config.py - Build
templates.py(read Cold Outreach Kit v1 for exact email copy; adapt to voice rules) - Build
discover_ddg.py(clone fCTO, swap queries) - Run
discover_ddg.pyfor Tier 1 states → CSV indata/raw/ - Build
server/discover_gmaps_scraper.py(generate query files) - Run gmaps scraper via Docker:
docker run ...→ CSV indata/raw/ - Build
import_prospects.py(consolidate all CSVs, deduplicate by domain) - Run
import_prospects.py→data/intermediate/prospects_imported.csv - Build
enrich_websites.py(clone fCTO + foundation email patterns) - Run
enrich_websites.py→data/intermediate/prospects_enriched.csv - Build
guess_emails.py - Run
guess_emails.pyon prospects without email → updates enriched CSV - Build
score_prospects.py - Run
score_prospects.py→data/intermediate/prospects_scored.csv - Build
generate_outreach_csv.py - Run
generate_outreach_csv.py→data/output/attorney_outreach_queue.csv - Build
generate_daily_outreach.py - Run
generate_daily_outreach.py→solanasis-docs/daily-outreach/YYYY-MM-DD-attorney.md - Review generated emails for voice compliance, then start sending
Verification Checklist
-
discover_ddg.py --dry-runshows correct query list and time estimate -
discover_ddg.pyproduces CSV with 50+ results per Tier 1 state -
import_prospects.pyconsolidates sources and deduplicates -
enrich_websites.py --limit 10discovers emails and keyword signals correctly -
guess_emails.py --test "John Smith" "smithlawfirm.com"generates patterns and verifies MX -
score_prospects.pyproduces bell-curve tier distribution (few A, many B/C, some D) -
generate_daily_outreach.pyoutputs markdown with pre-written emails and mailto links - Voice check: 3 generated emails have NO em dashes, use correct language shifts, under 125 words
- CAN-SPAM check: emails include physical address + opt-out text
- gosom/google-maps-scraper Docker runs and produces CSV (if Docker available)
What NOT to Do
- Do NOT use Apollo, ZoomInfo, or any paid data tool. Free layers are sufficient.
- Do NOT scrape Justia or FindLaw directly (both return 403). Use Apify if needed.
- Do NOT scrape NAELA (reCAPTCHA protected). Small enough to browse manually.
- Do NOT say “cybersecurity assessment” in any template. Always “data protection review.”
- Do NOT claim Colorado mandates cybersecurity CLE. It does NOT as of 2026.
- Do NOT send more than 10 emails/day total across all pipelines from solanasis.com.
- Do NOT use HTML, images, tracking pixels, UTMs, or link shorteners in cold emails.
- Do NOT commit .env files or API keys to git.
Reference: Solanasis Signature
Dmitri Sunshine
Founder, Solanasis
solanasis.com | 303-900-8969
Reference: Key Stats for Templates
- 66% of law firms lack incident response plans (ABA TechReport 2023)
- $200K average breach cost for professional services firms
- ABA Rule 1.6(c): “reasonable efforts to prevent the inadvertent or unauthorized disclosure”
- ABA Rule 1.1 Comment 8: keep abreast of “benefits and risks associated with relevant technology”
- 294K identity theft tax returns intercepted in 2023
- Estate files contain SSNs of clients AND all beneficiaries (often including children and grandchildren)