I Built an App with Cursor, Made $30K, and Quit My Job.
Step-by-step “vibe coded” Cursor prompts to build the Creator Hunter–style app
Goal: build an app that matches startups with influencers (creator directory + search + save lists + paywall), using Next.js + Clerk + Supabase, deployed on Vercel, with an ingestion job using a scraping API—the same general flow described in the video (Bolt for MVP → Cursor for production polish + Supabase + Clerk).
Phase 0 — Cursor setup prompts (project guardrails)
Prompt 0.1 — “Set rules + scope”
Cursor prompt:
You are my senior full-stack engineer. We’re building “Creator Hunter”: a SaaS that matches startups with influencers.
Stack: Next.js (App Router) + TypeScript + Tailwind + shadcn/ui, Clerk auth, Supabase Postgres, Stripe (optional), Vercel deploy, and a scraping API for ingestion.
Output: propose the folder structure, main routes, DB schema, and a 10-step implementation plan. Keep it MVP-first and production-safe (no secrets in client, RLS policies, server routes only for privileged actions).
Ask clarifying questions ONLY if absolutely required—otherwise make reasonable assumptions and proceed.
✅ Expected output: a clear plan + structure. Save it in your repo as docs/plan.md.
Phase 1 — Create the Next.js app + UI system
Prompt 1.1 — “Bootstrap app”
Create a new Next.js app (App Router) called
creator-hunterwith TypeScript, Tailwind, and ESLint. Add shadcn/ui.
Add basic layout, top nav, and an/approute group for authenticated pages and/for marketing placeholder.
Provide the exact commands + files to create/edit.
Prompt 1.2 — “Install deps + config”
Install and configure dependencies: @clerk/nextjs, @supabase/supabase-js, zod.
Add a safe env var template in.env.exampleand a.env.localplaceholder.
Ensure no server secrets are used in client components.
Common error this avoids: accidentally exposing service keys via NEXT_PUBLIC_*.
Phase 2 — Clerk authentication + route protection
Prompt 2.1 — “Add Clerk auth”
Integrate Clerk with Next.js App Router:
add Clerk provider in root layout
add middleware to protect
/app/*create
/sign-inand/sign-uppages using Clerk componentsafter login redirect to
/app/search
Show all code changes.
Prompt 2.2 — “Auth sanity test”
Add a simple
/appdashboard page that displays the signed-in user’s email and a sign-out button.
Ensure unauthenticated users are redirected to/sign-in.
Common error + fix: login redirect loops → check middleware matcher + Clerk URLs.
Phase 3 — Supabase database + schema
Prompt 3.1 — “Design schema for creators + lists”
Create a Supabase SQL migration for these tables:
creators (platform, handle, display_name, bio, avatar_url, country, language, followers, avg_views, engagement_rate, contact_email nullable, contact_url nullable, last_scraped_at, scrape_status, scrape_fail_count, scrape_last_error)
lists (user_id, name)
list_items (list_id, creator_id, note)
Add unique constraint on (platform, handle).
Add indexes for filtering/search (platform, followers, country, and a full text index on bio/display_name/handle).
Output:supabase/migrations/0001_init.sql
Prompt 3.2 — “RLS policies”
Enable RLS and write SQL policies:
creators: readable by authenticated users
lists: only owner can CRUD
list_items: only owner (via list ownership) can CRUD
Output:supabase/migrations/0002_rls.sql
Keep policies strict and deny-by-default.
Common errors + fixes
“403 RLS violation” when reading/writing → policy missing or
auth.uid()mismatch.Duplicates in creators → missing unique constraint, or handle normalization.
Phase 4 — Supabase clients (server vs browser)
Prompt 4.1 — “Supabase client setup”
Create
lib/supabase/client.tsfor browser use with anon key andlib/supabase/server.tsfor server-only use.
Server client must support reading the current Clerk user and (optionally) using service role ONLY in server routes.
Add comments warning never to expose service keys.
Common error: using service role key in client bundle.
Phase 5 — Creator search (core product)
Prompt 5.1 — “Build /app/search page UI”
Build
/app/searchpage with a filter sidebar and results table/grid:
Filters: platform, followers min/max, country, keyword.
Results show: avatar, name, handle, followers, engagement_rate, tags (optional).
Use shadcn components. Keep design clean, modern, cohesive.
Prompt 5.2 — “Implement search query server-side”
Implement a server action or route handler that accepts validated filters (Zod) and queries Supabase with pagination.
Ensure it’s safe, parameterized, and fast.
Wire the UI to call it and render results with loading + empty states.
Common errors + fixes
Slow queries → missing indexes or returning too many columns/rows.
Broken filters → missing Zod validation; use defaults.
Phase 6 — Creator detail page + paywall
Prompt 6.1 — “Creator profile route”
Create
/app/creator/[id]page that shows profile details and recent metrics.
Add “Save to list” button.
If user is not paid (we’ll implement later), hide/blur contact fields and show an Upgrade CTA.
Prompt 6.2 — “Entitlements stub”
Create
lib/entitlements.tswithisPro(userId)stub that returns false for now.
Use it to gate contact fields and export actions (server-side, not just UI).
Common error: paywall bypass via direct API calls → always enforce server-side.
Phase 7 — Lists (shortlist workflow)
Prompt 7.1 — “Lists CRUD”
Build
/app/listspage: create list, rename list, delete list, and view list items.
Add list selector modal from search results to save creators into a list.
Enforce ownership server-side and rely on RLS as a second layer.
Prompt 7.2 — “Notes per saved creator”
Add note editing for list items with optimistic UI.
Validate inputs and handle errors gracefully.
Phase 8 — Billing (optional but typical)
Prompt 8.1 — “Stripe checkout”
Add Stripe billing with a simple Pro plan:
/app/billingpagecreate checkout session server-side
webhook route to update
subscriptionstable
isProchecks subscription status
Include safe handling for test vs live keys and webhook signature verification.
Common errors + fixes
Webhook signature fails → need raw body verification in Next route handlers.
Test/live mismatch → keep keys consistent per environment.
Phase 9 — Data ingestion (“the database”) with scraping API
Prompt 9.1 — “Ingestion tables + job design”
Add ingestion support:
create table
ingestion_queue(platform, profile_url, handle nullable, priority, status, attempts, last_error, next_run_at)write a job runner design: Discover → Fetch → Normalize → Upsert creators
include retry with exponential backoff and rate limiting
Output SQL migration + a/api/jobs/run-ingestionroute handler (server-only).
Prompt 9.2 — “Scraper integration”
Implement
lib/scraper.tsthat calls SCRAPER_API (provider-agnostic wrapper).
Include robust error handling: timeouts, 401/429/403, and structured logs.
Add a “dry run” mode to test a single profile scrape without writing to DB.
Prompt 9.3 — “Normalization + upsert”
Implement
lib/normalize.tsto normalize scraped data into our schema.
Include parsing for counts like “1.2M”, “45K”, commas, and missing values.
Upsert intocreatorsusing unique key (platform, handle).
Update last_scraped_at and scrape status fields.
Common ingestion errors + fixes
403/blocked/captcha: add backoff + provider anti-bot options + cooldown per creator.
Cost explosion: refresh selectively; don’t scrape posts every time.
Data inconsistency: strict parsing + set NULLs, not 0.
Phase 10 — Reliability hardening before launch
Prompt 10.1 — “Rate limiting + abuse controls”
Add rate limiting to search and export endpoints (and ingestion triggers).
Enforce max page size and max export rows.
Add caching for common searches if helpful.
Prompt 10.2 — “Observability”
Add structured logging for server routes and ingestion jobs: request id, user id, job id.
Store ingestion failures in DB fields.
Add an admin-only page/app/admin/ingestionto view queue + recent failures.
Prompt 10.3 — “Security audit”
Perform a security review of the repo:
ensure no secrets are exposed
ensure server actions and API routes enforce auth + entitlements
confirm RLS policies are correct
confirm ingestion routes cannot be triggered by the public
Produce a checklist and recommended fixes.
“When Cursor gets stuck” prompt (use anytime)
You introduced an error. Diagnose it step-by-step.
First: explain the root cause in 1–2 sentences.
Then: provide the minimal diff to fix it.
Finally: add a regression check (test or manual steps).
Do not rewrite unrelated files.
If you want, tell me which platform you’re ingesting first (TikTok / YouTube / IG / X) and whether you want billing in MVP, and I’ll tailor these prompts into an even tighter sequence (with exact file names and the order to run migrations/deploy).
environment, API keys and configuration
Below is a practical “setup + config” guide for a Creator Hunter–style app using the stack described in the video: Cursor/Bolt → Next.js, Supabase, Clerk, Vercel, plus an API for scraping, and optional Framer for the marketing site.
0) Prereqs
Node.js 20 LTS (recommended)
Git + GitHub
Accounts: Supabase, Clerk, Vercel
(Optional but common) Stripe for billing, Upstash/QStash or similar for jobs, Scraping provider (Apify/Bright Data/ScraperAPI/etc.)
1) Project bootstrap (local environment)
Create project:
npx create-next-app@latest creator-hunter --ts --eslint --tailwind --app cd creator-hunterInstall dependencies you’ll almost certainly need:
npm i @clerk/nextjs @supabase/supabase-js zodCreate
.env.local(never commit this file).
2) Environment variables (what to put in .env.local)
A) Clerk (Auth)
From the Clerk dashboard → your application → API keys:
# Clerk (public keys are safe in browser)
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_test_...
CLERK_SECRET_KEY=sk_test_...
# Clerk URLs (recommended defaults)
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up
NEXT_PUBLIC_CLERK_AFTER_SIGN_IN_URL=/app
NEXT_PUBLIC_CLERK_AFTER_SIGN_UP_URL=/app
# Webhooks (if using)
CLERK_WEBHOOK_SECRET=whsec_...
Common errors + fixes
“Missing Clerk publishable key”: you forgot
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEYor misspelled it.Wrong environment keys: using
pk_livelocally or mixing test/live keys. Keep a clean separation.Redirect loop after login: mismatch between Clerk redirect URLs and your
middleware/protected routes. Ensure/appis protected and/sign-inis not.
B) Supabase (DB/API)
From Supabase project settings → API:
# Public URL + anon key can be used client-side
NEXT_PUBLIC_SUPABASE_URL=https://xxxx.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJ...
# Service role key MUST stay server-side only (never expose to browser)
SUPABASE_SERVICE_ROLE_KEY=eyJ...
Common errors + fixes
403 / RLS errors when querying tables: Row Level Security is enabled but you haven’t added policies.
Fix: add minimal SELECT policies for
creators(and proper owner-based policies forlists, etc.), or use server routes with the service role key for admin operations.
Accidentally shipping service role key to the client: any variable starting with
NEXT_PUBLIC_is exposed.Fix: never prefix service keys with
NEXT_PUBLIC_. Keep them server-only.
Supabase client using the wrong key: you used service role key in browser or anon key on server when you needed elevated permissions.
Fix: use anon key in client; use service role key only in server routes/actions.
C) Scraping API (data ingestion)
Pick a provider and store its key:
SCRAPER_API_KEY=...
SCRAPER_BASE_URL=https://api.someprovider.com
Common errors + fixes
401 Unauthorized: wrong key, key disabled, or using the wrong base URL.
IP/rate limits: ingestion fails intermittently.
Fix: throttle requests, add retries with exponential backoff, queue jobs, and log failures.
Blocked pages / captchas: common with social platforms.
Fix: use a provider that supports rendering/proxies, or rely on platform APIs where possible.
D) Billing (optional, but common) — Stripe
If you’re paywalling access:
STRIPE_SECRET_KEY=sk_test_...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...
# Your app URL (important for webhooks and redirects)
NEXT_PUBLIC_APP_URL=http://localhost:3000
Common errors + fixes
Webhook signature verification fails:
STRIPE_WEBHOOK_SECRETwrong, or you parse the body before verifying.Fix: use Stripe’s raw body verification pattern for Next.js route handlers.
Test/live mismatch: using live publishable key with test secret key.
Fix: keep environments consistent.
E) Vercel deployment env parity
When you deploy, copy the same variables into Vercel:
Vercel Project → Settings → Environment Variables
Add variables for Production, Preview, and optionally Development.
Common errors + fixes
Works locally, fails on Vercel: missing env vars in Vercel (very common).
Fix: ensure every required env var exists in Vercel environments.
Wrong
NEXT_PUBLIC_APP_URLin prod: causes bad redirects / webhook URLs.Fix: set
NEXT_PUBLIC_APP_URLto your Vercel domain in production.
3) Configuration steps inside each service
Clerk configuration
Add Allowed Redirect URLs:
Local:
http://localhost:3000/*Prod:
https://your-domain.com/*
If using webhooks (recommended for syncing users/subscriptions):
Create webhook endpoint (e.g.,
/api/webhooks/clerk)Copy
whsec_...intoCLERK_WEBHOOK_SECRET
Common error
Webhook never fires: endpoint URL wrong or not publicly reachable.
Fix: use a stable ngrok URL in local dev, or test on Vercel Preview/Prod.
Supabase configuration
Create tables (
creators,lists,list_items, etc.)Enable RLS and create policies:
creators: allow read to authenticated users (or public if you want)lists/list_items: owner-only access
(If using Clerk) Decide how you map users:
Store Clerk user id in your
userstable (string) or map to UUID.
Common errors
Type mismatch (uuid vs string) on user id fields:
Fix: pick one approach early. A common pattern is:
users.clerk_user_id TEXT UNIQUEand internalid UUID.
RLS blocks inserts:
Fix: add INSERT/UPDATE policies or do writes in server routes using the service role key.
Vercel configuration
Set environment variables in Vercel.
Confirm build uses Node 20 (Vercel generally does; you can also set
"engines"inpackage.json):{ "engines": { "node": ">=20" } }
Common errors
Build fails due to missing env vars: fix by adding them in Vercel.
Edge vs Node runtime mismatches (webhook libraries, crypto):
Fix: ensure webhook route uses Node runtime if required.
4) A quick “sanity checklist” to catch setup issues fast
Run these checks before building features:
Auth loads: visit
/sign-in→ can sign in.Protected routes: visit
/appwhile signed out → should redirect to sign-in.Supabase connectivity: server route can query
creatorswithout crashing.RLS behavior: client-side read works (if intended) and user-owned tables are protected.
Ingestion key works: a single “test scrape” route returns a 200 with expected JSON.
5) The most common setup mistakes (and the fixes)
Putting secrets in
NEXT_PUBLIC_*
✅ Fix: only publishable keys getNEXT_PUBLIC_. Service keys never do.RLS not understood (everything returns 401/403)
✅ Fix: add minimal policies or move queries server-side with service role.Mismatch between local/prod redirect URLs
✅ Fix: set allowed redirect URLs in Clerk + ensureNEXT_PUBLIC_APP_URLcorrect.Environment variables missing on Vercel
✅ Fix: copy.env.localvalues into Vercel env var settings (Production + Preview).Test/live key mismatch (Clerk/Stripe)
✅ Fix: keep test keys together and live keys together; don’t mix.
If you tell me which scraping provider you’re using and whether you’re deploying to a custom domain or just Vercel, I can give you an exact env var template + the recommended webhook URLs and “allowed origins” values for that setup.
Below are practical, technical instructions for building the data ingestion “database” layer (the part Paulius said he spent months building, using an “API for scraping” + Supabase).
1) Ingestion goals and constraints (define before coding)
What you’re trying to build
A pipeline that continuously produces/updates rows in creators (and optionally creator_posts) so your app has:
enough creators to be useful on day 1
“fresh enough” stats to trust (followers/views/engagement)
consistent fields across platforms (TikTok/YouTube/IG/X)
Constraints you must decide now (or you’ll rewrite later)
Platform scope: start with one platform (cheapest + simplest).
Data freshness: how often to refresh (e.g., every 7–30 days).
Cost ceiling: scraping can get expensive fast.
Compliance: avoid collecting/selling sensitive personal data; treat emails as “only if publicly listed”.
2) Database schema: ingestion-ready fields
Minimum creators columns to support ingestion + refresh:
Identity:
platform(enum)handle(string)profile_urlplatform_creator_id(string, if available)
Content:
display_name,bio,avatar_urlcategory_tags(array or join table)country,language
Metrics:
followers,avg_views,engagement_rate
Ingestion metadata (critical):
created_at,updated_atlast_scraped_atscrape_status(ok|failed|blocked)scrape_fail_countscrape_last_error(text)
Hard requirement: create a unique constraint like:
UNIQUE(platform, handle)
This makes “upsert” reliable and prevents duplicates.
3) The ingestion flow (3-stage pipeline)
Think of ingestion as Discover → Fetch → Normalize/Upsert.
Stage A — Discover (build a queue of creators to scrape)
You need a list of profile URLs/handles. Common discovery methods:
Keyword/niche discovery: search by niche terms (“fitness”, “SaaS”, “marketing”).
Seed accounts: start from known creators, expand via “similar creators”.
Hashtag discovery: find accounts posting under hashtags.
Output: a queue table (or task list) of candidates:
platform,profile_url,handle(maybe blank initially),priority
Tip: Keep discovery separate from profile scraping. Discovery is noisy; profile scraping should be clean and repeatable.
Stage B — Fetch (scrape profile + optional posts)
Use your “API for scraping” provider (as mentioned in the video).
Profile scrape should return at least:
handle, display name, bio
follower count
links/contact fields that are publicly visible
profile avatar
country/language if available
Optional post scrape (recommended for better sorting):
last 12–30 posts
views/likes/comments + timestamps
Then compute:avg_views= mean(views)engagement_rate= (likes+comments)/views or /followers (choose one and be consistent)
Stage C — Normalize + Upsert into Supabase
Normalize raw data from each platform into your schema.
Rules
Convert counts to integers (handle “1.2M”, “45K” properly)
Trim/clean bios (remove zero-width chars)
Set missing metrics to
NULLnot0(0 looks like “real” data)Use upsert keyed by
(platform, handle)or(platform, platform_creator_id)if stable
Recommended upsert behavior
Always update “volatile” fields (followers, avg_views, engagement_rate, avatar_url)
Only update “semi-stable” fields (bio/display_name) if changed
Preserve manual overrides (e.g., curated tags) by separating them into a different table
4) Scheduling: initial load vs refresh jobs
You’ll run two job types:
A) Initial load
Goal: populate first 1k–10k creators.
Run in batches (e.g., 50–200 at a time).
Stop when cost or quality threshold is met.
B) Refresh
Goal: keep creators fresh.
Pick candidates by:
last_scraped_atoldest firstor “popular creators” more frequently (faster drift)
Example schedule:
top creators weekly
long tail monthly
5) Implementation blueprint (how to wire it)
Minimal job setup (works on Vercel)
Cron trigger (Vercel Cron) hits
/api/jobs/refresh.That endpoint:
selects N creators to refresh
enqueues tasks (or processes sequentially if small)
Worker function:
calls scraper API
normalizes
upserts to Supabase
updates
last_scraped_at,scrape_status, error fields
Better setup (recommended once it grows)
Use a queue (Upstash/QStash, etc.)
One “scheduler” enqueues tasks
Many workers process tasks, rate-limited per platform
6) Common errors at the ingestion stage (and fixes)
1) Duplicates everywhere
Symptoms
Same creator appears multiple times with slight handle differences.
CausesNo unique constraint, or you key on unstable IDs/URLs.
FixEnforce
UNIQUE(platform, handle)and normalize handle casing (lower(handle)).Prefer stable platform IDs if available.
2) Scraper returns inconsistent data (“1.2M”, “1,200,000”, “—”)
Symptoms
Sorting is broken, analytics nonsense.
FixBuild a strict “parse counts” utility and unit test it.
Treat “—/N/A” as
NULL.
3) Getting blocked / CAPTCHA / 403 spikes
Symptoms
Large waves of failures, then recovery.
FixAdd:
exponential backoff
retries with jitter
rate limits per platform
rotate proxies / use provider’s anti-bot options
Track
scrape_fail_countand cool down creators after X failures.
4) Cost explosion
Symptoms
Scraping bill grows faster than revenue.
FixMake refresh selective:
refresh only creators that appear in user searches/lists
refresh top creators more often, long tail less often
Cache post-scrapes: don’t fetch posts every time.
5) RLS / permissions block your upserts
Symptoms
401/403 from Supabase when writing.
FixDo ingestion server-side using service role key (never client-side).
Or create a dedicated Supabase Edge Function with elevated permissions.
6) Rate-limited database writes (or slow queries)
Symptoms
Upserts time out; refresh jobs never finish.
FixBatch writes (upsert arrays of 50–500)
Add indexes for search/filter fields
Avoid updating unchanged columns (saves write amplification)
7) Data quality issues ruin trust (wrong profiles, wrong niche tags)
Symptoms
Users complain results aren’t relevant.
FixAdd validation rules:
minimum follower threshold
language/country detection sanity checks
Add an internal “review queue” for flagged creators.
7) Quick “first database” plan (fastest way to ship)
Choose one platform
Build discovery from 20–50 niche keywords
Scrape ~2,000 profiles (no posts yet)
Add keyword search + filters
Only then add posts + engagement calculations
If you tell me which platform you’re starting with and which scraping provider/API you plan to use, I can turn this into a concrete runbook with: exact endpoints to create, queue table schema, batch sizes, retry rules, and a recommended refresh schedule.
Below is a Security & Reliability checklist specifically for a Creator Hunter–style SaaS (Next.js + Clerk + Supabase + scraping ingestion), focused on the stage after your app “works” but before you trust it with users, payments, and ongoing ingestion. This aligns with the risks implicit in Paulius’s stack and workflow.
Security & Reliability Checklist (with failure modes)
1) Secrets & environment safety (highest priority)
✅ Checklist
All service secrets (Supabase service role, scraping API keys, Stripe secret key) are:
server-only
never prefixed with
NEXT_PUBLIC_
.env.localis git-ignoredVercel has separate env vars for:
Production
Preview
Secrets rotated at least once before launch
❌ Common errors
Error: Service role key exposed to the browser
Happens when devs accidentally prefix with
NEXT_PUBLIC_Result: full database compromise
Fix
Audit env vars:
NEXT_PUBLIC_*→ safe to exposeeverything else → server-only
Run a repo-wide search for leaked keys before launch
2) Authentication & authorization boundaries
✅ Checklist
All
/app/*routes are protected by Clerk middlewareServer routes never trust client input for identity
User access is enforced in both:
API logic
Supabase Row Level Security (RLS)
Ownership checks exist for:
lists
saved creators
exports
❌ Common errors
Error: “It works locally but users can see each other’s data”
Cause: missing or incorrect RLS policies
Fix
Enforce deny-by-default RLS
Explicitly allow:
SELECTon shared tables (e.g., creators)SELECT/INSERT/UPDATE/DELETEonly whereuser_id = auth.uid()
3) Paywall & entitlement enforcement
✅ Checklist
Feature gating happens:
server-side (authoritative)
not just in the UI
Subscription status cached short-term (e.g., 5–10 min)
Webhooks update entitlements asynchronously
Graceful downgrade if billing fails
❌ Common errors
Error: Users bypass paywall via direct API calls
Cause: paywall only enforced in frontend
Fix
Check entitlements in every server action:
exports
full contact access
advanced filters
4) Scraping & ingestion security
✅ Checklist
Ingestion runs only server-side
Scraper endpoints are:
authenticated
not publicly callable
Rate limits enforced per platform
Errors are logged with:
timestamp
platform
creator identifier
❌ Common errors
Error: Public API route lets anyone trigger scraping
Result: massive scraping bill or IP bans
Fix
Protect ingestion routes with:
secret token
or internal cron-only access
Never expose ingestion endpoints to the client
5) Input validation & injection safety
✅ Checklist
All user input validated with a schema (e.g., Zod):
search filters
text fields
exports
SQL queries are parameterized
Free-text search sanitized
❌ Common errors
Error: Search breaks or returns nonsense
Cause: unvalidated filters (NaN ranges, empty arrays)
Fix
Validate before building queries
Default invalid input to safe values (not empty queries)
6) Rate limiting & abuse prevention
✅ Checklist
Rate limits on:
search endpoints
exports
ingestion triggers
Pagination enforced (no unlimited queries)
Hard caps on export size
❌ Common errors
Error: One user tanks performance with massive queries
Cause: missing limits + full table scans
Fix
Enforce:
max page size
max export rows
Add indexes for all filterable fields
7) Database reliability & performance
✅ Checklist
Indexes exist for:
platform
follower count
country/language
full-text search
Batch upserts used for ingestion
Timeouts configured for long jobs
❌ Common errors
Error: Ingestion jobs randomly fail or time out
Cause: row-by-row writes, no batching
Fix
Upsert in batches (50–500 rows)
Skip unchanged rows when possible
8) Error handling & observability
✅ Checklist
Every server route:
catches errors
returns non-leaky messages
Logs include:
request id
user id (if authenticated)
job id (for ingestion)
Alerts for:
webhook failures
ingestion failure spikes
❌ Common errors
Error: “Something went wrong” with no clue why
Cause: swallowed errors or console-only logs
Fix
Centralize logging
Store ingestion errors in DB (
scrape_last_error)Track failure counts
9) Data integrity & trustworthiness
✅ Checklist
Distinguish between:
scraped data
manually curated data
Null used instead of fake zeros
Last-updated timestamps shown (or tracked internally)
❌ Common errors
Error: Users lose trust due to stale/wrong data
Cause: no freshness tracking
Fix
Track
last_scraped_atRefresh high-traffic creators more often
Deprioritize long-tail refreshes
10) Backup & recovery
✅ Checklist
Supabase automatic backups enabled
Ability to:
disable ingestion
revoke keys
pause exports
Manual “kill switch” env flag exists
❌ Common errors
Error: Bad ingestion corrupts large parts of DB
Cause: no rollback or isolation
Fix
Write ingestion into staging tables first (optional)
Validate before merging into main tables
11) Pre-launch security sanity test (do this once)
Before launch, answer yes to all:
Can a logged-in user access another user’s list? → No
Can a free user export full data? → No
Can I trigger ingestion from the browser console? → No
If Stripe/Clerk goes down, does the app fail safely? → Yes
If scraping fails for 24h, does the app still function? → Yes
The meta-mistake most builders make
They assume:
“Because it’s just a directory / SaaS MVP, security can wait.”
In reality, scraping + subscriptions + search is exactly the combo that:
leaks data
racks up surprise bills
destroys trust early
If you want, I can:
turn this into a launch-day checklist
provide example RLS policies
or create a threat model specific to scraping + paid exports