I Built an App with Cursor, Made $30K, and Quit My Job.

Step-by-step “vibe coded” Cursor prompts to build the Creator Hunter–style app

Goal: build an app that matches startups with influencers (creator directory + search + save lists + paywall), using Next.js + Clerk + Supabase, deployed on Vercel, with an ingestion job using a scraping API—the same general flow described in the video (Bolt for MVP → Cursor for production polish + Supabase + Clerk).

Phase 0 — Cursor setup prompts (project guardrails)

Prompt 0.1 — “Set rules + scope”

Cursor prompt:

You are my senior full-stack engineer. We’re building “Creator Hunter”: a SaaS that matches startups with influencers.
Stack: Next.js (App Router) + TypeScript + Tailwind + shadcn/ui, Clerk auth, Supabase Postgres, Stripe (optional), Vercel deploy, and a scraping API for ingestion.
Output: propose the folder structure, main routes, DB schema, and a 10-step implementation plan. Keep it MVP-first and production-safe (no secrets in client, RLS policies, server routes only for privileged actions).
Ask clarifying questions ONLY if absolutely required—otherwise make reasonable assumptions and proceed.

✅ Expected output: a clear plan + structure. Save it in your repo as docs/plan.md.

Phase 1 — Create the Next.js app + UI system

Prompt 1.1 — “Bootstrap app”

Create a new Next.js app (App Router) called creator-hunter with TypeScript, Tailwind, and ESLint. Add shadcn/ui.
Add basic layout, top nav, and an /app route group for authenticated pages and / for marketing placeholder.
Provide the exact commands + files to create/edit.

Prompt 1.2 — “Install deps + config”

Install and configure dependencies: @clerk/nextjs, @supabase/supabase-js, zod.
Add a safe env var template in .env.example and a .env.local placeholder.
Ensure no server secrets are used in client components.

Common error this avoids: accidentally exposing service keys via NEXT_PUBLIC_*.

Phase 2 — Clerk authentication + route protection

Prompt 2.1 — “Add Clerk auth”

Integrate Clerk with Next.js App Router:

add Clerk provider in root layout

add middleware to protect /app/*

create /sign-in and /sign-up pages using Clerk components

after login redirect to /app/search
Show all code changes.

Prompt 2.2 — “Auth sanity test”

Add a simple /app dashboard page that displays the signed-in user’s email and a sign-out button.
Ensure unauthenticated users are redirected to /sign-in.

Common error + fix: login redirect loops → check middleware matcher + Clerk URLs.

Phase 3 — Supabase database + schema

Prompt 3.1 — “Design schema for creators + lists”

Create a Supabase SQL migration for these tables:

creators (platform, handle, display_name, bio, avatar_url, country, language, followers, avg_views, engagement_rate, contact_email nullable, contact_url nullable, last_scraped_at, scrape_status, scrape_fail_count, scrape_last_error)

lists (user_id, name)

list_items (list_id, creator_id, note)
Add unique constraint on (platform, handle).
Add indexes for filtering/search (platform, followers, country, and a full text index on bio/display_name/handle).
Output: supabase/migrations/0001_init.sql

Prompt 3.2 — “RLS policies”

Enable RLS and write SQL policies:

creators: readable by authenticated users

lists: only owner can CRUD

list_items: only owner (via list ownership) can CRUD
Output: supabase/migrations/0002_rls.sql
Keep policies strict and deny-by-default.

Common errors + fixes

  • “403 RLS violation” when reading/writing → policy missing or auth.uid() mismatch.

  • Duplicates in creators → missing unique constraint, or handle normalization.

Phase 4 — Supabase clients (server vs browser)

Prompt 4.1 — “Supabase client setup”

Create lib/supabase/client.ts for browser use with anon key and lib/supabase/server.ts for server-only use.
Server client must support reading the current Clerk user and (optionally) using service role ONLY in server routes.
Add comments warning never to expose service keys.

Common error: using service role key in client bundle.

Phase 5 — Creator search (core product)

Prompt 5.1 — “Build /app/search page UI”

Build /app/search page with a filter sidebar and results table/grid:
Filters: platform, followers min/max, country, keyword.
Results show: avatar, name, handle, followers, engagement_rate, tags (optional).
Use shadcn components. Keep design clean, modern, cohesive.

Prompt 5.2 — “Implement search query server-side”

Implement a server action or route handler that accepts validated filters (Zod) and queries Supabase with pagination.
Ensure it’s safe, parameterized, and fast.
Wire the UI to call it and render results with loading + empty states.

Common errors + fixes

  • Slow queries → missing indexes or returning too many columns/rows.

  • Broken filters → missing Zod validation; use defaults.

Phase 6 — Creator detail page + paywall

Prompt 6.1 — “Creator profile route”

Create /app/creator/[id] page that shows profile details and recent metrics.
Add “Save to list” button.
If user is not paid (we’ll implement later), hide/blur contact fields and show an Upgrade CTA.

Prompt 6.2 — “Entitlements stub”

Create lib/entitlements.ts with isPro(userId) stub that returns false for now.
Use it to gate contact fields and export actions (server-side, not just UI).

Common error: paywall bypass via direct API calls → always enforce server-side.

Phase 7 — Lists (shortlist workflow)

Prompt 7.1 — “Lists CRUD”

Build /app/lists page: create list, rename list, delete list, and view list items.
Add list selector modal from search results to save creators into a list.
Enforce ownership server-side and rely on RLS as a second layer.

Prompt 7.2 — “Notes per saved creator”

Add note editing for list items with optimistic UI.
Validate inputs and handle errors gracefully.

Phase 8 — Billing (optional but typical)

Prompt 8.1 — “Stripe checkout”

Add Stripe billing with a simple Pro plan:

/app/billing page

create checkout session server-side

webhook route to update subscriptions table

isPro checks subscription status
Include safe handling for test vs live keys and webhook signature verification.

Common errors + fixes

  • Webhook signature fails → need raw body verification in Next route handlers.

  • Test/live mismatch → keep keys consistent per environment.

Phase 9 — Data ingestion (“the database”) with scraping API

Prompt 9.1 — “Ingestion tables + job design”

Add ingestion support:

create table ingestion_queue (platform, profile_url, handle nullable, priority, status, attempts, last_error, next_run_at)

write a job runner design: Discover → Fetch → Normalize → Upsert creators

include retry with exponential backoff and rate limiting
Output SQL migration + a /api/jobs/run-ingestion route handler (server-only).

Prompt 9.2 — “Scraper integration”

Implement lib/scraper.ts that calls SCRAPER_API (provider-agnostic wrapper).
Include robust error handling: timeouts, 401/429/403, and structured logs.
Add a “dry run” mode to test a single profile scrape without writing to DB.

Prompt 9.3 — “Normalization + upsert”

Implement lib/normalize.ts to normalize scraped data into our schema.
Include parsing for counts like “1.2M”, “45K”, commas, and missing values.
Upsert into creators using unique key (platform, handle).
Update last_scraped_at and scrape status fields.

Common ingestion errors + fixes

  • 403/blocked/captcha: add backoff + provider anti-bot options + cooldown per creator.

  • Cost explosion: refresh selectively; don’t scrape posts every time.

  • Data inconsistency: strict parsing + set NULLs, not 0.

Phase 10 — Reliability hardening before launch

Prompt 10.1 — “Rate limiting + abuse controls”

Add rate limiting to search and export endpoints (and ingestion triggers).
Enforce max page size and max export rows.
Add caching for common searches if helpful.

Prompt 10.2 — “Observability”

Add structured logging for server routes and ingestion jobs: request id, user id, job id.
Store ingestion failures in DB fields.
Add an admin-only page /app/admin/ingestion to view queue + recent failures.

Prompt 10.3 — “Security audit”

Perform a security review of the repo:

ensure no secrets are exposed

ensure server actions and API routes enforce auth + entitlements

confirm RLS policies are correct

confirm ingestion routes cannot be triggered by the public
Produce a checklist and recommended fixes.

“When Cursor gets stuck” prompt (use anytime)

You introduced an error. Diagnose it step-by-step.
First: explain the root cause in 1–2 sentences.
Then: provide the minimal diff to fix it.
Finally: add a regression check (test or manual steps).
Do not rewrite unrelated files.

If you want, tell me which platform you’re ingesting first (TikTok / YouTube / IG / X) and whether you want billing in MVP, and I’ll tailor these prompts into an even tighter sequence (with exact file names and the order to run migrations/deploy).


environment, API keys and configuration

Below is a practical “setup + config” guide for a Creator Hunter–style app using the stack described in the video: Cursor/Bolt → Next.js, Supabase, Clerk, Vercel, plus an API for scraping, and optional Framer for the marketing site.

0) Prereqs

  • Node.js 20 LTS (recommended)

  • Git + GitHub

  • Accounts: Supabase, Clerk, Vercel

  • (Optional but common) Stripe for billing, Upstash/QStash or similar for jobs, Scraping provider (Apify/Bright Data/ScraperAPI/etc.)

1) Project bootstrap (local environment)

  1. Create project:

    npx create-next-app@latest creator-hunter --ts --eslint --tailwind --app
    cd creator-hunter
    
  2. Install dependencies you’ll almost certainly need:

    npm i @clerk/nextjs @supabase/supabase-js zod
    
  3. Create .env.local (never commit this file).

2) Environment variables (what to put in .env.local)

A) Clerk (Auth)

From the Clerk dashboard → your application → API keys:

# Clerk (public keys are safe in browser)
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_test_...
CLERK_SECRET_KEY=sk_test_...

# Clerk URLs (recommended defaults)
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up
NEXT_PUBLIC_CLERK_AFTER_SIGN_IN_URL=/app
NEXT_PUBLIC_CLERK_AFTER_SIGN_UP_URL=/app

# Webhooks (if using)
CLERK_WEBHOOK_SECRET=whsec_...

Common errors + fixes

  • “Missing Clerk publishable key”: you forgot NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY or misspelled it.

  • Wrong environment keys: using pk_live locally or mixing test/live keys. Keep a clean separation.

  • Redirect loop after login: mismatch between Clerk redirect URLs and your middleware/protected routes. Ensure /app is protected and /sign-in is not.

B) Supabase (DB/API)

From Supabase project settings → API:

# Public URL + anon key can be used client-side
NEXT_PUBLIC_SUPABASE_URL=https://xxxx.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJ...

# Service role key MUST stay server-side only (never expose to browser)
SUPABASE_SERVICE_ROLE_KEY=eyJ...

Common errors + fixes

  • 403 / RLS errors when querying tables: Row Level Security is enabled but you haven’t added policies.

    • Fix: add minimal SELECT policies for creators (and proper owner-based policies for lists, etc.), or use server routes with the service role key for admin operations.

  • Accidentally shipping service role key to the client: any variable starting with NEXT_PUBLIC_ is exposed.

    • Fix: never prefix service keys with NEXT_PUBLIC_. Keep them server-only.

  • Supabase client using the wrong key: you used service role key in browser or anon key on server when you needed elevated permissions.

    • Fix: use anon key in client; use service role key only in server routes/actions.

C) Scraping API (data ingestion)

Pick a provider and store its key:

SCRAPER_API_KEY=...
SCRAPER_BASE_URL=https://api.someprovider.com

Common errors + fixes

  • 401 Unauthorized: wrong key, key disabled, or using the wrong base URL.

  • IP/rate limits: ingestion fails intermittently.

    • Fix: throttle requests, add retries with exponential backoff, queue jobs, and log failures.

  • Blocked pages / captchas: common with social platforms.

    • Fix: use a provider that supports rendering/proxies, or rely on platform APIs where possible.

D) Billing (optional, but common) — Stripe

If you’re paywalling access:

STRIPE_SECRET_KEY=sk_test_...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=pk_test_...
STRIPE_WEBHOOK_SECRET=whsec_...

# Your app URL (important for webhooks and redirects)
NEXT_PUBLIC_APP_URL=http://localhost:3000

Common errors + fixes

  • Webhook signature verification fails: STRIPE_WEBHOOK_SECRET wrong, or you parse the body before verifying.

    • Fix: use Stripe’s raw body verification pattern for Next.js route handlers.

  • Test/live mismatch: using live publishable key with test secret key.

    • Fix: keep environments consistent.

E) Vercel deployment env parity

When you deploy, copy the same variables into Vercel:
Vercel Project → Settings → Environment Variables

  • Add variables for Production, Preview, and optionally Development.

Common errors + fixes

  • Works locally, fails on Vercel: missing env vars in Vercel (very common).

    • Fix: ensure every required env var exists in Vercel environments.

  • Wrong NEXT_PUBLIC_APP_URL in prod: causes bad redirects / webhook URLs.

    • Fix: set NEXT_PUBLIC_APP_URL to your Vercel domain in production.

3) Configuration steps inside each service

Clerk configuration

  1. Add Allowed Redirect URLs:

    • Local: http://localhost:3000/*

    • Prod: https://your-domain.com/*

  2. If using webhooks (recommended for syncing users/subscriptions):

    • Create webhook endpoint (e.g., /api/webhooks/clerk)

    • Copy whsec_... into CLERK_WEBHOOK_SECRET

Common error

  • Webhook never fires: endpoint URL wrong or not publicly reachable.

    • Fix: use a stable ngrok URL in local dev, or test on Vercel Preview/Prod.

Supabase configuration

  1. Create tables (creators, lists, list_items, etc.)

  2. Enable RLS and create policies:

    • creators: allow read to authenticated users (or public if you want)

    • lists/list_items: owner-only access

  3. (If using Clerk) Decide how you map users:

    • Store Clerk user id in your users table (string) or map to UUID.

Common errors

  • Type mismatch (uuid vs string) on user id fields:

    • Fix: pick one approach early. A common pattern is: users.clerk_user_id TEXT UNIQUE and internal id UUID.

  • RLS blocks inserts:

    • Fix: add INSERT/UPDATE policies or do writes in server routes using the service role key.

Vercel configuration

  1. Set environment variables in Vercel.

  2. Confirm build uses Node 20 (Vercel generally does; you can also set "engines" in package.json):

    { "engines": { "node": ">=20" } }
    

Common errors

  • Build fails due to missing env vars: fix by adding them in Vercel.

  • Edge vs Node runtime mismatches (webhook libraries, crypto):

    • Fix: ensure webhook route uses Node runtime if required.

4) A quick “sanity checklist” to catch setup issues fast

Run these checks before building features:

  1. Auth loads: visit /sign-in → can sign in.

  2. Protected routes: visit /app while signed out → should redirect to sign-in.

  3. Supabase connectivity: server route can query creators without crashing.

  4. RLS behavior: client-side read works (if intended) and user-owned tables are protected.

  5. Ingestion key works: a single “test scrape” route returns a 200 with expected JSON.

5) The most common setup mistakes (and the fixes)

  • Putting secrets in NEXT_PUBLIC_*
    ✅ Fix: only publishable keys get NEXT_PUBLIC_. Service keys never do.

  • RLS not understood (everything returns 401/403)
    ✅ Fix: add minimal policies or move queries server-side with service role.

  • Mismatch between local/prod redirect URLs
    ✅ Fix: set allowed redirect URLs in Clerk + ensure NEXT_PUBLIC_APP_URL correct.

  • Environment variables missing on Vercel
    ✅ Fix: copy .env.local values into Vercel env var settings (Production + Preview).

  • Test/live key mismatch (Clerk/Stripe)
    ✅ Fix: keep test keys together and live keys together; don’t mix.

If you tell me which scraping provider you’re using and whether you’re deploying to a custom domain or just Vercel, I can give you an exact env var template + the recommended webhook URLs and “allowed origins” values for that setup.


Below are practical, technical instructions for building the data ingestion “database” layer (the part Paulius said he spent months building, using an “API for scraping” + Supabase).

1) Ingestion goals and constraints (define before coding)

What you’re trying to build

A pipeline that continuously produces/updates rows in creators (and optionally creator_posts) so your app has:

  • enough creators to be useful on day 1

  • “fresh enough” stats to trust (followers/views/engagement)

  • consistent fields across platforms (TikTok/YouTube/IG/X)

Constraints you must decide now (or you’ll rewrite later)

  • Platform scope: start with one platform (cheapest + simplest).

  • Data freshness: how often to refresh (e.g., every 7–30 days).

  • Cost ceiling: scraping can get expensive fast.

  • Compliance: avoid collecting/selling sensitive personal data; treat emails as “only if publicly listed”.

2) Database schema: ingestion-ready fields

Minimum creators columns to support ingestion + refresh:

  • Identity:

    • platform (enum)

    • handle (string)

    • profile_url

    • platform_creator_id (string, if available)

  • Content:

    • display_name, bio, avatar_url

    • category_tags (array or join table)

    • country, language

  • Metrics:

    • followers, avg_views, engagement_rate

  • Ingestion metadata (critical):

    • created_at, updated_at

    • last_scraped_at

    • scrape_status (ok|failed|blocked)

    • scrape_fail_count

    • scrape_last_error (text)

Hard requirement: create a unique constraint like:

  • UNIQUE(platform, handle)
    This makes “upsert” reliable and prevents duplicates.

3) The ingestion flow (3-stage pipeline)

Think of ingestion as Discover → Fetch → Normalize/Upsert.

Stage A — Discover (build a queue of creators to scrape)

You need a list of profile URLs/handles. Common discovery methods:

  1. Keyword/niche discovery: search by niche terms (“fitness”, “SaaS”, “marketing”).

  2. Seed accounts: start from known creators, expand via “similar creators”.

  3. Hashtag discovery: find accounts posting under hashtags.

Output: a queue table (or task list) of candidates:

  • platform, profile_url, handle (maybe blank initially), priority

Tip: Keep discovery separate from profile scraping. Discovery is noisy; profile scraping should be clean and repeatable.

Stage B — Fetch (scrape profile + optional posts)

Use your “API for scraping” provider (as mentioned in the video).

Profile scrape should return at least:

  • handle, display name, bio

  • follower count

  • links/contact fields that are publicly visible

  • profile avatar

  • country/language if available

Optional post scrape (recommended for better sorting):

  • last 12–30 posts

  • views/likes/comments + timestamps
    Then compute:

  • avg_views = mean(views)

  • engagement_rate = (likes+comments)/views or /followers (choose one and be consistent)

Stage C — Normalize + Upsert into Supabase

Normalize raw data from each platform into your schema.

Rules

  • Convert counts to integers (handle “1.2M”, “45K” properly)

  • Trim/clean bios (remove zero-width chars)

  • Set missing metrics to NULL not 0 (0 looks like “real” data)

  • Use upsert keyed by (platform, handle) or (platform, platform_creator_id) if stable

Recommended upsert behavior

  • Always update “volatile” fields (followers, avg_views, engagement_rate, avatar_url)

  • Only update “semi-stable” fields (bio/display_name) if changed

  • Preserve manual overrides (e.g., curated tags) by separating them into a different table

4) Scheduling: initial load vs refresh jobs

You’ll run two job types:

A) Initial load

  • Goal: populate first 1k–10k creators.

  • Run in batches (e.g., 50–200 at a time).

  • Stop when cost or quality threshold is met.

B) Refresh

  • Goal: keep creators fresh.

  • Pick candidates by:

    • last_scraped_at oldest first

    • or “popular creators” more frequently (faster drift)

  • Example schedule:

    • top creators weekly

    • long tail monthly

5) Implementation blueprint (how to wire it)

Minimal job setup (works on Vercel)

  1. Cron trigger (Vercel Cron) hits /api/jobs/refresh.

  2. That endpoint:

    • selects N creators to refresh

    • enqueues tasks (or processes sequentially if small)

  3. Worker function:

    • calls scraper API

    • normalizes

    • upserts to Supabase

    • updates last_scraped_at, scrape_status, error fields

Better setup (recommended once it grows)

  • Use a queue (Upstash/QStash, etc.)

  • One “scheduler” enqueues tasks

  • Many workers process tasks, rate-limited per platform

6) Common errors at the ingestion stage (and fixes)

1) Duplicates everywhere

Symptoms

  • Same creator appears multiple times with slight handle differences.
    Causes

  • No unique constraint, or you key on unstable IDs/URLs.
    Fix

  • Enforce UNIQUE(platform, handle) and normalize handle casing (lower(handle)).

  • Prefer stable platform IDs if available.

2) Scraper returns inconsistent data (“1.2M”, “1,200,000”, “—”)

Symptoms

  • Sorting is broken, analytics nonsense.
    Fix

  • Build a strict “parse counts” utility and unit test it.

  • Treat “—/N/A” as NULL.

3) Getting blocked / CAPTCHA / 403 spikes

Symptoms

  • Large waves of failures, then recovery.
    Fix

  • Add:

    • exponential backoff

    • retries with jitter

    • rate limits per platform

    • rotate proxies / use provider’s anti-bot options

  • Track scrape_fail_count and cool down creators after X failures.

4) Cost explosion

Symptoms

  • Scraping bill grows faster than revenue.
    Fix

  • Make refresh selective:

    • refresh only creators that appear in user searches/lists

    • refresh top creators more often, long tail less often

  • Cache post-scrapes: don’t fetch posts every time.

5) RLS / permissions block your upserts

Symptoms

  • 401/403 from Supabase when writing.
    Fix

  • Do ingestion server-side using service role key (never client-side).

  • Or create a dedicated Supabase Edge Function with elevated permissions.

6) Rate-limited database writes (or slow queries)

Symptoms

  • Upserts time out; refresh jobs never finish.
    Fix

  • Batch writes (upsert arrays of 50–500)

  • Add indexes for search/filter fields

  • Avoid updating unchanged columns (saves write amplification)

7) Data quality issues ruin trust (wrong profiles, wrong niche tags)

Symptoms

  • Users complain results aren’t relevant.
    Fix

  • Add validation rules:

    • minimum follower threshold

    • language/country detection sanity checks

  • Add an internal “review queue” for flagged creators.

7) Quick “first database” plan (fastest way to ship)

  1. Choose one platform

  2. Build discovery from 20–50 niche keywords

  3. Scrape ~2,000 profiles (no posts yet)

  4. Add keyword search + filters

  5. Only then add posts + engagement calculations

If you tell me which platform you’re starting with and which scraping provider/API you plan to use, I can turn this into a concrete runbook with: exact endpoints to create, queue table schema, batch sizes, retry rules, and a recommended refresh schedule.

Below is a Security & Reliability checklist specifically for a Creator Hunter–style SaaS (Next.js + Clerk + Supabase + scraping ingestion), focused on the stage after your app “works” but before you trust it with users, payments, and ongoing ingestion. This aligns with the risks implicit in Paulius’s stack and workflow.

Security & Reliability Checklist (with failure modes)

1) Secrets & environment safety (highest priority)

✅ Checklist

  • All service secrets (Supabase service role, scraping API keys, Stripe secret key) are:

    • server-only

    • never prefixed with NEXT_PUBLIC_

  • .env.local is git-ignored

  • Vercel has separate env vars for:

    • Production

    • Preview

  • Secrets rotated at least once before launch

❌ Common errors

Error: Service role key exposed to the browser

  • Happens when devs accidentally prefix with NEXT_PUBLIC_

  • Result: full database compromise

Fix

  • Audit env vars:

    • NEXT_PUBLIC_* → safe to expose

    • everything else → server-only

  • Run a repo-wide search for leaked keys before launch

2) Authentication & authorization boundaries

✅ Checklist

  • All /app/* routes are protected by Clerk middleware

  • Server routes never trust client input for identity

  • User access is enforced in both:

    • API logic

    • Supabase Row Level Security (RLS)

  • Ownership checks exist for:

    • lists

    • saved creators

    • exports

❌ Common errors

Error: “It works locally but users can see each other’s data”

  • Cause: missing or incorrect RLS policies

Fix

  • Enforce deny-by-default RLS

  • Explicitly allow:

    • SELECT on shared tables (e.g., creators)

    • SELECT/INSERT/UPDATE/DELETE only where user_id = auth.uid()

3) Paywall & entitlement enforcement

✅ Checklist

  • Feature gating happens:

    • server-side (authoritative)

    • not just in the UI

  • Subscription status cached short-term (e.g., 5–10 min)

  • Webhooks update entitlements asynchronously

  • Graceful downgrade if billing fails

❌ Common errors

Error: Users bypass paywall via direct API calls

  • Cause: paywall only enforced in frontend

Fix

  • Check entitlements in every server action:

    • exports

    • full contact access

    • advanced filters

4) Scraping & ingestion security

✅ Checklist

  • Ingestion runs only server-side

  • Scraper endpoints are:

    • authenticated

    • not publicly callable

  • Rate limits enforced per platform

  • Errors are logged with:

    • timestamp

    • platform

    • creator identifier

❌ Common errors

Error: Public API route lets anyone trigger scraping

  • Result: massive scraping bill or IP bans

Fix

  • Protect ingestion routes with:

    • secret token

    • or internal cron-only access

  • Never expose ingestion endpoints to the client

5) Input validation & injection safety

✅ Checklist

  • All user input validated with a schema (e.g., Zod):

    • search filters

    • text fields

    • exports

  • SQL queries are parameterized

  • Free-text search sanitized

❌ Common errors

Error: Search breaks or returns nonsense

  • Cause: unvalidated filters (NaN ranges, empty arrays)

Fix

  • Validate before building queries

  • Default invalid input to safe values (not empty queries)

6) Rate limiting & abuse prevention

✅ Checklist

  • Rate limits on:

    • search endpoints

    • exports

    • ingestion triggers

  • Pagination enforced (no unlimited queries)

  • Hard caps on export size

❌ Common errors

Error: One user tanks performance with massive queries

  • Cause: missing limits + full table scans

Fix

  • Enforce:

    • max page size

    • max export rows

  • Add indexes for all filterable fields

7) Database reliability & performance

✅ Checklist

  • Indexes exist for:

    • platform

    • follower count

    • country/language

    • full-text search

  • Batch upserts used for ingestion

  • Timeouts configured for long jobs

❌ Common errors

Error: Ingestion jobs randomly fail or time out

  • Cause: row-by-row writes, no batching

Fix

  • Upsert in batches (50–500 rows)

  • Skip unchanged rows when possible

8) Error handling & observability

✅ Checklist

  • Every server route:

    • catches errors

    • returns non-leaky messages

  • Logs include:

    • request id

    • user id (if authenticated)

    • job id (for ingestion)

  • Alerts for:

    • webhook failures

    • ingestion failure spikes

❌ Common errors

Error: “Something went wrong” with no clue why

  • Cause: swallowed errors or console-only logs

Fix

  • Centralize logging

  • Store ingestion errors in DB (scrape_last_error)

  • Track failure counts

9) Data integrity & trustworthiness

✅ Checklist

  • Distinguish between:

    • scraped data

    • manually curated data

  • Null used instead of fake zeros

  • Last-updated timestamps shown (or tracked internally)

❌ Common errors

Error: Users lose trust due to stale/wrong data

  • Cause: no freshness tracking

Fix

  • Track last_scraped_at

  • Refresh high-traffic creators more often

  • Deprioritize long-tail refreshes

10) Backup & recovery

✅ Checklist

  • Supabase automatic backups enabled

  • Ability to:

    • disable ingestion

    • revoke keys

    • pause exports

  • Manual “kill switch” env flag exists

❌ Common errors

Error: Bad ingestion corrupts large parts of DB

  • Cause: no rollback or isolation

Fix

  • Write ingestion into staging tables first (optional)

  • Validate before merging into main tables

11) Pre-launch security sanity test (do this once)

Before launch, answer yes to all:

  • Can a logged-in user access another user’s list? → No

  • Can a free user export full data? → No

  • Can I trigger ingestion from the browser console? → No

  • If Stripe/Clerk goes down, does the app fail safely? → Yes

  • If scraping fails for 24h, does the app still function? → Yes

The meta-mistake most builders make

They assume:

“Because it’s just a directory / SaaS MVP, security can wait.”

In reality, scraping + subscriptions + search is exactly the combo that:

  • leaks data

  • racks up surprise bills

  • destroys trust early

If you want, I can:

  • turn this into a launch-day checklist

  • provide example RLS policies

  • or create a threat model specific to scraping + paid exports