How to Integrate AI Into an Existing Website: A Developer's Guide

A practical guide to adding AI to an existing website—chatbots, RAG over your own content, and natural-language dashboards—with real Next.js code, model and cost trade-offs, and security pitfalls to avoid.

"Can we just add AI to our website?" is the most common question I get from clients right now. The honest answer: yes, and it's far less work than most people expect—if you scope it correctly. The mistake teams make is treating "add AI" as one giant project instead of choosing the one feature that actually moves a metric.

This guide walks through the realistic ways to add AI to a site you already run, then builds the most popular one—a retrieval-augmented chatbot that answers from your content—with real Next.js code.

What "adding AI" actually means

There are four distinct features hiding behind the phrase "add AI," and they have very different costs:

Feature	What it does	Effort	Best for
Support chatbot (RAG)	Answers questions from your docs/content	Medium	Reducing support load
Smart search	Semantic search instead of keyword match	Medium	Content-heavy sites, docs
Content generation	Drafts copy, summaries, replies	Low	Internal tooling, dashboards
Natural-language dashboard	"Show me last month's revenue by region" → query + chart	High	Internal analytics

Pick one. A focused chatbot that deflects 30% of support tickets is worth more than four half-finished features. You can always add the next one once the first proves its value.

If you want help scoping which feature fits your business, that's exactly what my AI integration service is built around.

The architecture (it's simpler than you think)

Every one of these features follows the same shape. You never call a model directly from the browser—your API does it, so your keys stay secret and you control cost.

plaintext

Browser widget  →  Your API route  →  LLM (Claude / GPT)
                        │
                        └──→ Vector DB (for RAG/search)

For a site already on Next.js, the "API route" is just a route handler. Nothing new to deploy.

Building a RAG chatbot over your own content

"RAG" (retrieval-augmented generation) is the technique that lets a model answer from your data instead of its training. Without it, a model confidently invents your refund policy. With it, the model is handed your real policy and asked to answer using only that.

It has two phases: ingestion (one-time, when content changes) and querying (every message).

Phase 1: Ingest your content

Chunk your pages into ~500-token pieces, turn each into an embedding (a vector), and store it. I'll use Supabase with the pgvector extension because it's a Postgres table you already know how to manage.

sql

-- one-time setup
create extension if not exists vector;
 
create table documents (
  id bigserial primary key,
  content text,
  embedding vector(1536)  -- matches text-embedding-3-small
);

// scripts/ingest.ts — run when your content changes
import OpenAI from 'openai';
import { createClient } from '@supabase/supabase-js';
 
const openai = new OpenAI();
const supabase = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_KEY!);
 
async function embed(text: string) {
  const res = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  });
  return res.data[0].embedding;
}
 
export async function ingest(chunks: string[]) {
  for (const content of chunks) {
    const embedding = await embed(content);
    await supabase.from('documents').insert({ content, embedding });
  }
}

Phase 2: Answer a question

When a user asks something, embed their question, find the most similar chunks, and hand them to Claude as context. Define a similarity search function in Postgres:

sql

create or replace function match_documents(query_embedding vector(1536), match_count int)
returns table (content text, similarity float)
language sql stable as $$
  select content, 1 - (embedding <=> query_embedding) as similarity
  from documents
  order by embedding <=> query_embedding
  limit match_count;
$$;

Then the route handler that ties it together:

// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk';
import OpenAI from 'openai';
import { createClient } from '@supabase/supabase-js';
 
const anthropic = new Anthropic();
const openai = new OpenAI();
const supabase = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_KEY!);
 
const MODEL = 'claude-haiku-4-5-20251001'; // fast + cheap for support; see Anthropic's docs for current model IDs
 
export async function POST(req: Request) {
  const { question } = await req.json();
 
  // 1. Embed the question and retrieve relevant chunks
  const { data: emb } = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: question,
  });
  const { data: docs } = await supabase.rpc('match_documents', {
    query_embedding: emb[0].embedding,
    match_count: 5,
  });
 
  const context = (docs ?? []).map((d: { content: string }) => d.content).join('\n\n');
 
  // 2. Ask Claude to answer using ONLY that context
  const message = await anthropic.messages.create({
    model: MODEL,
    max_tokens: 1024,
    system:
      'You are a support assistant. Answer using only the provided context. ' +
      'If the answer is not in the context, say you do not know and offer to connect them to a human.',
    messages: [
      { role: 'user', content: `Context:\n${context}\n\nQuestion: ${question}` },
    ],
  });
 
  const answer = message.content[0].type === 'text' ? message.content[0].text : '';
  return Response.json({ answer });
}

That's a working RAG chatbot backend in ~40 lines. The frontend is a simple widget that POSTs to /api/chat and renders the reply—if your site is already on Next.js or React, you can reuse the same patterns from my Web3 wallet integration guide for the client-side fetch and state handling.

Why split embeddings (OpenAI) and generation (Claude)? You don't have to. I use a dedicated embeddings model for retrieval because it's cheap and accurate, and Claude for the answer because it follows the "only use this context" instruction reliably. Use whatever stack you already pay for.

Choosing a model (and controlling cost)

The model choice is where projects quietly blow their budget. Match the model to the job:

Job	Model tier	Why
Support chat, classification, extraction	Small/fast (e.g. Claude Haiku)	Cheap, low latency, plenty smart
Summaries, content drafting	Mid (e.g. Claude Sonnet)	Better writing, still affordable
Complex multi-step reasoning, agents	Large (e.g. Claude Opus)	Use only where it earns its cost

Two cost levers that matter more than the model:

Cache your system prompt. If you send the same instructions on every request, prompt caching cuts input cost dramatically.
Cap max_tokens. A support bot rarely needs 1,000 tokens. Smaller caps mean lower bills and faster replies.

The security pitfalls nobody mentions

This is where I see the most damage in production:

Never expose your API key client-side. Calls go through your server route, always. A key in frontend code will be scraped and drained within hours.
Rate-limit per user/IP. Without it, one bad actor can run up a five-figure bill overnight. Add a limiter at the route.
Defend against prompt injection. If your RAG content includes user-generated text (reviews, comments), treat it as untrusted—never let retrieved text override your system instructions, and don't give the model tools it can misuse.
Don't send secrets as context. Whatever you retrieve and pass in can end up in a response. Keep PII and credentials out of the indexed content.
Log and monitor. Track tokens, latency, and "I don't know" rates so you can see quality and cost drift early.

Build it yourself or hire out?

Build it yourself if you have a developer comfortable with API routes, you're starting with one feature, and you can own the ongoing tuning. The code above is genuinely most of a v1.

Bring in help if you need RAG quality tuned (chunking, reranking, evals), a natural-language dashboard that writes safe SQL, or production-grade cost/security hardening. That's the work behind my AI chatbot and natural-language dashboard services—the last 20% that separates a demo from something you'd put in front of customers.

Frequently asked questions

How much does it cost to add an AI chatbot to a website? The model usage for a small-to-mid support bot is typically a few cents per conversation—often $20–$200/month depending on volume. The larger cost is engineering: a focused v1 is days, not months. Choosing a small model and capping output keeps usage bills low.

Can I add AI without rebuilding my site? Yes. For most stacks you add one backend endpoint and one frontend widget. Nothing about your existing pages, CMS, or database has to change.

What's the difference between a chatbot and RAG? A plain chatbot answers from the model's training data and will guess about your business. RAG retrieves your actual content first and constrains the model to answer from it—accurate, and it cites your real information.

Do I need a vector database? Only for RAG and semantic search. Simple content generation or a Q&A bot over a tiny, fixed FAQ can skip it. When you do need one, pgvector on Postgres (e.g. Supabase) is usually enough before reaching for a dedicated service.

Which is better, Claude or GPT, for a website assistant? Both are excellent. I default to Claude for instruction-following ("answer only from this context") and use a small, fast tier for support workloads. The right answer is whichever fits your existing tooling and budget—the architecture above works with either.

Thinking about adding AI to your site? I help teams ship focused, production-ready AI features—see AI integration for existing websites or get in touch.