What technologies does Refat Bhuyan specialise in?

Refat specialises in full-stack JavaScript: React, Next.js, Node.js, Express, and MongoDB (MERN stack). On the AI side he works with LangChain, OpenAI GPT-4o, RAG pipelines, and Pinecone. For cloud he deploys on AWS, Azure, GCP, and Vercel. He also builds MCP servers for AI agent tooling.

Is Refat Bhuyan available for hire or freelance projects?

Yes. Refat is open to remote full-time roles, senior contract work (1–6 months), and founding engineer conversations for early-stage products. He works with clients in the UK, US, EU, UAE, Australia, and Singapore. His timezone (GMT+6) overlaps well with UAE (GMT+4), UK (GMT+0/+1), and Australian mornings.

Can Refat Bhuyan build AI-powered applications using LangChain or OpenAI?

Yes. Refat has built production AI systems including a document-grounded customer support chatbot for a UK fintech client using LangChain, GPT-4o-mini, and a Pinecone vector store.

Does Refat Bhuyan work with international clients remotely?

Yes. Refat has been working remotely with international clients for over two years, currently full-time with Cunard Consulting Ltd in the UK. He is available for overlap calls with UK and US timezones.

What is Refat Bhuyan's typical project budget range?

Projects range from $500 for focused API integrations up to $5,000+ for full-stack SaaS builds. For ongoing retainer or contract work rates are discussed based on scope.

Can Refat Bhuyan rescue an AI project that is broken or stuck?

Yes — AI project rescue is one of the most common requests Refat handles. Many clients built 70-80% of a product using tools like Cursor, Claude Code, or ChatGPT and then hit walls: broken RAG pipelines, LangChain hallucinations, apps that work locally but fail in production. Refat audits the codebase, identifies root causes, re-architects where needed, and ships a working, production-ready system. Most rescues are completed in 1–4 weeks.

Does Refat Bhuyan build websites for small and local businesses?

Yes. Refat has helped local businesses — restaurants, retail shops, clinics, and service providers — launch and grow online. One example is EcoEats, a local food delivery business that grew from zero to over 150,000 customers after Refat built their full-stack platform with online ordering, SEO, and an AI-powered chatbot. Services include website builds, online booking/ordering systems, Google-optimised SEO, and 24/7 AI chatbots.

What is context engineering and does Refat Bhuyan offer it?

Context engineering is the practice of designing the full information context provided to AI models — including system prompts, retrieval strategies, tool definitions, memory architecture, and conversation structure — to maximise AI reliability and performance. It goes far beyond basic prompt engineering. Refat applies context engineering when building RAG systems, AI agents, MCP servers, and LangChain-based applications. He is available for consulting and implementation.

Can Refat Bhuyan fix or finish a half-built web application?

Yes. Project rescue — taking over incomplete, broken, or poorly built applications — is a core service. Refat performs a rapid code audit (usually within 48 hours), identifies the issues, proposes a fix plan, and executes it. He has rescued Next.js apps, Node.js APIs, React frontends, and AI integrations. A previous developer vanishing or going silent is a common starting point.

AI Engineering29 May 2026·7 min read

Context Engineering: Why Your AI App Keeps Getting It Wrong

Your AI app hallucinates, ignores your data, or gives confidently wrong answers. The problem is almost certainly not the model — it's your context architecture. Here's what context engineering actually is and a checklist to fix it.

context engineeringLangChainRAGAI developmentfix AI app

You gave your AI app careful instructions. You wrote the prompts. You connected it to your database. And it still returns wrong answers, hallucinates facts, or ignores the context you gave it.

The problem is almost certainly not the model.

It's your context architecture.

What "Context" Actually Means

When you send a message to an AI model, the model only knows three things:

What you told it to be (system prompt)
What data you retrieved and passed to it
The conversation history so far

That's it. The model has no memory. No instinct. No common sense beyond its training. Every response is only as good as what you put in the context window.

Context engineering is the discipline of designing exactly what goes into that window — and what stays out.

The Myths That Break Most AI Apps

Myth 1: "Better prompts will fix my AI app."

The reality: Prompts are about 10% of why an AI app works. The other 90% is what data you retrieve, when you retrieve it, how you chunk it, what you filter out, and how you structure conversation history.

A brilliant prompt with the wrong documents retrieved = wrong answer. A mediocre prompt with exactly the right context = correct answer.

I've seen this dozens of times. The developer rewrites the prompt 20 times. Marginally better. Then I look at the retrieval layer — it's fetching the three least relevant documents in the entire database.

Fix the retrieval. The prompt becomes almost irrelevant.

Myth 2: "More context = better answers."

The reality: This is one of the most damaging mistakes.

Models exhibit what researchers call "lost in the middle" behaviour — they attend strongly to the start and end of the context window, and largely ignore everything in between.

If you retrieve 20 documents when you need 2, the model frequently uses the wrong one.

The principle: give the model exactly what it needs, nothing more.

Good retrieval is about relevance, not volume.

Myth 3: "My retrieval is working, so why are answers wrong?"

The reality: Retrieval and generation are two separate failure modes.

Common generation failures even with good retrieval:

The retrieved text is too long and the key fact is buried in the middle
The question is ambiguous and the model picks the wrong interpretation
The context contains contradictions the model does not resolve
The model defaults to its training data instead of the retrieved content

Fix: Add a re-ranking step after retrieval. Score retrieved documents by relevance before passing to the model. Cut anything below threshold.

Myth 4: "AI hallucinations are a model problem — nothing I can do."

The reality: Hallucinations are almost always a context problem.

When a model hallucinates, it fills in gaps with plausible-sounding content. The gap exists because the context did not have the right information at the right time.

Solutions that actually work:

Grounding: Force the model to only answer from retrieved content — "If the answer is not in the provided documents, say: I don't have that information."
Citation: Require the model to cite which document it is drawing from. This forces precision and makes hallucinations instantly visible.
Confidence gates: If retrieval similarity is below 0.65, skip the LLM entirely and return a fallback response.

Myth 5: "Prompt engineering and context engineering are the same thing."

The reality:

Prompt engineering = how you phrase your instructions.
Context engineering = the entire information architecture of your AI system.

Context engineering includes:

Retrieval strategy — which documents to fetch, how many, by what method
Chunking — how to split documents (chunk size matters enormously for retrieval quality)
Re-ranking — scoring and filtering retrieved results before they reach the model
Memory architecture — what to remember across sessions, how to summarise long conversations
Tool definitions — how you describe available functions to the model
Conversation structure — how you format dialogue history passed to each API call
What you leave out — deliberately removing information that would confuse or distract

Prompt engineering is one tool inside this larger architecture.

The Diagnostic Checklist

If your AI app is behaving badly, run through this in order:

Retrieval

Are you retrieving the right documents? (Log what gets retrieved for failing queries)
Are your chunks the right size? (200–500 tokens is usually optimal)
Are you re-ranking retrieved results by relevance score?
Are you passing too many documents? (Try cutting to top 3)

Generation

Is the model instructed to only use retrieved content?
Are you requiring citations in responses?
Is the conversation history growing too long? (Add summarisation above 4,000 tokens)
Are there contradictions in your context?

Memory

Does the model need cross-session memory? (Add persistent store)
Is conversation history trimmed when it gets long?
Are you passing duplicate information?

Error handling

Do you have fallback responses when retrieval confidence is low?
Are you logging what context went into each failed response?
Do you have retry logic with exponential backoff?

What Good Context Engineering Looks Like

This pipeline works for most RAG applications:

User query
    ↓
Query rewriting        (rephrase for better retrieval)
    ↓
Retrieval              (top 10 candidates)
    ↓
Re-ranking             (cut to top 3 by relevance score)
    ↓
Relevance gate         (if max score < 0.65 → return fallback)
    ↓
Context assembly       (docs + conversation history + system prompt)
    ↓
Generation             (with citation requirement)
    ↓
Response validation    (does it cite? does it stay grounded?)
    ↓
User

Most broken AI apps skip steps 3–6 entirely.

Where to Start If Your App Is Broken Right Now

Step 1 — Log your retrieval. For every failing query, log exactly which documents were retrieved and their similarity scores. You will usually see the problem immediately.

Step 2 — Test retrieval in isolation. Disconnect the LLM and just test whether the right documents come back for your test queries. Fix this layer first.

Step 3 — Add a relevance threshold. If the best retrieved document scores below 0.65 cosine similarity, do not call the LLM. Return: "I don't have reliable information on that." This alone eliminates most hallucinations.

Step 4 — Reduce context. Cut retrieved documents from 10 to 3. You will almost certainly see immediate improvement.

Step 5 — Add grounding instructions. Add to your system prompt: "Only answer using the provided context. Never use information from outside the provided documents."

If you've tried the above and it's still broken, the issue is likely architectural. I do AI project rescue — free 48-hour audit, honest diagnosis, fixed in 1–2 weeks.

Book a free 30-min call or message me on WhatsApp.

Md Refat Bhuyan

Full-Stack Developer & AI Engineer · Cunard Consulting Ltd, UK

Hire Me WhatsApp

← Read more posts