// RAG + signals = grounded foresight

Building a Trend Radar with LLMs (Without Hallucinating the Future)

"Can the LLM tell us which technology will matter in three years?"

This was the question we kept getting from strategy teams at our partner organisations. It's a fair thing to want. It's also a great way to build a system that produces confident-sounding nonsense.

Here's how we built JIH Trend Radar to actually be useful — and the design choices that kept the LLM honest.

What "trend detection" really is

Strip the buzzwords away, and trend detection is three jobs:

  1. Signal collection — find evidence that something is changing (papers, patents, news, repos, talks).
  2. Pattern recognition — spot which signals are clustering, growing, or fading.
  3. Interpretation — explain to a non-expert what it means for their domain.

For decades, this was a job for human analysts. LLMs are very good at job (3) and surprisingly bad at jobs (1) and (2) on their own. The whole architecture is about giving the LLM just job (3), grounded in artefacts produced by very un-flashy infrastructure.

Collect papers · patents Cluster unsupervised ML Score trend velocity Explain LLM + RAG The LLM only enters at stage 4 — and only with evidence in hand.
The LLM only enters the pipeline once we already have evidence and a candidate trend.

Where LLMs hallucinate trends

If you ask GPT-4 "what are the emerging trends in industrial AI?", you get a perfectly readable list. Some items are real. Some are last year's news. Some are entirely made up — confident sentences with no underlying signal.

This isn't the LLM being broken; it's the LLM doing its job. It's trained to produce plausible language. "What are emerging trends" doesn't have an objectively correct answer, so the model returns plausible language about emerging trends.

The fix is to never ask that question directly. Instead, you build a system that finds the actual emerging signals first, and only then asks the LLM to describe them.

What we actually do

Stage 1 — Collect

We pull papers from arXiv and CORDIS, patents from EPO, repos from GitHub, news from a curated set of feeds. Everything goes into a vector store with date, source, and domain tags.

Stage 2 — Cluster

Unsupervised clustering on embeddings gives us candidate "topics" — groups of documents that talk about the same thing. We deduplicate, then check whether each cluster's volume is growing, flat, or shrinking over time. This is the part the LLM doesn't touch.

Stage 3 — Score

For each candidate cluster, we compute a few simple metrics: month-over-month growth, source diversity (a trend only in arXiv is interesting; a trend in arXiv + patents + news is more so), and a basic novelty score (how different is this cluster from clusters six months ago?).

Most clusters fail this stage. That's the point — we want to drop noise before the LLM ever sees it.

Stage 4 — Explain

For the small set of clusters that make it through, we hand the LLM:

Then we let the LLM do what it's good at — turning a stack of artefacts into one paragraph a human can read in 30 seconds.

The honesty layer

Every claim the LLM makes is rendered with a citation marker in the UI. Click it, you get the source document. No citation, no claim. We literally truncate any sentence that the model can't tie back to a retrieved chunk.

The LLM is a translator, not an oracle. Anything it says that isn't traceable to a real document is, by construction, deleted before the user sees it.

This is the difference between a "trend radar" and a "hype radar." The hype version produces five exciting paragraphs and lets the reader figure out which are real. Ours produces three paragraphs, each with sources you can read in two clicks.

What still goes wrong

This isn't a solved problem. Three things still break regularly:

  1. The retrieval misses a key document. Then the LLM's interpretation is built on incomplete evidence. We mitigate with overlapping retrievers (BM25 + dense + a knowledge-graph lookup) but it still happens.
  2. The cluster is too broad. "AI in healthcare" is technically a cluster but it's not a trend — it's a field. The fix is human-in-the-loop curation at the cluster level.
  3. The metric is gameable. Anything that uses publication counts can be inflated by spam papers. We're slowly adding citation-weighted versions of every metric.

The bigger pattern

I'm increasingly convinced that "useful LLM apps" almost always look like Trend Radar from the inside: a lot of unsexy retrieval and scoring infrastructure, with the LLM as the final mile that turns artefacts into language. The LLM gets the credit and the press; the boring parts make the difference between a tool people use and a demo they retweet.

If you want to see it live: jih-trendradar.eu.

← All posts