How We Track Whether LLMs Mention Your Site
A behind-the-scenes look at how Vergrank tracks whether ChatGPT, Claude, Perplexity and Gemini mention your site — and how the daily tracking engine actually works.
The part of Vergrank people keep asking about isn’t the SEO automation — it’s how we track whether LLMs mention you. So this post is the technical answer: every day we turn your keywords into real user questions, ask five different AI engines, and measure whether you show up. Here’s how that actually works.
First, a bit of context on why I built this at all.
A classic engineer mistake
I’m the founder of Vergrank, a side project I originally built for myself.
In this new AI era, I create products much faster than before. I was already building products in the pre-AI days, and my biggest problem was always the same: marketing. I’d spend months building something, do very little marketing, get discouraged by the lack of traction, and move on to the next project. A classic engineer mistake.
So I built an SEO tool designed to put marketing on autopilot — it generates pages, tracks competitors, monitors rankings, and handles a lot of the repetitive work involved in SEO. But this post isn’t really about the SEO features. It’s about the part people find most interesting: tracking whether LLMs are recommending you, and who they recommend instead.
The core idea
The approach is simpler than people expect. Once a day, we:
- Take every keyword you’re targeting.
- Turn each one into a realistic question a person would actually ask an LLM.
- Run that question across multiple LLM providers, with web search on.
- Store the full responses.
- Parse them to figure out who got mentioned, who got cited, and where.
Then we analyze the responses to see:
- Whether your website is mentioned
- Which competitors are being recommended
- Which new competitors we discover through the responses
- How your visibility changes over time
- Which sources and providers different LLMs appear to lean on
Each of those steps has a few engineering decisions worth unpacking. Here’s the whole pipeline on one page:
Step 1 — From keyword to a real question
LLMs don’t get typed at the way Google does. Nobody opens ChatGPT and types
marketing automation nonprofits. They ask “What’s the best marketing
automation tool for a small nonprofit?”
So we don’t query the raw keyword. When a keyword is added, we run it through a
tiny, cheap model (gpt-5-nano) with a single instruction: rewrite this SEO
keyword as one natural-language question a user might type into ChatGPT or
Perplexity. That derived question is what we actually ask — one question per
keyword, stored alongside it so every run is consistent and comparable.
Step 2 — Fan out across five engines, every day
A scheduled job runs once a day (06:00 UTC). For each keyword’s question, it queries five engines in parallel:
| Engine | Model | Search |
|---|---|---|
| OpenAI | gpt-5-mini | native web_search tool |
| Anthropic | claude-sonnet-4-6 | native web search tool |
gemini-2.5-flash | googleSearch grounding | |
| Perplexity | sonar-pro | built-in |
| Brave Search | — (search API) | direct results |
Four of those are LLMs answering with web search enabled, so they return an answer plus the citations they leaned on. Brave isn’t an LLM — it’s a plain search API we run as a grounded-search baseline, a stand-in for “what the web says” before a model editorializes over it.
Every call is wrapped so it can retry on failure and won’t double-write if it runs twice — the run has to be safe to re-attempt, because at five engines times every keyword times every client, something always flakes.
Step 3 — Did we get mentioned?
When an answer comes back, we extract two things: the raw answer text and the list of cited URLs.
Detecting a mention is deliberately boring and literal, because false positives are worse than misses:
- Citation match — we normalize every cited URL down to its hostname (lowercase,
strip
www.) and check whether your domain (or a subdomain of it) is in the list. If it is, we also record the position — being cited first is not the same as being cited fifth. - Brand match — we check whether your brand name appears in the answer text, even when no link is attached, since models often name a product without linking it.
That gives us, per engine per day: were you mentioned, and where.
Step 4 — A per-URL citation graph
Knowing you were mentioned is only half the value. The more useful question is who else the model put in front of the user. So for every response we don’t just store “mentioned: yes/no” — we store one row per cited URL and classify each one:
- You — your own domain.
- A known competitor — matched against your competitor list by domain, domain aliases, or name.
- A neutral source — Reddit, Wikipedia, YouTube, news, review sites, docs, and so on, bucketed by type.
The result is a citation graph: for any keyword, on any engine, on any day, we can see the exact set of sources the model used and how you ranked among them.
Step 5 — Discovering competitors you didn’t know about
This falls out of the citation graph almost for free. Any cited domain that isn’t you and isn’t a known competitor is, by definition, a candidate. After each run we aggregate those unknown domains per account, and once one shows up across at least two different queries — i.e. it’s a pattern, not a fluke — we surface it as a suggested competitor to review.
That’s how the system finds rivals you’ve never heard of: not from a list we maintain, but from whoever the LLMs keep recommending instead of you.
Step 6 — Visibility over time
Because every run writes a dated snapshot per keyword and engine, trend lines come for free. You can pull the latest mention status across all engines as a matrix, the top competitors cited over a recent window, or the multi-week history for a single keyword on a single engine. The goal is to answer “is our visibility inside LLMs going up or down?” — not just “are we in there today?”
What I’m actually seeing so far
There’s a lot still to improve, and I need more time and data before drawing strong conclusions.
That said, I’m already seeing progress. Today my site shows up in LLM responses when I query using my own domain name. In the beginning, the models couldn’t even find my website when I searched for the domain itself. It’s still early, but it’s genuinely interesting to watch visibility inside LLMs evolve as a site’s authority and content grow.
This is also exactly the loop Vergrank is built to close: generate the content, measure whether the models pick it up, and feed that back into what to write next.

PS: I’ve attached an image of the agents working — though a few of them look like they’re not working :P