The Data Point
The study found that ChatGPT cites 88.46% of URLs from the “search” ref_type, while Reddit, despite having a large volume of data points, is cited at a rate of only 1.93%. This discrepancy is significant, with 67.8% of non-cited URLs coming from Reddit. The analysis also revealed that cited URLs have consistently higher similarity between their title and the original prompt, with a cosine similarity of 0.602, compared to 0.484 for non-cited URLs.
Why the Algorithm Does This
The mechanism behind this finding is rooted in ChatGPT’s retrieval process, which uses a gatekeeping layer to decide which pages are worth opening and citing. The title, snippet, and URL are crucial in this initial decision, with search results dominating the citation pool. The study suggests that ChatGPT’s algorithm prioritizes search results due to their relevance and credibility, while Reddit content, although useful for understanding topics and gauging consensus, is less likely to be cited.
The Creator / Developer Play
To increase citation likelihood, GEO practitioners can focus on optimizing their content for search, ensuring that their pages rank high in search results. Additionally, creating content that matches ChatGPT’s internal sub-questions can improve relevance and citation rates. This can be achieved by using tools like Brand Radar to identify gaps in content and creating targeted content that addresses specific topics and questions.
What the Research Doesn’t Cover
The study has some limitations, including the sample size and the focus on ChatGPT 5.2 prompts from February 2025. The analysis also highlights the importance of accounting for data composition and retrieval mechanics when interpreting citation studies, as the findings can be distorted by the data composition and retrieval pipeline. Further research is needed to fully understand the implications of these findings and to explore other AI engines and their citation mechanisms.