Bibliometrics for Beginners: Reading the Future in Citations
Bibliometrics sounds like the most boring word in academia. It's actually one of the most useful tools we have for spotting where research-driven fields are about to inflect, and you don't need a PhD to use it.
Here's the friendly version.
The core idea
Scholars are noisy sensors. Each individual paper is a single, biased data point. But thousands of papers together produce a remarkably consistent signal about what's interesting, what's growing, and what's fading. Bibliometrics is the art of reading that signal.
Three things to count
1. Publication volume per year. The simplest signal. A field with rapidly growing publication counts is attracting more researchers, more grant money, or both. Watch the second derivative more than the first: a steady rise is healthy, an inflection is a story.
2. Citation velocity. A paper that gets cited 200 times in its first year is doing something. A paper with 50 citations in year ten is influential. Both matter. Velocity matters more for foresight, because it tells you what's being built on right now.
3. Co-authorship density. When the same authors keep showing up together across institutions, the field has a small core. When the network suddenly fans out, the field is broadening and probably hitting commercial application.
Where it gets useful
Combine the three signals and patterns emerge:
- Pre-take-off: publication volume rising, citation velocity high, network still small and tight.
- Take-off: publication volume rising fast, citation velocity high but spreading, network fanning out across geographies.
- Maturity: publication volume flat or declining, citation velocity steady on a few canonical papers, network stable.
You can't predict the future. You can read what stage a field is in, today, with high confidence. That's already a lot.
Two pitfalls
1. Self-citation inflation. Some fields cite themselves into prominence. Always look at the diversity of citing authors, not just the count.
2. Naming drift. "Large language model" didn't exist as a term in 2018. Counting papers with that exact phrase misses everything that's now called an LLM but was published as "transformer-based language model." Use embeddings for clustering, not literal terms.
Where to start
OpenAlex, Semantic Scholar, and Crossref give you free APIs over most of the academic record. Pull a few thousand papers in a domain you know, count the three signals, and the field will look different than it does from the inside.
Bibliometrics doesn't tell you what to do. It tells you what's actually happening underneath the news cycle, which is usually the first step.
← All posts