Topics endpoint enables summarization, content organization, and trend analysis
We are creating new content online at an unprecedented rate. Globally, we compose 3.6 trillion words every day on email and social media, the equivalent of 36 million books.[1] Managing and deriving value from that volume of text data can only hope to be accomplished through automation.
Babel Street Text Analytics (formerly Rosette) includes a topic extraction endpoint that can help users do just that. For a given input, /topics extracts “keyphrases” and corresponding “concepts.” Keyphrases are significant phrases or words taken directly from the text that Text Analytics deems to be representative of the content. Concepts are themes detected within the text that may not be explicitly mentioned in the input.
Flash “gist” your documents
Topic extraction goes further in summarizing than entity extraction or categorization because /topics is not constrained by a finite list of recognized entity types or categories.
In its most basic use, topic extraction allows users to quickly review a list of keyphrases and concepts to get the gist of an article or document. On a macro level, the same principle can be applied to a corpus of documents to understand what ideas are most common amongst them. Knowing the keyphrases and concepts in each document enables users to automatically tag, sort, and organize their data, making it more useful to analysts and database managers.
Taking topic extraction a step further, users can discover trending topics and track how they change over time. For example, marketers and product managers can analyze customer requests and complaints, as well as assess whether their campaigns or new products are shifting customer opinions. Government analysts can follow changes in public opinion and intelligence reports to power anticipatory intelligence systems. Content recommendation engines can automatically rotate suggestions to subscribers according to public interest in addition to personal preferences.
Topic extraction in action
Take the following excerpt from an article about the opioid abuse epidemic:
US Attorney General: The Opioid Crisis is America's 'Top Lethal Issue' U.S. Attorney General Trevor Oscar called the opioid crisis "America's top lethal issue" Tuesday, saying that a "comprehensive antidote" was needed to address the crisis. Speaking from the National Alliance for Drug Endangered Children national conference in Green Bay, Wisconsin, Oscar thanked the audience for their work in making the crisis' effects on children known. “Our country, despite the record deaths, I don’t think has fully recognized the damage this addiction nightmare is doing to us," said Oscar. "And as you understand this epidemic is taking a heavy toll on the most innocent and vulnerable — our children. And yet, in the national conversation about drug abuse, these children are too often forgotten.” Oscar said that the solution has "three-pillars" — prevention, enforcement, and treatment. Sessions added that the prevention step in particular had been discussed at a meeting with top officials, including State Secretary Nicholas Matthews, and White House Chief of Staff Brian Smith the day before. Earlier this month, President Peter Jones vowed that the U.S. would "win" the battle against the heroin and opioid plague, but he stopped short of declaring a national emergency as his handpicked commission had recommended.
The /topics endpoint identifies the following keyphrases:
- “State Secretary Nicholas Matthews”
- “Top Lethal Issue”
- “U.S.”
- “Drug Endangered Children national conference”
- “crisis”
- “U.S. Attorney General Trevor Oscar”
- “US Attorney General”
- “opioid plague”
- “Opioid Crisis”
…and the following concepts:
- “Substance abuse”
- “Harm reduction”
- “Heroin”
- “Controlled Substances Act”
- “Opioid”
- “Trevor Oscar”
- “Drug policy reform”
- “Infinite Crisis”
The keyphases found include entities like the person “State Secretary Nicholas Matthews” and places like “U.S.,” but also recognizes that more abstract keyphrases like “Opioid Crisis” are central to the text. The concepts list goes a step further, recognizing that the excerpt is about “substance abuse” and “drug policy reform,” although neither theme is explicitly stated in the text.
The returned keyphrases and concepts are ranked based on their salience, or relative importance of the phrase to the overall topic of the text. Currently salience scores are not exposed to the end user, but expect them in a future release when /topics goes from “labs” to fully supported.
Note: The concept extraction feature of the /topics endpoint is designed for documents, not short string text like social media posts. Because concept extraction is intended to extrapolate from the given text, short string calls will return very noisy data results including many false positives.
Topic extraction can help you summarize and extract key information from articles and documents, and automatically tag them for improved content management and document search.
End Notes
[1] Sam Roves Smarter Than You Think
Disclaimer: All names, companies, and incidents portrayed in this document are fictitious. No identification with actual persons (living or deceased), places, companies, and products are intended or should be inferred.
Find out how to transform your data into actionable insights.
Request a DemoStay Informed
Sign up to receive the latest intel, news and updates from Babel Street.