How it’s used and how it works

Entity extraction (aka, named entity recognition or NER) is a type of natural language processing technology that enables computers to analyze text as it is naturally written. Specifically, it pulls out the most important data points (entities) in unstructured text (think news, webpages, text fields). Entities include names of people, places, organizations, and products, as well as dates, email addresses, and phone numbers. Extracted entities can populate a database record about the text. This structure enables higher-level analyses, such as relationships between entities, detecting events, and sentiment analysis around entities.

What is named entity recognition used for?

Better search for e-commerce, business research

Extracted entities make keyword search more accurate. Keywords only match words, whereas entity extraction uses context to know when, for example, “Paris” refers to a city, the name of a person ("Paris Jones"), or a nonentity (plaster of Paris). In e-commerce, extracting price, clothing features, size, and other product attributes from descriptions lets shoppers filter searches to refine 200 results to a browsable 20.

Brand monitoring and intelligence gathering

Want to know “what are people saying” about a new product launch or their experience at your hotel? NER is an enabling technology for sentiment analysis to track social media buzz or uncover new rivals. Intelligence agencies that track specific people and organizations of interest in message streams can distinguish between similarly named entities (e.g., Richie Fox the astronaut or hockey referee) by linking to an entity knowledge base using the context surrounding the entity. (Does the text refer to space or hockey?)

Knowledge graphs, event extraction, fact extraction

Pushing the possible are technologies built on NER:

Knowledge graphs visualize the relationship between entities (who is affiliated with what organizations and locations)
Fact extraction answers factual questions (What kills bacteria?)
Event extraction finds who did what to whom, when, and where.

Especially for these advanced technologies, entity extraction must be highly accurate and chain together different mentions of the same entity. This is also known as coreference resolution.

How entity extraction works

Different techniques are used to extract different types of entities.

Machine learning trains models to extract entities such as person, location, and organization where word meaning varies depending on context (e.g., Paris). A corpus of text containing thousands of examples of each entity type is annotated by humans. Then an algorithm trains a statistical model on that data to “learn rules” for predicting which words represent which entity type.

The accuracy from machine learning models depends on the algorithm used and, even more so, creating high-quality training and test data. Deep learning models can be more accurate than traditional machine learning models, but are currently much slower. Optimizing the accuracy of a model means adapting the statistical model to that set of data.

The exact match method matches words against a list of entities for each entity type. This method is appropriate for entity types that are finite and unambiguous, such as nationalities. However, since exact match doesn’t consider context, it cannot distinguish between the nationality “Polish,” and the common word “polish.”

Pattern matching is effective for finding entities that follow a particular pattern, such as email addresses, URLs, and phone numbers.

Applications that analyze big data to find insight from patterns and themes in unstructured text depend on entity extraction, and will only continue to grow.

Disclaimer: All names, companies, and incidents portrayed in this document are fictitious. No identification with actual persons (living or deceased), places, companies, and products are intended or should be inferred.

Find out how to transform your data into actionable insights.

Schedule a Demo

Analytics

Data

Insights

Secure Access

Ecosystem Overview

Anti-money Laundering

Border Security

Commercial

Government

Insider Threat

Law Enforcement

OSINT & Threat Intelligence

Blog

Resources

Case Studies

Glossary

Success Stories

Developers

Interactive Demos & Trials

Partner Program

Become a Partner

Launch Portal

About Us

Leadership

Newsroom

Events

Careers

Contact

What is Entity Extraction?

How it’s used and how it works

What is named entity recognition used for?

Better search for e-commerce, business research

Brand monitoring and intelligence gathering

Knowledge graphs, event extraction, fact extraction

How entity extraction works

Stay Informed

Share

You may also like

What’s the Difference Between Entity Extraction (NER) and Entity Resolution?

The Most Effective Entity Extraction Techniques

How Named Entity Recognition Connects the Dots for Law Enforcement and Intelligence

A Day in the Life of… Building a New Entity Extraction Model