Astaire Avenue, Garland Drive, Lamarr Avenue, Skelton Circle, and Hepburn Circle are real street names in Culver City, CA, and are as equally prone to spelling errors as person names. In fact, streets, cities, states/provinces, or building names frequently include names. Often identification involves more than just a name but an entire set of identity attributes such as address and date of birth.

Extending Babel Street Name Match (formerly Rosette) to match “embedded names” in addresses was the logical next step. Name Match applies algorithmic smarts to postal addresses the same way it does for personal names in the /address-similarity endpoint.

How address matching works

Name Match accepts fielded addresses or unstructured address strings, which it parses into address fields. Depending on the type of field, Name Match applies the appropriate algorithm. For alphanumeric fields like postal code or street number, it applies edit distance, which looks at character-level additions, substitutions, and deletions.

Our specialized name matching algorithms compare text fields like “street name,” “house (aka, building name),” “city,” “province/state,” and “country.” Within each of the text fields, Name Match considers:

VARIATION	EXAMPLE
Phonetics and spelling differences	100 Montvale Ave vs. 100 Montvail Av
Missing address field components	100 Montvale Ave vs. 100 Montvale
Differences in upper and lowercase	100 Montvale Ave vs. 100 MONTVALE AVE
Reordered address components within a field	100 Montvale Ave. vs. 100 Avenue Montvale
Address field abbreviations	Montvale St. vs Montvale Street

When comparing two names, Name Match matches every field of one address against every field of the other address to look for the best match.

Increased accuracy with address field groups

When Name Match parses unstructured addresses, data may be misfielded. Name Match groups related fields (such as “state, stateDistrict, island” or “city, cityDistrict, suburb”) so that if data in similar fields match, it can reduce the impact of misfielding during address parsing.

For example, if an address that includes the cityDistrict “Williamsburg” is parsed to assign “Williamsburg” to “city,” there will only be a small match penalty for the mismatched fields, because city and cityDistrict belong to the same address field group.

FIELD	ADDRESS 1	ADDRESS 2
House	Hawaii Paradise Apartments
House Number	1351	1351
Road	Washington St.	Washington Street
Unit	312	312
cityDistrict	Brooklyn
City	Williamsburg	Brooklyn
State	NY	New York

On the other hand, if the house field value “Hawaii Paradise Apartment” matches “Hawaii” in the state field of a different address, a large penalty will be assessed for these fields that don’t belong to the same address field group.

Locale- and language-specific support

Currently, address matching in Name Match supports U.S., Canada, and UK locales. Locale support means:

Postal code structure and geographic mapping is understood. Thus, even if a postal code is irregularly formatted, it is still recognized.For example, Canadian postal codes are in the pattern “A1A 1A1,” where A is a letter and 1 is a digit, with a space after the third character. If the space were missing or in another position, Name Match would still recognize the Canadian postal code pattern.
Common address abbreviations for supported locales are handled through override files – which map common address words to their abbreviations.
For example, “Pennsylvania” maps to “PA”; “Street” maps to “St.”; “Calle” maps to “CLL.” Spanish street designations like “Calle” are common in California and other parts of the U.S.
Stop words, such as “the” in “the United States” are removed.

Language-only Chinese support is available for matching two addresses written in Chinese script (Hanzi) or one address in Latin script and the other in Chinese script. While Name Match provides Chinese stop words and basic overrides for common Chinese address abbreviations, as of version 7.36.0, it is not yet customized to a particular Chinese-speaking locale or country.

How fuzzy date matching works

Date fuzzy matching complements address and name matching. Name Matching can compare partial dates and misordered date components (DDMMYY vs. MMDDYY) for the Gregorian calendar. In particular, the matching engine considers several aspects of dates:

Time: The number of days between Date 1 and Date 2
Year: The difference of the year fields of Date 1 and Date 2
Month: The difference of the month fields of Date 1 and Date 2
Day: The difference of the day fields of Date 1 and Date 2 (even if they are close in time, 1 and 30 are considered far apart)
String distance: Date 1 and Date 2 to a standard format; then the string distance score is calculated based on the edit distance between the two strings.
Time proximity: Based on a given interval of years, Rosette computes the chronological distance between dates in years to determine similarity.

Date matching is currently available through the Name Match Enterprise SDK or its Elasticsearch integration.

Who needs fuzzy records matching

Names, addresses, and dates are critical data points to check when matching records in many domains.

Know your customer for financial compliance

Financial institutions are required by know your customer (KYC) regulations to avoid transacting business with known bad actors. Often these customers are other businesses. Suppose a new business customer requests a line of credit from a bank. Before approving it, the bank asks for information, such as the customer’s places of business and names of its executive board. The bank compares the provided information against business directory listings to verify the customer is who it says it is. For example, if a business is applying from the Cayman Islands, but the directory listing shows no offices there, that might be a red flag. Similarly, if the name of the executive applying for credit isn’t listed in the directory, that will be considered in the risk calculation of whether to take on this new client.

Assigning unique IDs

In any system where the use of Social Security numbers or other personal IDs is restricted by privacy rules— such as education or health care — a unique ID might be assigned to each person. These records include person names, nicknames, dates of birth, and current and previous addresses. Using the Elasticsearch plugin, the various record fields can be weighted differently, depending on which data are known to be the most dependable, and thus an overall match score can take into account the various scores from fuzzy matching name, address, and date fields. Being able to positively identify matching or non-matching records eliminates the creation of duplicate records, which are costly to ferret out and correct.