Entity Resolution for Better Identification and Intelligence


The Problem - Entity Identification and Unification

Real world entities such as people, vehicles, incidents and many others are represnted in information systems as sets of properties. A real world person's properties may include a social security number (SSN), name, date of birth (DOB), height, weight, hair color, addresses, etc. These representations may differ for a single entity in a single information system and across disparate systems. For example, a person may be represented in multiple database records, each with some identical, some conflicting and some missing values of the same properties.

For instance, consider the following records representing a person which may exist in a single or multiple information systems:

Property

Record 1

Record 2

Record 3

Record 4

Last Name Doe Doe Doe Doe
First Name John John John Jon
Middle Initial A. A. Null No data
SSN 012345678 012345679 Null 012345678
DOB 01/01/1970 01/01/1970 01/01/1970 01/01/1971

Do these records represent one, two, three or four different real world people? Imagine that each record contains values for dozens of more properties (height, weight, hair color, addresses, aliases, etc.). When a user searches for information about the real world "John Doe", which values of the properties will he see? If we want to analyze which people are linked together in a social network in various contexts (family, business, criminal, etc.), it becomes even more crucial to determine which records represent the same real world entity.

Svivot's Solution - Unified Entities  

Svivot's LEC (Logical Entity Constructor) entity resolution software can group together entity related data records (from a single or multiple sources) into a "Unified Entity" representing a single real world entity. The system can use simple rules or more complex reconciliation algorithms. An example of a simple rule is: two data records with the exact same social security number are considered to be the same unified person. An example of reconciliation is finding two records in which the social security numbers are the same with the exception of a difference in a single digit.

The system administrator determines what combinations of rules and reconciliations are used to compare individual and groups of records. The system automatically produces a calculation for the overall similarity of the records and those which exceed a particular threshold are grouped together into a unified entity.

LEC retains the original records which can be displayed together with the unified entity for detailed consideration and information to the user. 

LEC also creates correlations between different unified entities which have some similar and some conflicting attributes. For instance, two different unified person entities with the same name and date of birth, but conflicting social security numbers can be linked, thus indicating that they have conflciting properties, but may represent the same real world entity. 

The rules for determining the criteria for unification and a minimum threshold of probability for grouping an individual record into a specific unified entity are configurable and easily adjusted and thus adaptable to a wide variety of aplications.  

All of the information pertaining to the unified entities, the links to the original data records and correlations between the unified entities is recorded in a central database.  

Therefore, the results of the entity resolution are available for use by any standard search engine and analytical appplication. Svivot's Contextor software is especially useful for visualizing the relationships and data as well as effectively producting intelligence with its information sharing and link analysis applications.