The SN-Sphere ETL Framework concept broadens the scope of the types of data sources available for intelligence analysis. The objective of the ETL Framework is to integrate heterogeneous types of Link Support Information (LSI) from multiple sources in preparation for effective information sharing and analysis. Data pumps and a process manager carry out ETL (extraction, transformation, and loading) of various types and formats of data.
The data is extracted from practically any format including, but not exclusively, SQL based databases, XML files, text files, MS Excel documents, MS Access, legacy systems and many others. Therefore, the data used in a specific solution can come from almost any sources.
The data is transformed into canonical forms which are designed for optimization of analysis. This process includes the business logic related to the specific type of data in order to build the LSI. The processing also attempts to identify data that refers to instances of the same real world entity which appear in more than one data source. For instance, the same person that is identified by a driver's license in one data source and a social security number in another source.
Finally, the transformed data is loaded into a central database which stores the entity data and LSI which are used for processing and analysis by other SN-Sphere components and/or other external applications.
The data fusion elements of a solution are optional. The other SN-Sphere components can also use one or more existing normalized databases in which data has been collected by other systems.