Data that is used by NAS for link analysis is pre-processed in a proprietary structure called PreStar™ which allows the link analysis to be executed at a speed many times greater than working with the unprocessed data. Link analysis on large quantities of data (millions and billions of records) without pre-processing is not practical due to the very long processing time that would be required.
The pre-processing is executed by standard servers. The server uses the definitions created when using LSI Designer to determine what data should be processed and where the results should be stored. It can be operated manually by the system administrator or as a scheduled task. In addition, it can be operated automatically by the Data Fusioner such that immediately after the data has been pumped into the database, the appropriate PreStar server is automatically started to process the data. The main features include:
SN-Sphere™ uses a paging mechanism to allow the ETL, entity identification and pre-processing into PreStar form to take place simultaneously using multiple servers. This makes the processing scaleable for virtually almost any quantity of data.
Storage volume and compression
Storage volume is an important factor in the cost and maintenance of a data warehouse that stores tens of billions of records. LSI is normally stored in databases and data warehouses in ABD structured tables where:
A = ID of one entity such as a phone number initiating a call
B = ID of a linked entity such as a phone number receiving the call
D = Descriptive call attributes (date, time, etc.)
The storage of tens of billions of records in ABD structured tables can use terabytes of space. PreStar stores the pre-processed data in a format which is more efficient in storing the data. Once the data has been transformed to the PreStar tables, it can be deleted from the original ABD tables. If need be, at a later time it can be reverse engineered from the PreStar tables back into the ABD tables for use with other standard tools.
Much of the data in the PreStar tables is stored in blob (binary large object) fields. The data in those fields is compressed by PreStar. The compressed data requires about 1/3 of the storage volume required if the data was not compressed.