PreStar™

 

Data that is used by NAS for link analysis is pre-processed in a proprietary structure called PreStar™ which allows the link analysis to be executed at a speed many times greater than working with the unprocessed data. Link analysis on large quantities of data (millions and billions of records) without pre-processing is not practical due to the very long processing time that would be required.

The pre-processing is executed by standard servers. The server uses the definitions created when using LSI Designer to determine what data should be processed and where the results should be stored. It can be operated manually by the system administrator or as a scheduled task. In addition, it can be operated automatically by the Data Fusioner such that immediately after the data has been pumped into the database, the appropriate PreStar server is automatically started to process the data. The main features include:

  • Speed of analysis is many times greater than working with the unprocessed data.

  • Reduce storage volume: The pre-processing allows compression of data in the ratio of about 1:3 compared to unprocessed data.

  • Modularity: The system architecture is modular and the amount of data which can be pre-processed is a function of the hardware infrastructure (servers, disks, network speed, etc.).

SN-Sphere™ uses a paging mechanism to allow the ETL, entity identification and pre-processing into PreStar form to take place simultaneously using multiple servers. This makes the processing scaleable for virtually almost any quantity of data.

Storage volume and compression

Storage volume is an important factor in the cost and maintenance of a data warehouse that stores tens of billions of records. LSI is normally stored in databases and data warehouses in ABD structured tables where:

A = ID of one entity such as a phone number initiating a call

B = ID of a linked entity such as a phone number receiving the call

D = Descriptive call attributes (date, time, etc.)

The storage of tens of billions of records in ABD structured tables can use terabytes of space. PreStar stores the pre-processed data in a format which is more efficient in storing the data. Once the data has been transformed to the PreStar tables, it can be deleted from the original ABD tables. If need be, at a later time it can be reverse engineered from the PreStar tables back into the ABD tables for use with other standard tools.

Much of the data in the PreStar tables is stored in blob (binary large object) fields. The data in those fields is compressed by PreStar. The compressed data requires about 1/3 of the storage volume required if the data was not compressed.