Using the Data Scanner, it is possible to quickly scan input raw data, and to decide how to use them later.
The core of the DatonixOne Scanner is a fractal engine that converts an input data source into an optimized data set, called QueryObject.
The fractal engine builds a QueryObject in two steps named SCAN and QueryBuild. SCAN builds a data structure called Ready File and QueryBuild builds indexes on Ready File called data complex.
The Ready File is an exact copy of the input, extremely compressed and fast in answering to any query.
Once the Ready File is built, the engine executes the QueryBuild to materialize the results of any possible query on the Ready File, aka it is automatically discovering all meaningful patterns in data.
The QueryBuild results, i.e. the data patterns, are stored in Arithmetic Storage Groups containing fractal data complex structures named datoni the word that has generated our name.
This algorithmic approach to the analytic processing, results in dramatic savings for the Data Scientist, that will have the possibility to access in real time to any required “Schema on Read” data structures.
ALGORITHM DATA PROCESSING
datonixOne most relevant differentiator is the data processing algorithm, DATA SCAN, which converts data into data structures made of a number of fractal fragments, called “datoni”.
The “datoni” are data complex that can be categorized in three families:
- ReadyData: the exact copy of input data
- QueryObject: the result of canonical and/or custom processing built upon ReadyData attributes
- Keyback: long pointers that connect ReadyData and QueryObject attributes
DATA STORE ORGANIZATION
ReadyData, QueryObject and Keyback are stored in a Disk package named QueryObject Data set
QueryObject ensures dramatic reduction of main memory requirements, better data access speed, and scalability on any type of Union blend, or Merge blend or Join blend. QueryObject is:
– read only
– binary portable
– extremely compressed
– open to be queried through simple NoSql or Ansi Sql,
– support attribute’s categories tag/untag for future exploration
– extremely responsive for both simple and intricate queries
Thanks to its fractal representation it can blend large volumes of external data. in fact, federated Queries are possible because datonix architecture implements grid computing
How datonix functionalities best serves the client requirements.
The unique feature datonix introduces is the Fractal conversion, that is the ability to convert any source of Big Data set in a fractal image, called the QueryObject data set. QueryObject is a SMART Data Store extremely compressed and ready to be queried.
We do believe, Datonix is benefiting of an accelerated success because it is a pragmatic answer for the Data Scientist, from the “Citizen” to the Expert one. Datonix offers the ability to prepare very large data set in a collaborative environment with some very useful features like the following:
- to use Excel to read QueryObject data, disconnected by any server, going well beyond the 1 million records Excel limit. As known, Excel is a worldwide common business language. It is limited to 1 million records per sheet even if in the real world the Excel user knows that issues start at a few hundreds of thousands of rows. Well, as stated, QueryObject is a binary data set, so even if it has been generated on a Linux File System it can be moved into a Windows Workstation. It will continue to run without any conversion. QueryObject is extremely compressed, so even if the original dataset is 100 Gigabytes, the resulting QueryObject will be only a few Megabytes. As a result it will have a quick download time. Since QueryObject is made by several small data structures, and Excel will load them dynamically & transparently during its processing. As a result, users will have the possibility to use Excel with high performances on an unlimited number of records and fields.
- to obtain data prepared by users in the IT enterprise data hub or in the specific data lake at a very low cost. As said datonix processes all the data and offers pre-computed results to any possible applicable query. As a result, more than 60% of data mining effort required to prepare usable data for the analysis can be saved. For example, there will be no reasons to reduce data using advanced data mining practices. In addition, R, Python, SAS, SPSS, etc. processes will have the possibility to run on All the Data at the speed of the disk I/O.