+39 081 8252 685 info@datonix.it

When elephants fly

New perspectives for Hadoop data movement and remote data processing

It’s a great pleasure to announce the general availability of the Hadoop Scanner, our direct integration with Hadoop File System. Using Hadoop Scanner, now available for Cloudera distribution, it’s possible to scan HDFS data sources in a few clicks.


Application class: software middleware
Stackeholder: remote information consumer
User: Hadoop System Programmer

Problem description

Hadoop is perfect to support simple queries on very large data set. When it is needed more then a simple full table scans, it is recommended to complement Hadoop with an additional technology.

Since the usage of a data base technology is expensive and shows inherent limitations, and that Hadoop software stack is conceived to leave data where they are, ie in the HDFS, today, they are emerging modern solutions to run Hadoop data processes (ie Spark, Flink, etc.).

But what does it happen when it is required to move data outside the HDFS to run data processes remotely or, vice versa, remote users need to move large data set into the HDFS?


Issues and high costs of data delivery make untenable the traditional solution of transferring and integrating remote data resources by hauling the data, in bulk, back to an IT Enterprise Data Hub and processing it there. Worse still, the more distributed the organization, the more serious the impact of large scale data movement will be.

Transferring Hadoop’s raw data to remote destinations, blending those data with Remote Dark Data, extracting there business intelligence, and sending back huge volume of adjusted data should solve the problem.

This strategy would require to maintain expensive data movements, and an IT architecture that matches the distributed nature of the Remote Data.


With Hadoop Scanner, datonix™ offers new perspectives for Hadoop data movement and remote data processing.

Hadoop Scanner plugs into HDFS, reads data and natively shares memory segments with the datonix™ data-scan ultrafast processes. Once converted in the QueryObect™ format, dataset are ultra-compressed and ready to be exchanged with remote disconnected users.

This way existing IT Centric architecture can be complemented with an ideal “pure grid” satellite, the datonixOne, so that it’s as non-disruptive as possible, saves operation’s costs, and increases performance.

Using datonix™ Voyager it’s easy and fast to set up connections to data stored in HDFS, so that developers don’t loose their focus facing against java memory errors, garbage collection, file fragmentation, cluster contention and annoying cluster’s performance tuning.

For more info contact us.