Dremio Launches Self-Service Data Analytics for Data Scientists

Open source-based data analytics application sets business analysts, data scientists and line-of-business professionals free to do their own querying in natural language.

Dremio.logo

Just how did Dremio (pronounced DREM-e-oh), which has made the idea of self-service data analytics its business, choose its name?

“Well, there aren’t that many six-letter domain names available, and this was one of them two years ago,” CEO and co-founder Tomer Shiran told eWEEK. “There’s a Google project called Dremel, which was the inspiration for work we did a long time ago, that combined with ‘io’ made it for us.”

In Google’s world, Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. However, in Home Depot’s and Ace Hardware’s world, Dremel is a brand of professional-level power tools for carpentry, glass etching and other specialty arts.

In the IT world of 2017, Dremio is channeling Google but also carving out its own place in the business productivity sector. It makes an open source-based data analytics application that sets business analysts, data scientists and line-of-business professionals free to do their own querying in natural language.

New Approach to Parsing Data Analytics

“We are creating a new approach for data analytics,” Shiran said. “We’re calling it self-service data, and the objective is to put (non-IT) people into the driver’s seat so they can do the analytics themselves.

“They’ve been very dependent upon IT, and have been for the last 20 years, in order to get the data, run the queries and so forth.”

It’s like what Amazon Web Services did for software developers in the last 10 years, Shiran said.

“Instead of developers having to wait for a company to buy servers, rack them and stack them, set up the network and operating systems and waiting two months to get anything into production, AWS solved that problem and made developers self-sufficient,” he said.

“We’re doing something similar here for data, where the end users are business analysts and data scientists, and we are building the first technology to make them independent,” Shiran said.

Eliminates Need for ETL Tools, Data Warehouses

Using existing data sources and business intelligence tools, Dremio eliminates the need for traditional ETL (extract, transform, load) tools, data warehouses, cubes and aggregation tables, as well as the infrastructure, copies of data and effort these systems entail, Shiran said.

The application combines consumer-grade ease-of-use with enterprise-grade security and governance and includes execution and data-acceleration technologies for analytical processing, Shiran said.

Released as a new open source project under the Apache license, the application came out of a four-month-long beta period on July 19. It already has a set of paying customers using it in production.

Dremio is the first Apache Arrow-based distributed query execution engine. Arrow is a powerful open-source columnar in-memory analytics engine.

Arrow-Based App Brings High Performance

This represents a breakthrough in performance for analytical workloads because Arrow enables extreme hardware efficiency and minimizes serialization and deserialization of in-memory data buffers between Dremio and client technologies such as Python, R, Spark, and other analytical tools, Shiran said.

Arrow is also designed for GPU (graphics processing unit) and FPGA (field-programmable gate array) hardware acceleration, making it a powerful conveyor for machine-learning workloads, he said.

Key technical capabilities include:

Native Query Push Downs: Instead of performing full table scans for all queries, Dremio optimizes processing into underlying data sources, maximizing efficiency and minimizing demands on operational systems. Dremio rewrites SQL in the native query language of each data source, such as Elasticsearch, MongoDB, and HBase, and optimizes processing for file systems such as Amazon S3 and HDFS. 

Dremio Reflections: Dremio accelerates processing and isolates operational systems from analytical workloads by physically optimizing data for specific query patterns, including columnarizing, compressing, aggregating, sorting, partitioning and co-locating data. Dremio maintains multiple reflections of datasets, optimized for heterogeneous workloads that are fully transparent to users. Dremio’s query planner automatically selects the best reflections to provide maximum efficiency, providing a breakthrough in performance that accelerates processing by up to a factor of 1000.

Comprehensive data lineage: Dremio's Data Graph preserves a complete view of the end to end flow of data for analytical processing. Companies have full visibility into how data is accessed, transformed, joined, and shared across all sources and all analytical environments. This transparency facilitates data governance, security, knowledge management, and remediation activities.

Self-service model: Dremio was designed with analysts and data scientists in mind, providing a powerful and intuitive interface for users to easily discover, curate, accelerate, and share data for specific needs, without being dependent on IT. Users can also launch their favorite tools from Dremio directly, including Tableau, Qlik, Power BI, and Jupyter Notebooks.

Built for the cloud: Dremio was designed for modern cloud infrastructure, and is able to take advantage of elastic compute resources as well as object storage such as Amazon S3 for its Reflection Store. Dremio also can analyze data from a wide variety of cloud-native and cloud-deployed data sources.

Mountain View, Calif.-based Dremio, founded in 2015 by Shiran and co-founder Jacques Nadeau and their team of big data experts, has raised more $15 million. The company’s software is already being used by organizations in the U.S., Europe, Asia and Australia. These include premium vehicle manufacturer Daimler and OVH, Europe's largest cloud-services provider, Shiran said.

Technology providers such as Microsoft, Tableau and Qlik along with open-source communities such as Python Pandas and R are collaborating with Dremio to deliver end-to-end self-service for data analytics, Shiran said.

“In our personal lives, most people expect to get answers to questions in just a few seconds. But in the workplace, it can take months to answer a question,” Shiran said. “We believe there is an enormous opportunity to improve the data experience for people in the workplace, by connecting popular BI and data science tools to the diverse data stores of the modern enterprise.”

Availability

Dremio is distributed as a Community Edition, which is open source and free for anyone, as well as an Enterprise Edition, which is available as part of an annual subscription with support, a commercial license and enterprise features.

Dremio is available for download here.

Chris Preimesberger

Chris J. Preimesberger

Chris J. Preimesberger is Editor of Features & Analysis at eWEEK, responsible in large part for the publication's coverage areas. In his 12 years and more than 3,900 stories at eWEEK, he...