How Podium Data Speeds Self-Service Access to Data Lakes

Podium 3.0 is the latest edition of the company's enterprise data-lake management platform that works primarily with Hadoop installations.

It's one thing to coordinate, monitor and secure all of a company's data streams to servers, cloud applications and storage in an enterprise IT system. It's another thing to make that huge lake of raw data become easy and fast to access and process by employees who need the data "yesterday."

There are apps for this, as one might imagine. We're focusing on one of them here in this article: Podium Data.

Ostensibly, the 3-year-old company's software can become a Chief Data Officer's best non-human friend very quickly because it makes the connection between line-of-business users and their worlds of big data fast and user-friendly without having to detour through the IT department.

Self-Service Way to Obtain Data for Doing the Job

Lowell, Mass.-based Podium empowers data analysts and other analytically oriented business people to obtain the data they need to do their jobs on a self-service, on-demand basis quickly and efficiently by eliminating hurdles in the data delivery process.

To this end, Podium Data on Jan. 19 released Podium 3.0, the latest edition of its enterprise data-lake management platform that works primarily with Hadoop installations. This features expanded data preparation and publishing capabilities that enable business users and data analysts to bring new data sources into a secure enterprise lake in less than a week and build and retrieve custom data sets in an hour.
It offers a range of improved capabilities over previous versions: self-service publishing, a new intuitive user interface, user-defined datasets, support for Spark and Spark SQL, and operational, security, and governance reporting based on Podium's metadata.

"My colleagues and I started Podium Data three years ago because we saw an opportunity to totally change the way enterprises approached data management," CEO and co-founder Paul Barth told eWEEK. "We decided to deal with their really big problem, which was always agility.

Common Problems: 'Politics and Dirty Data'

"They would come up with an idea about how to use information analytics to improve a business process, whether it was customer service or marketing, or even financials. Then they'd go to IT and [find out] it's 18 months and $10 million and they'd have to buy a new data warehouse appliance, rewire everything, and so on. It turns a business opportunity into a massive project and an investment.

"We also saw that when you made those investments, they atrophied pretty quickly, because business requirements were changing all the time," Barth said

Barth and his team determined that the real key to success here was not Hadoop itself but deploying the open source, massively parallel technology Podium selected, he said.

"What I saw as the most common problems [in getting a data lake/analytics initiative up and running were two tings: politics and 'dirty data,'" Barth said. "Those were always what got in the way. You were spending all of your time trying to clean it up.

"When we looked at Hadoop, we said we could reverse the way that data management is deployed in an enterprise. We said there's room for a product here that can automate and leverage all the Hadoop economics but make it accessible to mere mortals and not just programmers."

Seeing Major Speed-Up of Data Movement

As a result, Barth contends, Podium's customers have achieved a 25-fold acceleration in delivery of new data to business users--from six months to less than a week--and a 40 percent reduction in data delivery costs by simplifying and speeding up the delivery of data to the business through a secure enterprise scale data lake.

Major new features in Podium v3.0 include:

--New user interface: Revamped end-user experience with intuitive navigation and logical access to capabilities, data, and metadata;

--Spark: Support for Spark compute engine, both in Explore and Prepare modules;

--Datasets: User-definable logical grouping of data, enabling flexible categorization of data by topics, user groups, systems, projects, and so on;

--In-line shortcuts: Access to in-line shortcuts for all actions;

--Publishing: Easy export of data in configurable formats for consumption in other environments, with automatic compliance via data masking rules; and

--Metadata reporting: Podium now exposes its extensive metadata through reports and reporting views, supporting governance, operational, and security reporting needs.

New Features for Systems Managers

For IT system managers, new enhancements include:

--improvements to data ingestion and data lake administration;

--expanded data sources including Parquet ingest, additional mainframe branch wiring support, and more;

--new Publish module;

--export a file to multiple targets including S3, HDFS, and local;

--replicate to another Hadoop cluster, including partitions and Hive objects; and

--publish to RDBMS supporting any protocol via the Podium Open Connector.

Podium offers enterprise-grade security and ensures that access to data-as well as data protection and authentication-is in accordance with organizational protocols.

Podium 3.0 integrates with Active Directory and Kerberos, as well as provides for entity-level authorization via Podium impersonation in combination with Sentry/Ranger policies and/or HDFS access control lists.

Podium 3.0 is now available directly from Podium. Current customers have immediate access to the newest features as part of their existing subscription.

Chris Preimesberger

Chris Preimesberger

Chris Preimesberger is Editor of Features & Analysis at eWEEK, responsible in large part for the publication's coverage areas. In his 12 years and more than 3,900 stories at eWEEK, he has...