Gathering insights from information placed on Microsoft’s cloud-based storage and big data analytics platforms is about to get easier for the company’s enterprise customers.
On June 27, Microsoft unveiled new cloud capabilities that further lower the barriers to big data analytics. They include Azure Data Lake Storage Gen2, a service that brings “together the notion of a Hadoop compatible file system with a scale-out object cloud storage platform,” namely Azure Blob Storage, Tad Brockway, general manager of Azure Storage and Azure Stack at Microsoft, told eWEEK.
Describing Azure Data Lake Storage Gen2 as the “the first no-compromise data lake for the industry,” Brockway said the service builds on the original Azure Data Lake offering by adding “true HDFS [Hadoop Distributed File System] compatibility,” tightly integrating the technology with Azure Blob Storage for enterprise-grade levels of scalability and performance.
Whereas competing solutions use client-side file system emulation to interface with cloud object stores, an approach that can subject users to subpar performance and iffy reliability, he claimed, Microsoft’s implementation is “all server-side [and] natively integrated,” which provides a more seamless experience with much less storage and data management overhead.
The habit of creating and maintaining on-premises data silos has followed businesses on their journey to the cloud, Brockway said. This turns visions of pervasive data analytics in the workplace into an onerous or simply unobtainable prospect for many organizations, hindering their digital transformation efforts. Using a “simple API call,” Brockway explained that Azure Data Lake Storage Gen2 grants Azure object storage customers with “access to the richer compatibility of a Hadoop-compatible file system without moving data,” one of the major roadblocks businesses face in adopting analytics solutions.
Currently in beta, Azure Data Lake Storage Gen2 also inherits the data protection, security, storage tiering and lifecycle management capabilities found in Azure Blob Storage. Azure Active Directory is natively integrated, plus the service supports POSIX (Portable Operating System Interface)-complaint ACLs (Access Control Lists) for tight control over files and folder access.
There’s also a cost management aspect to consider, Brockway said. Customers can “align the performance requirements of their workloads with the economics of how they’re using our services,” whether their data is kept in hot or cold storage tiers, he said.
Meanwhile, several new Azure Data Factory features have shed the beta label and are now generally available.
The latest additions to the cloud-based data integration service include new control flow data pipeline capabilities that introduce branching, looping, conditional execution and other concepts that allow users to orchestrate complex integration jobs, along with a new, code-free way of managing data pipelines with a web browser. “With a new browser-based user interface, you can accelerate your time to production by building and scheduling your data pipelines using drag and drop,” stated Mark Kromer, senior program manager of Azure Information Management at Microsoft, in a June 71 announcement.
Also new are iterative debugging tools in the Azure Data Factory design environment, new flexible pipeline scheduling options and enhanced SDK support for Python, .NET, REST and PowerShell. Customers can also now “lift and shift” their SQL Server Integration Services packages into Azure Data Factory and build ETL (extract, transform and load) pipelines with on-demand Azure HDInsight clusters or Azure Databricks Notebooks using Apache Spark.