The combination of cloud, "serverless" computing, micro-services and container technology is making on-demand, snack-sized computing services more ubiquitous. The big data world has approached this model too, but only gradually. Amazon Web Services (AWS), Microsoft and Google have services for the provision of whole Hadoop/Spark clusters.
The only thing lacking in these services is a dedicated Hadoop and Spark environment and a job oriented pricing model and user interface tagged with it. Cloudera seems to have filled in that gap. At the Strata Data Conference in London, Cloudera, the provider of the leading modern platform for machine learning and advanced analytics, announced the release of Cloudera Altus, a Platform-as-a-Service (PaaS) offering Hadoop jobs as a service.
The features offered by Altus is not terribly fancy and is similar to AWS own Hadoop service. The initial service is tailored for Data engineers and it allows for the submission and execution of Spark, Hive and MapReduce jobs which it collectively calls data pipelines. Data engineers can write the code and package it up then specify whether it should executed using Hadoop on MapReduce or Spark. The "pipeline" then runs on a cluster that is created on-demand.
“The Cloudera Altus Data Engineering service simplifies the development and operations of elastic data pipelines; putting data engineering jobs front and center and abstracting infrastructure management and operations that can be both time consuming and complex,” said Cloudera in a blog post.
“Altus also reduces the risk associated with cloud migrations. It provides users with familiar tools packaged in an open, unified, enterprise-grade platform service that delivers common storage, metadata, security, and management across multiple data engineering applications.”
Another plus point of this service is its flexible pricing. Yes, by default you by the hour, but you can do so with credits that are bought earlier on a discounted price. You can also disregard the time and get an annual subscription based on the number of nodes in the cluster. No matter how you pay, you get the same environment with a user interface designed for job submission and monitoring, and you don't really have to think much about the deployment or management of a cluster. You do need to specify number and instance type for the nodes, but even there you or your IT staff can set up default cluster types for different workloads and then never have to worry about it again.
In the end, Altus will increasingly use Hadoop and Spark like operating systems, on which various data services can be offered. And that, of course, is how Hadoop and Spark should be used, as a substrate on which value-added products and services can operate.