June 09, 2017
At the Spark Summit in San Francisco, Databricks had some good news in store for all the developers out there who use the spark ecosystem. Databricks announced a new serverless platform for Apache sparks which helps developers worry less about cluster management while working on Apache Spark. This confirms the main theme of the event which was speculated to be ‘simplification of developer experience’.
For those of you new to the world of big data, Databricks is the commercial manifestation of the Apache Spark project. The company focuses on building data tools—just like the ones released-- to make it easier for developers to work with Spark.
“SQL is stateless so it isn’t hard to work with, but making data science serverless is hard because it has states,” Databricks CEO Ali Ghodsi explained in an interview with Techcrunch.
On the machine learning end of the spectrum, they had more good news. Databricks announced a new library to supporting deep learning. Called Deep Learning Pipelines, it is a library of high level API’s which are designed to make it possible for data scientists and AI novices to implement neural nets in their big data processing. The API’s are specifically designed to help with tasks like tuning a model’s hyper parameters, loading images and modifying a general purpose model to help in a specific case.
Those two releases focused on helping enterprises deal with massive amounts of data, which is becoming the key in the present world of data driven development. Deep Learning Pipelines also enable developers to run pre-trained models as well as models built using Keras and TensorFlow.
“If you want to distribute TensorFlow, you have to construct graphs manually and direct what goes to what machine,” added Ghodsi. “That’s really hard if you want to run on 100 machines.”