CIO Bulletin

Databricks : Accelerate innovation by unifying data science, engineering, andÂ business.

Databricks was founded out of the UC Berkeley AMPLab by the team that created Apache Spark. The company has been working for the past six years on cutting-edge systems to extract value from Big Data. It Â believes that Big Data is a huge opportunity that is still largely untapped, and it is working to revolutionize what you can do with it.

Databricksâ€™ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the team who created Apache Sparkâ„¢, Databricks provides aÂ Unified Analytics PlatformÂ for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Salesforce, Viacom, Shell andÂ HP.

Bravo! Apache is here..

ApacheÂ SparkÂ is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. It was originally developed at UC Berkeley in 2009.

Apache Spark is 100% open source, hosted at the vendor-independent Apache Software Foundation. Databricks, is fully committed to maintaining this open development model. The company believes that no computing platform will win in the Big Data space unless it is fully open.

Spark has one of the largest open source communities in Big Data, with over 1000 contributors from 250+ organizations. Databricks works within the open source community to maintain this momentum.

Since its release,Â Apache SparkÂ has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo, and eBay have deployed Spark at massive scale, collectively processing multiple petabytes of data on clusters of over 8,000 nodes.

â€œAt Databricks, weâ€™re working hard to make Spark easier to use and run than ever, through our efforts on both the Spark codebase and support materials around it. All of our work on Spark is open source and goes directly to Apache,â€ says CEO.

Benefits

SPEED

Engineered from the bottom-up for performance, Spark can beÂ 100x faster than Hadoop for large scale data processingÂ by exploiting in memory computing and other optimizations. Spark is also fast when data is stored on disk, and currently holds the world record for large-scale on-disk sorting.

EASE OF USE

Spark has easy-to-use APIs for operating on large datasets. This includes a collection of over 100 operators for transforming data and familiar data frame APIs for manipulating semi-structured data.

A UNIFIED ENGINE

Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning and graph processing. These standard libraries increase developer productivity and can be seamlessly combined to create complex workflows.

Solutions rendered

Advertising and Marketing Technology

The growth of digital advertising and marketing data has created a plethora of opportunities to optimize campaign performance and advertising spend across direct advertising and auctions. Databricks helps you manage the traffic jam of data caused by multiple sources of data such as ad inventory, web traffic, click logs, CRM, and behavioral data to uncover insights that improve audience targeting, pricing strategies, and conversion rates â€” increasing campaign ROI and creating new revenue opportunities.

Energy and Utilities

From highly-instrumented wells to the proliferation of smart grid technologies, data is becoming a critical element in the discovery, extraction, and delivery of energy â€” whether it is oil, natural gas, or even wind and solar. Databricks provides a virtual analytics platform that enables real-time analysis of operational and customer data at scale, making modern innovations like predicting weather patterns and optimizing the energy grid a reality.

Customer Case Study

Viacom

Viacom, with its 170 cable, broadcast and online networks in around 160 countries, is transforming itself into a data-driven enterprise â€” collecting and analyzing petabytes of network data to increase viewer loyalty and revenue.

The Challenges

Improving user experience:Â Streaming petabytes of video data across the world puts a strain on the delivery systems, resulting in videos failing to load or constantly stuttering as they rebuffer
Growing the audience:Â Making sense from huge troves of viewing data and determining the best actions to drive viewer retention and loyalty
Targeted advertising:Â With TV ad sales falling in recent years, Viacom needed to find better ways to engage with their audience via advertising

The Solution

Viacom has built a real-time analytics platform based on Apache^Â®Â Sparkâ„¢Â and Databricks, which constantly monitors the quality of video feeds and reallocates resources in real-time when needed. Databricks has helped Viacom:

Predict trends and issues to provide superior viewing experience:Â Reduced video start delay byÂ 33%
Increase customer loyalty:Â Leveraged data to identify how to increase customer retentionÂ by 3.5-7x
Improve ad conversions:Â Targeted customers with personalized ads based on comScore ratings and viewingÂ behavior

Meet the ace: Ali Ghodsi

Ali is the CEO and co-founder of Databricks, responsible for the growth and internationalâ€‹ â€‹expansion of the company. He previously served as the VP of Engineering and Productâ€‹ â€‹Management before taking the role of CEO in January 2016. In addition to his work at Databricks, Ali serves as an adjunct professor at UC Berkeley and is on the board at UC Berkeleyâ€™s RiseLab. Ali was one of the creators of open source project, Apache Spark, and ideas from his academic research in the areas of resource management and scheduling and data caching have been applied to Apache Mesosâ€‹ â€‹and Apache Hadoop.â€‹ â€‹â€‹Aliâ€‹ â€‹received his MBA from Mid-Sweden University in 2003 and PhD from KTH/Royal Institute of Technology in Swedenâ€‹ in 2006â€‹ in the area ofâ€‹ â€‹Distributed Computing.

At Databricks, we are fully committed to maintaining this open development model.