CIO Bulletin

Databricks: We unify Data Science, Engineering and Business

The Spark research project was started by Databricksâ€™ founders at UC Berkeley which later got the name Apache Sparkâ„¢. Big Data is a huge opportunity today since it is revolutionalizing. Analyzing the benefits Big Data has, the company established Databricks in the year 2013 with a mission to accelerate innovation for its customers by unifying Data Science, Engineering and Business.

Databricks provides aÂ Unified Analytics PlatformÂ for data science teams to associate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production.

Databricks customers can easily concentrate on their data by offering a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. The firm has an attractive customer base that includes Viacom, Salesforce, Shell, and HP.

Apache Spark is â€œAll-powerfulâ€

Apache Spark was born in 2009 at UC Berkeley. It is a powerful open-source processing engine built around speed, ease of use, and sophisticated analytics.

Apache Spark is 100% open source, hosted at the vendor-independent Apache Software Foundation. Databricks is fully dedicated to maintaining this open development model. The company believes that no computing platform will win in the Big Data space unless it is fully open.

â€œAt Databricks, weâ€™re working hard to make Spark easier to use and run than ever, through our efforts on both the Spark codebase and support materials around it. All of our work on Spark is open source and goes directly to Apache,â€ says Ali Ghodsi, CEO.

How is Spark beneficial?

Accelerated performance

Spark is 100 times faster than Hadoop for large-scale data processing by exploiting in-memory computing and other optimizations.

With the help of Spark, you can store data really fast and it presently has made a world record for large-scale on-disk sorting.

Â Work effortlessly with Spark

Spark has easy-to-use APIs for operating on large datasets. This includes a collection of over 100 operators for transforming data and familiar data frame APIs for manipulating semi-structured data.

Guaranteed success

Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning and graph processing. These standard libraries increase developer productivity and can be seamlessly combined to create complex workflows.

We will detail you about Databricks solutions..

Advertising and Marketing Technology

There are tremendous opportunities to accelerate campaign performance and advertising spend across direct advertising due to the growth in digital and marketing data. You can take control of the data jam caused by multiple sources of data such as ad inventory, web traffic, click logs, CRM, and behavioral data to uncover insights that enhance audience targeting, pricing strategies, and conversion rates â€” increasing campaign ROI and creating new revenue opportunities.

Energy and Utilities

Ranging from highly-instrumented wells to the proliferation of smart grid technologies, data is becoming very crucial in discovery, extraction, and delivery of energy â€” whether it is oil, natural gas, or even wind and solar. Databricks offers a virtual analytics platform that enables real-time analysis of operational and customer data at scale, making modern innovations like predicting weather patterns and optimizing the energy grid a reality.

A story to admire!

About Viacom

Viacom, with its 170 cable, broadcast and online networks in around 160 countries, is revamping itself into a data-driven enterprise â€” collecting and analyzing petabytes of network data to increase viewer loyalty and revenue.

The Challenges

Improving user experience:Â Streaming petabytes of video data across the world puts a strain on the delivery systems, resulting in videos failing to load or constantly stuttering as they rebuffer
Growing the audience:Â Making sense from huge troves of viewing data and determining the best actions to drive viewer retention and loyalty
Targeted advertising:Â With TV ad sales falling in recent years, Viacom needed to find better ways to engage with their audience via advertising

The Solution

Viacom has built a real-time analytics platform based on ApacheÂ®Â Sparkâ„¢Â and Databricks, which constantly monitors the quality of video feeds and reallocates resources in real-time when needed. Databricks has helped Viacom:

Predict trends and issues to provide superior viewing experience:Â Reduced video start delay byÂ 33%
Increase customer loyalty:Â Leveraged data to identify how to increase customer retentionÂ by 3.5-7x
Improve ad conversions:Â Targeted customers with personalized ads based on comScore ratings and viewingÂ behavior

Meet the powerhouse of Databricks, Ali Ghodsi

Ali is the CEO and Co-Founder of Databricks. He is responsible for the growth and internationalâ€‹ â€‹expansion of the company. He was previously working as the VP of Engineering and Productâ€‹ â€‹Management before serving as CEO in January 2016. While working with Databricks, Ali also serves as an adjunct professor at UC Berkeley and is on the board at UC Berkeleyâ€™s RiseLab. Ali was one of the creators of open source project, Apache Spark, and ideas from his academic research in the areas of resource management and scheduling and data caching have been applied to Apache Mesosâ€‹ â€‹and Apache Hadoop.â€‹ â€‹â€‹Aliâ€‹ â€‹received his MBA from Mid-Sweden University in 2003 and Ph.D. from KTH/Royal Institute of Technology in Swedenâ€‹ in 2006â€‹ in the area ofâ€‹ â€‹Distributed Computing.

â€œWe provide a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business.â€

â€œAt Databricks, we are fully committed to maintaining this open development model. We believe that no computing platform will winÂ Â Â Â Â Â Â Â Â Â Â Â in the Big Data space unless it is fully open.â€