Databricks was founded out of the UC Berkeley AMPLab by the team that created Apache Spark. The company has been working for the past six years on cutting-edge systems to extract value from Big Data. It believes that Big Data is a huge opportunity that is still largely untapped, and it is working to revolutionize what you can do with it.
Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the team who created Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Salesforce, Viacom, Shell and HP.
Bravo! Apache is here..
Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. It was originally developed at UC Berkeley in 2009.
Apache Spark is 100% open source, hosted at the vendor-independent Apache Software Foundation. Databricks, is fully committed to maintaining this open development model. The company believes that no computing platform will win in the Big Data space unless it is fully open.
Spark has one of the largest open source communities in Big Data, with over 1000 contributors from 250+ organizations. Databricks works within the open source community to maintain this momentum.
Since its release, Apache Spark has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Netflix, Yahoo, and eBay have deployed Spark at massive scale, collectively processing multiple petabytes of data on clusters of over 8,000 nodes.
“At Databricks, we’re working hard to make Spark easier to use and run than ever, through our efforts on both the Spark codebase and support materials around it. All of our work on Spark is open source and goes directly to Apache,” says CEO.
Engineered from the bottom-up for performance, Spark can be 100x faster than Hadoop for large scale data processing by exploiting in memory computing and other optimizations. Spark is also fast when data is stored on disk, and currently holds the world record for large-scale on-disk sorting.
EASE OF USE
Spark has easy-to-use APIs for operating on large datasets. This includes a collection of over 100 operators for transforming data and familiar data frame APIs for manipulating semi-structured data.
A UNIFIED ENGINE
Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning and graph processing. These standard libraries increase developer productivity and can be seamlessly combined to create complex workflows.
Advertising and Marketing Technology
The growth of digital advertising and marketing data has created a plethora of opportunities to optimize campaign performance and advertising spend across direct advertising and auctions. Databricks helps you manage the traffic jam of data caused by multiple sources of data such as ad inventory, web traffic, click logs, CRM, and behavioral data to uncover insights that improve audience targeting, pricing strategies, and conversion rates — increasing campaign ROI and creating new revenue opportunities.
Energy and Utilities
From highly-instrumented wells to the proliferation of smart grid technologies, data is becoming a critical element in the discovery, extraction, and delivery of energy — whether it is oil, natural gas, or even wind and solar. Databricks provides a virtual analytics platform that enables real-time analysis of operational and customer data at scale, making modern innovations like predicting weather patterns and optimizing the energy grid a reality.
Customer Case Study
Viacom, with its 170 cable, broadcast and online networks in around 160 countries, is transforming itself into a data-driven enterprise — collecting and analyzing petabytes of network data to increase viewer loyalty and revenue.
Viacom has built a real-time analytics platform based on Apache® Spark™ and Databricks, which constantly monitors the quality of video feeds and reallocates resources in real-time when needed. Databricks has helped Viacom:
Meet the ace: Ali Ghodsi
Ali is the CEO and co-founder of Databricks, responsible for the growth and international expansion of the company. He previously served as the VP of Engineering and Product Management before taking the role of CEO in January 2016. In addition to his work at Databricks, Ali serves as an adjunct professor at UC Berkeley and is on the board at UC Berkeley’s RiseLab. Ali was one of the creators of open source project, Apache Spark, and ideas from his academic research in the areas of resource management and scheduling and data caching have been applied to Apache Mesos and Apache Hadoop. Ali received his MBA from Mid-Sweden University in 2003 and PhD from KTH/Royal Institute of Technology in Sweden in 2006 in the area of Distributed Computing.
At Databricks, we are fully committed to maintaining this open development model.