Pinteresthas open-sourced Querybook, its data management solution for enterprise-scale remote engineering collaboration. The tool which the company usesinternally can help engineers compose queries, create analyses, and collaborate via a notebook interface.
Querybookwas started in 2017 as an intern project at Pinterest. Early on, the development team decided on a document-like interface where users could write queries and analyze them in one place. The tool was released internally in March 2018.Querybookhas become the go-to solution for big data analytics at Pinterest. It averages 500 daily active users and 7,000 daily query runs.
Every query executed on Querybook gets analyzed in order to extract metadata like referenced tables and query runners. Querybook uses this information to update its data schema automatically and search ranking, and show a table’s frequent users and query examples. The more queries are fed toQuerybook, the better documented the tables become.
Querybook features an admin interface that lets admins configure query engines, table metadata ingestion, and access permissions. Admins can make live Querybook changes without the need to go through code or config files and can create visualizations, including lines, bars, stacked areas, pies, donuts, scatter charts, and table charts.
“We built Querybook to provide a responsive and simple web user interface for such analysis so data scientists, product managers, and engineers can discover the right data, compose their queries, and share their findings,”Pinterest wrote in a blog post.