Google launches a new search engine for datasets called the Dataset Search. Google extends its support to the scientific community via the new search engine that allows them to discover data from tens of thousands of different repositories for big data online.
Dataset publication is extremely fragmented and many scientific domains prefer specific repositories. “Once they step out of their unique community, that’s when it gets hard,” says Natasha Noy, a research scientist at Google AI and a member of the Dataset Search project. “We want to make that data discoverable, but keep it where it is.”
Noy recalls what a climate scientist had to say about the hardship involved in getting specific datasets in the community. She illustrates the current situation with that example where the scientist pinned down a dataset on ocean temperature she needed for an upcoming study only after her colleague helped her find it in a fairly prominent host she already knew.
The initial release of Dataset Search will index datasets from the government, news organizations like ProPublica, and environmental and social sciences data. However, if the test version satisfactorily manages to pull off data from many different repositories, more and more institutions and scientists would come forward to make their data accessible.