Nature 26 Oct 2021--Catalogue of billions of phrases from 107 million papers could ease computerized searching of the literature. Catalogue of billions of phrases from 107 million papers could ease computerized searching of the literature.
In a project that could unlock the world’s research papers for easier computerized analysis, an American technologist [Carl Malamud]has released online a gigantic index of the words and short phrases contained in more than 100 million journal articles — including many paywalled papers.
The catalogue, which was released on 7 October and is free to use, holds tables of more than 355 billion words and sentence fragments listed next to the articles in which they appear. It is an effort to help scientists use software to glean insights from published work even if they have no legal access to the underlying papers, says its creator, Carl Malamud. He released the files under the auspices of Public Resource, a non-profit corporation in Sebastopol, California that he founded.
Malamud says that because his index doesn’t contain the full text of articles, but only sentence snippets up to five words long, releasing it does not breach publishers' copyright restrictions on the re-use of paywalled articles. However, one legal expert says that publishers might question the legality of how Malamud created the index in the first place.
Nature, July 2019. -- A giant data store quietly being built in India could free vast swathes of science for computer analysis — but is it legal? A giant data store quietly being built in India could free vast swathes of science for computer analysis —but is it legal?
Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.
2019 How librarians, pirates, and funders are liberating the world’s academic research from paywalls. Featuring Elaine Westworth, Aileen Fyfe, Theodora Bloom et al
"What’s standing in the way of a full-on revolution? The culture of science. "
"But there’s a big thing getting in the way of a revolution: prestige-obsessed scientists who continue to publish in closed-access journals. They’re like the road workers who keep paying fees to build infrastructure they can’t freely access. Until that changes, the walls will remain firmly intact."
The latest strategy for addressing the serials crisis that has fueled the crisis in scholarly publishing across the disciplines is the establishment of transformative open access agreements.
By
Prabir Purkayastha26 Dec 2020
Three academic publishers are asking for blocking of Sci-Hub and Libgen in India, two websites who provide free downloads of research publications and books to research scholars and students. The three—Elsevier Ltd., Wiley India Pvt. Ltd., American Chemical Society—have filed a petition in Delhi High Court, which is now scheduled to be heard next on January 6.