Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.
CouchDB is written in Erlang, but can be easily accessed from any environment that provides means to make HTTP requests. There are a multitude of third-party client libraries that make this even easier for a variety of programming languages and environments.
Extract, Transform, and Load (ETL) is a process in data warehousing that involves
* extracting data from outside sources,
* transforming it to fit business needs (which can include quality levels), and ultimately
* loading it into the end target, i.e. the data warehouse.
The FacetedDBLP search interface allows to search computer science publications in the DBLP collection starting from some keyword and shows the result set along with a set of facets, e.g., distinguishing publication years, authors, or conferences. It is the first large scale application that uses GrowBag graphs to create a computer science specific topic facet, with which a user can characterize the result set in terms of main research topics and filter it according to certain subtopics.
FacetedDBLP builds upon the DBLP++ data set which is an enhancement of DBLP (as of 2008-11-21) plus additional keywords and abstracts as available on public web pages. We have also corrected some of the links to electronic editions, which were broken in DBLP. A brief description of the GrowBag facet within FacetedDBLP can be found in our JCDL paper, a detailed description of the algorithm is available on the GrowBag project page.
This site is hosted by the
Computation Facility at the
Harvard-Smithsonian Center for Astrophysics [ Smithsonian logo ]
The SAO/NASA Astrophysics Data System (ADS) is a Digital Library portal for researchers in Astronomy and Physics, operated by the Smithsonian Astrophysical Observatory (SAO) under a NASA grant. The ADS maintains three bibliographic databases containing more than 7.2 million records: Astronomy and Astrophysics, Physics, and arXiv e-prints. The main body of data in the ADS consists of bibliographic records, which are searchable through highly customizable query forms, and full-text scans of much of the astronomical literature which can be browsed or searched via our full-text search interface. Integrated in its databases, the ADS provides access and pointers to a wealth of external resources, including electronic articles, data catalogs and archives. We currently have links to over 7.9 million records maintained by our collaborators.
Infochimps.org
Free Redistributable Rich Data Sets
There are many sources to find out something about everything. Until now, there’s been no good place for you to find out everything about something.
The infochimps.org community is assembling and interconnecting the world's best repository for raw data -- a sort of giant free allmanac, with tables on everything you can put in a table. Built by data nerds, used by data nerds, it's a central source for the information you need to power the projects the world needs.
Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.
Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.
Use Google Spreadsheets API to create a database in the cloud
Mr. Jeffrey W Scudder (Google)
30min Intermediate
case study, cloud computing, google spreadsheets, online databse, web, web services
I'll show you how to create a Python module which wraps the Google Spreadsheets Data API web service in an interface so that it looks like a local database. Using this tool, your application can run anywhere with Internet connectivity and users will be able to take their data with them. The benefit of using Google Spreadsheets for a back-end is that it provides a simple UI which is easy for non programmers to interact with. The cost of provisioning and running a Spreadsheets based database is zero from the perspective of an application developer. This example module provides a toolkit which simplifies interactions with the Google Spreadsheets API for a specific use case: using a spreadsheet like a remote database.
This glossary explains terms that
* are specific to the GMOD project, or
* are computing terms that are used in the GMOD project.
This glossary does not define biology terms.
Generic Model Organism Database project, a collection of open source software tools for creating and managing genome-scale biological databases. You can use it to create a small laboratory database of genome annotations, or a large web-accessible community database. GMOD tools are in use at many large and small community databases.
It’s a way to build free Dabble databases under a Creative Commons license. Functionally, it’s the same as our paid service except that data you keep in a free application is publicly accessible.
*