HUD USER provides interested researchers with access to the original electronic data sets generated by PD&R sponsored data collection efforts, including the American Housing Survey, HUD median family income limits, as well as microdata from research initiatives on topics such as housing discrimination, the HUD-insured multifamily housing stock, and the public housing population.
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. The collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.
Various US databases provided by federal government agencies. Census, Labor Statistics, Transportation, Economics. Also: A 3D Version of the PubChem Library, Annotated Human Genome Data.
StatLib, a system for distributing statistical software, datasets, and information. started in 1989. hosted by the Department of Statistics at Carnegie Mellon University.
Collection of economic, social and environmental time series data from sources including the United Kingdom government, the Federal Reserve System and the European Central Bank. you can build graphs and embed them into your blogs and websites, and if the data they're based on is updated, they'll be updated too. You can set up alerts too, and get Timetric to email you when something interesting happens to a value you're watching. Also has an API.
Open Energy Info is a platform to connect the world’s energy data. It is a linked open data platform bringing together energy information to provide improved analyses, unique visualizations, and real-time access to data. OpenEI follows guidelines set by the White House’s Open Government Initiative , which is focused on transparency, collaboration, and participation. OpenEI strives to provide open access to this energy information, which will spur creativity and drive innovation in the energy sector.