Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata. Using the Sitemap protocol does not guarantee that web pages are included in search engines, but provides hints for web crawlers to do a better job of crawling your site.
Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.
Earth allows you to find files across a large network of machines and track disk usage in real time. It consists of a daemon that indexes filesystems in real time and reports all the changes back to a central database. This can then be queried through a simple, yet powerful, web interface. Think of it like Spotlight or Beagle but operating system independent with a central database for multiple machines with a web application that allows novel ways of exploring your data.
M. Liu, R. Cai, M. Zhang, und L. Zhang. Proceedings of the 20th ACM international conference on Information and knowledge management, Seite 87--92. New York, NY, USA, ACM, (2011)
B. Bahmani, R. Kumar, M. Mahdian, und E. Upfal. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, Seite 24--32. New York, NY, USA, ACM, (2012)
B. Krause, R. Jäschke, A. Hotho, und G. Stumme. HT '08: Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia, Seite 157--166. New York, NY, USA, ACM, (2008)
R. Jäschke, B. Krause, A. Hotho, und G. Stumme. Proceedings of the Second International Conference on Weblogs and Social Media (ICWSM 2008), Seite 192--193. Menlo Park, CA, USA, AAAI Press, (2008)