Schema.org is a set of extensible schemas that enables webmasters to embed structured data on their web pages for use by search engines and other applications.
The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.
Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.
REDAXO ist mehr als nur ein Content-Management-System. Aufgrund des Leistungsumfanges kann es auch für vielfältige Informations-Management-Lösungen eingesetzt werden. REDAXO steht unter der GNU-GPL Lizenz und darf somit kostenlos und kommerziell frei verwendet werden.
Unser System bietet eine Menge an Funktionen und wir haben hier die wichtigsten zusammengestellt. Dank des modularen Aufbaus von REDAXO lassen sich sämtliche Module und AddOns je nach Bedarf hinzufügen.
Linda Yueh is Fellow in Economics at St. Edmund Hall, University of Oxford. She is an Associate of the Globalisation Programme of the Centre for Economic Performance at the London School of Economics and Political Science (LSE), a Fellow of The Royal Society for the Encouragement of Arts, Manufactures & Commerce (RSA) and Member of the Bar of New York State. She is also visiting the London Business School in 2008-09.
D. Shen, Z. Chen, Q. Yang, H. Zeng, B. Zhang, Y. Lu, and W. Ma. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, page 242--249. New York, NY, USA, ACM, (2004)
A. Sun, E. Lim, and W. Ng. Proceedings of the 4th international workshop on Web information and data management, page 96--99. New York, NY, USA, ACM, (2002)