The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.
The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings.
Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate.
Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0.
JSmooth is a Java Executable Wrapper generator with advanced JRE detection features. It builds standard Windows executable binaries (.exe) that contain all the information needed to launch your java application, i.e. the classpath, the jvm version required, the java properties, and so on.