JaMoPP is a set of Eclipse plug-ins that can be used to parse Java source code into EMF-based models and vice versa. JaMoPP consists of:
a complete Java5 Ecore Metamodel,
a complete Java5 EMFText Syntax, and
an implementation of Java5's static semantics analysis.
Through JaMoPP, every Java program can be processed as any other EMF model. JaMoPP therefore bridges the gap between modelling and Java programming. It enables the application of arbitrary EMF-based tools on full Java programs. Since JaMoPP is developed through metamodelling and code generation, extending Java and embedding Java into other modelling languages, using standard metamodeling techniques and tools, is now possible. To ensure the quality of JaMoPP, it has been successfully tested on a large code base.
This is the home page of the ParsCit project, which performs reference string parsing, sometimes also called citation parsing or citation extraction. It is architected as a supervised machine learning procedure that uses Conditional Random Fields as its learning mechanism. You can download the code below, parse strings online, or send batch jobs to our web service (coming soon!). The code contains both the training data, feature generator and shell scripts to connect the system to a web service (used here too).
Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
OpenNLP is an organizational center for open source projects related to natural language processing. It hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package.
MSTParser is a non-projective dependency parser that searches for maximum spanning trees over directed graphs. Models of dependency structure are based on large-margin discriminative training methods. Projective parsing is also supported.
This is an insanely long and gnarly essay about implementing, then optimizing, the low-level bits of a pure-Ruby XML parser. If you obsess about XML reading, deterministic finite automata, or Ruby code optimization, you may find some part of it interestin