Ok, I admit it. I don’t know when I’d be likely to do a project that needs this, but it is very cool.
OpenNLP is both the name of a group of open source projects related to natural language processing (NLP), and the name of a library of NLP tools written in Java by Jason Baldridge, Tom Morton, and Gann Bierner. My C# port is based upon the latest version (1.2.0) of the Java OpenNLP tools, released in April 2005. Development of the Java library is ongoing, and I hope to update the C# port as new developments occur.
Tools included in the C# port are: a sentence splitter, a tokenizer, a part-of-speech tagger, a chunker (used to "find non-recursive syntactic annotations such as noun phrase chunks"), a parser, and a name finder. The Java library also includes a tool for co-reference resolution, but the code for this feature is in flux and has not yet been ported to C#. All of these tools are driven by maximum entropy models processed by the SharpEntropy library.
