Stanford Pattern-based Information Extraction and Diagnostics (SPIED) is a pattern-based entity extraction and visualization that provides code for two components, Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion and Visualizing and diagnosing the output from one to two systems.
Stanford CoreNLP provides a set of natural language analysis tools that can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract open-class relations between mentions, etc.
Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'.
Stanford Phrasal is a statistical phrase-based machine translation system, written in Java that provides much the same functionality as the core of Moses it include: providing an easy to use API for implementing new decoding model features, the ability to translating using phrases that include gaps (Galley et al. 2010), and conditional extraction of phrase-tables and lexical reordering models.
TokensRegex is a generic framework defining patterns over text (sequences of tokens) and mapping it to semantic objects represented as Java objects thta emphasizes describing text as a sequence of tokens (words, punctuation marks, etc.), which may have additional attributes, and writing patterns over those tokens, rather than working at the character level, as with standard regular expression packages.
Stanford Topic Modeling Toolbox (TMT) brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component, it has the ability to import and manipulate text from cells in Excel and other spreadsheets, train topic models (LDA, Labeled LDA, and PLDA new) to create summaries of the text, select parameters (such as the number of topics) via a data-driven process and generate rich Excel-compatible outputs for tracking word usage across topics, time, and other groupings of data.
Stanford Word Segmenter currently supports Arabic and Chinese that provided segmentation schemes have been found to work well for a variety of applications the system requires Java 1.8+ to be installed, it recommend at least 1G of memory for documents that contain long sentences. For files with shorter sentences (e.g., 20 tokens), decrease the memory requirement by changing the option java -mx1g in the run scripts.
Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions"). Tregex comes with Tsurgeon, a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph, called semgrex.