Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'.
Stanford Classifier is a machine learning tool that will take data items and place them into one of k classes it can also give a probability distribution over the class assignment for a data item.
Stanford CoreNLP provides a set of natural language analysis tools that can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract open-class relations between mentions, etc.
Stanford Phrasal is a statistical phrase-based machine translation system, written in Java that provides much the same functionality as the core of Moses it include: providing an easy to use API for implementing new decoding model features, the ability to translating using phrases that include gaps (Galley et al. 2010), and conditional extraction of phrase-tables and lexical reordering models.
Stanford Pattern-based Information Extraction and Diagnostics (SPIED) is a pattern-based entity extraction and visualization that provides code for two components, Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion and Visualizing and diagnosing the output from one to two systems.
Stanford Tokenizer is an ancillary tool that uses tokenization to provide the ability to split text into sentences. PTBTokenizer mainly targets formal English writing rather than SMS-speak.
TokensRegex is a generic framework defining patterns over text (sequences of tokens) and mapping it to semantic objects represented as Java objects thta emphasizes describing text as a sequence of tokens (words, punctuation marks, etc.), which may have additional attributes, and writing patterns over those tokens, rather than working at the character level, as with standard regular expression packages.
Stanford Topic Modeling Toolbox (TMT) brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component, it has the ability to import and manipulate text from cells in Excel and other spreadsheets, train topic models (LDA, Labeled LDA, and PLDA new) to create summaries of the text, select parameters (such as the number of topics) via a data-driven process and generate rich Excel-compatible outputs for tracking word usage across topics, time, and other groupings of data.
Stanford Word Segmenter currently supports Arabic and Chinese that provided segmentation schemes have been found to work well for a variety of applications the system requires Java 1.8+ to be installed, it recommend at least 1G of memory for documents that contain long sentences. For files with shorter sentences (e.g., 20 tokens), decrease the memory requirement by changing the option java -mx1g in the run scripts.
SUTime is a library for recognizing and normalizing time expressions it can be used to annotate documents with temporal information. It is a deterministic rule-based system designed for extensibility.
Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions"). Tregex comes with Tsurgeon, a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph, called semgrex.