Natural language processing (NLP) allows applications to interact with human language using a deep learning algorithm. NLP algorithms input language and can give a variety of outputs based on the learned required task. These outputs can include automatic summarization, language translation, part-of-speech tagging, parsing or grammatical analysis, and sentiment analysis, among others. NLP algorithms can also provide voice recognition and natural language generation, which converts data into understandable human language. Some examples of NLP uses include chatbots, translation applications, and social media monitoring tools that scan Facebook and Twitter for mentions. Natural language processing algorithms are an example of a deep learning algorithim and may be a pre-built offering in anAI platform.

To qualify for inclusion in the Natural Language Processing category, a product must:

  • Provide a deep learning algorithm specifically for human language interaction
  • Connect with language data pools to learn a specific solution or function
  • Consume the language as an input and provide an outputted solution

Natural Language Processing (NLP) Software Grid® Overview

The best Natural Language Processing (NLP) Software products are determined by customer satisfaction (based on user reviews) and market presence (based on products’ scale, focus, and influence) and placed into four categories on the Grid®:
  • Products in the Leader quadrant are rated highly by G2 Crowd users and have substantial Market Presence scores. Leaders include: Google Cloud Translation API
  • High Performers are highly rated by their users, but have not yet achieved the Market Presence of the Leaders High Performers include: spaCy
  • Contenders have significant Market Presence and resources, but have received below average user Satisfaction ratings or have not yet received a sufficient number of reviews to validate the solution. Contenders include: NVivo
  • Niche solutions do not have the Market Presence of the Leaders. They may have been rated positively on customer Satisfaction, but have not yet received enough reviews to validate them. Niche products include: IBM SPSS Text Analytics for Surveys (IBM Stafs)
G2 Crowd Grid® for Natural Language Processing (NLP)
High Performers
Market Presence
    Dynamically translate between thousands of available language pairs

    spaCy is a Python NLP library that helps user get their work out of papers and into production.

    See the big picture fast with NVivo 12 – the most powerful software for gaining richer insights from qualitative and mixed-methods data. Purpose-built software for qualitative and mixed-methods research.

    IBM SPSS Text Analytics for Surveys software lets you transform unstructured survey text into quantitative data and gain insight using sentiment analysis. The solution uses natural language processing (NLP) technologies specifically designed for survey text.

    FuzzyWuzzy is a Fuzzy String Matching in Python that uses Levenshtein Distance to calculate the differences between sequences

    IBM Watson Tone Analyzer is a service that uses linguistic analysis to detect three types of tones from text: emotion, social tendencies, and language style, emotions identified include things like anger, fear, joy, sadness, and disgust, identified social tendencies include things from the Big Five personality traits used by some psychologists includi openness, conscientiousness, extroversion, agreeableness, and emotional range and identified language styles include confident, analytical, and tentative.

    MITIE: MIT Information Extraction is a tool that include performing named entity extraction and binary relation detection for training custom extractors and relation detectors.

    Text-Processing is a sentiment analysis, stemming and lemmatization, part-of-speech tagging and chunking, phrase extraction and named entity recognition.

    Epic is a high performance statistical parser written in Scala, along with a framework for building complex structured prediction models.

    Microsoft Language Understanding Intelligent Service (LUIS) is a service that enable user to quickly deploy an HTTP endpoint that will take the sentences being send and interpret them in terms of the intention they convey and the key entities that are present, it has a web interface that can custom design a set of intentions and entities that are relevant to an application and guide ser through the process of building a language understanding system.

    NLTK is a platform for building Python programs to work with human language data that provides interfaces to corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.

    retext is an ecosystem of plug-ins for processing natural language.

    TextBlob is a Python (2 and 3) library for processing textual data that provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

    Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. Amazon Comprehend identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; and automatically organizes a collection of text files by topic.

    Apache cTAKES is a natural language processing system for extraction of information from electronic medical record clinical free-text.

    Azure Translator Speech API, part of the Microsoft Cognitive Services API collection, is a cloud-based machine translation service. The API enables businesses to add end-to-end, real-time, speech translations to their applications or services.

    Azure Translator Text API is a cloud-based machine translation service supporting multiple languages.Translator is used to build applications, websites, tools, or any solution requiring multilanguage support.

    ClearTK is a Machine Learning for UIMA it is a framework for developing machine learning and natural language processing components within the Apache Unstructured Information Management Architecture.

    Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala it integrated with Hadoop and Spark, to be used in business environments on distributed GPUs and CPUs that aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for non-researchers.

    Frog is an integration of memory-based natural language processing (NLP) that tokenize, tag, lemmatize, and morphologically segment word tokens in Dutch text files, will assign a dependency graph to each sentence, will identify the base phrase chunks in the sentence, and will attempt to find and label all named entities.

    Ngram is an index for golang that support unicode, append only. Data can't be deleted from index, GC friendly (all strings are pooled and compressed) and Application agnostic (there is no notion of document or something that user needs to implement).

    Derive insights from unstructured text using Google machine learning

    IBM Watson Natural Language Classifier is a service that enables developers without a background in machine learning or statistical algorithms to create natural language interfaces for their applications, interprets the intent behind text and returns a corresponding classification with associated confidence levels and the return value can then be used to trigger a corresponding action, such as redirecting the request or answering a question.

    Jellyfish is a python library for doing approximate and phonetic matching of strings.

    Kapiche uses the power of Natural Language Processing to analyse your unstructured data, letting you get on with the process of creating recommendations. Be it open survey responses, online reviews, or social media, unstructured data is the key to knowing what your customers want. However, drawing this information into a readily understood format can be difficult and time consuming. That’s where Kapiche fills the gap.

    Knwl.js is a Javascript library that parses through text for dates, times, phone numbers, emails, places, and more.

    MALLET is a machine Learning for LanguagE Toolkit it is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

    Microsoft Bing Spell Check API is a tool that help users correct spelling errors, recognize the difference among names, brand names, and slang, as well as understand homophones as they're typing.

    Microsoft Linguistic Analysis APIs is a tool that provide access to natural language processing (NLP) that identify the structure of text and it provides three types of analysis:Sentence separation and tokenization, Part-of-speech tagging and Constituency parsing.

    Microsoft Web Language Model API is a REST-based cloud service that provide tools for natural language processing, using this API, users application can leverage the power of big data through language models trained on web-scale corpora collected by Bing in the EN-US market.

    Natural is a general natural language facility for nodejs that support tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, and some inflections.

    Natural language Understanding Toolkit (nut) is an implementation of Cross-Language Structural Correspondence Learning (CLSCL)

    All the talk about qualitative data analysis is for naught if you can’t understand language as it is spoken. That is what Natural Language Processing (NLP) is all about. NewSci NLP brings this power to organization’s seeking to extract insights from their unstructured data. Just as you know what a person is saying when you hear, “I’m hungry, I want an apple” vs. “I really want an Apple™ instead of a PC,” so now can a computer. NewSci NLP enables a computer to understand the people, places, and things important to your organization. This, in turn, allows your unstructured data to be analyzed just like your structured data. With NewSci NLP your organization will enjoy qualitative analysis (the Why behind the numbers) alongside your quantitative analytics. Uses models customized to your organization; the domain in which you operate; the quality of your recordings; and even local and regional dialects to deliver the highest level of transcription accuracy. Captures your organization’s domain and unique characteristics to enable deep Natural Language Understanding analysis and Natural Language Generation. Your NewSci Ontology will be your Rosetta Stone for unlocking the value hidden in your unstructured data. The NewSci Insight Reservoir™ brings governance and insight to the data lake. You enjoy all the benefits of a state-of-the-art Big Data lake including access to hundreds of data connectors for ingesting information; transformation tools for quality assurance and data enhancement; and cataloging of your data down to the field level while at the same time having unmatched data governance capabilities: Unlike a passive data lake, the NewSci Insight Reservoir™ is a powerful cognitive computing platform where you can perform machine learning; deep learning; and natural language processing on all your structured and unstructured data. NewSci NLP connects directly to your NewSci Insight Reservoir™ to extract meaning from your text and make it available for analysis. Machine and Deep Learning algorithms can be created, and perfected, as data enters the Insight Reservoir™, increasing the value in real-time. And all of the insights can easily be made available for visualization tools including Tableau®, Qlik®, and MS Power- BI®. Jump out of the data lake and get your organization into the NewSci Insight Reservoir™

    nlpjs is a JavaScript natural language processing library.

    Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text that supports the common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution these tasks are usually required to build more advanced text processing services and includes maximum entropy and perceptron based machine learning.

    PyNLPl is a Python library for Natural Language Processing that contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model.

    SnowNLP is a library written in python that simplifies Chinese text processing.

    Stanford NER is a Java implementation of a Named Entity Recognizer that labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names.

    Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'.

    TokensRegex is a generic framework defining patterns over text (sequences of tokens) and mapping it to semantic objects represented as Java objects thta emphasizes describing text as a sequence of tokens (words, punctuation marks, etc.), which may have additional attributes, and writing patterns over those tokens, rather than working at the character level, as with standard regular expression packages.

    Stanford Topic Modeling Toolbox (TMT) brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component, it has the ability to import and manipulate text from cells in Excel and other spreadsheets, train topic models (LDA, Labeled LDA, and PLDA new) to create summaries of the text, select parameters (such as the number of topics) via a data-driven process and generate rich Excel-compatible outputs for tracking word usage across topics, time, and other groupings of data.

    Stanford Word Segmenter currently supports Arabic and Chinese that provided segmentation schemes have been found to work well for a variety of applications the system requires Java 1.8+ to be installed, it recommend at least 1G of memory for documents that contain long sentences. For files with shorter sentences (e.g., 20 tokens), decrease the memory requirement by changing the option java -mx1g in the run scripts.

    textacy is a Python library for performing higher-level natural language processing (NLP) tasks, built on the high-performance spaCy library that has tokenization, part-of-speech tagging, dependency parsing, etc. offloaded to another library, textacy focuses on tasks facilitated by the ready availability of tokenized, POS-tagged, and parsed text.

    TextAnalysis.jl is a manual designed to allow user started doing text analysis in Julia to assumes that user is already familiar with the basic methods of text analysis.

    Treat is a toolkit for natural language processing and computational linguistics in Ruby that build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition.

    Stanford Parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as "phrases") and which words are the subject or object of a verb.

    Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first.

    ABBYY Mobile OCR Engine enables developers to integrate Optical Character Recognition (OCR) into mobile and small-footprint applications. Enables images and photographs to be transformed into searchable and editable document formats.

    Intelligent Service Robot is a dialog platform that enables smart dialog through various dialog-enabling clients, such as websites, mobile apps, and robots. Users can use domain-specific knowledge bases, configure their own knowledge base for customized smart dialogs and use Intelligent Service Robot to facilitate self-service through multi-round dialog. Intelligent Service Robot can also integrate with third-party APIs to enable complex scenarios such as order search, shipping tracking, and self-service returns

    Build a model tailored to your solution, then deploy and maintain it with ease

    Breeze is a numerical processing library for Scala.

    BLLIP Parser is a statistical natural language parser including a generative constituent parser (``first-stage``) and discriminative maximum entropy reranker (``second-stage``).

    Botsplash is designed to enable users to engage their customers over NLP and AI powered chat platform.

    cogcomp-nlp is a Natural Language Processing libraries that contains detailed readme and instructions on how to use it.

    Cogito API is a ready to deploy and fully configured API series that helps developers accelerate creation and deployment of unique applications that leverage large volumes of unstructured information from multiple sources. Cogito API is easily deployed or integrated for faster evaluation and analysis of content such as web pages, social media data or any big data sets or real-time information streams.

    Colibri Core is software that count and extract patterns from large corpus data, to extract various statistics on the extracted patterns, and to compute relations between the extracted patterns. has wrapped its Retina Engine into an easy-to-use, powerful platform for fast semantic search, semantic classification and semantic filtering that can process any kind of text, independently of language and length it enables user to process terabytes of data orders of magnitude faster than other methods.

    CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data, CRF++ is designed for generic purpose and will be applied to a variety of NLP tasks, such as Named Entity Recognition, Information Extraction and Text Chunking.

    Datumbox API offers a large number of off-the-shelf Classifiers and Natural Language Processing services which can be used in a broad spectrum of applications including: Sentiment Analysis, Topic Classification, Language Detection, Subjectivity Analysis, Spam Detection, Reading Assessment, Keyword and Text Extraction and more.

    ezCAC is an NLP-based HIPPA compliant computer-assisted coding software. It is designed to bring hospital clinical data and all patient related data together in one intuitive Enterprise Computer-Assisted Coding (CAC) platform.

    ezCDI is a NLP-based computer-assisted clinical documentation improvement software.

    FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala that provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.

    FoLiA is an XML-based annotation format tool that is suitable for the representation of linguistically annotated language resources, its intended use is as a format for storing and/or exchanging language resources, including corpora.

    Grooper is a document capture and data transformation software platform that incorporates modern technology to help companies manage their documents and data.

    IBM Watson Discovery News provides news and blog content that enriched with natural language processing to allow for highly targeted search and trend analysis.

    AI-powered virtual assistant that automatically listens in and captures key moments, notes and insights from every meeting you have.

    Inbenta, a global leader in artificial intelligence, utilizes patented natural language processing technology to provide a highly accurate search solution for customer support, e-commerce and chatbots. Inbenta's semantic search engine understands & delivers results based on the meaning behind customers’ search queries, not the individual keywords, leading to improved customer satisfaction, lower support costs and stronger ROI. The result: industry-leading 90%+ self-service rates.

    Intellexer SDK incorporates natural language processing tools for semantic analysis of unstructured text data is a platform that indexes voice data to make it searchable and automatically records and transcribes calls so nothing is ever forgotten or lost.

    KoNLPy is a Python package for natural language processing (NLP) of the Korean language.

    LingPipe is a tool kit for processing text using computational linguistics that is used to do tasks like: Find the names of people, organizations or locations in news, Automatically classify Twitter search results into categories and Suggest correct spellings of queries.

    Merlin is a deep learning framework written in Julia, it aims to provide a fast, flexible and compact deep learning library for machine learning.

    MeTA is a modern C++ data sciences toolkit that allow text tokenization, including deep semantic features like parse trees, inverted and forward indexes with compression and various caching strategies, a collection of ranking functions for searching the indexes, topic models, classification algorithms, graph algorithms, language models, CRF implementation (POS-tagging, shallow parsing), wrappers for liblinear and libsvm (including libsvm dataset parsers), UTF8 support for analysis on various languages and .multithreaded algorithms

    MonkeyLearn is an AI platform that allows you analyze text with Machine Learning to automate business workflows and save hours of manual data processing.

    MXNet is a Flexible and Efficient Library for Deep Learning that supports both imperative and symbolic programming, calculates the gradient automatically for training a model, runs on CPUs or GPUs, on clusters, servers, desktops, or mobile phones and supports distributed training on multiple CPU/GPU machines, including AWS, GCE, Azure, and Yarn clusters.

    Natural Language Processing for JVM languages (NLP4J) provides a tools readily available for research in various disciplines, Frameworks for fast development of efficient and robust NLP components and API for manipulating computational structures in NLP (e.g., dependency graph).

    NLP Compromise is a natural language processing on the clientside.

    Omnitraq extracts critical business insights through our award-winning and patented technology from call center calls, web media, video, audio, and text data. By delivering these insights at low cost, with speed, and at scale, Omnitraq can provide both SMB and Enterprise clients with a suite of affordable and high impact BI tools.

    Puck is a high-speed, high-accuracy parser for natural languages use with grammars trained with the Berkeley Parser and on NVIDIA cards.

    Reading Buddy Software™ is advanced speech recognition reading software that listens, responds, and teaches as your child reads. It’s like having a tutor in your computer.

    Salience is a text analytics engine that integrate with users application or put behind firewall.

    Sonix is an online platform that combines automated transcription and editing. We built the world's first AudioText Editor™ that allows users to edit audio in a revolutionary new way: Edit audio by editing text. Sonix integrates with Adobe Audition, Adobe Premiere, Final Cut Pro, Audacity, and Hindenburg.

    Spitch is a Swiss provider of solutions based on Automatic Speech Recognition (ASR) and voice biometrics, Voice User Interfaces (VUI), and natural language voice data analytics.

    Stanford CoreNLP provides a set of natural language analysis tools that can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract open-class relations between mentions, etc.

    Stanford.NLP.NET is a toolkits for various major computational linguistics problems that can be incorporated into applications with human language technology needs.

    Stanford Phrasal is a statistical phrase-based machine translation system, written in Java that provides much the same functionality as the core of Moses it include: providing an easy to use API for implementing new decoding model features, the ability to translating using phrases that include gaps (Galley et al. 2010), and conditional extraction of phrase-tables and lexical reordering models.

    Stanford Pattern-based Information Extraction and Diagnostics (SPIED) is a pattern-based entity extraction and visualization that provides code for two components, Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion and Visualizing and diagnosing the output from one to two systems.

    Stanford Tokenizer is an ancillary tool that uses tokenization to provide the ability to split text into sentences. PTBTokenizer mainly targets formal English writing rather than SMS-speak.

    SUTime is a library for recognizing and normalizing time expressions it can be used to annotate documents with temporal information. It is a deterministic rule-based system designed for extensibility.

    Synthesys is a solution that adds the brainpower of thousands of people to a team. by reading through all data and highlights the important people, places, organizations, events and facts being discussed, resolve highlighted points and determines what's important, connecting the dots together and figures out what the final picture means by comparing it with the opportunities, risks and anomalies that are looking for.

    ParallelDots Text Analytics APIs provide convenient and diverse set of Natural Language Understanding (NLU) algorithms in fourteen different languages to find sentiment or emotion of any document, find prominent entities in them or remove expletives from them.

    ThickStat's LIBRO is a voice enabled solution to modernize the experience with with public Libraries and increase user involvement. LIBRO brings the library services through Amazon Alexa/Echo and also through a mobile application.

    TiMBL is an open source software that is used in natural language processing as a machine learning classifier component, but its use extends to virtually any supervised machine learning domain.

    Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions"). Tregex comes with Tsurgeon, a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph, called semgrex.

    Ucto is a tool that tokenizes text files: it separates words from punctuation, and splits sentences, it offers several other basic preprocessing steps such as changing case that can all use to make text suited for further processing such as indexing, part-of-speech tagging, or machine translation.

    Unbabel is a AI-powered human translations designed for the translation of dynamic content like emails, support tickets and knowledge centers.

    VoiceBase is defining the future of deep learning and communications by providing unparalleled access to spoken information for businesses to make better decisions. With flexible APIs developers and enterprises build scalable solutions with VoiceBase by embedding speech-to-text, conversational analytics, and predictive analytics capabilities into any big voice application. VoiceBase’s customers include Amazon Web Services, Twilio, Nasdaq, HireVue and Veritone. The company is privately held and is based in San Francisco, California.

