Natural language processing (NLP) allows applications to interact with human language using a deep learning algorithm. NLP algorithms input language and can give a variety of outputs based on the learned required task. These outputs can include automatic summarization, language translation, part-of-speech tagging, parsing or grammatical analysis, and sentiment analysis, among others. NLP algorithms can also provide voice recognition and natural language generation, which converts data into understandable human language. Some examples of NLP uses include chatbots, translation applications, and social media monitoring tools that scan Facebook and Twitter for mentions. Natural language processing algorithms are an example of a deep learning algorithim and may be a pre-built offering in anAI platform.
To qualify for inclusion in the Natural Language Processing category, a product must:
Natural Language Processing (NLP) reviews by real, verified users. Find unbiased ratings on user satisfaction, features, and price based on the most reviews available anywhere.
IBM SPSS Text Analytics for Surveys software lets you transform unstructured survey text into quantitative data and gain insight using sentiment analysis. The solution uses natural language processing (NLP) technologies specifically designed for survey text.
IBM Watson Tone Analyzer is a service that uses linguistic analysis to detect three types of tones from text: emotion, social tendencies, and language style, emotions identified include things like anger, fear, joy, sadness, and disgust, identified social tendencies include things from the Big Five personality traits used by some psychologists includi openness, conscientiousness, extroversion, agreeableness, and emotional range and identified language styles include confident, analytical, and tentative.
NLTK is a platform for building Python programs to work with human language data that provides interfaces to corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
TextBlob is a Python (2 and 3) library for processing textual data that provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. Amazon Comprehend identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; and automatically organizes a collection of text files by topic.
Azure Translator Speech API, part of the Microsoft Cognitive Services API collection, is a cloud-based machine translation service. The API enables businesses to add end-to-end, real-time, speech translations to their applications or services.
Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala it integrated with Hadoop and Spark, to be used in business environments on distributed GPUs and CPUs that aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for non-researchers.
Frog is an integration of memory-based natural language processing (NLP) that tokenize, tag, lemmatize, and morphologically segment word tokens in Dutch text files, will assign a dependency graph to each sentence, will identify the base phrase chunks in the sentence, and will attempt to find and label all named entities.
IBM Watson Document Conversion is a service that provides an Application Programming Interface (API) that enables developers to transform a document into a new format, it is a single PDF, Word, or HTML document and the output is an HTML document, a Text document, or Answer units that can be used with other Watson services.
IBM Watson Language Translator is a service that provides domain-specific translation utilizing Statistical Machine Translation techniques, it offers multiple domain-specific translation models, plus three levels of self-service customization for text with very specific language.
IBM Watson Natural Language Classifier is a service that enables developers without a background in machine learning or statistical algorithms to create natural language interfaces for their applications, interprets the intent behind text and returns a corresponding classification with associated confidence levels and the return value can then be used to trigger a corresponding action, such as redirecting the request or answering a question.
Microsoft Language Understanding Intelligent Service (LUIS) is a service that enable user to quickly deploy an HTTP endpoint that will take the sentences being send and interpret them in terms of the intention they convey and the key entities that are present, it has a web interface that can custom design a set of intentions and entities that are relevant to an application and guide ser through the process of building a language understanding system.
Microsoft Linguistic Analysis APIs is a tool that provide access to natural language processing (NLP) that identify the structure of text and it provides three types of analysis:Sentence separation and tokenization, Part-of-speech tagging and Constituency parsing.
Microsoft Web Language Model API is a REST-based cloud service that provide tools for natural language processing, using this API, users application can leverage the power of big data through language models trained on web-scale corpora collected by Bing in the EN-US market.
Natural language Understanding Toolkit (nut) is an implementation of Cross-Language Structural Correspondence Learning (CLSCL)
Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text that supports the common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution these tasks are usually required to build more advanced text processing services and includes maximum entropy and perceptron based machine learning.
Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'.
TokensRegex is a generic framework defining patterns over text (sequences of tokens) and mapping it to semantic objects represented as Java objects thta emphasizes describing text as a sequence of tokens (words, punctuation marks, etc.), which may have additional attributes, and writing patterns over those tokens, rather than working at the character level, as with standard regular expression packages.
Stanford Topic Modeling Toolbox (TMT) brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component, it has the ability to import and manipulate text from cells in Excel and other spreadsheets, train topic models (LDA, Labeled LDA, and PLDA new) to create summaries of the text, select parameters (such as the number of topics) via a data-driven process and generate rich Excel-compatible outputs for tracking word usage across topics, time, and other groupings of data.
Stanford Word Segmenter currently supports Arabic and Chinese that provided segmentation schemes have been found to work well for a variety of applications the system requires Java 1.8+ to be installed, it recommend at least 1G of memory for documents that contain long sentences. For files with shorter sentences (e.g., 20 tokens), decrease the memory requirement by changing the option java -mx1g in the run scripts.
textacy is a Python library for performing higher-level natural language processing (NLP) tasks, built on the high-performance spaCy library that has tokenization, part-of-speech tagging, dependency parsing, etc. offloaded to another library, textacy focuses on tasks facilitated by the ready availability of tokenized, POS-tagged, and parsed text.
Treat is a toolkit for natural language processing and computational linguistics in Ruby that build a language- and algorithm- agnostic NLP framework for Ruby with support for tasks such as document retrieval, text chunking, segmentation and tokenization, natural language parsing, part-of-speech tagging, keyword extraction and named entity recognition.
Captricity’s AI-powered automation enables paper to travel at the speed of digital. Captricity is used by eight of the top ten U.S. insurance companies and other enterprises to extract and enhance data from any customer channel—including handwritten documents—and deliver it seamlessly into downstream business systems.
Cogito API is a ready to deploy and fully configured API series that helps developers accelerate creation and deployment of unique applications that leverage large volumes of unstructured information from multiple sources. Cogito API is easily deployed or integrated for faster evaluation and analysis of content such as web pages, social media data or any big data sets or real-time information streams.
Cortical.io has wrapped its Retina Engine into an easy-to-use, powerful platform for fast semantic search, semantic classification and semantic filtering that can process any kind of text, independently of language and length it enables user to process terabytes of data orders of magnitude faster than other methods.
CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data, CRF++ is designed for generic purpose and will be applied to a variety of NLP tasks, such as Named Entity Recognition, Information Extraction and Text Chunking.
Datumbox API offers a large number of off-the-shelf Classifiers and Natural Language Processing services which can be used in a broad spectrum of applications including: Sentiment Analysis, Topic Classification, Language Detection, Subjectivity Analysis, Spam Detection, Reading Assessment, Keyword and Text Extraction and more.
Inbenta, a global leader in artificial intelligence, utilizes patented natural language processing technology to provide a highly accurate search solution for customer support, e-commerce and chatbots. Inbenta's semantic search engine understands & delivers results based on the meaning behind customers’ search queries, not the individual keywords, leading to improved customer satisfaction, lower support costs and stronger ROI. The result: industry-leading 90%+ self-service rates.
Kapiche uses the power of Natural Language Processing to analyse your unstructured data, letting you get on with the process of creating recommendations. Be it open survey responses, online reviews, or social media, unstructured data is the key to knowing what your customers want. However, drawing this information into a readily understood format can be difficult and time consuming. That’s where Kapiche fills the gap.
LingPipe is a tool kit for processing text using computational linguistics that is used to do tasks like: Find the names of people, organizations or locations in news, Automatically classify Twitter search results into categories and Suggest correct spellings of queries.
MeTA is a modern C++ data sciences toolkit that allow text tokenization, including deep semantic features like parse trees, inverted and forward indexes with compression and various caching strategies, a collection of ranking functions for searching the indexes, topic models, classification algorithms, graph algorithms, language models, CRF implementation (POS-tagging, shallow parsing), wrappers for liblinear and libsvm (including libsvm dataset parsers), UTF8 support for analysis on various languages and .multithreaded algorithms
MXNet is a Flexible and Efficient Library for Deep Learning that supports both imperative and symbolic programming, calculates the gradient automatically for training a model, runs on CPUs or GPUs, on clusters, servers, desktops, or mobile phones and supports distributed training on multiple CPU/GPU machines, including AWS, GCE, Azure, and Yarn clusters.
Natural Language Processing for JVM languages (NLP4J) provides a tools readily available for research in various disciplines, Frameworks for fast development of efficient and robust NLP components and API for manipulating computational structures in NLP (e.g., dependency graph).
Omnitraq extracts critical business insights through our award-winning and patented technology from call center calls, web media, video, audio, and text data. By delivering these insights at low cost, with speed, and at scale, Omnitraq can provide both SMB and Enterprise clients with a suite of affordable and high impact BI tools.
Sonix is an online platform that combines automated transcription and editing. We built the world's first AudioText Editor™ that allows users to edit audio in a revolutionary new way: Edit audio by editing text. Sonix integrates with Adobe Audition, Adobe Premiere, Final Cut Pro, Audacity, and Hindenburg.
Stanford CoreNLP provides a set of natural language analysis tools that can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, and mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract open-class relations between mentions, etc.
Stanford Phrasal is a statistical phrase-based machine translation system, written in Java that provides much the same functionality as the core of Moses it include: providing an easy to use API for implementing new decoding model features, the ability to translating using phrases that include gaps (Galley et al. 2010), and conditional extraction of phrase-tables and lexical reordering models.
Stanford Pattern-based Information Extraction and Diagnostics (SPIED) is a pattern-based entity extraction and visualization that provides code for two components, Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion and Visualizing and diagnosing the output from one to two systems.
Synthesys is a solution that adds the brainpower of thousands of people to a team. by reading through all data and highlights the important people, places, organizations, events and facts being discussed, resolve highlighted points and determines what's important, connecting the dots together and figures out what the final picture means by comparing it with the opportunities, risks and anomalies that are looking for.
Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions"). Tregex comes with Tsurgeon, a tree transformation language. Also included from version 2.0 on is a similar package which operates on dependency graphs (class SemanticGraph, called semgrex.
Ucto is a tool that tokenizes text files: it separates words from punctuation, and splits sentences, it offers several other basic preprocessing steps such as changing case that can all use to make text suited for further processing such as indexing, part-of-speech tagging, or machine translation.
VoiceBase is defining the future of deep learning and communications by providing unparalleled access to spoken information for businesses to make better decisions. With flexible APIs developers and enterprises build scalable solutions with VoiceBase by embedding speech-to-text, conversational analytics, and predictive analytics capabilities into any big voice application. VoiceBase’s customers include Amazon Web Services, Twilio, Nasdaq, HireVue and Veritone. The company is privately held and is based in San Francisco, California.