Machine learning algorithms make predictions or decisions based on data. These learning algorithms can be embedded within applications to provide automated, artificial intelligence (AI) features or be used in an AI platform to build brand new applications. In both cases, a connection to a data source is necessary for the algorithm to learn and adapt over time. There are many different types of machine learning algorithms that perform a variety of tasks and functions. These algorithms may consist of more specific machine learning algorithms, such as association rule learning, Bayesian networks, clustering, decision tree learning, genetic algorithms, learning classifier systems, and support vector machines, among others.
These learned algorithms may be developed with supervised learning or unsupervised learning. Supervised learning consists of training an algorithm to determine a pattern of inference by feeding it consistent data to produce a repeated, general output. Human training is necessary for this type of learning. Unsupervised learning, on the other hand, requires no consistency in the input of machine learning algorithms. Unsupervised algorithms independently reach an output and are a feature of deep learning algorithms. Reinforcement learning is the final form of machine learning, which consists of algorithms that understand how to react based on their situation or environment. For example, autonomous driving cars are an instance of reinforcement machine learning because they react based on their surroundings on the road. If a traffic light is red, the car stops. Machine learning algorithms are used by developers when using an AI platform to build an application or to embed AI within an existing application. End users of intelligent applications may not be aware that an everyday software tool is utilizing a machine learning algorithm to provide some form of automation. Additionally, machine learning solutions for businesses may come in a machine learning as a service model.
To qualify for inclusion in the Machine Learning category, a product must:
Machine Learning reviews by real, verified users. Find unbiased ratings on user satisfaction, features, and price based on the most reviews available anywhere.
Scikit-learn is a software machine learning library for the Python programming language that has a various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
Microsoft Bing Web Search API is a service that retrieve web documents indexed by Bing and narrow down the results by result type, freshness and more, it bring intelligent search to apps and harness the ability to comb billions of webpages, images, videos, and news with a single API call.
Crab as known as scikits.recommender is a Python framework for building recommender engines that integrate with the world of scientific Python packages (numpy, scipy, matplotlib), provide a rich set of components from which user can construct a customized recommender system from a set of algorithms and be usable in various contexts: ** science and engineering ** .
MLlib is Spark's machine learning (ML) library that make practical machine learning scalable and easy it provides ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering, feature extraction, transformation, dimensionality reduction, and selection, tools for constructing, evaluating, and tuning ML Pipelines, saving and load algorithms, models, and Pipelines and linear algebra, statistics, data handling, etc.
Our platform leverages human-in-the-loop practices to train, test, and tune machine learning models. At Figure Eight, we know that AI isn’t magic. We know what it takes to create AI that isn’t just a science project, but AI that works in the real world. And we provide the crucial ingredients that make it happen. We believe that AI is the combination of three important components: training data, machine learning, and humans-in-the-loop.
Microsoft Cognitive Toolkit is an open-source, commercial-grade toolkit that empowers user to harness the intelligence within massive datasets through deep learning by providing uncompromised scaling, speed and accuracy with commercial-grade quality and compatibility with the programming languages and algorithms already use.
XGBoost is an optimized distributed gradient boosting library that is efficient, flexible and portable, it implements machine learning algorithms under the Gradient Boosting framework and provides a parallel tree boosting(also known as GBDT, GBM) that solve many data science problems in a fast and accurate way.
Microsoft Machine Learning Server is your flexible enterprise platform for analyzing data at scale, building intelligent apps, and discovering valuable insights across your business with full support for Python and R. Machine Learning Server meets the needs of all constituents of the process – from data engineers and data scientists to line-of-business programmers and IT professionals. It offers a choice of languages and features algorithmic innovation that brings the best of open-source and proprietary worlds together
The ML-Agents SDK allows researchers and developers to transform games and simulations created using the Unity Editor into environments where intelligent agents can be trained using Deep Reinforcement Learning, Evolutionary Strategies, or other machine learning methods through a simple to use Python API.
Weka is a machine learning algorithms for data mining tasks that can either be applied directly to a dataset or called from own Java code, it contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization and well-suited for developing new machine learning schemes.
Bolt is a discriminative learning of linear predictors (e.g. SVM or Logistic Regression) that uses fast online learning algorithms to aimed large-scale, high-dimensional and sparse machine-learning problems. In particular, problems encountered in information retrieval and natural language processing.
HLearn is a high performance machine learning library written in Haskell to discover the "best possible" interface for machine learning. This involves two competing demands: The library should be as fast as low-level libraries written in C/C++/Fortran/Assembly; but it should be as flexible as libraries written in high level languages like Python/R/Matlab.
htm.java is a Hierarchical Temporal Memory implementation in Java - an official Community-Driven Java port of the Numenta Platform for Intelligent Computing (NuPIC) it provide a Java version of NuPIC that has a 1-to-1 correspondence to all systems, functionality and tests provided by Numenta's open source implementation; while observing the tenets, standards and conventions of Java language best practices and development.
Intel Data Analytics Acceleration Library (or Intel DAAL) is a software development library that is highly optimized for Intel architecture processors it provides building blocks for all data analytics stages, from data preparation to data mining and machine learning.
Microsoft Academic Knowledge API is a service that allow user to interpret queries for academic intent and retrieve rich information from the Microsoft Academic Graph (MAG), it is a knowledge base web-scale heterogeneous entity graph comprised of entities that model scholarly activities: field of study, author, institution, paper, venue, and event.
Pattern is a web mining module for the Python programming language that has a tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and visualization.
Sparkling Water is a tool that allows users to combine the fast, scalable machine learning algorithms of H2O with the capabilities of Spark, users can drive computation from Scala/R/Python and utilize the H2O Flow UI, providing an ideal machine learning platform for application developers.
Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine is an open source software library for training and deploying recommendation models with sparse inputs that is fully connected hidden layers, and sparse outputs. Models with weight matrices that are too large for a single GPU can still be trained on a single host it has been used at Amazon to generate personalized product recommendations for Amazon customer It is designed for production deployment of real-world applications which need to emphasize speed and scale over experimental flexibility.
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point and it creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.
Apache SAMOA is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms it enables development of new ML algorithms without directly dealing with the complexity of underlying distributed stream processing engines (DSPEe, such as Apache Storm, Apache Flink, and Apache Samza) users can develop distributed streaming ML algorithms once and execute them on multiple DSPEs.
Apache SystemML is a machine learning platform optimal for big data that provides an optimal workplace for machine learning using big data, it can be run on top of Apache Spark, where it automatically scales your data, line by line, determining whether your code should be run on the driver or an Apache Spark cluster.
Azure Bing Custom Search is an easy-to-use, ad-free custom search tool that lets you deliver the search results you want. Bing Custom Search allows you to select the slices of the web that you want to search over and control the ranking when searching over your targeted web space.
The DataRobot automated machine learning platform captures the knowledge, experience and best practices of the world’s leading data scientists to deliver unmatched levels of automation and ease-of-use for machine learning initiatives. DataRobot enables users of all skill levels – from business people to analysts to data scientists – to build and deploy highly-accurate machine learning models in a fraction of the time of traditional modeling methods.
DecisionTree.jl is a Julia classifier with the implimentation of the ID3 algorithm with post pruning (pessimistic pruning), parallelized bagging (random forests), adaptive boosting (decision stumps), cross validation (n-fold) and support for mixed nominal and numerical data.
Dlib Machine Learning is a tool that contains a wide range of machine learning algorithms, designed to be highly modular, quick to execute, and simple to use via a clean and modern C++ API and used in a wide range of applications including robotics, embedded devices, mobile phones, and large high performance computing environments.
Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data, its training algoritms are multi-threaded and scale well to multicore hardware and can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train machine learning algorithms.
gago is a genetic algorithm library written in Go that is architectured in a modular way, allows using different evolutionary models, includes speciation, migration and parallel populations, allows implementing custom genetic operators, has no external dependencies, got a high test coverage and actively maintained and will remain one my priorities for a very long time.
Learning Based Java is a modeling language for the rapid development of software systems with one or more learned functions, designed for use with the JavaTM programming language that offers a convenient, declarative syntax for classifier and constraint definition directly in terms of the objects in the programmer's application.
Apache Mahout is a software that build an environment for quickly creating scalable performant machine learning applications, it provides three major features: A simple and extensible programming environment and framework for building scalable algorithms, A wide variety of premade algorithms for Scala + Apache Spark, H2O, Apache Flink and Samsara, a vector math experimentation environment with R-like syntax which works at scale
Microsoft Entity Linking Intelligence Service is a web service that help developers with tasks relating to entity linking, given a specific paragraph within a document,this service will recognize and identify each separate entity based on its context.
Microsoft Knowledge Exploration Service is a service that offers a fast and effective way to add interactive search and refinement to applications, it allows user to build a compressed index from structured data, author a grammar that interprets natural language queries, and provide interactive query formulation with auto-completion suggestions.
Milk is a machine learning toolkit in Python that focuses on supervised classification with several classifiers available: SVMs (based on libsvm), k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be combined in many ways to form different classification systems.
All the talk about qualitative data analysis is for naught if you can’t understand language as it is spoken. That is what Natural Language Processing (NLP) is all about. NewSci NLP brings this power to organization’s seeking to extract insights from their unstructured data. Just as you know what a person is saying when you hear, “I’m hungry, I want an apple” vs. “I really want an Apple™ instead of a PC,” so now can a computer. NewSci NLP enables a computer to understand the people, places, and things important to your organization. This, in turn, allows your unstructured data to be analyzed just like your structured data. With NewSci NLP your organization will enjoy qualitative analysis (the Why behind the numbers) alongside your quantitative analytics. Uses models customized to your organization; the domain in which you operate; the quality of your recordings; and even local and regional dialects to deliver the highest level of transcription accuracy. Captures your organization’s domain and unique characteristics to enable deep Natural Language Understanding analysis and Natural Language Generation. Your NewSci Ontology will be your Rosetta Stone for unlocking the value hidden in your unstructured data. The NewSci Insight Reservoir™ brings governance and insight to the data lake. You enjoy all the benefits of a state-of-the-art Big Data lake including access to hundreds of data connectors for ingesting information; transformation tools for quality assurance and data enhancement; and cataloging of your data down to the field level while at the same time having unmatched data governance capabilities: Unlike a passive data lake, the NewSci Insight Reservoir™ is a powerful cognitive computing platform where you can perform machine learning; deep learning; and natural language processing on all your structured and unstructured data. NewSci NLP connects directly to your NewSci Insight Reservoir™ to extract meaning from your text and make it available for analysis. Machine and Deep Learning algorithms can be created, and perfected, as data enters the Insight Reservoir™, increasing the value in real-time. And all of the insights can easily be made available for visualization tools including Tableau®, Qlik®, and MS Power- BI®. Jump out of the data lake and get your organization into the NewSci Insight Reservoir™
Pattern Recognition and Machine Learning is a Matlab implementation of the algorithms.
Push is a programming language designed for evolutionary computation, to be used as the programming language within which evolving programs are expressed it is a stack-based execution architecture in which there is a separate stack for each data type that allow programs to manipulate their own code as they run and thereby to implement arbitrary and potentially novel control structures.
Random Forest is an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of descision trees at training time and outputing the class that is the mode of classes (classification) or mean prediction (regression) or the individual trees.
Reproducible Experiment Platform (REP) is a software infrastructure to support collaborative ecosystem for computational science it is a Python based solution for research teams that allows running computational experiments on shared datasets, obtaining repeatable results, and consistent comparisons of the obtained results.
RGP is a simple modular Genetic Programming (GP) system build in pure R this system supports Symbolic Regression by GP through the familiar R model formula interface, GP individuals are represented as R expressions, an (optional) type system enables domain-specific function sets containing functions of diverse domain- and range types and is a basic set of genetic operators for variation (mutation and crossover) and selection is provided.
SHARK is a fast, modular, feature-rich open-source C++ machine learning library that provides methods for linear and nonlinear optimization, kernel-based learning algorithms, neural networks, and various other machine learning techniques and is compatible with Windows, Solaris, MacOS X, and Linux.
Steam AI engine is an end-to-end platform that streamlines the entire process of building and deploying smart applications, data scientists and developers can launch turnkey compute environments for collaboratively training and deploying predictive models and integrate those models into real-time smart applications.
tgp is a Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM) in special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP single-index models.
Libra Toolkit is a collection of algorithms for learning and inference with discrete probabilistic models, including Bayesian networks (BNs), Markov networks (MNs), dependency networks (DNs), sum-product networks (SPNs), and arithmetic circuits (ACs), it focuses more on structure learning, especially for tractable models in which exact inference is efficient. Each algorithm in Libra is implemented as a command-line program suitable for interactive use or scripting, with consistent options and file formats throughout the toolkit.
topik is a topic modeling toolbox that provide a full suite and high-level interface for applying topic modeling, it includes many utilities beyond statistical modeling algorithms and wraps all of its features into an easy callable function and a command line interface and it is built on top of existing natural language and topic modeling libraries and primarily provides a wrapper around them, for a quick and easy exploratory analysis of your text data sets.
ToPS is an objected-oriented framework that is implemented using C++ that facilitates the integration of probabilistic models for sequences over a user defined alphabet it contains the implementation of eight distinct models to analyze discrete sequences: Independent and identically distributed model, Variable-Length Markov Chain (VLMC), Inhomogeneous Markov Chain, Hidden Markov Model, Pair Hidden Markov Model, Profile Hidden Markov Model, Similarity Based Sequence Weighting and Generalized Hidden Markov Model (GHMM).
yahmm is a module that implements Hidden Markov Models (HMMs) with a compositional, graph- based interface it can construct node by node and edge by edge, built up from smaller models, loaded from files, baked (into a form that can be used to calculate probabilities efficiently), trained on data, and saved.
Accord.NET Framework is a .NET machine learning framework combined with audio and image processing libraries completely written in C#, it is a framework for building production-grade computer vision, computer audition, signal processing and statistics applications even for commercial use
Aerosolve is a machine learning package built for humans its library is meant to be used with sparse, interpretable features such as those that commonly occur in search (search keywords, filters) or pricing (number of rooms, location, price). It is not as interpretable with problems with very dense non-human interpretable features such as raw pixels or audio samples.
APEX is an AI-enhanced technology platform intended to provide solutions for your business end to end. With APEX you gain access to the same powerful AI capabilities and tools used by the tech unicorns at a fraction of the cost. APEX allows you to realize the full benefits of the AI technologies, while sustaining governance, flexibility, scalability, tool compatibility, and collaboration. Through the integration of the most advanced open source and proprietary 2021.AI technological components, APEX enhances data governance, increases maintainability and quality of the AI models. APEX can be installed either on-premises, or consumed in private or public cloud. APEX offers 3 editions: Front, Go, and Enterprise, all capable of delivering immediate business value for companies of all sizes, in all the stages of AI maturity and ambitions.
Machine Learning Platform For AI provides end-to-end machine learning services, including data processing, feature engineering, model training, model prediction, and model evaluation. Machine Learning Platform For AI combines all of these services to make AI more accessible than ever.
AstroML is a Python module for machine learning and data mining that provide a community repository for fast Python implementations of common tools and routines used for statistical data analysis in astronomy and astrophysics, to provide a uniform and easy-to-use interface to freely available astronomical datasets.
bayesian-bandit.js is an adaptation of the Bayesian Bandit code from Probabilistic Programming and Bayesian Methods for Hackers, specifically d3bandits.js the code has been rewritten to be more idiomatic and also usable as a browser script or npm package and includes unit test.
BioPy is a collection of biologically-inspired algorithms written in Python that are more focused on artificial model's of biological computation, such as Hopfield Neural Networks, while others are inherently more biologically-focused, such as the basic genetic programming module included in this project.
C5.0 is a decision trees and rule-based models for pattern recognition that extracts informative patterns from data.
CORElearn is a suite of machine learning algorithms written in C++ with R interface that contains several machine learning model learning techniques in classification and regression these methods can be used for example to discretize numeric attributes. Its additional feature is OrdEval algorithm and its visualization used for evaluation of data sets with ordinal features and class, enabling analysis according to the Kano model of customer satisfaction.
Fido is a light-weight, open-source, and highly modular C++ machine learning library that targeted towards embedded electronics and robotics, it includes implementations of trainable neural networks, reinforcement learning methods, genetic algorithms, and a full-fledged robotic simulator.
Cloud AutoML is a suite of machine learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs, by leveraging Google's state-of-the-art transfer learning, and Neural Architecture Search technology
Cloud Datalab is a powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models on Google Cloud Platform. It runs on Google Compute Engine and connects to multiple cloud services easily so you can focus on your data science tasks.
H2O is a tool that makes it possible for anyone to easily apply machine learning and predictive analytics to solve today's most challenging business problems, it combine the power of highly advanced algorithms, the freedom of open source, and the capacity of truly scalable in-memory processing for big data on one or many nodes.
IBM Watson Knowledge Catalog powers intelligent, self-service discovery of data, models and more, activating them for artificial intelligence, machine learning and deep learning. Access, curate, categorize and share data, knowledge assets and their relationships, wherever they reside.
LIONoso is a comprehensive Machine Learning and Intelligent Optimization tool for non-profit research and academic use, users adopt it for orchestrating heterogeneuos components that deals with automating processes, with the arrangement, coordination, and management of complex software components connecting data, experiments, simulators, models, decisions.
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed, it focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.
MachineLearning is a package that represents the very beginnings of an attempt to consolidate common machine learning algorithms written in pure Julia and presenting a consistent API, it will be targeted towards the machine learning practitioner, working with a dataset that fits in memory on a single machine
mboost function as gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data.
MLBase.jl is a swiss knife for machine learning that does not implement specific machine learning algorithms, instead, it provides a collection of useful tools to support machine learning programs, including: Data manipulation & preprocessing, Score-based classification, Performance evaluation (e.g. evaluating ROC), Cross validation and Model tuning (i.e. search best settings of parameters).
mlpack is a scalable machine learning library, written in C++, that aims to provide fast, extensible implementations of cutting-edge machine learning algorithms, these algorithms as simple command-line programs and C++ classes which can then be integrated into larger-scale machine learning solutions.
NuPIC is an open source project based on a theory of neocortex called Hierarchical Temporal Memory (HTM) that can be used to analyze streaming data, it learns the time-based patterns in data, predicts future values, and detects anomalies and includes discussion groups on HTM theory, research on extending HTM, and source code for complete applications based on HTM.
partykit: A Toolkit for Recursive Partytioning with infrastructure for representing, summarizing, and visualizing tree-structured regression and classification models, this unified infrastructure can be used for reading/coercing tree models from different sources ('rpart', 'RWeka', 'PMML') yielding objects that share functionality for print()/plot()/predict() methods.
Pattern Recognition Toolbox for MATLAB is a tool that provides an easy to use and robust interface to dozens of pattern classification tools making cross-validation, data exploration, and classifier development rapid and simple it gives user the power to apply sophisticated data analysis techniques to the problem.
Pebl is a python library and command line application for learning the structure of a Bayesian network given prior knowledge and observations that can learn with observational and interventional data, handles missing values and hidden variables using exact and heuristic methods, provides several learning algorithms; makes creating new ones simple, has facilities for transparent parallel execution using several cluster and cloud resources, calculates edge marginals and consensus networks and presents results in a variety of formats.
pyhsmm Bayesian inference in HSMMs and HMMs is a Python library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.
Rmalschains it is a package that implements an algorithm family for continuous optimization called memetic algorithms with local search chains (MA-LS-Chains), memetic algorithms are hybridizations of genetic algorithms with local search methods suited for continuous optimization.
Saul is a modeling language implemented as a domain specific language (DSL) in Scala that facilitate designing machine learning models with arbitrary configurations for the application programmer, including, interacting with raw data and setting it in a flexible graph structure (i.e. data model) using the original available data structures, relational feature extraction by flexible querying from the data model graph and designing flexible learning models including various configurations in which learners interact.
Scribe helps sales people save ~2 hours every day and do more sales calls instead, by bringing sales to Slack. Scribe is used by people at Salesforce, Uber, General Assembly and other top companies, and we're backed by Y Combinator. This In-Slack Sales Bot brings emails to Slack, suggests smart replies that you can edit and send directly via Slack, and lets you update your CRM with the click of a button in Slack itself, so that anyone can scale their sales conversations and actually save time while doing it (approx 2 hrs a day) ⏳💪
sofia-ml is a suite of fast incremental algorithms for machine learning (sofia-ml) that can be used for training models for classification, regression, ranking, or combined regression and ranking, intended to aid researchers and practitioners who require fast methods for classification and ranking on large, sparse data sets.