Machine learning algorithms make predictions or decisions based on data. These learning algorithms can be embedded within applications to provide automated, artificial intelligence (AI) features or be used in an AI platform to build brand new applications. In both cases, a connection to a data source is necessary for the algorithm to learn and adapt over time. There are many different types of machine learning algorithms that perform a variety of tasks and functions. These algorithms may consist of more specific machine learning algorithms, such as association rule learning, Bayesian networks, clustering, decision tree learning, genetic algorithms, learning classifier systems, and support vector machines, among others.
These learned algorithms may be developed with supervised learning or unsupervised learning. Supervised learning consists of training an algorithm to determine a pattern of inference by feeding it consistent data to produce a repeated, general output. Human training is necessary for this type of learning. Unsupervised learning, on the other hand, requires no consistency in the input of machine learning algorithms. Unsupervised algorithms independently reach an output and are a feature of deep learning algorithms. Reinforcement learning is the final form of machine learning, which consists of algorithms that understand how to react based on their situation or environment. For example, autonomous driving cars are an instance of reinforcement machine learning because they react based on their surroundings on the road. If a traffic light is red, the car stops. Machine learning algorithms are used by developers when using an AI platform to build an application or to embed AI within an existing application. End users of intelligent applications may not be aware that an everyday software tool is utilizing a machine learning algorithm to provide some form of automation. Additionally, machine learning solutions for businesses may come in a machine learning as a service model.
To qualify for inclusion in the Machine Learning category, a product must:
Machine Learning reviews by real, verified users. Find unbiased ratings on user satisfaction, features, and price based on the most reviews available anywhere.
Scikit-learn is a software machine learning library for the Python programming language that has a various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
Microsoft Bing Web Search API is a service that retrieve web documents indexed by Bing and narrow down the results by result type, freshness and more, it bring intelligent search to apps and harness the ability to comb billions of webpages, images, videos, and news with a single API call.
machine learning support vector machine (SVMs), and support vector regression (SVRs) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis.
Microsoft Bing Image Search API is a service that provides a similar (but not exact) experience to Bing.com/Images (overview on MSDN), it allow partners send a search query to Bing and get back a list of relevant images.
Crab as known as scikits.recommender is a Python framework for building recommender engines that integrate with the world of scientific Python packages (numpy, scipy, matplotlib), provide a rich set of components from which user can construct a customized recommender system from a set of algorithms and be usable in various contexts: ** science and engineering ** .
IBM Watson Personality Insights is a tool that extracts and analyzes a spectrum of personality attributes to help discover actionable insights about people and entities, and in turn guides end users to highly personalized interactions.
Microsoft Cognitive Toolkit is an open-source, commercial-grade toolkit that empowers user to harness the intelligence within massive datasets through deep learning by providing uncompromised scaling, speed and accuracy with commercial-grade quality and compatibility with the programming languages and algorithms already use.
MLlib is Spark's machine learning (ML) library that make practical machine learning scalable and easy it provides ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering, feature extraction, transformation, dimensionality reduction, and selection, tools for constructing, evaluating, and tuning ML Pipelines, saving and load algorithms, models, and Pipelines and linear algebra, statistics, data handling, etc.
Our platform leverages human-in-the-loop practices to train, test, and tune machine learning models. At Figure Eight, we know that AI isn’t magic. We know what it takes to create AI that isn’t just a science project, but AI that works in the real world. And we provide the crucial ingredients that make it happen. We believe that AI is the combination of three important components: training data, machine learning, and humans-in-the-loop.
Weka is a machine learning algorithms for data mining tasks that can either be applied directly to a dataset or called from own Java code, it contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization and well-suited for developing new machine learning schemes.
Bolt is a discriminative learning of linear predictors (e.g. SVM or Logistic Regression) that uses fast online learning algorithms to aimed large-scale, high-dimensional and sparse machine-learning problems. In particular, problems encountered in information retrieval and natural language processing.
HLearn is a high performance machine learning library written in Haskell to discover the "best possible" interface for machine learning. This involves two competing demands: The library should be as fast as low-level libraries written in C/C++/Fortran/Assembly; but it should be as flexible as libraries written in high level languages like Python/R/Matlab.
kernlab is a Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction and the method support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
Pattern is a web mining module for the Python programming language that has a tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and visualization.
Sparkling Water is a tool that allows users to combine the fast, scalable machine learning algorithms of H2O with the capabilities of Spark, users can drive computation from Scala/R/Python and utilize the H2O Flow UI, providing an ideal machine learning platform for application developers.
The ML-Agents SDK allows researchers and developers to transform games and simulations created using the Unity Editor into environments where intelligent agents can be trained using Deep Reinforcement Learning, Evolutionary Strategies, or other machine learning methods through a simple to use Python API.
XGBoost is an optimized distributed gradient boosting library that is efficient, flexible and portable, it implements machine learning algorithms under the Gradient Boosting framework and provides a parallel tree boosting(also known as GBDT, GBM) that solve many data science problems in a fast and accurate way.
Accord.MachineLearning contains Support Vector Machines, Decision Trees, Naive Bayesian models, K-means, Gaussian Mixture models and general algorithms such as Ransac, Cross-validation and Grid-Search for machine-learning applications.
Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine is an open source software library for training and deploying recommendation models with sparse inputs that is fully connected hidden layers, and sparse outputs. Models with weight matrices that are too large for a single GPU can still be trained on a single host it has been used at Amazon to generate personalized product recommendations for Amazon customer It is designed for production deployment of real-world applications which need to emphasize speed and scale over experimental flexibility.
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point and it creates large read-only file-based data structures that are mmapped into memory so that many processes may share the same data.
Apache SAMOA is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms it enables development of new ML algorithms without directly dealing with the complexity of underlying distributed stream processing engines (DSPEe, such as Apache Storm, Apache Flink, and Apache Samza) users can develop distributed streaming ML algorithms once and execute them on multiple DSPEs.
Apache SystemML is a machine learning platform optimal for big data that provides an optimal workplace for machine learning using big data, it can be run on top of Apache Spark, where it automatically scales your data, line by line, determining whether your code should be run on the driver or an Apache Spark cluster.
Azure Bing Custom Search is an easy-to-use, ad-free custom search tool that lets you deliver the search results you want. Bing Custom Search allows you to select the slices of the web that you want to search over and control the ranking when searching over your targeted web space.
DecisionTree.jl is a Julia classifier with the implimentation of the ID3 algorithm with post pruning (pessimistic pruning), parallelized bagging (random forests), adaptive boosting (decision stumps), cross validation (n-fold) and support for mixed nominal and numerical data.
Disco is a lightweight, open-source framework for distributed computing based on the MapReduce paradigm it distributes and replicates data, and schedules jobs efficiently it includes the tools need to index billions of data points and query them in real-time.
Dlib Machine Learning is a tool that contains a wide range of machine learning algorithms, designed to be highly modular, quick to execute, and simple to use via a clean and modern C++ API and used in a wide range of applications including robotics, embedded devices, mobile phones, and large high performance computing environments.
Encog is an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data, its training algoritms are multi-threaded and scale well to multicore hardware and can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train machine learning algorithms.
GoLearn is a 'batteries included' machine learning library for Go that implements the scikit-learn interface of Fit/Predict, to easily swap out estimators for trial and error it includes helper functions for data, like cross validation, and train and test splitting.
htm.java is a Hierarchical Temporal Memory implementation in Java - an official Community-Driven Java port of the Numenta Platform for Intelligent Computing (NuPIC) it provide a Java version of NuPIC that has a 1-to-1 correspondence to all systems, functionality and tests provided by Numenta's open source implementation; while observing the tenets, standards and conventions of Java language best practices and development.
Infrrd's high accuracy document digitizing and automated data capturing OCR solutions improve cost efficiencies in the business environment, reducing the need for manual document sorting and manual data entry. Infrrd's OCR solutions have been providing substantial returns on the original investment from different sectors like retail, finance, vendor management systems, back office & BPOs etc. The machine learning algorithms used by the OCR, learn intuitively and scan invoices, receipts, business documents and handwritten documents with ease.
Intel Data Analytics Acceleration Library (or Intel DAAL) is a software development library that is highly optimized for Intel architecture processors it provides building blocks for all data analytics stages, from data preparation to data mining and machine learning.
Learning Based Java is a modeling language for the rapid development of software systems with one or more learned functions, designed for use with the JavaTM programming language that offers a convenient, declarative syntax for classifier and constraint definition directly in terms of the objects in the programmer's application.
Apache Mahout is a software that build an environment for quickly creating scalable performant machine learning applications, it provides three major features: A simple and extensible programming environment and framework for building scalable algorithms, A wide variety of premade algorithms for Scala + Apache Spark, H2O, Apache Flink and Samsara, a vector math experimentation environment with R-like syntax which works at scale
Microsoft Academic Knowledge API is a service that allow user to interpret queries for academic intent and retrieve rich information from the Microsoft Academic Graph (MAG), it is a knowledge base web-scale heterogeneous entity graph comprised of entities that model scholarly activities: field of study, author, institution, paper, venue, and event.
Microsoft Entity Linking Intelligence Service is a web service that help developers with tasks relating to entity linking, given a specific paragraph within a document,this service will recognize and identify each separate entity based on its context.
Microsoft Knowledge Exploration Service is a service that offers a fast and effective way to add interactive search and refinement to applications, it allows user to build a compressed index from structured data, author a grammar that interprets natural language queries, and provide interactive query formulation with auto-completion suggestions.
Microsoft Machine Learning Server is your flexible enterprise platform for analyzing data at scale, building intelligent apps, and discovering valuable insights across your business with full support for Python and R. Machine Learning Server meets the needs of all constituents of the process – from data engineers and data scientists to line-of-business programmers and IT professionals. It offers a choice of languages and features algorithmic innovation that brings the best of open-source and proprietary worlds together
Milk is a machine learning toolkit in Python that focuses on supervised classification with several classifiers available: SVMs (based on libsvm), k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be combined in many ways to form different classification systems.
Naive Bayesian Classification for Golang that perform classification into an arbitrary number of classes on sets of strings.
Pattern Recognition and Machine Learning is a Matlab implementation of the algorithms.
Push is a programming language designed for evolutionary computation, to be used as the programming language within which evolving programs are expressed it is a stack-based execution architecture in which there is a separate stack for each data type that allow programs to manipulate their own code as they run and thereby to implement arbitrary and potentially novel control structures.
Random Forest is an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of descision trees at training time and outputing the class that is the mode of classes (classification) or mean prediction (regression) or the individual trees.
Reproducible Experiment Platform (REP) is a software infrastructure to support collaborative ecosystem for computational science it is a Python based solution for research teams that allows running computational experiments on shared datasets, obtaining repeatable results, and consistent comparisons of the obtained results.
RGP is a simple modular Genetic Programming (GP) system build in pure R this system supports Symbolic Regression by GP through the familiar R model formula interface, GP individuals are represented as R expressions, an (optional) type system enables domain-specific function sets containing functions of diverse domain- and range types and is a basic set of genetic operators for variation (mutation and crossover) and selection is provided.
SHARK is a fast, modular, feature-rich open-source C++ machine learning library that provides methods for linear and nonlinear optimization, kernel-based learning algorithms, neural networks, and various other machine learning techniques and is compatible with Windows, Solaris, MacOS X, and Linux.
Steam AI engine is an end-to-end platform that streamlines the entire process of building and deploying smart applications, data scientists and developers can launch turnkey compute environments for collaboratively training and deploying predictive models and integrate those models into real-time smart applications.
tgp is a Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM) in special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP single-index models.
Libra Toolkit is a collection of algorithms for learning and inference with discrete probabilistic models, including Bayesian networks (BNs), Markov networks (MNs), dependency networks (DNs), sum-product networks (SPNs), and arithmetic circuits (ACs), it focuses more on structure learning, especially for tractable models in which exact inference is efficient. Each algorithm in Libra is implemented as a command-line program suitable for interactive use or scripting, with consistent options and file formats throughout the toolkit.
yahmm is a module that implements Hidden Markov Models (HMMs) with a compositional, graph- based interface it can construct node by node and edge by edge, built up from smaller models, loaded from files, baked (into a form that can be used to calculate probabilities efficiently), trained on data, and saved.
Accord.NET Framework is a .NET machine learning framework combined with audio and image processing libraries completely written in C#, it is a framework for building production-grade computer vision, computer audition, signal processing and statistics applications even for commercial use
Aerosolve is a machine learning package built for humans its library is meant to be used with sparse, interpretable features such as those that commonly occur in search (search keywords, filters) or pricing (number of rooms, location, price). It is not as interpretable with problems with very dense non-human interpretable features such as raw pixels or audio samples.
APEX is an AI-enhanced technology platform intended to provide solutions for your business end to end. With APEX you gain access to the same powerful AI capabilities and tools used by the tech unicorns at a fraction of the cost. APEX allows you to realize the full benefits of the AI technologies, while sustaining governance, flexibility, scalability, tool compatibility, and collaboration. Through the integration of the most advanced open source and proprietary 2021.AI technological components, APEX enhances data governance, increases maintainability and quality of the AI models. APEX can be installed either on-premises, or consumed in private or public cloud. APEX offers 3 editions: Front, Go, and Enterprise, all capable of delivering immediate business value for companies of all sizes, in all the stages of AI maturity and ambitions.
AstroML is a Python module for machine learning and data mining that provide a community repository for fast Python implementations of common tools and routines used for statistical data analysis in astronomy and astrophysics, to provide a uniform and easy-to-use interface to freely available astronomical datasets.
bayesian-bandit.js is an adaptation of the Bayesian Bandit code from Probabilistic Programming and Bayesian Methods for Hackers, specifically d3bandits.js the code has been rewritten to be more idiomatic and also usable as a browser script or npm package and includes unit test.
BioPy is a collection of biologically-inspired algorithms written in Python that are more focused on artificial model's of biological computation, such as Hopfield Neural Networks, while others are inherently more biologically-focused, such as the basic genetic programming module included in this project.
C5.0 is a decision trees and rule-based models for pattern recognition that extracts informative patterns from data.
Clusterone is a cloud agnostic, deep learning platform that enables teams to bridge the AI Gap through scalable model training, running distributed computing or many concurrent experiments, flexible infra with Zero DevOps, and lowest computing cost
CORElearn is a suite of machine learning algorithms written in C++ with R interface that contains several machine learning model learning techniques in classification and regression these methods can be used for example to discretize numeric attributes. Its additional feature is OrdEval algorithm and its visualization used for evaluation of data sets with ordinal features and class, enabling analysis according to the Kano model of customer satisfaction.
The DataRobot automated machine learning platform captures the knowledge, experience and best practices of the world’s leading data scientists to deliver unmatched levels of automation and ease-of-use for machine learning initiatives. DataRobot enables users of all skill levels – from business people to analysts to data scientists – to build and deploy highly-accurate machine learning models in a fraction of the time of traditional modeling methods.
While most of the methods to treat data relies on old expensive software and valuable human resources, the AI will transform your big data into powerful predictive inferences to be deployed in real time, 24/7 at a competitive price.
Dynamic Predictive Audiences is a customer segmentation software from simMachines. This machine learning platform is capable of making actionable predictions and recommendations with the justification behind each recommendation.
Fido is a light-weight, open-source, and highly modular C++ machine learning library that targeted towards embedded electronics and robotics, it includes implementations of trainable neural networks, reinforcement learning methods, genetic algorithms, and a full-fledged robotic simulator.
gago is a genetic algorithm library written in Go that is architectured in a modular way, allows using different evolutionary models, includes speciation, migration and parallel populations, allows implementing custom genetic operators, has no external dependencies, got a high test coverage and actively maintained and will remain one my priorities for a very long time.
H2O is a tool that makes it possible for anyone to easily apply machine learning and predictive analytics to solve today's most challenging business problems, it combine the power of highly advanced algorithms, the freedom of open source, and the capacity of truly scalable in-memory processing for big data on one or many nodes.
IBM Watson Tradeoff Analytics is a service that helps people make decisions when balancing multiple objectives, it uses a mathematical filtering technique called Pareto Optimization, which enables users to explore tradeoffs when considering multiple criteria for a single decision.
LIONoso is a comprehensive Machine Learning and Intelligent Optimization tool for non-profit research and academic use, users adopt it for orchestrating heterogeneuos components that deals with automating processes, with the arrangement, coordination, and management of complex software components connecting data, experiments, simulators, models, decisions.
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed, it focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.
MachineLearning is a package that represents the very beginnings of an attempt to consolidate common machine learning algorithms written in pure Julia and presenting a consistent API, it will be targeted towards the machine learning practitioner, working with a dataset that fits in memory on a single machine
mboost function as gradient descent algorithm (boosting) for optimizing general risk functions utilizing component-wise (penalised) least squares estimates or regression trees as base-learners for fitting generalized linear, additive and interaction models to potentially high-dimensional data.
MLBase.jl is a swiss knife for machine learning that does not implement specific machine learning algorithms, instead, it provides a collection of useful tools to support machine learning programs, including: Data manipulation & preprocessing, Score-based classification, Performance evaluation (e.g. evaluating ROC), Cross validation and Model tuning (i.e. search best settings of parameters).
mlpack is a scalable machine learning library, written in C++, that aims to provide fast, extensible implementations of cutting-edge machine learning algorithms, these algorithms as simple command-line programs and C++ classes which can then be integrated into larger-scale machine learning solutions.
NuPIC is an open source project based on a theory of neocortex called Hierarchical Temporal Memory (HTM) that can be used to analyze streaming data, it learns the time-based patterns in data, predicts future values, and detects anomalies and includes discussion groups on HTM theory, research on extending HTM, and source code for complete applications based on HTM.
partykit: A Toolkit for Recursive Partytioning with infrastructure for representing, summarizing, and visualizing tree-structured regression and classification models, this unified infrastructure can be used for reading/coercing tree models from different sources ('rpart', 'RWeka', 'PMML') yielding objects that share functionality for print()/plot()/predict() methods.
Pattern Recognition Toolbox for MATLAB is a tool that provides an easy to use and robust interface to dozens of pattern classification tools making cross-validation, data exploration, and classifier development rapid and simple it gives user the power to apply sophisticated data analysis techniques to the problem.
Pebl is a python library and command line application for learning the structure of a Bayesian network given prior knowledge and observations that can learn with observational and interventional data, handles missing values and hidden variables using exact and heuristic methods, provides several learning algorithms; makes creating new ones simple, has facilities for transparent parallel execution using several cluster and cloud resources, calculates edge marginals and consensus networks and presents results in a variety of formats.
pyhsmm Bayesian inference in HSMMs and HMMs is a Python library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.
Rmalschains it is a package that implements an algorithm family for continuous optimization called memetic algorithms with local search chains (MA-LS-Chains), memetic algorithms are hybridizations of genetic algorithms with local search methods suited for continuous optimization.
SAS FACTORY MINER is a softwarethat automatically build and retrain hundreds of predictive models across multiple segments and pick the best model for each segment to reveal new opportunities, expose hidden risks, and fuel smarter, well-timed decisions.
Saul is a modeling language implemented as a domain specific language (DSL) in Scala that facilitate designing machine learning models with arbitrary configurations for the application programmer, including, interacting with raw data and setting it in a flexible graph structure (i.e. data model) using the original available data structures, relational feature extraction by flexible querying from the data model graph and designing flexible learning models including various configurations in which learners interact.
Scribe is an AI-Powered Sales Assistant for inside sales teams. Scribe engages your lead list, identifies warm leads to reach out to, handles objections and books demos for you. All you have to do is talk to Scribe in Slack, just like with any other team member! You can hire Scribe in 10 minutes, train her based on your best SDR's conversations, scale her up as you need and never have to worry about her quitting.
sofia-ml is a suite of fast incremental algorithms for machine learning (sofia-ml) that can be used for training models for classification, regression, ranking, or combined regression and ranking, intended to aid researchers and practitioners who require fast methods for classification and ranking on large, sparse data sets.
topik is a topic modeling toolbox that provide a full suite and high-level interface for applying topic modeling, it includes many utilities beyond statistical modeling algorithms and wraps all of its features into an easy callable function and a command line interface and it is built on top of existing natural language and topic modeling libraries and primarily provides a wrapper around them, for a quick and easy exploratory analysis of your text data sets.
ToPS is an objected-oriented framework that is implemented using C++ that facilitates the integration of probabilistic models for sequences over a user defined alphabet it contains the implementation of eight distinct models to analyze discrete sequences: Independent and identically distributed model, Variable-Length Markov Chain (VLMC), Inhomogeneous Markov Chain, Hidden Markov Model, Pair Hidden Markov Model, Profile Hidden Markov Model, Similarity Based Sequence Weighting and Generalized Hidden Markov Model (GHMM).