Browse tools in the Data Pipelines space.
329 tools
ADAM is a specialized genomics data processing tool built on top of Apache Spark. It provides efficient storage and quer...
Aerosolve is a machine learning library designed to be human-friendly and easy to use. It provides a flexible framework ...
Provides shared MongoDB schemas and a connection service specifically designed for Multiplayer AI agents, facilitating d...
This package provides shared MongoDB schemas and a connection setup specifically tailored for Multiplayer AI agent servi...
Airbyte is an open-source ELT data integration platform with 350+ pre-built connectors. It enables teams to sync data fr...
Apache Airflow is a platform to programmatically author, schedule, and monitor workflows. It allows for the creation of ...
AIToolbox is a comprehensive toolbox of AI modules written in Swift, offering functionalities like Graphs/Trees, Linear ...
Aligned describes data dependencies in ML systems to reduce technical debt, similar to how DBT works for data warehouses...
Ambrosia is a tool designed to help clean and refine large language model datasets by leveraging the power of other LLMs...
Amundsen is a data discovery and metadata engine that helps improve the productivity of teams interacting with data by p...
ANEE is an Adaptive Neural Execution Engine that optimizes transformers through per-token sparse inference, dynamic laye...
Apache Atlas provides open-source capabilities for managing and governing metadata to build comprehensive data catalogs....
Apache Flink is an open-source framework for distributed stream and batch data processing. It provides high throughput, ...
Apache SINGA is an open-source machine learning library designed to support the development of scalable deep learning mo...
Spark is a powerful open-source framework designed to handle big data with speed and efficiency. It supports various dat...
Arrikto provides a simple and ultra-fast storage solution designed specifically for the hybrid Kubernetes world, enablin...
Autograd automatically differentiates native Torch code, making it easier to implement gradient-based optimization metho...
avsc is a comprehensive library that allows developers to interact with Apache Avro schemas, data files, and RPC service...
Azkaban is a batch workflow job scheduler created at LinkedIn to manage and schedule Hadoop jobs. It provides an easy-to...
Bee-Queue is a high-performance Redis-backed job queue designed to efficiently manage and distribute tasks in distribute...
Bender is a high-performance deep learning framework that leverages the power of Apple's Metal API for efficient neural ...
BIDMach is a CPU and GPU-accelerated machine learning library designed to provide fast and efficient computation. It sup...
BIDMat is a high-performance matrix library designed to support large-scale exploratory data analysis, offering both CPU...
Big Data For Chimps is an open-source tool that simplifies the setup and management of big data infrastructure, making i...
Biopython is a set of freely available tools for biological computation. It provides Python libraries and programs to ha...
Brain.js is a neural network library that allows developers to create and train neural networks directly in the browser ...
Brain.js is a powerful yet simple to use neural network library that allows developers to create, train, and deploy mach...
BrainCore is an iOS and macOS neural network framework designed to facilitate the development of deep learning applicati...
Brainstorm is a successor to PyBrain, offering fast, flexible, and enjoyable development of neural network models. It's ...
Breze is a Theano-based Python library designed to facilitate the creation of deep learning models, particularly focusin...
Showing 30 of 329