This packages provides simple Distributed Data manipulation and processing routines for Julia.
The design of the package and data manipulation approach is deliberately "imperative" and "hands-on", to allow as much user influence on the actual way the data are moved and stored in the cluster as possible. It is based on the
Distributed package and its infrastructure of remote workers. The basic
Distributed package functions
fetch are then wrapper (very lightly) to create a simple yet powerful data manipulation interface.
There are also various extra functions to easily run distributed data transformations, MapReduce-style algorithms, store and load the data remotely on worker-local storage (e.g. to prevent memory exhaustion) and others.
To start quickly, you can read the tutorial:
- DistributedData tutorial
- DistributedData.jl on Slurm clusters
A full reference to all functions is given here: