Apache Spark

Resilient distributed dataset (RDD) is the architectural core of Apache Spark. It is a read-only multiset of data items distributed over a group of machines that are maintained in a fault-tolerant way.