Apache Spark is a distributed open-source, general-purpose framework for clustered computing. It is designed with computational speed in mind, from machine learning to stream processing to complex SQL queries. It can easily process and distribute work on large datasets across multiple computers.
High availability is the description of a system designed to be fault-tolerant, highly dependable, operates continuously without intervention, or having a single point of failure. These systems are highly sought after to increase the availability and uptime required to keep an infrastructure running without issue. The following characteristics define a High Availability system.