Jayati Tiwari: Spark Overview

Friday, September 4, 2015

Spark Overview

Spark is a cluster computing framework i.e. a framework which uses multiple workstations, multiple storage devices, and redundant interconnections, to form an abstract single highly available system.

Spark has been imparted the following features:

- open-source under BSD licence
- in-memory processing
- multi-language APIs in Scala, Java and Python
- rich array of parallel operators
- runnable on Apache Mesos, YARN, Amazon EC2 or in standalone mode
- best suitable for highly iterative jobs
- efficient for interactive data mining jobs

Need of Spark

Spark is a result of the fleeting ongoing developments in the Big-Data world. "Big-Data", a term that can be used to describe data which has got 3 V's to it, Volume, Variety and Velocity. To store and process Big-Data, specialized frameworks were in demand. Hadoop is a predominantly established software framework, composed of a file-system called Hadoop Distributed File System(HDFS) and Map-Reduce. Spark is a strong contender of Map-Reduce. Due to its in-memory processing capability, Spark offers lightening fast results as compared to Map-Reduce.

When to use Spark?

Apart from using Spark for common data processing applications, it should be used specifically for applications where in-memory operations are a major percentage to the processing. The following are the type of applications in which Spark specializes:

1. Iterative algorithms
2. Interactive data mining

Who is using Spark?

UC Berkeley AMPLab is the developer of Spark. Apart from Berkeley, which runs large-scale applications such as spam filtering and traffic prediction, 14 other companies including Conviva, Quantifind have contributed to Spark.

2 comments:

Chandra Sekhar ReddySeptember 4, 2019 at 12:15 AM
Good Aricle
Yaaron Studios is one of the rapidly growing editing studios in Hyderabad. We are the best Video Editing services in Hyderabad. We provides best graphic works like logo reveals, corporate presentation Etc. And also we gives the best Outdoor/Indoor shoots and Ad Making services.
Best video editing services in Hyderabad,ameerpet
Best Graphic Designing services in Hyderabad,ameerpet
Best Ad Making services in Hyderabad,ameerpet
ReplyDelete
Replies
seodigiperformNovember 26, 2024 at 12:55 AM
This blog provides a comprehensive overview of Apache Spark, highlighting its importance for big data processing and analysis. Spark's in-memory computing and ability to handle vast datasets make it a game-changer in the field of data analytics. For those interested in leveraging Spark's potential for advanced digital strategies, understanding data handling is crucial. This aligns perfectly with mastering digital marketing, as data analytics is key to optimizing campaigns. If you want to explore how data-driven insights can enhance digital strategies, check out this Advanced digital marketing course in delhi, which covers analytics in depth.
ReplyDelete
Replies

Add comment