1
2
Apache spark, a cluster computing framework, is widely used for solving big data problems in distributed environment. Unfortunately, this framework efficiency was not analyzed completely based on different number of nodes and for processing different large-scale computational geometry operations such as Geometry Union, Convex Hull, Closest and Farthest pair and Spatial Range, Join and Aggregation on both small, medium and huge volumes of spatial dataset. In this paper we leverage these operations using the inherent functions provided by Apache Spark framework such as Map and Map Partitions and analyze its performance and efficiency in different cases of single & multiple nodes.
Apache Spark, Hadoop Distributed File System (HDFS), Computational Geometry, Geometric Algorithms, Distributed Database