Title: Execution of an Advanced Data Analytics by Integrating Spark with MongoDB

Year of Publication: Sep - 2016
Page Numbers: 39-48
Authors: Vijayalakshmi Chakravarthy
Conference Name: The International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA2016)
- Malaysia


Spark has several advantages compared to other big data and MapReduce technologies like Hadoop and Storm. Spark provides a comprehensive, unified framework to manage big data processing requirements with a variety of data sets that are diverse in nature (text data, graph data etc.) as well as the source of data (batch vs. real-time streaming data). Spark SQL is an easy-to-use and power API provided by Apache Spark. Spark SQL makes it much easier reading and writing data to do analysis. MongoDB Connector for Apache Spark is a powerful integration that enables developers and data scientists to create new insights and drive real-time action on live, operational, and streaming data. This paper demonstrates some experimentation on the MongoDB Connector for Apache Spark that how Spark SQL library can be used to store, retrieve and execute the structured/semi-structured datasets such as BSON against the Non-Relational database MongoDB, an open-source and leading NoSQL database.