Future Technology Recent Articles

Apache Hadoop for large-scale data processing using machine learning techniques

Paper Title: Apache Hadoop for large-scale data processing using machine learning techniques

Authors: Nidaa Ghalib Ali, Mohanaed Ajmi Falih, Ali Ajmi Falih

Corresponding Author: Nidaa Ghalib Ali (inb.nedaa10@atu.edu.iq)/ Iraq

 

Abstract

As big data volumes increase and data variety becomes greater, there is a need for more advanced technology.  The paper discusses Volume, Variety, and Velocity, which are known as the 3Vs of Big Data, along with Valence and Veracity. As organizations battle with these complexities, Apache Spark perhaps emerges as a technology that can overcome the limitations of Hadoop MapReduce to enable real-time analytics. The focus of this paper is on Big Data. The study evaluates the effectiveness of the K-Nearest Neighbors (KNN) algorithm on structured data. Decision Tree regression is evaluated on unstructured data, and logistic regression on semi-structured data in this study. The algorithms performed well on structured data; however, all the models failed to predict unstructured data. Moreover, an examination of the framework’s performance proves the computational efficiency of Apache Hadoop and Apache Spark.  Furthermore, in terms of processing speed across all data types and algorithms, Spark outperformed Hadoop. As a result, it requires advanced analytical tools. Apache Spark is a modern, high-performance data processing framework that enables organizations to manage Big Data in real time.
 
 

Keywords

Big Data, Hadoop, Spark, Machine learning

 

Cite:

Ghalib Ali , N. ., Ajmi Falih, M. ., & Ajmi Falih, A. . (2026). Apache Hadoop for large-scale data processing using machine learning techniques. Future Technology5(3), 128–138. Retrieved from https://fupubco.com/futech/article/view/762
 

Related posts

Optimized PID control for automated blood pressure management in post-operative care

admin

Influence of public transportation on urban mobility in Celaya: a GIS case study

admin

Integrated scheduling of jobs, tools, and AGVs in FMS with non-identical machines using a recurrent neural network

admin

Leave a Comment