International Journal of Engineering and Management Research (IJEMR)
  • Year: 2016
  • Volume: 6
  • Issue: 5

Hadoop File Management System

  • Author:
  • Swaraj Pritam Padhy1, Sashi Bhusan Maharana2
  • Total Page Count: 6
  • Page Number: 281 to 286

1Student, Department of Computer Science & Engineering, Centurion University of Technology & Management, Odisha, Parlakhemundi, Gajapati, Odisha, India

2Faculty, Department of Computer Science & Engineering, Centurion University of Technology & Management, Odisha, Parlakhemundi, Gajapati, Odisha, India

Online published on 24 October, 2017.

Abstract

Hadoop is a Java software framework that supports data-intensive distributed applications and is developed under open source license. It enables applications to work with thousands of nodes and peta bytes of data. The two major pieces of Hadoop are HDFS: Hadoop own file system. This is designed to scale to petabytes of storage and runs on top of the file systems of the underlying operating systems.

The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!. HDFS stores file system metadata and application data separately.

For improved durability, redundant copies of the checkpoint and journal can be made at other servers. During restarts the Name Node restores the namespace by reading the namespace and replaying the journal.

Keywords

HDFS, Hadoop, Cluster, Hadoop file management system