Hadoop File Management System

Swaraj Pritam Padhy; Sashi Bhusan Maharana

Year: 2016
Volume: 6
Issue: 5

Hadoop File Management System

Author:
Swaraj Pritam Padhy¹, Sashi Bhusan Maharana²
Total Page Count: 6
Page Number: 281 to 286

¹Student, Department of Computer Science & Engineering, Centurion University of Technology & Management, Odisha, Parlakhemundi, Gajapati, Odisha, India

²Faculty, Department of Computer Science & Engineering, Centurion University of Technology & Management, Odisha, Parlakhemundi, Gajapati, Odisha, India

Online published on 24 October, 2017.

Abstract

Hadoop is a Java software framework that supports data-intensive distributed applications and is developed under open source license. It enables applications to work with thousands of nodes and peta bytes of data. The two major pieces of Hadoop are HDFS: Hadoop own file system. This is designed to scale to petabytes of storage and runs on top of the file systems of the underlying operating systems.

The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!. HDFS stores file system metadata and application data separately.

For improved durability, redundant copies of the checkpoint and journal can be made at other servers. During restarts the Name Node restores the namespace by reading the namespace and replaying the journal.

Keywords

HDFS, Hadoop, Cluster, Hadoop file management system

Hadoop File Management System

Abstract

Keywords

Products

Company

Account

Support