Replication

Posted at  10:05  |  in  HDFS

Replication means Duplication, Hadoop is famous for its storage technique. Generally once when we load the file on top of HDFS it divides the file into blocks of equal size. By default this block size is 64mb. Hadoop maintains 3 duplicates of each block, that means if we want to store file of size 1TB on HDFS, we need a hardware to store 3TB. Each block will be stored on three different data nodes.

Suppose at the time of processing the data if any data node dies the name node uses any one of the remaining two replicated blocks and makes the processing continuous without any breaks. Let us take an example: suppose if our file is divided into 5 blocks, each block will be stored on three different data nodes and all the metadata information will be saved in Name Node as shown in following figure.


Now after we have started processing, if Data Node 2 goes down (because of any hardware failure, or some other) we will loss three blocks (B1, B5, B2) then immediately Name Node checks for replicated blocks and continues processing.                       

Share this post

About-Privacy Policy-Contact us
Copyright © 2013 Hadoop Tutor. Blogger Template by Bloggertheme9
Proudly Powered by Blogger.
back to top