Any one know how many bytes occupy per file in namenode of Hdfs? I want to estimate how many files can store in single namenode of 32G memory.
Each file or directory or block occupies about 150 bytes in the namenode memory. [1] So a cluster with a namenode with 32G RAM can support a maximum of (assuming namenode is the bottleneck) about 38 million files. (Each file will also take up a block, so each file takes 300 bytes in effect. I am also assuming 3x replication. So each file takes up 900 bytes)
In practice however, the number will be much lesser because all of the 32G will not be available to the namenode for keeping the mapping. You can increase it by allocating more heap space to the namenode in that machine.
Replication also effects this to a lesser degree. Each additional replica adds about 16 bytes to the memory requirement. [2]
[1] https://blog.cloudera.com/small-files-big-foils-addressing-the-associated-metadata-and-application-challenges/
[2] http://search-hadoop.com/c/HDFS:/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java%7C%7CBlockInfo
1.4m articles
1.4m replys
5 comments
57.0k users