Monday, April 29, 2024
 Popular · Latest · Hot · Upcoming
2
rated 0 times [  2] [ 0]  / answers: 1 / hits: 7876  / 2 Years ago, sun, june 26, 2022, 10:47:02

I'm running into a fairly common problem in configuring a hadoop cluster (actually, its using Cloudera's pseudo distributed cluster on a single machine), where the number of files that hadoop is opening is exceeding the file system limits. Cloudera recommends adding the following lines to /etc/security/limits.conf:



hdfs hard nofile 16384 # hdfs is my 'hadoop' user equivalent


and, since I'm running kernel 2.6.32, also editing /etc/sysctl.conf:



fs.epoll.max_user_instances = 4096


After making these changes and restarting my server, I am still getting the same error! It still appears that hdfs's open file limits have not been increased beyond the 1024 default:



[bash]$ sudo lsof | awk '{print $3}' | sort | uniq -c | sort -nr
2145 root
1495 hdfs
610 mapred
359 www-data
305 rdm
116 mysql
83 rabbitmq
32 messagebus
30 snmp
25 ntp
23 syslog
16 daemon
1 USER


As I've done more research its becoming clear that increasing the file size limits is highly system dependent (even within Ubuntu; here, here, and here), so I wanted to see what the Ubuntu method is. Does anyone know how to increase these limits in Ubuntu 10.04?



I definitely prefer solutions that do not increase the limits for all users, but at this point I would be willing to try anything. Thanks for your help!


More From » 10.04

 Answers
1

In order to set these limits, I did a combination of things found here and here. Since I want to restrict these file limits to the hdfs and mapred users, I added each of these users to the hadoop group on my system and edited /etc/security/limits.conf to include the line:



@hadoop hard nofile 16384


which allows both users to open as many as 16384 files at once, which is apparently important in pseudo-distributed mode. I also had to add the following line to /etc/pam.d/common-session:



session required pam_limits.so


which makes these file limits persist across daemon processes like hdfs and mapred. After restarting the server, everything appears to be working perfectly as hdfs currently has more than the default number (1024) files open:



[dsftar01 ~]$ sudo lsof | awk '{if(NR>1) print $3}' | sort | uniq -c | sort -nr
1972 root
1530 hdfs
608 mapred
360 www-data
166 rdm
97 mysql
83 rabbitmq
41 nobody
35 syslog
31 messagebus
30 snmp
25 ntp
16 daemon

[#36933] Monday, June 27, 2022, 2 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
whoppinolo

Total Points: 93
Total Questions: 113
Total Answers: 107

Location: Cyprus
Member since Mon, Oct 24, 2022
2 Years ago
whoppinolo questions
Tue, Feb 14, 23, 08:30, 1 Year ago
Wed, Sep 28, 22, 06:56, 2 Years ago
Fri, May 27, 22, 01:55, 2 Years ago
Tue, Oct 12, 21, 09:33, 3 Years ago
;