TagCloud

Thursday, January 3, 2013

hadoop installation script (cluster setup)

hadoop 1.0.4 cluster setup ( CentOS 6.3 x64 )

1. core-site.xml
vi /srv/hadoop/conf/core-site.xml

<!-- Put site-specific property overrides in this file. -->
<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://hadoop01:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/tmp/hadoop-${user.name}</value>
    </property>
</configuration>


2. hdfs-site.xml
vi /srv/hadoop/conf/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.name.dir</name>
        <value>/data/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/data/data1/hdfs,/data/data2/hdfs</value> <!-- 일반적으로 각각 다른 Device 를 지정해주는게 좋음-->
    </property>
    <property>
        <name>dfs.replication</name>
        <value>3</value> <!-- 복제를 위한 기본값3 노드가 하나일경우1, 두개일경우2 -->
    </property>
</configuration>


3. mapred-site.xml
vi /srv/hadoop/conf/mapred-site.xml

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>hadoop01:9001</value>
    </property>
</configuration>


4. create data directory (for each server)
mkdir -p /data/name
mkdir -p /data/data1
mkdir -p /data/data2
chown -R luvu.luvu /data/name
chown -R luvu.luvu /data/data1
chown -R luvu.luvu /data/data2

5. add master host
vi /srv/hadoop-1.0.2/conf/masters # master host


6. add slave host
vi /srv/hadoop-1.0.2/conf/slaves # slave host list


7. copy to all slaves
scp /srv/hadoop/conf vm02:/srv/hadoop-1.0.2/


8. create data node directory ( for slaves )
/srv/hadoop/bin/slaves.sh mkdir -p /data/data1/hdfs
/srv/hadoop/bin/slaves.sh mkdir -p /data/data2/hdfs
주) 데이타노드 로그에서 권한관련 경고가 나올경우, hdfs 경로의 권한을 맞춰줌(Default로 755를 요구함)

9. format namenode
/srv/hadoop/bin/hadoop namenode -format
10. firewall
iptables -I INPUT -p tcp --dport 9000 -j ACCEPT # for namenode
iptables -I INPUT -p tcp --dport 9001 -j ACCEPT # for jobTracker
iptables -I INPUT -p tcp --dport 50060 -j ACCEPT # for mapreduce web
iptables -I INPUT -p tcp --dport 50060 -j ACCEPT # for task tracker web
iptables -I INPUT -p tcp --dport 50070 -j ACCEPT # for hdfs web
iptables -I INPUT -p tcp --dport 50075 -j ACCEPT # for hdfs web
iptables -I INPUT -p tcp --dport 50010 -j ACCEPT # for hdfs save from namenode
service iptables save
service iptables restart
11. start hdfs
$ /srv/hadoop/bin/start-dfs.sh
starting namenode, logging to /srv/hadoop/libexec/../logs/hadoop-luvu-namenode-hadoop01.out
hadoop02: starting datanode, logging to /srv/hadoop/libexec/../logs/hadoop-luvu-datanode-hadoop02.out
hadoop04: starting datanode, logging to /srv/hadoop/libexec/../logs/hadoop-luvu-datanode-hadoop04.out
hadoop03: starting datanode, logging to /srv/hadoop/libexec/../logs/hadoop-luvu-datanode-hadoop03.out
hadoop05: starting datanode, logging to /srv/hadoop/libexec/../logs/hadoop-luvu-datanode-hadoop05.out
hadoop01: starting secondarynamenode, logging to /srv/hadoop/libexec/../logs/hadoop-luvu-secondarynamenode-hadoop01.out
 
12. check hdfs status
- http://hadoop01:50070


13. start mapreduce
/srv/hadoop/bin/start-mapred.sh
14 check mapreduce status
- http://hadoop01.50030


참고 : 데이타노드에서 Server Not Available Yet ... 발생시
- http://wiki.apache.org/hadoop/ServerNotAvailable


Reference :
http://hadoop.apache.org
http://hadoop.apache.org/docs/stable/cluster_setup.html
http://hadoop.apache.org/docs/r1.0.4/file_system_shell.html ( file system shell command )