TagCloud

Thursday, January 3, 2013

hadoop installation script

hadoop 1.0.4 installation ( CentOS 6.3 x64 )

1. pre installation
1) java # reqirement
2) ssh and sshd ( yum install openssl-clients )
3) create and register ssh-key (ssh-keygen -t rsa)

1. hadoop 1.0.4 download , tar
- wget http://mirror.apache-kr.org/hadoop/common/hadoop-1.0.4/hadoop-1.0.4.tar.gz
- tar -xvzf hadoop-1.0.4.tar.gz
- mv hadoop-1.0.4 /srv/hadoop

2. configuration
vi /srv/hadoop/conf/hadoop-env.sh # modify path : JAVA_HOME etc ..
vi /etc/hosts # register hadoop host name list
chown -Rf luvu.luvu hadoop/

3. standalone operation test
단독실행 
기본적으로 하둡은 비분산환경모드(싱글 자바 프로세스)로 실행되도록 설정되어있으며 디버깅에 있어서 유용함 
다음은 압축해재된 conf 폴더의 파일들을 input 폴더에 복사 후, 해당 파일의 내용중 주어진 정규표현식에 일치하는 모든 내용을 찾아서 출력해주는 예제.
cd /srv/hadoop
mkdir input # input dir
cp conf/*.xml input
bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 
cat output/*

4. Pseudo-Distributed Operation
# 1) vi conf/core-site.xml
<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://hadoop01:9000</value>
        </property>
</configuration>

# 2) vi conf/hdfs-site.xml

<configuration>
        <property>         
                <name>dfs.replication</name>         
                <value>1</value>     
        </property>
</configuration>

# 3) vi conf/mapred-site.xml
<configuration>
        <property>         
                <name>mapred.job.tracker</name>         
                <value>hadoop01:9001</value>         
        </property>
</configuration>

# 4) Setup passphraseless ssh
# Now check that you can ssh to the localhost without a passphrase:
ssh localhost
# If you cannot ssh to localhost without a passphrase, execute the following commands:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 644 ~/.ssh/authorized_keys

5) Execution
# Format a new distributed-filesystem:
bin/hadoop namenode -format
13/01/03 14:40:25 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop01/192.168.10.201
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
************************************************************/
13/01/03 14:40:25 INFO util.GSet: VM type       = 64-bit
13/01/03 14:40:25 INFO util.GSet: 2% max memory = 19.33375 MB
13/01/03 14:40:25 INFO util.GSet: capacity      = 2^21 = 2097152 entries
13/01/03 14:40:25 INFO util.GSet: recommended=2097152, actual=2097152
13/01/03 14:40:25 INFO namenode.FSNamesystem: fsOwner=luvu
13/01/03 14:40:25 INFO namenode.FSNamesystem: supergroup=supergroup
13/01/03 14:40:25 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/01/03 14:40:25 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/01/03 14:40:25 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/01/03 14:40:25 INFO namenode.NameNode: Caching file names occuring more than 10 times 
13/01/03 14:40:26 INFO common.Storage: Image file of size 110 saved in 0 seconds.
13/01/03 14:40:26 INFO common.Storage: Storage directory /tmp/hadoop-luvu/dfs/name has been successfully formatted.
13/01/03 14:40:26 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop01/192.168.10.201
************************************************************/

Start the hadoop daemons:
$ bin/start-all.sh
starting namenode, logging to /srv/hadoop/libexec/../logs/hadoop-luvu-namenode-hadoop01.out
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is e3:30:16:1a:a5:a9:09:cf:d5:6b:f2:d0:ac:ad:1b:8c.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
localhost: starting datanode, logging to /srv/hadoop/libexec/../logs/hadoop-luvu-datanode-hadoop01.out
localhost: starting secondarynamenode, logging to /srv/hadoop/libexec/../logs/hadoop-luvu-secondarynamenode-hadoop01.out
starting jobtracker, logging to /srv/hadoop/libexec/../logs/hadoop-luvu-jobtracker-hadoop01.out
localhost: starting tasktracker, logging to /srv/hadoop/libexec/../logs/hadoop-luvu-tasktracker-hadoop01.out

5. firewall
iptables -I INPUT -p tcp --dport 50030 -j ACCEPT
iptables -I INPUT -p tcp --dport 50070 -j ACCEPT
iptables -I INPUT -p tcp --dport 50060 -j ACCEPT
iptables -I INPUT -p tcp --dport 50075 -j ACCEPT
service iptables save
service iptables restart

6. check service
NameNode - http://hadoop01:50070/
JobTracker - http://hadoop01:50030/
7. Test Map Reduce
# Copy the input files into the distributed filesystem:
bin/hadoop fs -put conf input

# Run some of the examples provided:
bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

# Examine the output files:

1) Copy the output files from the distributed filesystem to the local filesytem and examine them:
bin/hadoop fs -get output output 
cat output/*
# or 
2) View the output files on the distributed filesystem:
bin/hadoop fs -cat output/*

8. stop service
$ bin/stop-all.sh


Next Step :
- Hadoop Cluster Setting

Reference :
http://hadoop.apache.org http://hadoop.apache.org/docs/r1.0.4/file_system_shell.html ( file system shell command )