博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
hadoop debug script
阅读量:5080 次
发布时间:2019-06-12

本文共 4208 字,大约阅读时间需要 14 分钟。

A Hadoop job may consist of many map tasks and reduce tasks. Therefore, debugging a

Hadoop job is often a complicated process. It is a good practice to first test a Hadoop job
using unit tests by running it with a subset of the data.
However, sometimes it is necessary to debug a Hadoop job in a distributed mode. To support
such cases, Hadoop provides a mechanism called debug scripts. This recipe explains how to
use debug scripts.

A debug script is a shell script, and Hadoop executes the script whenever a task encounters

an error. The script will have access to the $script, $stdout, $stderr, $syslog, and
$jobconfproperties, as environment variables populated by Hadoop. You can find a
sample script from resources/chapter3/debugscript. We can use the debug scripts
to copy all the logfiles to a single location, e-mail them to a single e-mail account, or perform
some analysis.
LOG_FILE=HADOOP_HOME/error.log
echo "Run the script" >> $LOG_FILE
echo $script >> $LOG_FILE
echo $stdout>> $LOG_FILE
echo $stderr>> $LOG_FILE
echo $syslog >> $LOG_FILE
echo $jobconf>> $LOG_FILE

when you execute this, you should pay attention to the execute path, or else it will not found debug script.

package chapter3;import java.net.URI;import org.apache.hadoop.filecache.DistributedCache;import org.apache.hadoop.fs.FileStatus;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.JobConf;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordcountWithDebugScript {    private static final String scriptFileLocation = "resources/chapter3/debugscript";    private static final String HDFS_ROOT = "/debug";    public static void setupFailedTaskScript(JobConf conf) throws Exception {        // create a directory on HDFS where we'll upload the fail scripts        FileSystem fs = FileSystem.get(conf);        // Path debugDir = new Path("/debug");        Path debugDir = new Path(HDFS_ROOT);        // who knows what's already in this directory; let's just clear it.        if (fs.exists(debugDir)) {            fs.delete(debugDir, true);        }        // ...and then make sure it exists again        fs.mkdirs(debugDir);        // upload the local scripts into HDFS        fs.copyFromLocalFile(new Path(scriptFileLocation), new Path(HDFS_ROOT                + "/fail-script"));        FileStatus[] list = fs.listStatus(new Path(HDFS_ROOT));        if (list == null || list.length == 0) {            System.out.println("No File found");        } else {            for (FileStatus f : list) {                System.out.println("File found " + f.getPath());            }        }        conf.setMapDebugScript("./fail-script");        conf.setReduceDebugScript("./fail-script");        // this create a simlink from the job directory to cache directory of        // the mapper node        DistributedCache.createSymlink(conf);        URI fsUri = fs.getUri();        String mapUriStr = fsUri.toString() + HDFS_ROOT                + "/fail-script#fail-script";        System.out.println("added " + mapUriStr + "to distributed cache 1");        URI mapUri = new URI(mapUriStr);        // Following copy the map uri to the cache directory of the job node        DistributedCache.addCacheFile(mapUri, conf);    }    public static void main(String[] args) throws Exception {        JobConf conf = new JobConf();        setupFailedTaskScript(conf);        Job job = new Job(conf, "word count");        job.setJarByClass(FaultyWordCount.class);        job.setMapperClass(FaultyWordCount.TokenizerMapper.class);        job.setReducerClass(FaultyWordCount.IntSumReducer.class);        job.setOutputKeyClass(Text.class);        job.setOutputValueClass(IntWritable.class);        FileSystem.get(conf).delete(new Path(args[1]), true);        FileInputFormat.addInputPath(job, new Path(args[0]));        FileOutputFormat.setOutputPath(job, new Path(args[1]));        job.waitForCompletion(true);    }}

digest from mapreduce cookbook

转载于:https://www.cnblogs.com/huaxiaoyao/p/4413488.html

你可能感兴趣的文章
这个看起来有点简单!--------实验吧
查看>>
PHP count down
查看>>
JVM参数调优:Eclipse启动实践
查看>>
(旧笔记搬家)struts.xml中单独页面跳转的配置
查看>>
不定期周末福利:数据结构与算法学习书单
查看>>
strlen函数
查看>>
python的列表与shell的数组
查看>>
关于TFS2010使用常见问题
查看>>
软件工程团队作业3
查看>>
python标准库——queue模块 的queue类(单向队列)
查看>>
火狐、谷歌、IE关于document.body.scrollTop和document.documentElement.scrollTop 以及值为0的问题...
查看>>
深入理解JVM读书笔记--字节码执行引擎
查看>>
vue-搜索功能-实时监听搜索框的输入,N毫秒请求一次数据
查看>>
批处理 windows 服务的安装与卸载
查看>>
React文档翻译 (快速入门)
查看>>
nodejs fs路径
查看>>
动态规划算法之最大子段和
查看>>
linux c:关联变量的双for循环
查看>>
深入浅出理解zend framework(三)
查看>>
python语句----->if语句,while语句,for循环
查看>>