Hadoop之MapReduce02【自定义wordcount案例】

原创

波波烤鸭2022-07-01 10:12:05博主文章分类：大数据专栏 ©著作权

文章标签 Hadoop MapReduce wordcount hadoop mapreduce 文章分类 Hadoop 大数据

©著作权归作者所有：来自51CTO博客作者波波烤鸭的原创作品，请联系作者获取转载授权，否则将追究法律责任

创建MapperTask

创建一个java类继承Mapper父类

接口形参说明

参数

说明

默认是一行一行读取的偏移量的类型

默认读取的一行的类型

用户处理完成后返回的数据的key的类型

用户处理完成后返回的value的类型

注意数据经过网络传输，所以需要序列化

数据类型

序列化类型

Integer

IntWritable

Long

LongWritable

Double

DoubleWritable

Float

FloatWritable

String

Text

null

NullWritable

Boolean

BooleanWritable

…


 
 
 
 
 
 
 
 
 
 
public class MyMapperTask extends Mapper<LongWritable, Text, Text, IntWritable> {
  
   
   
   
   
   
  
  protected void map(LongWritable key, Text value, Context context)
      throws IOException, InterruptedException {
    String line = value.toString();
    
    String[] words = line.split(" ");
    for (String word : words) {
      
      context.write(new Text(word), new IntWritable(1));
    }
  }
}

创建ReduceTask

创建java类继承自Reducer父类。

参数

说明

KEYIN

对应的是map阶段的 KEYOUT

VALUEIN

对应的是map阶段的 VALUEOUT

KEYOUT

reduce逻辑处理的输出Key类型

VALUEOUT

reduce逻辑处理的输出Value类型


 
 
 
 
 
 
 
public class MyReducerTask extends Reducer<Text, IntWritable, Text, IntWritable>{

  
   
   
   
   
  
  protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {
    int count = 0 ;
    
    for (IntWritable value : values) {
      count += value.get();
    }
    context.write(key, new IntWritable(count));
  }
}

创建启动工具类

package com.bobo.mr.wc;import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WcTest {  public static void main(String[] args) throws Exception {        Configuration conf = new Configuration(true);            Job job = Job.getInstance(conf);        job.setJarByClass(WcTest.class);            job.setMapperClass(MyMapperTask.class);    job.setReducerClass(MyReducerTask.class);            job.setMapOutputKeyClass(Text.class);    job.setMapOutputValueClass(IntWritable.class);            FileInputFormat.setInputPaths(job, new Path(args[0]));    FileOutputFormat.setOutputPath(job, new Path(args[1]));        job.waitForCompletion(true);  }}

打包部署

maven打包为jar包

上传测试

在HDFS系统中创建wordcount案例文件夹，并测试

hadoop fs -mkdir -p /hdfs/wordcount/inputhadoop fs -put a.txt b.txt /hdfs/wordcount/input/

执行程序测试

hadoop jar hadoop-demo-0.0.1-SNAPSHOT.jar com.bobo.mr.wc.WcTest /hdfs/wordcount/input /hdfs/wordcount/output/

执行成功

[root@hadoop-node01 ~]# hadoop jar hadoop-demo-0.0.1-SNAPSHOT.jar com.bobo.mr.wc.WcTest /hdfs/wordcount/input /hdfs/wordcount/output/19/04/03 16:56:43 INFO client.RMProxy: Connecting to ResourceManager at hadoop-node01/192.168.88.61:803219/04/03 16:56:46 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.19/04/03 16:56:48 INFO input.FileInputFormat: Total input paths to process : 219/04/03 16:56:49 INFO mapreduce.JobSubmitter: number of splits:219/04/03 16:56:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554281786018_000119/04/03 16:56:52 INFO impl.YarnClientImpl: Submitted application application_1554281786018_000119/04/03 16:56:53 INFO mapreduce.Job: The url to track the job: http://hadoop-node01:8088/proxy/application_1554281786018_0001/19/04/03 16:56:53 INFO mapreduce.Job: Running job: job_1554281786018_000119/04/03 16:57:14 INFO mapreduce.Job: Job job_1554281786018_0001 running in uber mode : false19/04/03 16:57:14 INFO mapreduce.Job:  map 0% reduce 0%19/04/03 16:57:38 INFO mapreduce.Job:  map 100% reduce 0%19/04/03 16:57:56 INFO mapreduce.Job:  map 100% reduce 100%19/04/03 16:57:57 INFO mapreduce.Job: Job job_1554281786018_0001 completed successfully19/04/03 16:57:57 INFO mapreduce.Job: Counters: 50  File System Counters    FILE: Number of bytes read=181    FILE: Number of bytes written=321388    FILE: Number of read operations=0    FILE: Number of large read operations=0    FILE: Number of write operations=0    HDFS: Number of bytes read=325    HDFS: Number of bytes written=87    HDFS: Number of read operations=9    HDFS: Number of large read operations=0    HDFS: Number of write operations=2  Job Counters     Launched map tasks=2    Launched reduce tasks=1    Data-local map tasks=1    Rack-local map tasks=1    Total time spent by all maps in occupied slots (ms)=46511    Total time spent by all reduces in occupied slots (ms)=12763    Total time spent by all map tasks (ms)=46511    Total time spent by all reduce tasks (ms)=12763    Total vcore-milliseconds taken by all map tasks=46511    Total vcore-milliseconds taken by all reduce tasks=12763    Total megabyte-milliseconds taken by all map tasks=47627264    Total megabyte-milliseconds taken by all reduce tasks=13069312  Map-Reduce Framework    Map input records=14    Map output records=14    Map output bytes=147    Map output materialized bytes=187    Input split bytes=234    Combine input records=0    Combine output records=0    Reduce input groups=10    Reduce shuffle bytes=187    Reduce input records=14    Reduce output records=10    Spilled Records=28    Shuffled Maps =2    Failed Shuffles=0    Merged Map outputs=2    GC time elapsed (ms)=1049    CPU time spent (ms)=5040    Physical memory (bytes) snapshot=343056384    Virtual memory (bytes) snapshot=6182891520    Total committed heap usage (bytes)=251813888  Shuffle Errors    BAD_ID=0    CONNECTION=0    IO_ERROR=0    WRONG_LENGTH=0    WRONG_MAP=0    WRONG_REDUCE=0  File Input Format Counters     Bytes Read=91  File Output Format Counters     Bytes Written=87

查看结果

[root@hadoop-node01 ~]# hadoop fs -cat /hdfs/wordcount/output/part-r-00000ajax  1bobo烤鸭  1hello 2java  2mybatis 1name  1php 1shell 2spring  2springmvc 1

OK~

赞
收藏
评论
*举报

上一篇：Hadoop之MapReduce01【自带wordcount案例】

下一篇：Hadoop之MapReduce03【wc案例流程分析】

Original: https://blog.51cto.com/u_15494758/5433343
Author: 波波烤鸭
Title: Hadoop之MapReduce02【自定义wordcount案例】

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/516996/

转载文章受原作者版权保护。转载请注明原作者出处！

2024 年 7 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31