从hdfs下载成绩表,对每个同学去除最高分,去除最低分,求总分和平均分,并把处理结果上传到hdfs
数据例如:
10001 22 42 60 32 77
10002 35 70 65 31 90
处理结果:
10001 134 45
10002 170 57
这需要编写一个MapReduce程序来对数据进行处理。以下是一个可能的实现方法:
public class ScoreMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text outKey = new Text();
private IntWritable outValue = new IntWritable();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] parts = value.toString().split(" ");
String id = parts[0];
int maxScore = Integer.MIN_VALUE;
int minScore = Integer.MAX_VALUE;
int totalScore = 0;
int count = 0;
for (int i = 1; i < parts.length; i++) {
int score = Integer.parseInt(parts[i]);
// 寻找最高分和最低分
if (score > maxScore) {
maxScore = score;
}
if (score < minScore) {
minScore = score;
}
totalScore += score;
count++;
}
// 去除最高分和最低分,并计算总分和平均分
int adjustedTotal = totalScore - maxScore - minScore;
double average = (double) adjustedTotal / (count - 2);
// 输出学生ID和成绩
outKey.set(id);
outValue.set(adjustedTotal);
context.write(outKey, outValue);
}
}
public class ScoreReducer extends Reducer<Text, IntWritable, Text, Text> {
private Text outValue = new Text();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
int count = 0;
for (IntWritable value : values) {
sum += value.get();
count++;
}
double average = (double) sum / count;
// 输出学生ID,总分和平均分
outValue.set(sum + " " + average);
context.write(key, outValue);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "score analysis");
// 配置MapReduce任务
job.setJarByClass(ScoreAnalyzer.class);
job.setMapperClass(ScoreMapper.class);
job.setReducerClass(ScoreReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
// 设置输入和输出路径
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// 执行任务并等待任务完成
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
在本地电脑终端执行以下命令进行提交MapReduce任务:
hadoop jar myScore.jar [InputPath] [OutputPath]
其中,[InputPath]是输入文件的HDFS路径,[OutputPath]是输出路径。
最终输出文件的内容应该如下:
10001 134 45
10002 170 57
可以使用以下命令将输出文件上传到HDFS:
hadoop fs -put [LocalOutputPath] [HDFSOutputPath]
其中,[LocalOutputPath]是本地输出文件的路径,[HDFSOutputPath]是HDFS输出路径。