Hive如何实现跨行两两分组相加?


我有这么一张 game 表..
   想求   两两分组  三三分组  四四分组 时, AMT之和
例如   

两两一组
  AB,(29.33+19.22)
  AC,(29.33+25.81)
.....
  BC,(19.22+25.81)
....
  CD,(25.81+16.79)
....
  EF,(20.78+25.32)

三三一组
    ABC,(29.33+19.22+25.81)
    ABD,(29.33+19.22+25.81)
   .....
    ACD,(29.33+25.81+16.79)
   .....
    BCD,(19.22+25.81+16.79)
   .....
    

剩下的四四一组 就不写了..
    可以不重复..  比如说  BBC,(19.22+19.22+25.81)
注意.. 括号里面的是要结果.. 前面这列可以是其他别名
  

-- 3个 
select t1.gname,t2.gname,t3.gname,concat(t1.gname,'+',t2.gname,'+',t3.gname,'=',t1.amt + t2.amt + t3.amt)
  from test_20210402 t1
 join test_20210402 t2
 join test_20210402 t3
 where t1.gname < t2.gname
   and t2.gname < t3.gname
order by t1.gname,t2.gname,t3.gname;


-- 4个
select t1.gname,t2.gname,t3.gname,t4.gname,concat(t1.gname,'+',t2.gname,'+',t3.gname,'+',t4.gname,'=',t1.amt + t2.amt + t3.amt + t4.amt)
  from test_20210402 t1
 join test_20210402 t2
 join test_20210402 t3
 join test_20210402 t4
 where t1.gname < t2.gname
   and t2.gname < t3.gname
   and t3.gname < t4.gname
order by t1.gname,t2.gname,t3.gname,t4.gname;

测试过程:

hive> 
    > 
    > select t1.gname,t2.gname,t3.gname,concat(t1.gname,'+',t2.gname,'+',t3.gname,'=',t1.amt + t2.amt + t3.amt)
    >   from test_20210402 t1
    >  join test_20210402 t2
    >  join test_20210402 t3
    >  where t1.gname < t2.gname
    >    and t2.gname < t3.gname
    > order by t1.gname,t2.gname,t3.gname;
Warning: Map Join MAPJOIN[13][bigTable=t1] in task 'Stage-1:MAPRED' is a cross product
Query ID = root_20210406094633_9629ee26-61a7-4407-a105-812b168a8d83
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1616576763401_0002
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/yarn application -kill application_1616576763401_0002
Hive on Spark Session Web UI URL: http://hp4:33610

Query Hive on Spark job[0] stages: [0]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-0 ........         0      FINISHED      2          2        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 01/01    [==========================>>] 100%  ELAPSED TIME: 6.10 s     
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 6.10 second(s)
Spark Job[0] Metrics: TaskDurationTime: 5895, ExecutorCpuTime: 1005, JvmGCTime: 272, BytesRead / RecordsRead: 9028 / 12, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 0 / 0, ShuffleBytesWritten / ShuffleRecordsWritten: 0 / 0
Launching Job 2 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1616576763401_0002
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/yarn application -kill application_1616576763401_0002
Hive on Spark Session Web UI URL: http://hp4:33610

Query Hive on Spark job[1] stages: [1, 2]
Spark job[1] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-1 ........         0      FINISHED      1          1        0        0       0  
Stage-2 ........         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 02/02    [==========================>>] 100%  ELAPSED TIME: 2.02 s     
--------------------------------------------------------------------------------------
Spark job[1] finished successfully in 2.02 second(s)
Spark Job[1] Metrics: TaskDurationTime: 1160, ExecutorCpuTime: 834, JvmGCTime: 32, BytesRead / RecordsRead: 8646 / 6, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 302 / 20, ShuffleBytesWritten / ShuffleRecordsWritten: 302 / 20
OK
A       B       C       A+B+C=6
A       B       D       A+B+D=7
A       B       E       A+B+E=8
A       B       F       A+B+F=9
A       C       D       A+C+D=8
A       C       E       A+C+E=9
A       C       F       A+C+F=10
A       D       E       A+D+E=10
A       D       F       A+D+F=11
A       E       F       A+E+F=12
B       C       D       B+C+D=9
B       C       E       B+C+E=10
B       C       F       B+C+F=11
B       D       E       B+D+E=11
B       D       F       B+D+F=12
B       E       F       B+E+F=13
C       D       E       C+D+E=12
C       D       F       C+D+F=13
C       E       F       C+E+F=14
D       E       F       D+E+F=15
Time taken: 29.681 seconds, Fetched: 20 row(s)

hive> 
    > 
    > 
    > select t1.gname,t2.gname,t3.gname,t4.gname,concat(t1.gname,'+',t2.gname,'+',t3.gname,'+',t4.gname,'=',t1.amt + t2.amt + t3.amt + t4.amt)
    >   from test_20210402 t1
    >  join test_20210402 t2
    >  join test_20210402 t3
    >  join test_20210402 t4
    >  where t1.gname < t2.gname
    >    and t2.gname < t3.gname
    >    and t3.gname < t4.gname
    > order by t1.gname,t2.gname,t3.gname,t4.gname;
Warning: Map Join MAPJOIN[15][bigTable=t1] in task 'Stage-1:MAPRED' is a cross product
Query ID = root_20210406094947_0124d39d-d808-4510-b438-637a9d18c119
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-6 ........         0      FINISHED      3          3        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 01/01    [==========================>>] 100%  ELAPSED TIME: 1.00 s     
--------------------------------------------------------------------------------------
Spark job[4] finished successfully in 1.00 second(s)
Spark Job[4] Metrics: TaskDurationTime: 501, ExecutorCpuTime: 139, JvmGCTime: 0, BytesRead / RecordsRead: 15855 / 18, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 0 / 0, ShuffleBytesWritten / ShuffleRecordsWritten: 0 / 0
Launching Job 2 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-7 ........         0      FINISHED      1          1        0        0       0  
Stage-8 ........         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 02/02    [==========================>>] 100%  ELAPSED TIME: 1.00 s     
--------------------------------------------------------------------------------------
Spark job[5] finished successfully in 1.01 second(s)
Spark Job[5] Metrics: TaskDurationTime: 231, ExecutorCpuTime: 92, JvmGCTime: 25, BytesRead / RecordsRead: 9623 / 6, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 265 / 15, ShuffleBytesWritten / ShuffleRecordsWritten: 265 / 15
WARNING: Spark Job[5] Spent 11% (25 ms / 231 ms) of task time in GC
OK
A       B       C       D       A+B+C+D=10
A       B       C       E       A+B+C+E=11
A       B       C       F       A+B+C+F=12
A       B       D       E       A+B+D+E=12
A       B       D       F       A+B+D+F=13
A       B       E       F       A+B+E+F=14
A       C       D       E       A+C+D+E=13
A       C       D       F       A+C+D+F=14
A       C       E       F       A+C+E+F=15
A       D       E       F       A+D+E+F=16
B       C       D       E       B+C+D+E=14
B       C       D       F       B+C+D+F=15
B       C       E       F       B+C+E+F=16
B       D       E       F       B+D+E+F=17
C       D       E       F       C+D+E+F=18
Time taken: 2.268 seconds, Fetched: 15 row(s)
hive> 

 

这个是用表连接即可,我提供2个一组的代码

 

create table test_20210402(gname varchar(20),amt decimal(10,2));

insert into test_20210402 values ('A',1),('B',2),('C',3),('D',4),('E',5),('F',6);

 
select t1.gname,t2.gname,concat(t1.gname,'+',t2.gname,'=',t1.amt + t2.amt)
  from test_20210402 t1
 join test_20210402 t2
 where t1.gname < t2.gname
order by t1.gname,t2.gname;

附测试记录:

hive> 
    > 
    > create table test_20210402(gname varchar(20),amt decimal(10,2));
OK
Time taken: 0.082 seconds
hive> insert into test_20210402 values ('A',1),('B',2),('C',3),('D',4),('E',5),('F',6);
Query ID = root_20210402180323_d81e216b-96a0-4cb1-a695-6c05f5951de0
Total jobs = 3
Launching Job 1 out of 3
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-12 .......         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 01/01    [==========================>>] 100%  ELAPSED TIME: 1.00 s     
--------------------------------------------------------------------------------------
Spark job[11] finished successfully in 1.00 second(s)
Spark Job[11] Metrics: TaskDurationTime: 171, ExecutorCpuTime: 55, JvmGCTime: 0, BytesRead / RecordsRead: 4563 / 6, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 0 / 0, ShuffleBytesWritten / ShuffleRecordsWritten: 0 / 0
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://nameservice1/user/hive/warehouse/test.db/test_20210402/.hive-staging_hive_2021-04-02_18-03-23_791_2194857612929860244-1/-ext-10000
Loading data to table test.test_20210402
OK
Time taken: 1.738 seconds
hive> 
    > 
    > 
    > 
    > select t1.gname,t2.gname,concat(t1.gname,'+',t2.gname,'=',t1.amt + t2.amt)
    >   from test_20210402 t1
    >  join test_20210402 t2
    >  where t1.gname < t2.gname
    > order by t1.gname,t2.gname;
Warning: Map Join MAPJOIN[11][bigTable=t1] in task 'Stage-1:MAPRED' is a cross product
Query ID = root_20210402180738_7d5a15f1-f796-46e3-bc45-ebb44e65419d
Total jobs = 2
Launching Job 1 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-16 .......         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 01/01    [==========================>>] 100%  ELAPSED TIME: 1.00 s     
--------------------------------------------------------------------------------------
Spark job[14] finished successfully in 1.00 second(s)
Spark Job[14] Metrics: TaskDurationTime: 126, ExecutorCpuTime: 50, JvmGCTime: 0, BytesRead / RecordsRead: 5515 / 6, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 0 / 0, ShuffleBytesWritten / ShuffleRecordsWritten: 0 / 0
Launching Job 2 out of 2
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  
--------------------------------------------------------------------------------------
Stage-17 .......         0      FINISHED      1          1        0        0       0  
Stage-18 .......         0      FINISHED      1          1        0        0       0  
--------------------------------------------------------------------------------------
STAGES: 02/02    [==========================>>] 100%  ELAPSED TIME: 1.00 s     
--------------------------------------------------------------------------------------
Spark job[15] finished successfully in 1.00 second(s)
Spark Job[15] Metrics: TaskDurationTime: 290, ExecutorCpuTime: 152, JvmGCTime: 20, BytesRead / RecordsRead: 8088 / 6, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 212 / 15, ShuffleBytesWritten / ShuffleRecordsWritten: 212 / 15
OK
A       B       A+B=3
A       C       A+C=4
A       D       A+D=5
A       E       A+E=6
A       F       A+F=7
B       C       B+C=5
B       D       B+D=6
B       E       B+E=7
B       F       B+F=8
C       D       C+D=7
C       E       C+E=8
C       F       C+F=9
D       E       D+E=9
D       F       D+F=10
E       F       E+F=11
Time taken: 2.237 seconds, Fetched: 15 row(s)
hive>