如何用一个脚本分批同时对600个文件数据进行分析,每批20个?

这是我部分的数据,我已经有一个分析脚本,但是我用这个脚本分析的时候是一个一个的进行,分析完600个,需要半年时间,所以需要一个循环,能够同时30个进行,不过这些编号不连续。求一个循环~

#!/usr/bin/bash

doc=/data/re-sequencedata/cleandata
Ref=/data/re-sequencedata/mappingZ/Reference/hic.fa
FM_index=/data/re-sequencedata/mappingZ/index/hic
gatkdoc=/home/yuning/software/gatk-4.1.9.0/gatkdoc
t=20
p_Xmx=50g
tmpdir="tmpdir"
bam_dir="."
G_Xmx=50g

while read id
do
     di=${id}
     echo "===========processing ${di} file============="
     gunzip -c ${doc}/${di}_1_clean.fq.gz >${di}_1_clean.fq
     gunzip -c ${doc}/${di}_2_clean.fq.gz >${di}_2_clean.fq
     grep "^@A00" ${di}_1_clean.fq|cut -d ':' -f1-4|sort|uniq >${di}.RG.table &&

     while read rg
     do
       affix=`echo $rg|tr ":" "-"`
       grep -A 3 ${rg} ${di}_1_clean.fq >${di}-${affix}_1_clean.fq 
       grep -A 3 ${rg} ${di}_2_clean.fq >${di}-${affix}_2_clean.fq 
     done <${di}.RG.table 
     rm ${di}_1_clean.fq ${di}_2_clean.fq &&

     while read rg
     do
       affix=`echo $rg |tr ":" "-"`
       bwa mem -t $t -M -R "@RG\tID:${affix}\tSM:${di}\tPL:illumina" $FM_index ${di}-${affix}_1_clean.fq ${di}-${affix}_2_clean.fq >${di}-${affix}.sam && rm ${di}-${affix}_*.fq
     done <${di}.RG.table 

     ls -v ${di}-*.sam|xargs -I [] echo "I="[]|xargs -L 1000000 \
     java -Xmx${p_Xmx} -jar /home/yuning/software/picard/build/libs/picard.jar MergeSamFiles \
        O=${di}.merged.sorted.bam \
        SORT_ORDER=coordinate \
        CREATE_INDEX=true \
        VALIDATION_STRINGENCY=LENIENT \
        REFERENCE_SEQUENCE=/data/re-sequencedata/mappingZ/Reference/hic.fa \
        TMP_DIR=${tmpdir} 2>${di%}.log \
        && rm ${di}-*.sam &&

     java -Xmx${p_Xmx} -jar /home/yuning/software/picard/build/libs/picard.jar MarkDuplicates \
        I=${di}.merged.sorted.bam \
        O=${di}.merged.sorted.dedup.bam \
        M=${di}.Marked_dup_metrics.txt \
        CREATE_INDEX=true \
        VALIDATION_STRINGENCY=LENIENT \
        REFERENCE_SEQUENCE=/data/re-sequencedata/mappingZ/Reference/hic.fa \
        TMP_DIR=${tmpdir} 2>${di%}.log \
        && rm ${di}.merged.sorted.bam* &&

      if [ ! -f "${tmpdir}/${di}_h" ];then
                    mkdir -p ${tmpdir}/${di}_h
          else
                    rm -r ${tmpdir}/${di}_h/*
      fi

     gatk --java-options "-Djava.io.tmpdir=${tmpdir}/${di}_h -Xmx${G_Xmx}" \
        HaplotypeCaller \
        -ERC GVCF \
        -R /data/re-sequencedata/mappingZ/Reference/hic.fa \
        -I ${di}.merged.sorted.dedup.bam \
        -O ${di}.gvcf.gz \
        1>${di}.gvcf.log 2>&1 \
        && rm -r ${tmpdir}/${di}_h
done

以上是我用来分析的脚本,摘自https://blog.csdn.net/Gossie/article/details/109296315。这个脚本是可以分析,但是是一个一个的分析,需要很久。本身对循环不怎么精通,急求大神给一个~~~另外,怎么给里边的bwa、picard、gatk建立临时文件夹,因为有时候会出现out of memory的情况而脚本停止运行。

files=(`ls *.fq.gz`)
filel=${#files[@]}
idx=0
while [[ $idx -lt $filel ]]
do
for((j=0;j<20&&$idx<$filel; j++))
do
    ./xxx.sh ${files[$idx]} &
    idx=$(($idx+1))
done
wait `jobs -p`
done

把xxx.sh修改成对应的单个处理shell就行了。