Perl 如何对多文件夹下的文本提取信息,合并至同一xls中,且根据文件夹信息进行自动分列输出

如题,我写了一个简单的脚步用来抓取往期分析结果的数据:

#!/usr/bin/perl
#
use warnings;
use strict;
use Statistics::Descriptive;

my $tdir = "/share/tumor171";
my $f20 = "/share/q20.xls";
my $f30 = "/share/q30.xls";
my $pad = "/share/Product_AverageDepth.xls";
my $cpeff = "/share/CaptureEffiency.xls";
my @dirs = glob("$tdir/tvoa*");

open Q20, '>',$f20 or die $!;
open Q30, '>',$f30 or die $!;
open AD, '>',$pad or die $!;
open CE, '>',$cpeff or die $!;

foreach my $dirs (@dirs){
    my $qc = "$dirs/result/sampleqc.xls";
    my $batchdir = (split /\//, $qc) [6];
    print "$batchdir\n";
    my $batchnum = (split /_/, $batchdir)[1];
    print "$batchnum\n";
    open QC,'<',$qc or die $!;
    print Q20 "$batchnum \n";
    print Q30 "$batchnum \n";
    print AD "$batchnum \n";
    print CE "$batchnum \n";
    while (<QC>){
        my $qc = $_;
        my $AverageDepth =  (split /\t/,$qc)[7];
        my $q20 =  (split /\t/,$qc)[13];
        my $q30 =  (split /\t/,$qc)[14];
        my $captureeffiency =  (split /\t/,$qc)[20];
        print Q20 "$q20\n";
        print Q30 "$q30\n";
        print AD "$AverageDepth\n";
        print CE "$captureeffiency\n";
    }
    close QC; 
    
}
close Q20;close Q30;close AD;close CE;

现在我输出的文件都是一列,如图所示
img
是否有一种办法可以根据样品名称进行自动分列,即打开一号样本的结果输出到表的第一列,打开第二号样本的结果输出到第二列。。以此类推
如果perl有分列符就好了

一般用法: @somearray = split(/:+/, $string ); #括号可以不要。 若不指定$string, 则对默认变量$_操作, 两斜线间为分割符,可以用正则表达式,强悍异常。

在perl手册里,有一个用法不多见。即: split /PATTERN/, EXPR, LIMIT; 关键就是这个LIMIT参数,可以节省不少事情。 如果使用了LIMIT,且是正数,表示分割成不多于LIMIT指定的数目的域。If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of pop would do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified. Note that splitting an EXPR that evaluates to the empty string always returns the empty list, regardless of the LIMIT specified.

通过制定LIMIT,可以在很长(分割产生几万个元素or域)的行分割操作中,只返回关键的前几列的域值,减少了内存使用及时间消耗。比如一般的基因型数据,第一列通常是材料命名,需要通过材料名的判断取舍,这时候就可以这样用。 my ($firstfield) = split /\t/, $someline, 1; 如果需要前面几列的值,这样的方式对大文件效率很好: my (undef, $var1, undef, undef, undef, $var2)=split /\t/, $someline, 6;