为什么逐行读取文件时缓冲区大小不是总是4096的整数倍?

The sample code is,

// test.go
package main

import (
    "bufio"
    "os"
)

func main() {
    if len(os.Args) != 2 {
        println("Usage:", os.Args[0], "")
        os.Exit(1)
    }
    fileName := os.Args[1]
    fp, err := os.Open(fileName)
    if err != nil {
        println(err.Error())
        os.Exit(2)
    }
    defer fp.Close()
    r := bufio.NewScanner(fp)
    var lines []string
    for r.Scan() {
        lines = append(lines, r.Text())
    }
}

c:\>go build test.go

c:\>test.exe test.txt

Then I monitored its process using process monitor when executing it, part of the output is:

test.exe  ReadFile  SUCCESS      Offset: 4,692,375, Length: 8,056
test.exe  ReadFile  SUCCESS      Offset: 4,700,431, Length: 7,198
test.exe  ReadFile  SUCCESS      Offset: 4,707,629, Length: 8,134
test.exe  ReadFile  SUCCESS      Offset: 4,715,763, Length: 7,361
test.exe  ReadFile  SUCCESS      Offset: 4,723,124, Length: 8,056
test.exe  ReadFile  SUCCESS      Offset: 4,731,180, Length: 4,322
test.exe  ReadFile  END OF FILE  Offset: 4,735,502, Length: 8,192

The equivalent java code is,

//Test.java
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;

public class Test{
public static void main(String[] args) {
  try
  {
  FileInputStream in = new FileInputStream("test.txt");
  BufferedReader br = new BufferedReader(new InputStreamReader(in));
  String strLine;
  while((strLine = br.readLine())!= null)
  {
   ;
  }
  }catch(Exception e){
   System.out.println(e);
  }
 }
}

c:\>javac Test.java

c:\>java Test

Then part of the monitoring output is:

java.exe  ReadFile  SUCCESS       Offset: 4,694,016, Length: 8,192
java.exe  ReadFile  SUCCESS       Offset: 4,702,208, Length: 8,192
java.exe  ReadFile  SUCCESS       Offset: 4,710,400, Length: 8,192
java.exe  ReadFile  SUCCESS       Offset: 4,718,592, Length: 8,192
java.exe  ReadFile  SUCCESS       Offset: 4,726,784, Length: 8,192
java.exe  ReadFile  SUCCESS       Offset: 4,734,976, Length: 526
java.exe  ReadFile  END OF FILE   Offset: 4,735,502, Length: 8,192

As you see, the buffer size in java is 8192 and it read 8192 bytes each time.Why is the Length in Go changing during each time reading file?

I have tried bufio.ReadString(' '),bufio.ReadBytes(' ')and both of them have the same problem.

[Update] I have tested the sample in C,

//test.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
        FILE * fp;
        char * line = NULL;
        size_t len = 0;
        ssize_t read;
        fp = fopen("test.txt", "r");
        if (fp == NULL)
                exit(EXIT_FAILURE);
        while ((read = getline(&line, &len, fp)) != -1) {
                printf("Retrieved line of length %zu :
", read);
        }
        if (line)
                free(line);
        return EXIT_SUCCESS;
}

The output is similar with java code(the buffer size is 65536 on my system).So why Go is so different here?

Reading bufio.Scan's source shows that while the buffer size is 4096, it reads depending on how much "empty" space is left in it, specifically this part:

n, err := s.r.Read(s.buf[s.end:len(s.buf)])

Now performance wise, I'm almost positive whatever file system you're using will be smart enough to read-ahead and cache the data, so the buffer size shouldn't make that much of a difference.

This may be the reason:

In all of the examples you cite, the Scan function output is determined by line-endings.

Go's default scan function splits by line (http://golang.org/pkg/bufio/#Scanner.Scan):

the default split function breaks the input into lines with line termination stripped

And bufio.ReadString(' ') and bufio.ReadBytes(' ') have the same problem due to the character.

Try removing all newlines from your test file and testing if it still gives non 4096 multiples on the READFILE records.

As some have suggested, what you're seeing may actually be due to the IO strategy used by the bufio package.