如何使Golang to Postgres查询更快? 有什么其他选择吗?

I am using Golang and Postgres to filter some financial data. I have a Postgres database which has a single table containing a single Stock Market (if that's the correct term). This table has columns for id, symbol, date, open, high, low, close and volume. The total number of rows is 6,610,598 and the number of distinct stocks (symbols) is 2174.

Now what I want to do is to filter the data from that table, and save to another table. So the first one contains raw data and second one contains cleaned data.

We have three parameters, a date (EVALDATE) and 2 integers (MINCTD & MINDP). First, we have to select only those stocks that will pass our minimum calendar trading days parameter. So that will be selected by (NOTE: we use golang for this)

symbols []string got its value from ( Select distinct symbol from table_name; )
[]filteredSymbols
var symbol, date string
var open, high, low, close float64
var volume int
for _, symbol := range symbols {
    var count int
    query := fmt.Sprintf("Select count(*) from table_name where symbol = '%s' and date >= '%s';", symbol, EVALDATE)
    row := db.QueryRow(query)
    if err := row.Scan(&count); err != nil ........
    if count >= MINCTD
        filteredSymbols = append(filteredSymbols, symbol)
}

Basically, the operation above only asks for those symbols which has enough number of rows from the EVALDATE up to current date (latest date in data) that will satisfy MINCTD. The operation above took 30 minutes

If a symbol satisfies the first filter above, it will undergo a second filter which will test if within that period (EVALDATE to LATEST_DATE) it has enough rows that contain complete data (no OHLC without values). So the query below is used to filter the symbols which passed the filter above:

Select count(*) from table_name where symbol='symbol' and date>= 'EVALDATE' and open != 0 and high != 0 and low != 0 and close != 0;

This query took 36 minutes.

After getting the slice of symbols which passed both filter, I will then grab their data again using postgres query then begin a bulk insert to another table.

So 1 hour and 6 minutes is not very acceptable. What should I do then? Grab all data then filter using Golang in memory?

Couple of things I note from the question.

Try to avoid scanning 6 million+ rows to arrive at 2174 values (i.e. avoid Select distinct symbol from table_name;). Do you not have (or can you build) a "master table" of symbols with a primary key of the symbols?

Combine your queries to test the data such as the following:

select
       count(*) c1
     , count(case when open != 0 and high != 0 and low != 0 and close != 0 then 1 end) as c2
from table_name 
where symbol='symbol' 
and date>= 'EVALDATE' 

An index on (symbol, date) would assist performance.

In Go, clean 7,914,698 rows for 3,142 symbols in 28.7 seconds, which is better than 3,960 seconds (1 hour and 6 minutes) for 6,610,598 rows for 2,174 symbols.

Output:

$ go run clean.go
clean: 7914698 rows 28.679295705s

$ psql
psql (9.6.6)

peter=# \d clean
         Table "public.clean"
 Column |       Type       | Modifiers 
--------+------------------+-----------
 id     | integer          | 
 symbol | text             | not null
 date   | date             | not null
 close  | double precision | 
 volume | integer          | 
 open   | double precision | 
 high   | double precision | 
 low    | double precision | 
Indexes:
    "clean_pkey" PRIMARY KEY, btree (symbol, date)

peter=# SELECT COUNT(*) FROM clean;
  count  
---------
 7914698

peter=# SELECT COUNT(DISTINCT symbol) FROM clean;
 count 
-------
  3142

peter=# \q
$

clean.go:

package main

import (
    "database/sql"
    "fmt"
    "strconv"
    "time"

    _ "github.com/lib/pq"
)

func clean(db *sql.DB, EVALDATE time.Time, MINCTD, MINDP int) (int64, time.Duration, error) {
    start := time.Now()

    tx, err := db.Begin()
    if err != nil {
        return 0, 0, err
    }
    committed := false
    defer func() {
        if !committed {
            tx.Rollback()
        }
    }()

    {
        const query = `DROP TABLE IF EXISTS clean;`
        if _, err := tx.Exec(query); err != nil {
            return 0, 0, err
        }
    }

    var nRows int64
    {
        const query = `
            CREATE TABLE clean AS
                SELECT id, symbol, date, close, volume, open, high, low 
                FROM unclean 
                WHERE symbol IN (
                    SELECT symbol
                    FROM unclean
                    WHERE date >= $1
                    GROUP BY symbol
                    HAVING 
                        COUNT(*) >= $2
                        AND 
                        COUNT(CASE WHEN NOT (open >0 AND high >0 AND low >0 AND close >0) THEN 1 END) <= $3
                )
                ORDER BY symbol, date
            ;
        `
        EVALDATE := EVALDATE.Format("'2006-01-02'")
        MINCTD := strconv.Itoa(MINCTD)
        MINDP := strconv.Itoa(MINDP)
        res, err := tx.Exec(query, EVALDATE, MINCTD, MINDP)
        if err != nil {
            return 0, 0, err
        }
        nRows, err = res.RowsAffected()
        if err != nil {
            return 0, 0, err
        }
    }

    {
        const query = `ALTER TABLE clean ADD PRIMARY KEY (symbol, date);`
        _, err := tx.Exec(query)
        if err != nil {
            return 0, 0, err
        }
    }

    if err = tx.Commit(); err != nil {
        return 0, 0, err
    }
    committed = true

    since := time.Since(start)

    {
        const query = `ANALYZE clean;`
        if _, err := db.Exec(query); err != nil {
            return nRows, since, err
        }
    }

    return nRows, since, nil
}

func main() {
    db, err := sql.Open("postgres", "user=peter password=peter dbname=peter")
    if err != nil {
        fmt.Println(err)
        return
    }
    defer db.Close()
    var ( // one year
        EVALDATE = time.Now().AddDate(-1, 0, 0)
        MINCTD   = 240
        MINDP    = 5
    )
    nRows, since, err := clean(db, EVALDATE, MINCTD, MINDP)
    if err != nil {
        fmt.Println(err)
        return
    }
    fmt.Println("clean:", nRows, "rows", since)
    return
}

Playground: https://play.golang.org/p/qVOQQ6mcU-1


References:

Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications, John J. Murphy.

An Introduction to Database Systems, 8th Edition, C.J. Date.

PostgreSQL: Introduction and Concepts, Bruce Momjian.

PostgreSQL 9.6.6 Documentation, PostgreSQL.