Golang slow scan() for multiple rows - postgresql

I am running a query in Golang where I select multiple rows from my Postgresql Database.
I am using the following imports for my query
"database/sql"
"github.com/lib/pq"
I have narrowed down to my loop for scanning the results into my struct.
// Returns about 400 rows
rows, err = db.Query('SELECT * FROM infrastructure')
if err != nil {
return nil, err
}
var arrOfInfra []model.Infrastructure
for rows.Next() {
obj, ptrs := model.InfrastructureInit()
rows.Scan(ptrs...)
arrOfInfra = append(arrOfInfra, *obj)
}
rows.Close()
The above code takes about 8 seconds to run, and while the query is fast, the loop in rows.Next() takes the entire 8 seconds over to complete.
Any ideas? Am I doing something wrong, or is there a better way?
My configuration for my database
// host, port, dbname, user, password masked for obvious reasons
db, err := sql.Open("postgres", "host=... port=... dbname=... user=... password=... sslmode=require")
if err != nil {
panic(err)
}
// I have tried using the default, or setting to high number (100), but it doesn't seem to help with my situation
db.SetMaxIdleConns(1)
db.SetMaxOpenConns(1)
UPDATE 1:
I placed print statements in the for loop. Below is my updated snippet
for rows.Next() {
obj, ptrs := model.InfrastructureInit()
rows.Scan(ptrs...)
arrOfInfra = append(arrOfInfra, *obj)
fmt.Println("Len: " + fmt.Sprint(len(arrOfInfra)))
fmt.Println(obj)
}
I noticed that in this loop, it will actually pause half-way, and continue after a short break. It looks like this:
Len: 221
Len: 222
Len: 223
Len: 224
<a short pause about 1 second, then prints Len: 225 and continues>
Len: 226
Len: 227
...
..
.
and it will happen again later on at another row count, and again after a few hundred records.
UPDATE 2:
Below is a snippet of my InfrastructureInit() method
func InfrastructureInit() (*Infrastructure, []interface{}) {
irf := new(Infrastructure)
var ptrs []interface{}
ptrs = append(ptrs,
&irf.Base.ID,
&irf.Base.CreatedAt,
&irf.Base.UpdatedAt,
&irf.ListingID,
&irf.AddressID,
&irf.Type,
&irf.Name,
&irf.Description,
&irf.Details,
&irf.TravellingFor,
)
return irf, ptrs
}
I am not exactly sure what is causing this slowness, but I currently placed a quick patch on my server to using a redis database and precache my infrastructures, saving it as a string. It seems to be okay for now, but I now have to maintain both redis and my postgres.
I am still puzzled over this weird behavior, but I'm not exactly how rows.Next() work - does it make a query to the database everytime I call rows.Next()?

How do you think about just do like this?
defer rows.Close()
var arrOfInfra []*Infrastructure
for rows.Next() {
irf := &Infrastructure{}
err = rows.Scan(
&irf.Base.ID,
&irf.Base.CreatedAt,
&irf.Base.UpdatedAt,
&irf.ListingID,
&irf.AddressID,
&irf.Type,
&irf.Name,
&irf.Description,
&irf.Details,
&irf.TravellingFor,
)
if err == nil {
arrOfInfra = append(arrOfInfra, irf)
}
}
Hope this help.

I went some weird path myself while consolidating my understanding of how rows.Next() work and what might be impacting performance so thought about sharing this here for posterity (despite the question asked a long time ago).
Related to:
I am still puzzled over this weird behavior, but I'm not exactly how
rows.Next() work - does it make a query to the database everytime I
call rows.Next()?
It doesn't make a 'query' but it reads (transfers) data from the db through a driver on each iteration which means it can be impacted by e.g. bad network performance. Especially true if, for example, your db is not local to where you are running your Go code.
One approach to confirm whether network performance is an issue would be to run your go app on the same machine where your db is (if possible).
Assuming columns that are scanned in the above code are not of extremely large size or having custom conversions - reading ~400 rows should take in the order of 100ms at most (in a local setup).
For example - I had a case where I needed to read about 100k rows with about 300B per row and that was taking ~4s (local setup).

Related

pq: date/time field value out of range: "22/02/2022"

I have this query:
l.loyaltycard_number, l.recipt_no,
l.totaltrans_amount, l.amount_paid,
l.reward_points, l.redeemed_points,
cashier FROM loyalty l
JOIN warehouses w
ON l.machine_ip = w.machine_ip
WHERE l.machine_name = $1
AND redeemed_points != $2
AND trasaction_time BETWEEN $3 AND $4
ORDER BY trasaction_time DESC;
I have HTML datepickers for the transaction_time that is in the format dd/mm/yyyy.
anytime I select a date range that the first number is greater than 12, (22/02/2022).
I get the above error.
I suspected the formatting was the problem.
I found in the docs how to set the postgresql date style to DMY. After doing that, I get the same error.
However, when I run the same query in Postgres cli like so:
SELECT w.machine_name, l.trasaction_time,
l.loyaltycard_number, l.recipt_no,
l.totaltrans_amount, l.amount_paid,
l.reward_points, l.redeemed_points,
cashier FROM loyalty l
JOIN warehouses w
ON l.machine_ip = w.machine_ip
WHERE l.machine_name = 'HERMSERVER'
AND redeemed_points != 0
AND trasaction_time BETWEEN '14/11/21' AND '22/02/22'
ORDER BY trasaction_time DESC;
I get the expected result. I don't know what I am doing wrong.
I want to know how I can make the database treat the date from the datepicker as dd/mm/yyyy instead of mm/dd/yyyy. I am using google cloudsql Postgres
This is the code for the handler that gets the data from the datepicker
err := r.ParseForm()
if err != nil {
app.clientError(w, http.StatusBadRequest)
}
startDate := r.PostForm.Get("startDate")
endDate := r.PostForm.Get("endDate")
outlet := r.PostForm.Get("outlet")
reportType := r.PostForm.Get("repoType")
if reportType == "0" {
rReport, err := app.models.Reports.GetRedeemedReport(startDate, endDate, outlet, reportType)
if err != nil {
app.serverError(w, err)
return
}
app.render(w, r, "tranxreport.page.tmpl", &templateData{
Reports: rReport,
})
} else if reportType == "1" {
rReport, err := app.models.Reports.GetAllReport(startDate, endDate, outlet)
if err != nil {
app.serverError(w, err)
return
}
app.render(w, r, "tranxreport.page.tmpl", &templateData{
Reports: rReport,
})
} else {
app.render(w, r, "tranxreport.page.tmpl", &templateData{})
}
As per the comments while it should be possible to change DateStyle there are a few issues with this:
The SET datestyle command changes the style for the current session. As the SQL package uses connection pooling this is of limited use.
You may be able to use "the DateStyle parameter in the postgresql.conf configuration file, or the PGDATESTYLE environment variable on the server" but this may not be available where Postgres is offered as a managed service. Note that making this change also means your software will fail if the parameter is not set (and this is easily done when moving to a new server).
A relatively simple solution is to edit your query to use TO_DATE e.g.:
BETWEEN TO_DATE($3,'DD/MM/YYYY') AND TO_DATE($4,'DD/MM/YYYY')
However while this will work it makes your database code dependent upon the format of the data sent into your API. This means that the introduction of a new date picker, for example, could break your code in a way that is easily missed (testing at the start of the month works either way).
A better solution may be to use a standard format for the date in your API (e.g. ISO 8601) and/or pass the dates to your database functions as a time.Time. However this does require care due to time zones, daylight saving etc.

mongodb-rust-driver perform poorly on find and get large amount of data compare to go-driver

I have a database consist of 85.4k of document with average size of 4kb
I write a simple code in go to find and get over 70k document from the database using mongodb-go-driver
package main
import (
"context"
"log"
"time"
"go.mongodb.org/mongo-driver/mongo"
"go.mongodb.org/mongo-driver/mongo/options"
)
func main() {
localC, _ := mongo.Connect(context.TODO(), options.Client().ApplyURI("mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb"))
localDb := localC.Database("sampleDB")
collect := localDb.Collection("sampleCollect")
localCursor, _ := collect.Find(context.TODO(), JSON{
"deleted": false,
})
log.Println("start")
start := time.Now()
var result []map[string] interface{} = make([]map[string] interface{}, 0)
localCursor.All(context.TODO(), &result)
log.Println(len(result))
log.Println("done")
log.Println(time.Now().Sub(start))
}
Which done in around 20 seconds
2021/03/21 01:36:43 start
2021/03/21 01:36:56 70922
2021/03/21 01:36:56 done
2021/03/21 01:36:56 20.0242869s
After that, I try to implement the similar thing in rust using mongodb-rust-driver
use mongodb::{
bson::{doc, Document},
error::Error,
options::FindOptions,
Client,
};
use std::time::Instant;
use tokio::{self, stream::StreamExt};
#[tokio::main]
async fn main() {
let client = Client::with_uri_str("mongodb://localhost:27017/")
.await
.unwrap();
let db = client.database("sampleDB");
let coll = db.collection("sampleCollect");
let find_options = FindOptions::builder().build();
let cursor = coll
.find(doc! {"deleted": false}, find_options)
.await
.unwrap();
let start = Instant::now();
println!("start");
let results: Vec<Result<Document, Error>> = cursor.collect().await;
let es = start.elapsed();
println!("{}", results.iter().len());
println!("{:?}", es);
}
But it took almost 1 minutes to complete the same task on release build
$ cargo run --release
Finished release [optimized] target(s) in 0.43s
Running `target\release\rust-mongo.exe`
start
70922
51.1356069s
May I know the performance on rust in this case is consider normal or I made some mistake on my rust code and it could be improve?
EDIT
As comment suggested, here is the Example document
The discrepancy here was due to some known bottlenecks in the Rust driver that have since been addressed in the latest beta release (2.0.0-beta.3); so, upgrading your mongodb dependency to use that version should solve the issue.
Re-running your examples with 10k copies of the provided sample document, I now see the Rust one taking ~3.75s and the Go one ~5.75s on my machine.

What's the recommended way to use s3manager.Downloader?

Recently, I tried to improve upload and download experience, and I found that s3manager.Uploader is so amazing to improve the upload experience for larger objects by parallelizing them. The below code works well. We can make full use of our bandwidth (far better than PutObject).
uploader := s3manager.NewUploaderWithClient(s.Client, func(u *s3manager.Uploader) {
u.Concurrency = 128
})
So I tried to use the similar way to improve the download speed, but it seems that whatever concurrency is used it can't accelarate the download speed. The download speed is always below 15 MB/s.
By the way, GetObject API works pretty good with very large objects, it makes full use of our bandwidth when we download large objects, GetObject can reach bandwidth limit about 1250 MB/s.
So I tried to do some benchmark test, and the above image is what go tool pprof shows.
func (b *WriteAtBuffer) WriteAt(p []byte, pos int64) (n int, err error) {
pLen := len(p)
expLen := pos + int64(pLen)
b.m.Lock()
defer b.m.Unlock()
if int64(len(b.buf)) < expLen {
if int64(cap(b.buf)) < expLen {
if b.GrowthCoeff < 1 {
b.GrowthCoeff = 1
}
newBuf := make([]byte, expLen, int64(b.GrowthCoeff*float64(expLen)))
copy(newBuf, b.buf)
b.buf = newBuf
}
b.buf = b.buf[:expLen]
}
copy(b.buf[pos:], p)
return pLen, nil
}
Here's my question, Why is s3manager.Downloader so slow, is data race slowing down the WriteAt interface? What's the recommended way to use s3manager.Downloader?

Batching large result sets using Rx

I've got an interesting question for Rx experts. I've a relational table keeping information about events. An event consists of id, type and time it happened. In my code, I need to fetch all the events within a certain, potentially wide, time range.
SELECT * FROM events WHERE event.time > :before AND event.time < :after ORDER BY time LIMIT :batch_size
To improve reliability and deal with large result sets, I query the records in batches of size :batch_size. Now, I want to write a function that, given :before and :after, will return an Observable representing the result set.
Observable<Event> getEvents(long before, long after);
Internally, the function should query the database in batches. The distribution of events along the time scale is unknown. So the natural way to address batching is this:
fetch first N records
if the result is not empty, use the last record's time as a new 'before' parameter, and fetch the next N records; otherwise terminate
if the result is not empty, use the last record's time as a new 'before' parameter, and fetch the next N records; otherwise terminate
... and so on (the idea should be clear)
My question is:
Is there a way to express this function in terms of higher-level Observable primitives (filter/map/flatMap/scan/range etc), without using the subscribers explicitly?
So far, I've failed to do this, and come up with the following straightforward code instead:
private void observeGetRecords(long before, long after, Subscriber<? super Event> subscriber) {
long start = before;
while (start < after) {
final List<Event> records;
try {
records = getRecordsByRange(start, after);
} catch (Exception e) {
subscriber.onError(e);
return;
}
if (records.isEmpty()) break;
records.forEach(subscriber::onNext);
start = Iterables.getLast(records).getTime();
}
subscriber.onCompleted();
}
public Observable<Event> getRecords(final long before, final long after) {
return Observable.create(subscriber -> observeGetRecords(before, after, subscriber));
}
Here, getRecordsByRange implements the SELECT query using DBI and returns a List. This code works fine, but lacks elegance of high-level Rx constructs.
NB: I know that I can return Iterator as a result of SELECT query in DBI. However, I don't want to do that, and prefer to run multiple queries instead. This computation does not have to be atomic, so the issues of transaction isolation are not relevant.
Although I don't fully understand why you want such time-reuse, here is how I'd do it:
BehaviorSubject<Long> start = BehaviorSubject.create(0L);
start
.subscribeOn(Schedulers.trampoline())
.flatMap(tstart ->
getEvents(tstart, tstart + twindow)
.publish(o ->
o.takeLast(1)
.doOnNext(r -> start.onNext(r.time))
.ignoreElements()
.mergeWith(o)
)
)
.subscribe(...)

sqlite3 results set in swift returning extra data

I am trying to fetch a number of records from a sqlite3 database and load them into an array. The code I have written, which seems to function correctly at least as far as retrieving the correct number of records with the right values from the db is
while(results?.next() == true) {
println("Got a result")
var sname = results?.stringForColumn("surname")
var fname = results?.stringForColumn("firstname")
println("Retrieved \(sname) ,\(fname)")
}
The problem I have is that when I try to access the variables in the println statement what it yields is
Retrieved Optional("Smiles") ,Optional("Dick")
I have seemingly tried everything to get just the values but keep getting the Optional(" ") added. Any ideas?
Try this approach:
println("Retrieved \(sname!) ,\(fname!)")