Couchbase Update query divide - nosql

I am trying to update the document using the UPDATE query statement on the couchbase.
EX)
UPDATE Users SET cityIndex = 1 where Users.city= "NewYork";
There was so much data that I wanted to divide 3,000 to 4,000 and proceed with the UPDATE. How should I proceed?
There is PRIMARY INDEX.

The Eventing Function method that vsr alluded too is quite simple (7 lines sans comments) and you run it as a one-off point tool deploying it from Everything. Note there is no need for any index for this to work.
// To run configure the settings for this Function, UpdateAllCityIndex, as follows:
//
// Version 7.0+
// "Listen to Location"
// bulk.data.yourcollection
// "Eventing Storage"
// rr100.eventing.metadata
// Binding(s)
// 1. "binding type", "alias name...", "bucket.scope.collection", "Access"
// ---------------------------------------------------------------------------
// "bucket alias", "src_col", "bulk.data.Users", "read and write"
//
// Version 6.X
// "Source Bucket"
// yourbucket
// "MetaData Bucket"
// metadata
// Binding(s)
// 1. "binding type", "alias name...", "bucket", "Access"
// ---------------------------------------------------------------------------
// "bucket alias", "src_col", "Users", "read and write"
//
// For more performance set the workers to the number of physical cores
function OnUpdate(doc, meta) {
// only process documents with the city field
if (!doc.city) return;
// only update New York if cityIndex isn't already 1 or does not exist
if ( doc.city === "NewYork" && (!doc.cityIndex || doc.cityIndex !== 1 )) {
doc.cityIndex = 1;
// write back the updated doc via the alias
src_col[meta.id] = doc;
}
}

Option 1)
You can use couchbase eventing
case 2 of https://docs.couchbase.com/server/current/eventing/eventing-example-data-enrichment.html
https://docs.couchbase.com/server/current/eventing/eventing-examples.html
Option 2)
CREATE INDEX ix1 ON Users (city, cityIndex);
UPDATE Users AS u
SET u.cityIndex = 1
WHERE u.city = "NewYork" AND u.cityIndex != 1
LIMIT 4000;

Using a primary index, you can issues multiple queries on a (presumably stable primary index) and iterate over it. Little bit more complicated, but generalized.
rq is the bucket, s is the scope, t1 is the collection.
create collection rq.s.t1;
create primary index on rq.s.t1;
First query:
UPDATE rq.s.t1 USE KEYS [(
SELECT META().id
FROM rq.s.t1
ORDER BY META().id
LIMIT 10)] SET x = 1 RETURNING MAX(META().id);
Second to N query until you're done (nothing gets returned):
Take the max value of meta().id from the previous query (see the WHERE clause)
UPDATE rq.s.t1 USE KEYS [(
SELECT RAW META().id
FROM rq.s.t1
WHERE META().id > "007dd444-fa39-498f-b070-6cd0d41abe3d"
ORDER BY META().id
LIMIT 10)] SET x = 1 RETURNING META().id;
You can optimize this loop by setting the initial meta().id to compare against "".

Related

Inserting many rows causes locking conflicts with Hibernate and Postgres, leaving the table empty

We are benchmarking some queries to see if they will still work reliably for "a lot of" data. (1 million isn't that much to be honest, but Postgres already fails here, so it evidently is.)
Our Java code to call this queries looks something like that:
#PersistenceContext
private EntityManager em;
#Resource
private UserTransaction utx;
for (int i = 0; i < 20; i++) {
this.utx.begin();
for (int inserts = 0; inserts < 50_000; inserts ++) {
em.createNativeQuery(SQL_INSERT).executeUpdate();
}
this.utx.commit();
for (int parameter = 0; parameter < 25; parameter ++)
long time = System.currentTimeMillis();
Assert.assertNotNull(this.em.createNativeQuery(SQL_SELECT).getResultList());
System.out.println(i + " iterations \t" + parameter + "\t" + (System.currentTimeMillis() - time) + "ms");
}
}
Or with plain JDBC:
Connection connection = //...
for (int i = 0; i < 20; i++) {
for (int inserts = 0; inserts < 50_000; inserts ++) {
try (Statement statement = connection.createStatement();) {
statement.execute(SQL_INSERT);
}
}
for (int parameter = 0; parameter < 25; parameter ++)
long time = System.currentTimeMillis();
try (Statement statement = connection.createStatement();) {
statement.execute(SQL_SELECT);
}
System.out.println(i + " iterations \t" + parameter + "\t" + (System.currentTimeMillis() - time) + "ms");
}
}
The queries we tried were a simple INSERT into a table with JSON and a INSERT over two tables with about 25 lines. The SELECT has one or two JOINs and is pretty easy. One set of queries is (I had to anonymize the SQL else I wouldn't have been allowed to post it):
CREATE TABLE ts1.p (
id integer NOT NULL,
CONSTRAINT p_pkey PRIMARY KEY ("id")
);
CREATE TABLE ts1.m(
pId integer NOT NULL,
mId character varying(100) NOT NULL,
a1 character varying(50),
a2 character varying(50),
CONSTRAINT m_pkey PRIMARY KEY (pI, mId)
);
CREATE SEQUENCE ts1.seq_p;
/*
* SQL_INSERT
*/
WITH p AS (
INSERT INTO ts1.p (id)
VALUES (nextval('ts1.seq_p'))
RETURNING id AS pId
)
INSERT INTO ts1.m(pId, mId, a1, a2)
VALUES ((SELECT pId from p), 'M1', '11', '12'),
((SELECT pId from p), 'M2', '13', '14'),
/* ... about 20 to 25 rows of values */
/*
* SQL_SELECT
*/
WITH userInput (mId, a1, a2) AS (
VALUES
('M1', '11', '11'),
('M2', '12', '15'),
/* ... about "parameter" rows of values */
)
SELECT m.pId, COUNT(m.a1) AS matches
FROM userInput u
LEFT JOIN ts1.m m ON (m.mId) = (u.mId)
WHERE (m.a1 IS NOT DISTINCT FROM u.a1) AND
(m.a2 IS NOT DISTINCT FROM u.a2) OR
(m.a1 IS NULL AND m.a2 IS NULL)
GROUP BY m.pId
/* plus HAVING, additional WHERE clauses etc. according to the use case, but that just speeds up the query */
When executing, we get the following output (the values are supposed to rise steadly and linearly):
271ms
414ms
602ms
820ms
995ms
1192ms
1396ms
1594ms
1808ms
1959ms
110ms
33ms
14ms
10ms
11ms
10ms
21ms
8ms
13ms
10ms
As you can see, after some value (usually at around 300,000 to 500,000 inserts) the time needed for the query drops significantly. Sadly we can't really debug what the result is at that point (other than that it's not null), but we assume it's an empty list, because the database tables are empty.
Let me repeat that: After half a million INSERTS, Postgres clears tables.
Of course that's not acceptable at all.
We tried different queries, all of easy to medium difficulty, and all produced this behavior, so we assume it's not the queries.
We thought that maybe the sequence returned a value too high for a column integer, so we droped and recreated the sequence.
Once there was this exception:
org.postgresql.util.PSQLException : FEHLER: Verklemmung (Deadlock) entdeckt
Detail: Prozess 1620 wartet auf AccessExclusiveLock-Sperre auf Relation 2001098 der Datenbank 1937678; blockiert von Prozess 2480.
Which I'm entirely unable to translate. I guess it's something like:
org.postgresql.util.PSQLException : ERROR: Jamming? Clamping? Constipation? (Deadlock) found
But I don't think this error has anything to do with the clearing of the table. We just tested against the wrong database, so multiple queries were run on the same table. Normally we have one database per benchmark test.
Of course it's important that we find out what the error is, so that we can decide if there is any risk to our customers losing their data (because again, on error the database empties some table of its choice).
Postgres version: PostgreSQL 10.6, compiled by Visual C++ build 1800, 64-bit
We tried PostgreSQL 9.6.11, compiled by Visual C++ build 1800, 64-bit, too. And we never had the same problem there (even though that could just be luck, since it's not 100% reproducible).
Do you have any idea what the error is? Or how we could debug it? The entire benchmark test runs for an hour, so there is no immediate feedback.

Merge rows with connecting dates

I've got a large Excel sheet with customer and subscription data. From this table I would like to merge records/rows with connection stop_ and start_dates and show the result in a new worksheet. A simplified version of the data is shown below.
Customer_id subscription_id start_date stop_date
1034 RV4 30-4-2012 30-1-2015
1035 AB7 30-1-2014 30-3-2014
1035 AB6 30-1-2014 30-3-2014
1035 AB7 30-12-2013 30-1-2014
1035 AB7 12-12-2012 30-12-2013
1035 AB7 12-9-2010 14-1-2011
So, the formula has to check the customer_id and the subscription_id. When there is a match between two or more rows in the sheet and the stop_date of one of the rows overlaps with the start_date of the other row, then after the extraction and merging, one new row must be shown with the start_date of the first and the stop_date of the other row. This also has to work if there are multiple rows with connecting dates. All the rows that don't match these criteria stay the same after the extraction. So the result will be like this:
Customer_id subscription_id start_date stop_date
1034 RV4 30-4-2012 30-1-2015
1035 AB6 30-1-2014 30-3-2014
1035 AB7 12-12-2012 30-3-2014
1035 AB7 12-9-2010 14-1-2011
A dynamic solution would be ideal while new data will be added to the original sheet. While I know this is possible when you're certain that the rows you're looking for are always below each other, this is not the case here and it wouldn't give you a very dynamic solution.
So some kind of array function would be needed in Excel I guess but after searching a lot I couldn't find a suitable solution. I've also got MATLAB available but no clue where to start in that program with a problem like this.
A dynamic solution may be possible, but if the dataset it large it might bog things down quite a bit because you'd need it to run every time a cell was changed.
Basically the best way I can see to approach this is to create unique keys out your customer_id and subscription_id, then collect all of the date ranges under that key and merge them.
Something like this should get you started (requires a reference to Microsoft Scripting Runtime):
Public Sub LinkSubscriptionDates()
Dim data As Dictionary, source As Worksheet, target As Worksheet
Set source = ActiveSheet
Set data = GetSubscriptions(source)
Set target = source.Parent.Worksheets.Add
'Copy headers
target.Range(target.Cells(1, 1), target.Cells(1, 4)).Value = _
source.Range(source.Cells(1, 1), source.Cells(1, 4)).Value
Dim row As Long
row = 2
Dim key As Variant, item As Variant
For Each key In data.Keys
For Each item In data(key)
target.Cells(row, 1) = Split(key, "|")(0)
target.Cells(row, 2) = Split(key, "|")(1)
target.Cells(row, 3) = Split(item, "|")(0)
target.Cells(row, 4) = Split(item, "|")(1)
row = row + 1
Next item
Next key
End Sub
Private Function GetSubscriptions(source As Worksheet) As Dictionary
Dim subscrips As Dictionary
Set subscrips = New Dictionary
Dim row As Long
Dim cust As String, subs As String, starting As String, ending As String
'Gather all the data as pairs of customer|subscription, starting|ending
For row = 2 To source.UsedRange.Rows.Count
Dim dates() As String
cust = source.Cells(row, 1).Value
subs = source.Cells(row, 2).Value
'Valid customer/subscription?
If cust <> vbNullString And subs <> vbNullString Then
starting = source.Cells(row, 3).Value
ending = source.Cells(row, 4).Value
'Has an ending and starting date?
If starting <> vbNullString And ending <> vbNullString Then
Dim key As String
key = cust & "|" & subs
'New combo?
If Not subscrips.Exists(key) Then
subscrips.Add key, New Collection
subscrips(key).Add starting & "|" & ending
Else
subscrips(key).Add starting & "|" & ending
Set subscrips(key) = MergeDates(subscrips(key))
End If
End If
End If
Next row
Set GetSubscriptions = subscrips
End Function
Private Function MergeDates(dates As Collection) As Collection
Dim candidate As Long, index As Long
Dim values() As String, test() As String
Dim merge As Boolean
For index = 1 To dates.Count
values = Split(dates(index), "|")
'Check to see if it can be merged with any other row.
For candidate = index + 1 To dates.Count
test = Split(dates(candidate), "|")
If CDate(test(0)) >= CDate(values(0)) And _
CDate(test(0)) <= CDate(values(1)) Or _
CDate(test(1)) >= CDate(values(0)) And _
CDate(test(1)) <= CDate(values(1)) Then
dates.Remove candidate
merge = True
Exit For
End If
Next candidate
If merge Then Exit For
Next index
If merge Then
'Pull both rows out of the collection.
dates.Remove index
values(0) = IIf(CDate(test(0)) < CDate(values(0)), _
CDate(test(0)), CDate(values(0)))
values(1) = IIf(CDate(test(1)) > CDate(values(1)), _
CDate(test(1)), CDate(values(1)))
'Put the merged date range back in.
dates.Add values(0) & "|" & values(1)
'Recurse.
Set MergeDates = MergeDates(dates)
End If
Set MergeDates = dates
End Function
It really needs to be fleshed out with data validation, error trapping, etc., and it currently just puts the resulting data on a new worksheet. All the work gets done in the GetSubscriptions function, so you can just grab returned Dictionary from that and do whatever you need to do with that data in it.

how to get specific rows page number in pagination

I have a question on MySQL paging. A user's record is displayed on a table with many other user's record. and the table is sorted/paged. Now I need to display the page that containing the user's row directly after the user login. How can I achieve this?
create table t_users (id int auto_increment primary key, username varchar(100)); insert t_users(username) values ('jim'),('bob'),('john'),('tim'),('tom'), ('mary'),('elise'),('karl'),('karla'),('bob'), ('jack'),('jacky'),('jon'),('tobias'),('peter');
I searched the google but not found answer so please help
There are two steps for this:
1. Determine the row's position in your sorted table.
Copied and tweaked from: https://stackoverflow.com/a/7057822/2391142
Use this SQL...
SELECT z.rank FROM (
SELECT id, #rownum := #rownum + 1 AS rank
FROM t_users, (SELECT #rownum := 0) r
ORDER BY id ASC
) as z WHERE id=1;
...replacing the ORDER BY id ASC with whatever your actual sort order is. And replacing the number 1 in WHERE id=1 with the provided number in that index.php?u=id url.
2. Determine the page number based on the row's position.
Use this PHP to determine the needed page number...
$rows_per_page = 50;
$user_row_position = [result you got from step 1];
$page = ceil($user_row_position / $rows_per_page);
...replacing the 50 with whatever your real rows-per-page limit is, and putting the real SQL result in $users_row_position.
And voila. You'll have the destination page number in the $page variable and hopefully you can take it from there.
EDIT
After further discussion in the comments, use this bit of PHP:
$page = 0;
$limit = 10;
// If a user ID is specified, then lookup the page number it's on.
if (isset($_GET['u'])) {
// Check the given ID is valid to avoid SQL injection risks.
if (is_numeric($_GET['u'])) {
// Lookup the user's position in the list.
$query = mysqli_fetch_array(mysqli_query($link, "SELECT z.rank FROM (SELECT id, #rownum := #rownum + 1 AS rank FROM sites, (SELECT #rownum := 0) r WHERE online='0') as z WHERE id=" . $_GET['u']));
$position = $query[0];
if (is_numeric($position)) {
// Convert the result to a number before doing math on it.
$position = (int) $position;
$page = ceil($position / $limit);
}
}
}
// If a page number is specified, and wasn't already set by looking a user, then lookup the real starting row.
if ($page == 0 && isset($_GET['page'])) {
// Check your given page number is valid too.
if (is_numeric($_GET['page'])) {
$page = (int) $_GET['page'];
}
}
// Notice that if anything fails in the above checks, we just pretend it never
// happened and keep using the default page and start number of 0.
// Determine the starting row based off the page number.
$start = ($page - 1) * $limit;
// Get the list of sites for the provided page only.
$query = mysqli_query($link, "SELECT * FROM sites WHERE online='0' LIMIT " . $start . ", " + $limit);
while ($row = mysqli_fetch_array($query)) {
// Stuff to render your rows goes here.
// You can use $row['fieldname'] to extract fields for this row.
}

Linq - Limit number of results by same value of a field

I have a problem with a creation of a Linq to Entities (Oracle 11g) request. Here is the
I have a table TREATMENT with three column (simplified version) : ID, STATE and APPLICATION. Here is a sample :
ID STATE APPLICATION
1 A MAJ
2 A FLUX
3 A FLUX
4 R REF
5 A REF
Now, my objectives are to retrieve the data with theses rules:
State must be A (added)
Number of rows per application below a max value
The max value is minored b the number of row with State = R (per application)
Exemple : If the max value is 1, I must retrieve row 1 and 2. (Can't retrieve row 5 since there is already a REF with state R (the row 4))
I manage to retrieve all the row when the number of R is equal or greater than the max value, but I don't see how to limit my number of result in order to respect the max value.
Here is the request :
using (Entities bdd = new Entities())
{
var treatments = from trt in bdd.TREATMENT
let app = from t in bdd.TREATMENT
where t.STATE == "R"
group t by t.APPLICATION into grouped
where grouped.Count() >= maxPerApplication
select grouped.Key
where trt.STATE == "A" && !app.Contains(trt.APPLICATION)
orderby trt.ID
select new TreatmentDto()
{
Id = trt.ID
};
result = treatments.ToList();
}
In SQL, I would use an inner request and a ROWNUM to limit the number of result, but I don't see how to do it. The only solution I see is to do the request in two parts, but I want to avoid this to maintain the consistency of the information.
I found a solution, not sure if it's the best, but it works :
using (Entities bdd = new Entities())
{
from trt in bdd.TREATMENT
where trt.STATE == "A" &&
(from trt2 in bdd.TREATMENT
where trt2.STATE == "A" && trt2.APPLICATION == trt.APPLICATION && trt2.ID <= trt.ID
select trt2).Count() <= maxPerApplication - (from appp in bdd.TREATMENT
where appp.STATE == "R"
&& appp.APPLICATION.Equals(trt.APPLICATION)
select appp).Count()
select new TreatmentDto()
{
Id = trt.ID
};
result = treatments.ToList();
}

Upsert in Postgres using node.js

I'm trying to do an insert or update in a postgres database using node.js with pg extension (version 0.5.4).
So far I have this code:
(...)
client.query({
text: "update users set is_active = 0, ip = $1 where id=$2",
values: [ip,id]
}, function(u_err, u_result){
debug(socket_id,"update query result: ",u_result);
debug(socket_id,"update query error: ",u_err);
date_now = new Date();
var month = date_now.getMonth() + 1;
if(!u_err){
client.query({
text: 'insert into users (id,first_name,last_name,is_active,ip,date_joined) values' +
'($1,$2,$3,$4,$5,$6)',
values: [
result.id,
result.first_name,
result.last_name,
1,
ip,
date_now.getFullYear() + "-" + month + "-" + date_now.getDate() + " " + date_now.getHours() + ":" + date_now.getMinutes() + ":" + date_now.getSeconds()
]
}, function(i_err, i_result){
debug(socket_id,"insert query result: ",i_result);
debug(socket_id,"insert query error: ",i_err);
});
}
});
The problem is that, although both queries work the problem is always running both instead of only running the insert function if the update fails.
The debug functions in code output something like:
UPDATE
Object { type="update query result: ", debug_value={...}}
home (linha 56)
Object { type="update query error: ", debug_value=null}
home (linha 56)
Object { type="insert query result: "}
home (linha 56)
Object { type="insert query error: ", debug_value={...}}
Insert
Object { type="update query result: ", debug_value={...}}
home (linha 56)
Object { type="update query error: ", debug_value=null}
home (linha 56)
Object { type="insert query result: ", debug_value={...}}
home (linha 56)
Object { type="insert query error: ", debug_value=null}
** EDIT **
ANSWER FROM node-postgres developer:
It's possible to retrieve number of rows affected by an insert and
update. It's not fully implemented in the native bindings, but does
work in the pure javascript version. I'll work on this within the
next week or two. In the mean time use pure javascript version and
have a look here:
https://github.com/brianc/node-postgres/blob/master/test/integration/client/result-metadata-tests.js
** END EDIT **
Can anyone help?
The immediate answer to your question is to use a stored procedure to do an upsert.
http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
Something like this works fine with the pg module.
client.query({
text: "SELECT upsert($1, $2, $3, $4, $5, $6)"
values: [ obj.id,
obj.first_name,
obj.last_name,
1,
ip,
date_now.getFullYear() + "-" + month + "-" + date_now.getDate() + " " + date_now.getHours() + ":" + date_now.getMinutes() + ":" + date_now.getSeconds()
]
}, function(u_err, u_result){
if(err) // this is a real error, handle it
// otherwise your data is updated or inserted properly
});
Of course this assumes that you're using some kind of model object that has all the values you need, even if they aren't changing. You have to pass them all into the upsert. If you're stuck doing it the way you've shown here, you should probably check the actual error object after the update to determine if it failed because the row is already there, or for some other reason (which is real db error that needs to be handled).
Then you've gotta deal with the potential race condition between the time your update failed and the time your insert goes through. If some other function tries to insert with the same id, you've got a problem. Transactions are good for that. That's all I got right now. Hope it helps.
I had this issue when connecting to a PG instance using the JDBC. The solution I ended up using was:
UPDATE table SET field='C', field2='Z' WHERE id=3;
INSERT INTO table (id, field, field2)
SELECT 3, 'C', 'Z'
WHERE NOT EXISTS (SELECT 1 FROM table WHERE id=3);
The update does nothing if the record doesn't exist and the insert does nothing if the record does exist. It works pretty well and is an SQL based solution vs a stored procedure.
Here's the initial question:
Insert, on duplicate update in PostgreSQL?
I have an electronic component database to which I add components that I either salvage from e-waste or buy as new, and the way I did it was:
const upsertData = (request, response) => {
const {
category, type, value, unit, qty,
} = request.body;
pool.query(`DO $$
BEGIN
IF EXISTS
( SELECT 1
FROM elab
WHERE category='${category}'
AND type='${type}'
AND value='${value}'
AND unit='${unit}'
)
THEN
UPDATE elab
SET qty = qty + ${qty}
WHERE category='${category}'
AND type='${type}'
AND value='${value}'
AND unit='${unit}';
ELSE
INSERT INTO elab
(category, type, value, unit, qty)
values ('${category}', '${type}', '${value}', '${unit}', ${qty});
END IF ;
END
$$ ;`, (error, results) => {
if (error) {
throw error;
}
response.status(201).send('Task completed lol');
});
};
The reason for this was that the only unique column any entry had was the ID, which is automatically updated, none of the other columns are unique only the whole entry is e.g. you can have a 100 kOhm resistor as a potentiometer or a "normal" one - and you can have a potentiometer with different values than 100 kOhm so only the whole entry is unique.