Slick insert into H2, but no data inserted - scala

I'm sure I am missing something really stupidly obvious here - I have a unit test for a very simple Slick 3.2 setup. The DAO has basic retrieve and insert methods as follows:
override def questions: Future[Seq[Tables.QuestionRow]] =
db.run(Question.result)
override def createQuestion(title: String, body: String, authorUuid: UUID): Future[Long] =
db.run(Question returning Question.map(_.id) += QuestionRow(0l, UUID.randomUUID().toString, title, body, authorUuid.toString))
And I have some unit tests - for the tests im using in memory H2 and have a setup script (passed to the jdbcurl) to initialise two basic rows in the table.
The unit tests for retriving works fine, and they fetch the two rows inserted by the init script, and I have just added a simple unit test to create a row and then retrieve them all - assuming it will fetch the three rows, but no matter what I do, it only ever retrieves the initial two:
it should "create a new question" in {
whenReady(questionDao.createQuestion("Question three", "some body", UUID.randomUUID)) { s =>
whenReady(questionDao.questions(s)) { q =>
println(s)
println(q.map(_.title))
assert(true)
}
}
}
The output shows that the original s (the returning ID from the autoinc) is 3, as I would expect (I have also tried the insert not doing the returning step and just letting it return the number of rows inserted, which returns 1, as expecteD), but looking at the values returned in q, its only ever the first two rows inserted by the init script.
What am I missing?

My assumptions are that your JDBC url is something like jdbc:h2:mem:test;INIT=RUNSCRIPT FROM 'init.sql' and no connection pooling is used.
There are two scenarios:
the connection is performed with keepAliveConnection = true (or by appending DB_CLOSE_DELAY=-1 to the JDBC url) and the init.sql is something like:
DROP TABLE IF EXISTS QUESTION;
CREATE TABLE QUESTION(...);
INSERT INTO QUESTION VALUES(null, ...);
INSERT INTO QUESTION VALUES(null, ...);
the connection is performed with keepAliveConnection = false (default) (without appending DB_CLOSE_DELAY=-1 to the JDBC url) and the init.sql is something like:
CREATE TABLE QUESTION(...);
INSERT INTO QUESTION VALUES(null, ...);
INSERT INTO QUESTION VALUES(null, ...);
The call to questionDao.createQuestion will open a new connection to your H2 database and will trigger the initialization script (init.sql).
In both scenarios, right after this call, the database contains a QUESTION table with 2 rows.
In scenario (2) after this call the connection is closed and according to H2 documentation:
By default, closing the last connection to a database closes the database. For an in-memory database, this means the content is lost. To keep the database open, add ;DB_CLOSE_DELAY=-1 to the database URL. To keep the content of an in-memory database as long as the virtual machine is alive, use jdbc:h2:mem:test;DB_CLOSE_DELAY=-1.
The call to questionDao.questions will then open a new connection to your H2 database and will trigger again the initialization script (init.sql).
In scenario (1) the first connection is kept alive (and also the database content) but the new connection will re-execute the initialization script (init.sql) erasing the database content.
Given that (in both scenarios) questionDao.createQuestion returns 3, as expected, but then the content is lost and so the subsequent call to questionDao.questions will use a freshly initialized database.

Related

Database calls, 484ms apart, are producing incorrect results in Postgres

We have "things" sending data to AWS IoT. A rule forwards the payloads to a Lambda which is responsible for inserting or updating the data into Postgres (AWS RDS). The Lambda is written in python and uses PG8000 for interacting with the db. The lambda event looks like this:
{
"event_uuid": "8cd0b9b1-be93-49f8-1234-af4381052672",
"date": "2021-07-08T16:09:25.138809Z",
"serial_number": "a1b2c3",
"temp": "34"
}
Before inserting the data into Postgres, a query is run on the table to look for any existing event_uuids which are required to be unique. For a specific reason, there is no UNIQUE constraint on the event_uuid column. If the event_uuid does not exist, the data is inserted. If the event_uuid does exist, the data is updated. This all works great, except for the following case.
THE ISSUE: one of our things is sending two of the same payloads in very quick succession. It's an issue with one of our things but it's not something we can resolve at the moment and we need to account for it. Here are the timestamps from CloudWatch of when each payload was received:
2021-07-08T12:10:09.288-04:00
2021-07-08T12:10:09.772-04:00
As a result of the payloads being received 484ms apart, the Lambda is inserting both payloads instead of inserting the first and performing an update with the second one.
Any ideas on how to get around this?
Here is part of the Lambda code...
conn = make_conn()
event_query = f"""
SELECT json_build_object('uuid', uuid)
FROM samples
WHERE event_uuid='{event_uuid}'
AND serial_number='{serial_number}'
"""
event_resp = fetch_one(conn, event_query)
if event_resp:
update_sample_query = f"""
UPDATE samples SET temp={temp} WHERE uuid='{event_resp["uuid"]}'
"""
else:
insert_sample_query = f"""
INSERT INTO samples (uuid, event_uuid, temp)
VALUES ('{uuid4()}', '{event_uuid}', {temp})
"""

JDBC batch for multiple prepared statements

Is it possible to batch together commits from multiple JDBC prepared statements?
In my app the user will insert one or more records along with records in related tables. For example, we'll need to update a record in the "contacts" table, delete related records in the "tags" table, and then insert a fresh set of tags.
UPDATE contacts SET name=? WHERE contact_id=?;
DELETE FROM tags WHERE contact_id=?;
INSERT INTO tags (contact_id,tag) values (?,?);
// insert more tags as needed here...
These statements need to be part of a single transaction, and I want to do them in a single round trip to the server.
To send them in a single round-trip, there are two choices: for each command create a Statement and then call .addBatch(), or for each command create a PreparedStatement, and then call .setString(), .setInt() etc. for parameter values, then call .addBatch().
The problem with the first choice is that sending a full SQL string in the .addBatch() call is inefficient and you don't get the benefit of sanitized parameter inputs.
The problem with the second choice is that it may not preserve the order of the SQL statements. For example,
Connection con = ...;
PreparedStatement updateState = con.prepareStatement("UPDATE contacts SET name=? WHERE contact_id=?;");
PreparedStatement deleteState = con.prepareStatement("DELETE FROM contacts WHERE contact_id=?;");
PreparedStatement insertState = con.prepareStatement("INSERT INTO tags (contact_id,tag) values (?,?);");
updateState.setString(1, "Bob");
updateState.setInt(1, 123);
updateState.addBatch();
deleteState.setInt(1, 123);
deleteState.addBatch();
... etc ...
... now add more parameters to updateState, and addBatch()...
... repeat ...
con.commit();
In the code above, are there any guarantees that all of the statements will execute in the order we called .addBatch(), even across different prepared statements? Ordering is obviously important; we need to delete tags before we insert new ones.
I haven't seen any documentation that says that ordering of statements will be preserved for a given connection.
I'm using Postgres and the default Postgres JDBC driver, if that matters.
The batch is per statement object, so a batch is executed per executeBatch() call on a Statement or PreparedStatement object. In other words, this only executes the statements (or value sets) associated with the batch of that statement object. It is not possible to 'order' execution across multiple statement objects. Within an individual batch, the order is preserved.
If you need statements executed in a specific order, then you need to explicitly execute them in that order. This either means individual calls to execute() per value set, or using a single Statement object and generating the statements in the fly. Due to the potential of SQL injection, this last approach is not recommended.

how read-through work in ignite

my cache is empty so sql queries return null.
The read-through means that if the cache is missed, then Ignite will automatically get down to the underlying db(or persistent store) to load the corresponding data.
If there are new data inserted into the underlying db table ,i have to down cache server to load the newly inserted data from the db table automatically or it will sync automatically ?
Is work same as Spring's #Cacheable or work differently.
It looks to me that the answer is no. Cache SQL query don't work as no data in cache but when i tried cache.get in i got following results :
case 1:
System.out.println("data == " + cache.get(new PersonKey("Manish", "Singh")).getPhones());
result ==> data == 1235
case 2 :
PersonKey per = new PersonKey();
per.setFirstname("Manish");
System.out.println("data == " + cache.get(per).getPhones());
throws error:- as following
error image, image2
Read-through semantics can be applied when there is a known set of keys to read. This is not the case with SQL, so in case your data is in an arbitrary 3rd party store (RDBMS, Cassandra, HBase, ...), you have to preload the data into memory prior to running queries.
However, Ignite provides native persistence storage [1] which eliminates this limitation. It allows to use any Ignite APIs without having anything in memory, and this includes SQL queries as well. Data will be fetched into memory on demand while you're using it.
[1] https://apacheignite.readme.io/docs/distributed-persistent-store
When you insert something into the database and it is not in the cache yet, then get operations will retrieve missing values from DB if readThrough is enabled and CacheStore is configured.
But currently it doesn't work this way for SQL queries executed on cache. You should call loadCache first, then values will appear in the cache and will be available for SQL.
When you perform your second get, the exact combination of name and lastname is sought in DB. It is converted into a CQL query containing lastname=null condition, and it fails, because lastname cannot be null.
UPD:
To get all records that have firstname column equal to 'Manish' you can first do loadCache with an appropriate predicate and then run an SQL query on cache.
cache.loadCache((k, v) -> v.lastname.equals("Manish"));
SqlFieldsQuery qry = new SqlFieldsQuery("select firstname, lastname from Person where firstname='Manish'");
try (FieldsQueryCursor<List<?>> cursor = cache.query(qry)) {
for (List<?> row : cursor)
System.out.println("firstname:" + row.get(0) + ", lastname:" + row.get(1));
}
Note that loadCache is a complex operation and requires to run over all records in the DB, so it shouldn't be called too often. You can provide null as a predicate, then all records will be loaded from the database.
Also to make SQL run fast on cache, you should mark firstname field as indexed in QueryEntity configuration.
In your case 2, have you tried specifying lastname as well? By your stack trace it's evident that Cassandra expects it to be not null.

EF CF slow on generating insert statements

I have project that pull data from a service (return xml) which deserialize into objects/entities.
I'm using EF CF and testing is working fine until it come to big chuck of data, not too big, only 150K records, I use SQL profile to check the SQL statement and it's really fast, but there is a huge slow issue with generating insert statement.
simply put, the data model is simple, class Client has many child object set (5) and 1 many-to-many relationship.
ID for model is provided from service so I cleaned up the duplicate instances of one entity (same ID).
var clientList = service.GetAllClients(); // return IEnumerable<Client> // return 10K clients
var filteredList = Client.RemoveDuplicateInstancesSameEntity(clientList); // return IEnumerable<Client>
int cur = 0;
in batch = 100;
while (true)
{
logger.Trace("POINT A : get next batch");
var importSegment = filteredList.Skip(cur).Take(batch).OrderBy(x=> x.Id);
if (!importSegment.Any())
Break;
logger.Trace("POINT B: Saving to DB");
importSegment.ForEach(c => repository.addClient(c));
logger.Trace("POINT C: calling persist");
repository.persist();
cur = cur + batch;
}
logic is simple, breaking it up into batch to speed up the process. each 100 Client create about 1000 insert statement (for child records and 1 many to many table).
using profiler and logging to analyze this. right after it insert
log show POINT B as the last step all the time. but i dont see any insert statement yet in profiler. then 2 minutes later, I see all the insert statement and then the POINT B for the next batch. and 2 minutes again.
did I do anything wrong or is there is setting or anything I can do to improve?
insert 1k records seems to be fast. Database is wiped out when process start so no records in there. doesn't seem to be an issue with SQL slowness but EF generating insert statement?
although the project works but it is slow. I want to speed it up and understand more about EF when it comes to big chunks of data. or is this normal?
the first 100 is fast and then is getting slower and slower and slower. seems like issue at POINT B. is it issue with too much data repo/dbcontext can't handle it in timely manner?
repo is inheritance from dbcoontext and addClient is simply
dbcontext.Client.Add(client)
Thank you very much.

Row-Level Update Lock using System.Transactions

I have a MSSQL procedure with the following code in it:
SELECT Id, Role, JurisdictionType, JurisdictionKey
FROM
dbo.SecurityAssignment WITH(UPDLOCK, ROWLOCK)
WHERE Id = #UserIdentity
I'm trying to move that same behavior into a component that uses OleDb connections, commands, and transactions to achieve the same result. (It's a security component that uses the SecurityAssignment table shown above. I want it to work whether that table is in MSSQL, Oracle, or Db2)
Given the above SQL, if I run a test using the following code
Thread backgroundThread = new Thread(
delegate()
{
using (var transactionScope = new TrasnsactionScope())
{
Subject.GetAssignmentsHavingUser(userIdentity);
Thread.Sleep(5000);
backgroundWork();
transactionScope.Complete();
}
});
backgroundThread.Start();
Thread.Sleep(3000);
var foregroundResults = Subject.GetAssignmentsHavingUser(userIdentity);
Where
Subject.GetAssignmentsHavingUser
runs the sql above and returns a collection of results and backgroundWork is an Action that updates rows in the table, like this:
delegate
{
Subject.UpdateAssignment(newAssignment(user1, role1));
}
Then the foregroundResults returned by the test should reflect the changes made in the backgroundWork action.
That is, I retrieve a list of SecurityAssignment table rows that have UPDLOCK, ROWLOCK applied by the SQL, and subsequent queries against those rows don't return until that update lock is released - thus the foregroundResult in the test includes the updates made in the backgroundThread.
This all works fine.
Now, I want to do the same with database-agnostic SQL, using OleDb transactions and isolation levels to achieve the same result. And I can't for the life of me, figure out how to do it. Is it even possible, or does this row-level locking only apply at the db level?