Why does deleteMany use 2 queries? - prisma

I noticed that deleteMany uses two queries when I specify a where. It first selects the primary keys of the rows to delete, and then removes them with a DELETE FROM WHERE id IN (...) query.
What is the use of this? Instead of the WHERE id IN (...) query, it would make more sense to me to just select only the rows to delete in the DELETE query itself.
As an example:
await this.prismaService.cardSet.deleteMany({ where: { steamAccountId: steamAccount.id } });
This runs:
SELECT "public"."CardSet"."id" FROM "public"."CardSet" WHERE "public"."CardSet"."steamAccountId" = $1;
DELETE FROM "public"."CardSet" WHERE "public"."CardSet"."id" IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,...);
The following seems more efficient to me:
DELETE FROM "public"."CardSet" WHERE "public"."CardSet"."steamAccountId" = $1;

I can't comment on what technical decisions led the Prisma library authors to translate a deleteMany operation into two separate SQL statements.
However, ORMs (Prisma and others) in general don't always generate perfectly efficient SQL queries. In many cases, it is definitely possible to write more optimal queries direclty in SQL. This is a tradeoff you have to be mindful about when using ORMs.
If your usecase really needs this deleteMany operationg to be as efficient as possible, you could cosider using Prisma's $queryRaw feature to directly write a more efficient SQL query. You can find more information in the Raw database access article of Prisma docs.
In my opinion, unless you're sure that this that a manual SQL query would really improve your performance in some meaningful way, I would not bother for this particular case.

Related

In mongo if I have a bunch of IDs, is it more performant to query per ID, or by $in: IDs?

I'm wondering how $in works behind the scenes, and what optimizations are made. Does it loop through the database, looking for the required items, or know immediately where those are? Do indexes matter in those operations?
I'm trying to be efficient as possible, by making one query, and querying the documents I need in one go, but maybe when providing a single ID, which is guaranteed to be indexed, it's faster, and worth the multiple queries.
I guess there is a factor of how many documents we're talking about, in my case it's only a few. I assume with a lot of IDs it may worth it to just query them in one go, but maybe not. I'm not too experienced in mongo.
Generally, It is always better to reduce network roundtrip to the database.
In your case, using $in operator is better because if you make many requests to the database for each id, you will have so many roundtrips.
when you send your query to the database, it will try to create the most efficient execution plan for your query and if there are any indices that can help to achieve a more efficient execution plan, the database will use them.
Mongo creates an index on _id filed of the document by default.

Best strategy for creating MongoDB indexes for Parse Server?

I have a Parse application with a number of possible queries triggered by Cloud functions. I understand that creating indexes can speed up those queries. But I'm wondering what the best strategy for indexing is.
for example, let's say these are some typical queries:
new Parse.Query('Customer')
.equalTo('email', email)
.equalTo('zipCode', zipCode)
new Parse.Query('Customer')
.equalTo('email', email)
.equalTo('areaCode', areaCode)
I could make a compound index for every such query (email+zipCode and email+areaCode, etc), but I have a lot of different queries, so this would be difficult to maintain, and my understanding is that too many indexes can hurt write performance.
If I instead make a single index for every field that is included in queries (email, areaCode, and zipCode for these examples), will all these compound queries be able to make use of the single indexes? I read about "index intersection" in the MongoDB docs, but it's not clear whether this is something that will happen automatically in a Parse query.

Eclipselink batch fetch VS join fetch

When should I use "eclipselink.join-fetch", when should I use "eclipselink.batch" (batch type = IN)?
Is there any limitations for join fetch, such as the number of tables being fetched?
Answer is alway specific to your query, the specific use case, and the database, so there is no hard rule on when to use one over the other, or if to use either at all. You cannot determine what to use unless you are serious about performance and willing to test both under production load conditions - just like any query performance tweaking.
Join-fetch is just what it says, causing all the data to be brought back in the one query. If your goal is to reduce the number of SQL statements, it is perfect. But it comes at costs, as inner/outer joins, cartesian joins etc can increase the amount of data being sent across and the work the database has to do.
Batch fetching is one extra query (1+1), and can be done a number of ways. IN collects all the foreign key values and puts them into one statement (more if you have >1000 on oracle). Join is similar to fetch join, as it uses the criteria from the original query to select over the relationship, but won't return as much data, as it only fetches the required rows. EXISTS is very similar using a subquery to filter.

How can I use Mongo $index so the query would NOT use any index

this question is somewhat strange, but I bumped onto it in a current implementation of mine:
I want to privilege inserts over everything in my application and it came to my mind that the command $hint could also be used to let mongo NOT use an index.
Is that possible? is that a sound question, considering what $hint is supposed to do?
Thanks
To force the query optimizer to not use indexes (do a table scan), use:
db.collection.find().hint({$natural:1})
Not sure if this achieves what you want (prioritizing inserts over other activity), though.
I don't think inserts work the way you think.
An insert will catalogue it's needed fields to the btree dependant upon the number of indexes on the collection itself. As such to privilege inserts you would have to destroy all indexes on the collection.
As such using $natural order hinting will make no difference to the order of read and write. Not to mention $natural order is a disk insertion index, just an index you cannot effectively use in a query as such it will force full table scan.
However that does not actually privilege anything since maintaining the btree is part of inserting data so as such there is no way, via indexes to prioritise inserts.
Also the write and read lock are two completely different things so again I am not sure if your question makes sense.
Are you more like looking for an atomic lock to ensure that you update or put data in before it is read?

What is the fundmental difference between MongoDB / NoSQL which allows faster aggregation (MapReduce) compared to MySQL

Greeting!
I have the following problem. I have a table with huge number of rows which I need to search and then group search results by many parameters. Let's say the table is
id, big_text, price, country, field1, field2, ..., fieldX
And we run a request like this
SELECT .... WHERE
[use FULLTEXT index to MATCH() big_text] AND
[use some random clauses that anyway render indexes useless,
like: country IN (1,2,65,69) and price<100]
This we be displayed as search results and then we need to take these search results and group them by a number of fields to generate search filters
(results) GROUP BY field1
(results) GROUP BY field2
(results) GROUP BY field3
(results) GROUP BY field4
This is a simplified case of what I need, the actual task at hand is even more problematic, for example sometimes the first results query does also its own GROUP BY. And example of such functionality would be this site
http://www.indeed.com/q-sales-jobs.html
(search results plus filters on the left)
I've done and still doing a deep research on how MySQL functions and at this point I totally don't see this possible in MySQL. Roughly speaking MySQL table is just a heap of rows lying on HDD and indexes are tiny versions of these tables sorted by the index field(s) and pointing to the actual rows. That's a super oversimplification of course but the point is I don't see how it is possible to fix this at all, i.e. how to use more than one index, be able to do fast GROUP BY-s (by the time query reaches GROUP BY index is completely useless because of range searches and other things). I know that MySQL (or similar databases) have various helpful things such index merges, loose index scans and so on but this is simply not adequate - the queries above will still take forever to execute.
I was told that the problem can be solved by NoSQL which makes use of some radically new ways of storing and dealing with data, including aggregation tasks. What I want to know is some quick schematic explanation of how it does this. I mean I just want to have a quick glimpse at it so that I could really see that it does that because at the moment I can't understand how it is possible to do that at all. I mean data is still data and has to be placed in memory and indexes are still indexes with all their limitation. If this is indeed possible, I'll then start studying NoSQL in detail.
PS. Please don't tell me to go and read a big book on NoSQL. I've already done this for MySQL only to find out that it is not usable in my case :) So I wanted to have some preliminary understanding of the technology before getting a big book.
Thanks!
There are essentially 4 types of "NoSQL", but three of the four are actually similar enough that an SQL syntax could be written on top of it (including MongoDB and it's crazy query syntax [and I say that even though Javascript is one of my favorite languages]).
Key-Value Storage
These are simple NoSQL systems like Redis, that are basically a really fancy hash table. You have a value you want to get later, so you assign it a key and stuff it into the database, you can only query a single object at a time and only by a single key.
You definitely don't want this.
Document Storage
This is one step up above Key-Value Storage and is what most people talk about when they say NoSQL (such as MongoDB).
Basically, these are objects with a hierarchical structure (like XML files, JSON files, and any other sort of tree structure in computer science), but the values of different nodes on the tree can be indexed. They have a higher "speed" relative to traditional row-based SQL databases on lookup because they sacrifice performance on joining.
If you're looking up data in your MySQL database from a single table with tons of columns (assuming it's not a view/virtual table), and assuming you have it indexed properly for your query (that may be you real problem, here), Document Databases like MongoDB won't give you any Big-O benefit over MySQL, so you probably don't want to migrate over for just this reason.
Columnar Storage
These are the most like SQL databases. In fact, some (like Sybase) implement an SQL syntax while others (Cassandra) do not. They store the data in columns rather than rows, so adding and updating are expensive, but most queries are cheap because each column is essentially implicitly indexed.
But, if your query can't use an index, you're in no better shape with a Columnar Store than a regular SQL database.
Graph Storage
Graph Databases expand beyond SQL. Anything that can be represented by Graph theory, including Key-Value, Document Database, and SQL database can be represented by a Graph Database, like neo4j.
Graph Databases make joins as cheap as possible (as opposed to Document Databases) to do this, but they have to, because even a simple "row" query would require many joins to retrieve.
A table-scan type query would probably be slower than a standard SQL database because of all of the extra joins to retrieve the data (which is stored in a disjointed fashion).
So what's the solution?
You've probably noticed that I haven't answered your question, exactly. I'm not saying "you're finished," but the real problem is how the query is being performed.
Are you absolutely sure you can't better index your data? There are things such as Multiple Column Keys that could improve the performance of your particular query. Microsoft's SQL Server has a full text key type that would be applicable to the example you provided, and PostgreSQL can emulate it.
The real advantage most NoSQL databases have over SQL databases is Map-Reduce -- specifically, the integration of a full Turing-complete language that runs at high speed that query constraints can be written in. The querying function can be written to quickly "fail out" of non-matching queries or quickly return with a success on records that meet "priority" requirements, while doing the same in SQL is a bit more cumbersome.
Finally, however, the exact problem you're trying to solve: text search with optional filtering parameters, is more generally known as a search engine, and there are very specialized engines to handle this particular problem. I'd recommend Apache Solr to perform these queries.
Basically, dump the text field, the "filter" fields, and the primary key of the table into Solr, let it index the text field, run the queries through it, and if you need the full record after that, query your SQL database for the specific index you got from Solr. It uses some more memory and requires a second process, but will probably best suite your needs, here.
Why all of this text to get to this answer?
Because the title of your question doesn't really have anything to do with the content of your question, so I answered both. :)