ISSUE IN CASE STATEMENT VS CORRELATED QUERY (DB2 Coursera IBM Course) [closed] - nosql

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am currently enrolled in the course by IBM on Coursera. I have written down two queries one with a sub query method and one with a case statement with correlated query.Answer of both looks same initially but is different in the values of other columns. Is the problem in my query?

I believe the issue is that multiple values of local_name have the same value of lang_num. That means that the sorting has ties.
The order by clause in SQL is not stable. This has a technical meaning that running order by twice on the same data produces the same results -- even when keys have the same value. The reason is simple: SQL tables represent unordered sets, so there is no default ordering for rows that have the same key values.
The solution is easy. Just make sure that the order by keys uniquely identify each row:
order by lang_num, local_name

Related

When is it better to pull all data and filter, or pull the data filtered [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am working with Spark (pyspark) and MongoDB as a relational database.
We are running into some performance issues and the answers I found here were not directly related to Big Data.
We pull our entire mongoDB and then filter in Spark and when we apply some filters, some of the columns we don't filter are still present in the spark DataFrame(let me explain better this last case later).
My questions, besides a general understanding of the question's tittle:
Pull and filter, or filter and pull. If it's not a clear answer what are the parameters to start taking into account?
Let's say I have a Spark DataFrame with columns A,B,C and I filter only on C, it would be better (assuming I pulled everything) to drop then A and B?
Any links or readings regarding this are welcome.
1 - pull filtered data , it is more efficient to pull only the data you want. most database are optimize to do filter operation. the perfect case is when you can partition your data on your filtering columns (in your case columns C i guess)
2 - I am not sure but i think it's better to drop the colums you dont use, mainly to reduce the shuffle size if shuffle you have. and it also make your DataFrame more clear

Best ways to apply Joins inside a Update Query in Postgreql (performance wise) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I Want to update a table column after checking multiple conditions on multiple tables
I think set from is more feasible at often and is faster than subquerying.See this understandable example.
UPDATE t_name AS t
SET T.attr1 = r.attr2
FROM any_list AS r
WHERE t.anylist_id = t.id
Joins are executed by the RDBMS with an execution pattern such that to optimize data loading and processing, unlike the sub-query where it will run all the queries and load all their data to do the processing.
More on subqueries can be found here
The subquery will generally only be executed far enough to determine whether at least one row is returned, not all the way to completion like in joins. It is unwise to write a subquery that has any side effects (such as calling sequence functions); whether the side effects occur or not may be difficult to predict.
I don't know how well they perform, but I see at least the following two possibilities:
A subquery
UPDATE SET FROM - syntax (can be found in PostgreSQL documentation, too)

Is mongodb a no-go for this application? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Good sirs.
I've just started planning a new project, and it seems that I should stick with a relational database, (even though I want to play with mongo). Tell me if I'm mistaken!
There will be box models, each of which can contain hundreds to thousands of items.
At any time, the user can move an item to another box.
for example, using some Railsy pseudocode...
item = Item(5676)
item.box // returns 24
item.update(box:25)
item.box // returns 25
This sounds like a simple SQL join table to me, but an expensive array manipulation operation for mongodb.
Or is removing an object out of one (huge) array and inserting it in another (huge) array not a big problem for mongo?
Thanks for any wisdom. I've only just started with mongo.
If you want to use big arrays, stay away from MongoDB. I tell from personal experience. There are two big problems with arrays. If they start to grow, document grows and it needs to be moved on disk. That is very, very slow operation. Plus if you need to scan array to get to 10000 element, that will be very slow as it needs to check 9999 before that.

How to properly design classes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm trying to design an application in which a user can make multiple operations like add, delete.. Work stations or Applications from database. How do I design it?
To deal this, I've found two solutions that I couldn't be able to choose the best:
1st solution, 2nd solution.
Is this right?
Any brilliant suggestion, please?
Thanks a lot!
To think this out a bit, a user can have one or more operations i.e. a one to many relationship with the operation.
An operation can delete one or more Workstations or Applications, again a one to many relationships.
Therefore, I think your 1st solution captures it nicely.
I think solution 1 is better. Solution 2 requires that you insert 2 records in the 2 many-to-many associative tables, for each single operation. This is more complex, and probably unnecessarily so.
In solution 1 the Operation table becomes the only associative table. 1 operation, 1 insert. You may have to have to make some of the referencing keys nullable, depending on your requirements, but this is manageable. Simpler and sufficient for the needs you expressed.

why doesn't PostgreSQL have ON DUPLICATE KEY? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Is this a perpetually denied feature request or something specific to Postgres?
I've read that Postgre has potentially higher performance than InnoDB but also potentially larger chance of less serialization (I apologize that I don't have a source, and please give that statement a wide berth because of my noobity and memory) and wonder if it might have something to do with that.
Postgres is amazingly functional compared to MySQL, and that's why I've switched. Already I've cut down lines of code & unnecessary replication immensely.
This is just a small annoyance, but I'm curious if it's considered unnecessary because of the UPDATE then INSERT workaround, or if it's very difficult to develop (possibly vs the perceived added value) like boost::lockfree::queue's ability to pass "anything" (, or if it's something else).
PostgreSQL committers are working on a patch to introduce "INSERT ... ON DUPLICATE KEY", which is functionally equivalent to an "upsert". MySQL and Oracle already have functionality (in Oracle it is called "MERGE")
A link to the PostgreSQL archives where the functionality is discussed and a patch introduced: http://www.postgresql.org/message-id/CAM3SWZThwrKtvurf1aWAiH8qThGNMZAfyDcNw8QJu7pqHk5AGQ#mail.gmail.com