In my table, I have several thousands of records. This table stores records of books. Let´s say, some books are special, so they are given an ID greater than 1500000. Other books, which are not special have classicaly IDs from 1 - 15879 (or whatever lesser than 1500000).
Now, I want to get the biggest ID of non-special book, using Entity Framework.
You need to filter your less that 1500000 books and get the Max() for that:
context.Books.Where(b => b.Id < 1500000).Max(b => b.Id);
Related
Context:
I have three Postgres tables:
authors - stores the id, author's full name, credentials, and awards
books - stores the id, title, book-length, summary, and an image of the front cover
authorBookRelations - connects Authors and Books by storing the author_id and book_id
An author can be connected to any book, but books are not connected. Books can have the same name, but each has its own id that is unique. Multiple authors can author a single book.
My question:
If I want to get all titles that match a given list of titles and are by a specific author what would be the best way to do that?
What I have so far:
Currently, I do two SELECT queries and a filtering function to "join" the two queries.
SELECT query #1 - get all of the book_ids associated with a particular author:
SELECT book_id FROM authorBookRelations WHERE author_id = 5
SELECT query #2 - get all of the titles that are in a given list of titles:
SELECT * FROM books WHERE title IN ('arbitraryTitle_1', arbitraryTitle_2, etc.)
Filter function (python) - filter titles for any that are not written by that specific author:
filtered_list = [x for x in query_2_results if x.id in query_1_results]
I get the correct books with this method, but can't help but feel that this is not a good way to do it/won't scale well. What would you suggest as a way to speed up this query? Instead of two separate db calls and a filtering function, could I do it all in one call by searching the list of titles against the filtered rows in table "books" that were filtered by the output from the query against authorBookRelations? ... that was horribly worded ... so something like this:
SELECT *
FROM (
SELECT book_id
FROM authorBookRelations
WHERE author_id = 5) AS foobar
WHERE title IN ('arbitraryTitle_1', arbitraryTitle_2, etc.)
UPDATE:
Trying out this seems to have cut my total query/processing time by half:
select *
from (select *
from books
where id in (
select book_id
from authorBookRelations
where author_id = 5
)) as foo
where foo.title in ('arbitraryTitle_1', 'arbitraryTitle_2', etc.)
The problem of performances will be on the "IN" operator, if the list has a great number of items...
For two or three sometime an index can be used by PG for seeking the data.
But when there is much more items, a scan will be the only solution...
If you want to speed up this query, just use a temporary table to INSERT your data into, the add an index and rewrite the query with a join between this temp table and your original query...
I am trying to create a simple DB of Customers/Products/Orders to practice my skills with aggregate and so on. I need some help on how to design the relation between those collections correctly (that i'll really be able to store the details in the right way).
I go for the minimal build and information-->
Customer:
(1) customer_id ,
(2)orders (this will be an array with all the order_id's this customer did)
Product:
(1) product_id
Order:
(1) order_id ,
(2) products (this will be an array with all the products in this order) + I wonder how I add quantity for each product ,
(3) user_id (this will store the user_id did this order) or i don't even need this field??
I hope someone could tell me if this is a right thinking, because I come from SQL and this what I would do there I guess
// I'm not using embedded document because in case i'll want to change a specific field value it would change in all places
The idea of the SaaS tool is to have dynamic tables with dynamic custom fields and values of different types, we were thinking to use "force.com/salesforce.com" example but is seems to be too complicated to maintain moving forward, also making some reports to create with a huge abstraction level, so we came up with simple idea but we have to be sure that this is kinda good approach.
This is the architecture we have today (in few steps).
Each tenant has it own separate database on the cluster (Postgres 12).
TABLE table, used to keep all of those tables as reference, this entity has ManyToOne relation to META table and OneToMany relation with DATA table.
META table is used for metadata configuration, has OneToMany relation with FIELDS (which has name of the fields as well as the type of field e.g. TEXT/INTEGER/BOOLEAN/DATETIME etc. and attribute value - as string, only as reference).
DATA table has ManyToOne relation to TABLES and 50 character varying columns with names like: attribute1...50 which are NULL-able.
Example flow today:
When user wants to open a TABLE DATA e.g. "CARS", we load the META table with all the FIELDS (to get fields for this query). User specified that he want to query against: Brand, Class, Year, Price columns.
We are checking by the logic, the reference for Brand, Class, Year and Price in META>FIELDS table, so we know that Brand = attribute2, Class = attribute 5, Year = attribute6 and Price = attribute7.
We parse his request into a query e.g.: SELECT [attr...2,5,6,7] FROM DATA and then show the results to user, if user decide to do some filters on it, based on this data e.g. Year > 2017 AND Class = 'A' we use CAST() functionality of SQL for example SELECT CAST(attribute6 AS int) AND attribute5 FROM DATA WHERE CAST(attribute6 AS int) > 2017 AND attribute5 = 'A';, so then we can actually support most principles of SQL.
However moving forward we are scared a bit:
Manage such a environment for more tenants while we are going to have more tables (e.g. 50 per customer, with roughly 1-5 mil per TABLE (5mil is maximum which we allow, for bigger data we have BigQuery) which is giving us 50-250 mil rows in single table DATA_X) which might affect performance of the queries, especially when we gave possibilities to manage simple WHERE statements (less,equal,null etc.) using some abstraction language e.g. GET CARS [BRAND,CLASS,PRICE...] FILTER [EQ(CLASS,A),MT(YEAR,2017)] developed to be similar to JQL (Jira Query Language).
Transactions lock, as we allow to batch upload CSV into the DATA_X so once they want to load e.g. 1GB of the data, it kinda locks the table for other systems to access the DATA table.
Keeping multiple NULL columns which can affect space a bit (for now we are not that scared as while TABLE creation, customer can decide how many columns he wants, so based on that we are assigning this TABLE to one of hardcoded entities DATA_5, DATA_10, DATA_15, DATA_20, DATA_30, DATA_50, where numbers corresponds to limitations of the attribute columns, and those entities are different, we also support migration option if they decide to switch from 5 to 10 attributes etc.
We are on super early stage, so we can/should make those before we scale, as we knew that this is most likely not the best approach, but we kept it to run the project for small customers which for now is working just fine.
We were thinking also about JSONB objects but that is not the option, as we want to keep it simple for getting the data.
What do you think about this solution (fyi DATA has PRIMARY key out of 2 tables - (ID,TABLEID) and built in column CreatedAt which is used form most of the queries, so there will be maximum 3 indexes)?
If it seem bad, what would you recommend as the alternative to this solution based on the details which I shared (basically schema-less RDBMS)?
IMHO, I anticipate issues when you wanted to join tables and also using cast etc.
We had followed the approach below that will be of help to you
We have a table called as Cars and also have a couple of tables like CarsMeta, CarsExtension columns. The underlying Cars table will have all the common fields for a ll tenant's. Also, we will have the CarsMeta table point out what are the types of columns that you can have for extending the Cars entity. In the CarsExtension table, you will have columns like StringCol1...5, IntCol1....5, LongCol1...10
In this way, you can easily filter for data also like,
If you have a filter on the base table, perform the search, if results are found, match the ids to the CarsExtension table to get the list of exentended rows for this entity
In case the filter is on the extended fields, do a search on the extension table and match with that of the base entity ids.
As we will have the extension table organized like below
id - UniqueId
entityid - uniqueid (points to the primary key of the entity)
StringCol1 - string,
...
IntCol1 - int,
...
In this case, it will be easy to do a join for entity and then get the data along with the extension fields.
In case you are having the table metadata and data being inferred from separate tables, it will be a difficult task to maintain this over long period of time and also huge volume of data.
HTH
I am making an application for a restaurant.
For some food items, there are some add-ons available - e.g. Toppings for Pizza.
My current design for Order Table-
FoodId || AddOnId
If a customer opts for multiple addons for a single food item (say Topping and Cheese Dip for a Pizza), how am I gonna manage?
Solutions I thought of -
Ids separated by commas in AddOnId column (Bad idea i guess)
Saving Combinations of all addon as a different addon in Addon Master Table.
Making another Trans table for only Addon for ordered food item.
Please suggest.
PS - I searched a lot for a similar question but cudnt find one.
Your relationship works like this:
(1 Order) has (1 or more Food Items) which have (0 or more toppings).
The most detailed structure for this will be 3 tables (in addition to Food Item and Topping):
Order
Order to Food Item
Order to Food Item to Topping
Now, for some additional details. Let's start flushing out the tables with some fields...
Order
OrderId
Cashier
Server
OrderTime
Order to Food Item
OrderToFoodItemId
OrderId
FoodItemId
Size
BaseCost
Order to Food Item to Topping
OrderToFoodItemId
ToppingId
LeftRightOrWhole
Notice how much information you can now store about an order that is not dependent on anything except that particular order?
While it may appear to be more work to maintain more tables, the truth is that it structures your data, allowing you many added advantages... not the least of which is being able to more easily compose sophisticated reports.
You want to model two many-to-many realtionships by the sound of it.
i.e. Many products (food items) can belong to many orders, and many addons can belong to many products:
Orders
Id
Products
Id
OrderLines
Id
OrderId
ProductId
Addons
Id
ProductAddons
Id
ProductId
AddonId
Option 1 is certainly a bad idea as it breaks even first normal form.
why dont you go for many-to-many relationship.
situation: one food can have many toppings, and one toppings can be in many food.
you have a food table and a toppings table and another FoodToppings bridge table.
this is just a brief idea. expand the database with your requirement
You're right, first one is a bad idea, because it is not compliant with normal form of tables and it would be hard to maintain it (e.g. if you remove some addon you would need to parse strings to remove ids from each row - really slow).
Having table you have already there is nothing wrong, but the primary key of that table will be (foodId, addonId) and not foodId itself.
Alternatively you can add another "id" not to use compound primary key.
I'm working on an iPhone app with a GAE backend. I currently have a database of ~8000 products and each product has 5 keywords, mined from reviews, that are the words used most often to describe the product. Once I deploy the app, I'd like to allow users to add new products, and add their 5 keywords to existing products. So, when "reviewing" an existing product, they would add their 5 words, and these would be reflected in the Top 5 words if they push a word over into the Top 5. These keywords will be selected via a large whitelist with indirect selection so I can control the user input. I'd like the application to scale to thousands of users without hitting my backend too hard.
My question is:
What's the most efficient database schema for keeping track of all the words for a product and calculating the top 5 for each product once it's updated?
My two ideas (which may be terrible):
Have a "words" column which contains a 2d array, one dimension is the word, the other is the count for that word. They would then be incremented/decremented as needed.
Have a database with each word as a column and each product as a row and the corresponding row/column would contain the count.
The easiest way to do this would be to have a 'tags' kind, defined something like this (you haven't specified a backend language, so I'm assuming Python):
class Tag(db.Model):
# Tags should be child entities of Products and have key name based on the tag
# eg, created with Tag(parent=a_product, key_name='awesome', ...)
count = db.IntegerProperty(required=True, default=0)
#classmethod
def increment_tags(cls, product, tag_names):
def _tx():
tags = cls.get_by_key_name(tag_names, parent=product)
for i, tag in enumerate(tags):
if tag is None:
# New tag
tags[i] = tag = cls(key_name=tag_names[i], parent=product)
tag.count += 1
db.put(tags)
return db.run_in_transaction(_tx)
#classmethod
def get_top_product_tags(cls, product, num=5):
return [x.key().name() for x
in cls.all().ancestor(product).order('-count').fetch(num)]
The increment_tags method increments the count property on all the relevant tags. Since they all have the same parent entity, they're in the same entity group, and it can do this transactionally, in a single transaction.
The get_top_product_tags method does a simple datastore query to find the num top ranked tags for a product.
You should use a normalized schema and let SQL and the database engine be your friend. Have a single table with a design like this:
create table KeywordUse
( AppID int
, UserID int
, Sequence int
, Word varchar(50) -- or whatever makes sense
)
You can also have an identity primary key if you like, but AppID + UserID + Sequence is a candidate key (i.e. the combination of these three must be unique).
To find the top 5 keywords for any app, do a SQL query like this:
select top 5
count(AppID) as Frequency -- If you have an identity PK count that instead.
, Word
from KeywordUse
where AppID = #AppIDVariable...
group by Word, AppID
order by count(AppID) desc
If you are really, really worried about performance you could denormalize the results of this query into a table that shows the words for each app. Then you'd have to work out how often to refresh that snapshot.
REVISED ANSWER:
As Nick Johnson so generously pointed out, aggregate functions are not available in GQL. However, the philosophy of my answer remains unchanged. Let the database engine do its job.
The table should be AppID, Word, and Frequency. (AppID and Word are the PK.) Then each use of the word would be added up as it is applied. Then, when you want to know the top five words for an app you select by AppID := #Value and order by Frequency (descending) with a LIMIT = 5.
You would need a separate table to track user keywords if that is important.