Search Logic removing records with no association from results when ordering by that association - searchlogic

I'm using search logic to filter and order my results but it removes records from my results when I order by a association and when that association is not always present for all records.
For example say I have a user model which can have one vehicle model but does not have to, if I have a results table where you can order by the users vehicles make I would hope all users without a vehicle record would be considered empty strings and therefore ordered all at the beginning followed by the other user records which have vehicles ordered by the make name.
Unfortunately all the user records which do not have a vehicle are removed from the results.
Is there anyway round this and still use search logic as I find it extremely useful

I think you'll have to explicitly assign a default vehicle that has an empty name

Related

Sqlalchemy way to handle select based on outdated data

Assuming I have a table sets with a field filters, containing array of key - value mappings, table items to which select query must be applied to extract rows based on these filters, and associated table for M:M relations to link each set with each item. I am seeking for a method or mechanism to cancel select query if sets.filters were updated, otherwise M:M relation will be built invalid as based on yet not refreshed filters.
The concrete scenario when a problem takes place is:
Receive file with items data, to parse, and insert into items returning new relevant ids(primary keys here);
After insertion, select from relevant sets for filters;
Take items ids and select from items using filters;
Update M:M association table for all the items returned at step 3.
So, unfortunately between step 3 and 4 or even earlier, API call makes an update on one of the sets rows, changing its filters. As the result - M:M table is invalid, because one filter was changed(lets say the filters contained kind of weight <= 100 kilos expression, however after the mentioned update it has become weight <= 50 kilos, so if there are some new items with weight greater than 50, those items ids should not be in M:M table, obviously).
Is there some efficient way to cancel select query from items during transaction? Or maybe there is a strong query to use. My idea is to rollback changes post-factum, checking sets.modified_at column. But it seems as doing additional job by wasting disk and cpu time.

DynamoDB model that supports queries on any given attribute

The application we're designing has a function where users can dynamically add new elements to an entity that then need to be efficiently searched. The number of these elements is essentially unlimited. Our team has been looking at DynamoDB as a data store option, and we've been wrestling with the key/value model and how to get this dynamic data under an index for efficient querying.
I think I have a single-table solution that handles the problem elegantly and also allows for querying on any given attribute in the data store, but am disturbed that I can't find an example of it anywhere else. Hopefully it's not fundamentally flawed in some way - I would appreciate any critique!
The model is essentially the Entity-Attribute-Value approach used for adding dynamic or sparse data to RDBMs. So instead of storing different entities/objects in a DynamoDB table like so:
PK SK SK-1 SK-2 SK-3 SK-N... PK SK SK-1 SK-N...
Key Key Key Key --> Name Money
Entity Id Value Value Value Value Person 22 Fred 30000
... which lets me query things like "all persons where name = Fred" but where you would eventually run out of sort key indexes and you would need to know which index goes with which key before you query, the data could be stored in EAV format like so:
PK SK & GSI-PK GSI-SK PK SK & GSI-PK GSI-SK
Id Entity#Key Value 22 Person#Name Fred
Id Entity#Key Value --> 22 Person#Money 30000
Id Entity#Key Value 22 Person#Sex M
Id Entity#Key Value 22 Person#DOB 09/00
Now, with one global secondary index (GSI-1 PK over Entity.Key and GSI-1 SK over Value) I can do a range search on any value for any key and get a list of Ids that match. Users can add their attributes or even entirely new entities and have them persisted in a way that's instantly indexed without us having to revamp the DynamoDB schema.
The one major downside to this approach that I can think of is that data returned from a query on an Entity#Key-Value only contains values for that key and the entity Id, not the entire entity. That's fine for charts and graphs but a problem if you want to get a grid-type result with one query. I also worry about hot partition keys on the index, but hopefully we could solve that with intelligent write sharding.
That's pretty much it. With a few tweaks the model can be extended to support the logging of all changes on each key and allow some nice time series queries against those changes, but my question is if anyone has found it useful to take an EAV type approach to a KV store like DynamoDB, or if there's another way to handle querying a dynamic schema?
You can have pk as the id of the entity. And then a sort key of {attributeName}. You may still want to have the base entity with fields like createdAt, etc.
So you might have:
PK SORT Attributes:
#Entity#22 #Entity#Details createdAt=2020
#Entity#22 #Attribute#name key=name value=Fred
#Entity#22 #Attribute#money key=money value=30000
To get all the attributes of an entity you simply do a query with no filter of pk={id}. You cannot dynamically sort by every given attribute, this is exactly what DynamoDB is not good at, I repeat! That case is exactly what NOSQL performs poorly at.
What you can do is use streaming to do aggregation. So you can for instance store the top 10 wealthiest people:
PK SORT Attributes:
#Money#Highest #1 id=#Entity#22 value=30000
#Money#Highest #2 id=#Entity#52 value=30000
Which you would calculate in a DynamoDB Streams. But you couldn't dynamically index values, DynamoDB works by effectively copying data from one form to another so that it can be efficiently retrieved. So you would be copying your entire database for each new attribute you wanted to search by, or otherwise you would have to use Scans and that wouldn't make any sense to do because you would get no benefit to using DynamoDB if all you ever did was do Scans all the time.
Your processes need to be very well understood to make good use of DynamoDb, if you want to index data at will, and do all sorts of different queries, you probably want an SQL database or elasticsearch.

#BatchFetch type JOIN

I'm confused about this annotation for an entity field that is of type of another entity:
#BatchFetch(value = BatchFetchType.JOIN)
In the docs of EclipseLink for BatchFetch they explain it as following:
For example, consider an object with an EMPLOYEE and PHONE table in
which PHONE has a foreign key to EMPLOYEE. By default, reading a list
of employees' addresses by default requires n queries, for each
employee's address. With batch fetching, you use one query for all the
addresses.
but I'm confused about the meaning of specifying BatchFetchType.JOIN. I mean, doesn't BatchFetch do a join in the moment it retrieves the list of records associated with employee? The records of address/phone type are retrieved using the foreign key, so it is a join itself, right?
The BatchFetch type is an optional parameter, and for join it is said:
JOIN – The original query's selection criteria is joined with the
batch query
what does this means? Isn't the batch query a join itself?
Joining the relationship and returning the referenced data with the main data is a fetch join. So a query that brings in 1 Employee that has 5 phones, results in 5 rows being returned, with the data in Employee being duplicated for reach row. When that is less ideal, say a query over 1000 employees, you resort to a separate batch query for these phone numbers. Such a query would run once to return 1000 employee rows, and then run a second query to return all employee phones needed to build the read in employees.
The three batch query types listed here then determine how this second batch query gets built. These will perform differently based on the data and database tuning.
JOIN - Works much the same away a fetch join would, except it only returns the Phone data.
EXISTS - This causes the DB to execute the initial query on Employees, but uses the data in an Exists subquery to then fetch the Phones.
IN - EclipseLink agregates all the Employee IDs or values used to reference Phones, and uses them to filter Phones directly.
Best way to find out is always to try it out with SQL logging turned on to see what it generates for your mapping and query. Since these are performance options, you should test them out and record the metrics to determine which works best for your application as its dataset grows.

Filter and display database audit / changelog (activity stream)

I'm developing an application with SQLAlchemy and PostgreSQL. Users of the system modify data in 8 or so tables. Consider this contrived example schema:
I want to add visible logging to the system to record what has changed, but not necessarily how it has changed. For example: "User A modified product Foo", "User A added user B" or "User C purchased product Bar". So basically I want to store:
Who made the change
A message describing the change
Enough information to reference the object that changed, e.g. the product_id and customer_id when an order is placed, so the user can click through to that entity
I want to show each user a list of recent and relevant changes when they log in to the application (a bit like the main timeline in Facebook etc). And I want to store subscriptions, so that users can subscribe to changes, e.g. "tell me when product X is modified", or "tell me when any products in store S are modified".
I have seen the audit trigger recipe, but I'm not sure it's what I want. That audit trigger might do a good job of recording changes, but how can I quickly filter it to show recent, relevant changes to the user? Options that I'm considering:
Have one column per ID type in the log and subscription tables, with an index on each column
Use full text search, combining the ID types as a tsvector
Use an hstore or json column for the IDs, and index the contents somehow
Store references as URIs (strings) without an index, and walk over the logs in reverse date order, using application logic to filter by URI
Any insights appreciated :)
Edit It seems what I'm talking about it an activity stream. The suggestion in this answer to filter by time first is sounding pretty good.
Since the objects all use uuid for the id field, I think I'll create the activity table like this:
Have a generic reference to the target object, with a uuid column with no foreign key, and an enum column specifying the type of object it refers to.
Have an array column that stores generic uuids (maybe as text[]) of the target object and its parents (e.g. parent categories, store and organisation), and search the array for marching subscriptions. That way a subscription for a parent category can match a child in one step (denormalised).
Put a btree index on the date column, and (maybe) a GIN index on the array UUID column.
I'll probably filter by time first to reduce the amount of searching required. Later, if needed, I'll look at using GIN to index the array column (this partially answers my question "Is there a trick for indexing an hstore in a flexible way?")
Update this is working well. The SQL to fetch a timeline looks something like this:
SELECT *
FROM (
SELECT DISTINCT ON (activity.created, activity.id)
*
FROM activity
LEFT OUTER JOIN unnest(activity.object_ref) WITH ORDINALITY AS act_ref
ON true
LEFT OUTER JOIN subscription
ON subscription.object_id = act_ref.act_ref
WHERE activity.created BETWEEN :lower_date AND :upper_date
AND subscription.user_id = :user_id
ORDER BY activity.created DESC,
activity.id,
act_ref.ordinality DESC
) AS sub
WHERE sub.subscribed = true;
Joining with unnest(...) WITH ORDINALITY, ordering by ordinality, and selecting distinct on the activity ID filters out activities that have been unsubscribed from at a deeper level. If you don't need to do that, then you could avoid the unnest and just use the array containment #> operator, and no subquery:
SELECT *
FROM activity
JOIN subscription ON activity.object_ref #> subscription.object_id
WHERE subscription.user_id = :user_id
AND activity.created BETWEEN :lower_date AND :upper_date
ORDER BY activity.created DESC;
You could also join with the other object tables to get the object titles - but instead, I decided to add a title column to the activity table. This is denormalised, but it doesn't require a complex join with many tables, and it tolerates objects being deleted (which might be the action that triggered the activity logging).

Swap the order of items in a SQLite database

I retrieve an ordered list of items from a table of items in a Sqlite Database. How can I swap the id so the order of two items in the Sqlite database table?.
The id shouldn't determine position or ordering. It should be an immutable identifier.
If you need to represent order in a database you need to create another orderNumber column. A couple options are (1) either have values that span a range or (2) have a pointer to next (like a linked list).
For ranges: Spanning a range helps you avoid rewriting the orderNumber column for all items after the insert point. For example, in the range, insert first gets 1, insert 2nd gets max range, insert 3rd between first and second gets mid-range number - if you reposition you have to assign mid-points of the items it's between. One downside is if the list gets enough churn (minimized by a large span) you may have to rebalance the ranges. The pro of this solution is you can get the ordered list just by ordering by this column in the sql statement.
For linked list: If the database has a next column that points to the id that's after it in order, you need to update a couple rows to insert something. Upside is it's simple. Downside is you can't order in the sql statement - you're relying on the code getting the list to sort it.
One other variation is you could pull the ordered list data out of that table altogether. For example, you could have an ordered list table that has listid, itemid, orderedNumber. That allows you to have one or multiple logical ordered lists of the items in that table it references.
Some other references:
How to store ordered items which often change position in DB
Best way to save a ordered List to the Database while keeping the ordering
https://dba.stackexchange.com/questions/5683/how-to-design-a-database-for-storing-a-sorted-list