OrientDB - Create Edges if not exist with subselect - orientdb

I have a unique constraint on edge:
CREATE CLASS hasAssignee extends E
CREATE PROPERTY hasAssignee.out LINK Assignable
CREATE PROPERTY hasAssignee.in LINK User
CREATE INDEX UniqueHasAssignee ON hasAssignee(out,in) UNIQUE
I want to multiple create edges if they don't yet exist in one query. If they do exist, then either replace them, or simply don't add them. Something like that, but without possibility of errors:
CREATE EDGE hasAssignee FROM ( SELECT FROM Project WHERE id=:id ) TO ( SELECT FROM User WHERE login in :login )
UPSERT from what I've noticed can deal with only one edge at a time (at least I couldn't produce query parsable by studio)
I see a few possibe solutions (but I don't see anything like it in docs):
something like CREATE IFNOTEXISTS
Soft indexes, that don't throw an error, simply don't create additional edges
Remove unique index, create duplicates, and somehow remove duplicates after each query (?)
Is it possible to do something like that in orientdb? Should I fill out a request?

There is an open feature request for this:
https://github.com/orientechnologies/orientdb/issues/4436
My suggestion is to vote on it

Related

Alternative way to list all part tables that have existing entries/tuples

I have one single master Subject table which have several part tables. One convenient thing about Table.delete() is that is displays a prompt with all existing entries that were created under that Subject. Aside from delete() is there an alternative way to print what part table entries were created under a single Subject entry?
Thank you
To get the entry count of all the part tables restricted by some restriction (e.g. subject_name), you can do something like this:
restriction = {'subject_name': 'my_star_subject'}
for part_table in Subject.parts(as_objects=True):
part_table_query = part_table & restriction
print(f'{part_table.table_name}: len(part_table_query)')
It's a slow process, but I think I would do the following to see where entries were:
(subject.Subject & 'subject="<NAME>"').descendants(as_objects=True)
Not sure if you'd be better off with children (1 level down) or descendants (all the way down). delete gives the full set of descendants, using table.delete_quick(get_count=True).
EDIT:
To just get the counts, you might want:
[print(i.table_name,len(i)) for i in (subject.Subject & 'subject="<NAME>"').descendants(as_objects=True)]

Filter and display database audit / changelog (activity stream)

I'm developing an application with SQLAlchemy and PostgreSQL. Users of the system modify data in 8 or so tables. Consider this contrived example schema:
I want to add visible logging to the system to record what has changed, but not necessarily how it has changed. For example: "User A modified product Foo", "User A added user B" or "User C purchased product Bar". So basically I want to store:
Who made the change
A message describing the change
Enough information to reference the object that changed, e.g. the product_id and customer_id when an order is placed, so the user can click through to that entity
I want to show each user a list of recent and relevant changes when they log in to the application (a bit like the main timeline in Facebook etc). And I want to store subscriptions, so that users can subscribe to changes, e.g. "tell me when product X is modified", or "tell me when any products in store S are modified".
I have seen the audit trigger recipe, but I'm not sure it's what I want. That audit trigger might do a good job of recording changes, but how can I quickly filter it to show recent, relevant changes to the user? Options that I'm considering:
Have one column per ID type in the log and subscription tables, with an index on each column
Use full text search, combining the ID types as a tsvector
Use an hstore or json column for the IDs, and index the contents somehow
Store references as URIs (strings) without an index, and walk over the logs in reverse date order, using application logic to filter by URI
Any insights appreciated :)
Edit It seems what I'm talking about it an activity stream. The suggestion in this answer to filter by time first is sounding pretty good.
Since the objects all use uuid for the id field, I think I'll create the activity table like this:
Have a generic reference to the target object, with a uuid column with no foreign key, and an enum column specifying the type of object it refers to.
Have an array column that stores generic uuids (maybe as text[]) of the target object and its parents (e.g. parent categories, store and organisation), and search the array for marching subscriptions. That way a subscription for a parent category can match a child in one step (denormalised).
Put a btree index on the date column, and (maybe) a GIN index on the array UUID column.
I'll probably filter by time first to reduce the amount of searching required. Later, if needed, I'll look at using GIN to index the array column (this partially answers my question "Is there a trick for indexing an hstore in a flexible way?")
Update this is working well. The SQL to fetch a timeline looks something like this:
SELECT *
FROM (
SELECT DISTINCT ON (activity.created, activity.id)
*
FROM activity
LEFT OUTER JOIN unnest(activity.object_ref) WITH ORDINALITY AS act_ref
ON true
LEFT OUTER JOIN subscription
ON subscription.object_id = act_ref.act_ref
WHERE activity.created BETWEEN :lower_date AND :upper_date
AND subscription.user_id = :user_id
ORDER BY activity.created DESC,
activity.id,
act_ref.ordinality DESC
) AS sub
WHERE sub.subscribed = true;
Joining with unnest(...) WITH ORDINALITY, ordering by ordinality, and selecting distinct on the activity ID filters out activities that have been unsubscribed from at a deeper level. If you don't need to do that, then you could avoid the unnest and just use the array containment #> operator, and no subquery:
SELECT *
FROM activity
JOIN subscription ON activity.object_ref #> subscription.object_id
WHERE subscription.user_id = :user_id
AND activity.created BETWEEN :lower_date AND :upper_date
ORDER BY activity.created DESC;
You could also join with the other object tables to get the object titles - but instead, I decided to add a title column to the activity table. This is denormalised, but it doesn't require a complex join with many tables, and it tolerates objects being deleted (which might be the action that triggered the activity logging).

Insert record in table if does not exist in iPhone app

I am obtaining a json array from a url and inserting data into a table. Since the contents of the url are subject to change, I want to make a second connection to a url and check for updates and insert new records in y table using sqlite3.
The issues that I face are:
1) My table doesn't have a primary key
2) The url lists the changes on the same day. Hence, if I run my app multiple times, when I insert values in my database, I get duplicate entries. I want to keep a check for the day duplicated entries that should be removed. The problem can be solved by adding a constraint, but since the url itself has duplicated values, I find it difficult.
The only way I can see you can do it if you have no primary key or something you can use that is unique to each record, is when you get your new data in you go through the new entries where for each one you check if the exact same data exists in the database already. If it doesn't then you add it, if it does then you skip over it.
You could even do something like create a unique key yourself for each entry which is a concatenation of each column of the table. That way you can quickly do the check for if the entry already exists in the database.
I see two possibilities depending on your setup:
You have a column setup as UNIQUE (this can be through a PRIMARY KEY or not). In this case, you can use the ON CONFLICT clause:
http://www.sqlite.org/lang_conflict.html
If you find this construct a little confusing, you can instead use "INSERT OR REPLACE" or "INSERT OR IGNORE" as described here:
http://www.sqlite.org/lang_insert.html
You do not have a column setup as UNIQUE. In this case, you will need to SELECT first to verify for duplicate data, and based on the result INSERT, UPDATE, or do nothing.
A more common & robust way to handle this is to associate a timestamp with each data item on the server. When your app interrogates the server it provides the timestamp corresponding to the last time it synced. The server then queries its database and returns all values that are timestamped later than the timestamp provided by the app. Then it also returns a new timestamp value for the app to store, to use on the next sync.

Dynamic auditing of data with PostgreSQL trigger

I'm interested in using the following audit mechanism in an existing PostgreSQL database.
http://wiki.postgresql.org/wiki/Audit_trigger
but, would like (if possible) to make one modification. I would also like to log the primary_key's value where it could be queried later. So, I would like to add a field named something like "record_id" to the "logged_actions" table. The problem is that every table in the existing database has a different primary key fieldname. The good news is that the database has a very consistent naming convention. It's always, _id. So, if a table was named "employee", the primary key is "employee_id".
Is there anyway to do this? basically, I need something like OLD.FieldByName(x) or OLD[x] to get value out of the id field to put into the record_id field in the new audit record.
I do understand that I could just create a separate, custom trigger for each table that I want to keep track of, but it would be nice to have it be generic.
edit: I also understand that the key value does get logged in either the old/new data fields. But, what I would like would be to make querying for the history easier and more efficient. In other words,
select * from audit.logged_actions where table_name = 'xxxx' and record_id = 12345;
another edit: I'm using PostgreSQL 9.1
Thanks!
You didn't mention your version of PostgreSQL, which is very important when writing answers to questions like this.
If you're running PostgreSQL 9.0 or newer (or able to upgrade) you can use this approach as documented by Pavel:
http://okbob.blogspot.com/2009/10/dynamic-access-to-record-fields-in.html
In general, what you want is to reference a dynamically named field in a record-typed PL/PgSQL variable like 'NEW' or 'OLD'. This has historically been annoyingly hard, and is still awkward but is at least possible in 9.0.
Your other alternative - which may be simpler - is to write your audit triggers in plperlu, where dynamic field references are trivial.

Create a new FileMaker layout showing unique records based on one field and a count for each

I have a table like this:
Application,Program,UsedObject
It can have data like this:
A,P1,ZZ
A,P1,BB
A,P2,CC
B,F1,KK
I'd like to create a layout to show:
Application,# of Programs
A,2
B,1
The point is to count the distinct programs.
For the life of me I can't make this work in FileMaker. I've created a summary field to count programs resetting after each group, but because it doesn't eliminate the duplicate programs I get:
A,3
B,1
Any help much appreciated.
Create a a summary field as:
cntApplicaiton = Count of Application
Do this by going into define fields, create a field called cntApplication, type summary. In the options dialogue make the summary field a count on application
Now create a new layout with a subsummary part and nobody. The subsummary should be sorted on Application. Put the Application and cntApplication fields in subsummary. If you enter browse mode and sort by Application you ought to get the data you want.
You can also create a calc field with the formula
GetSummary(cntApplication; Application)
This will allow you to use the total number of Applications with in a record
Since I also generate the data in this form, the solution I've adopted is to fill two tables in FileMaker. One provides the summary view, the other the detailed view.
I think that your problem is down to dupliate records and an inadequate key.
Create a text field called "App_Prog". In the options box set it to an auto-enter calc, unchecking the 'Do not replace...' option, and use the following calc:
Application & "_" & Program
Now create a self join to the table using App_Prog as the field on both sides, and call this 'MatchingApps'.
Now, create (if you don't alread have one) a unique serial number field, 'Counter' say, and make sure that you enter a value in each record. (Find all, click in the field, and use serial number option in'Replace Field Contents...')
Now add a new calc field - Is_Duplicate with the following calc...
If (Counter = MatchingApps::Counter; "Master Record" ; "Duplicate")
Finally, find all, click in the 'Application field, and use 'Replace Field Contents...' with a calculation to force the auto-enter calc for 'App_Prog' to come up with a value.
Where does this get you? You should now have a set of records that are marker either "Master Record" or "Duplicate". Do a find on "Master Record", and then you can perform your summary (by Application) to do a count of distinct application-program pairs.
If you have access to custom functions (you need FileMaker Pro Advanced), I'd do it like this:
Add the RemoveDuplicates function as found here (this is a recursive function that takes a list of strings and returns a list of unique values).
In the relationships graph, add another occurrence of your table and add an Application = Application relationship.
Create a calculated field in the table with the calculation looking something like this:
ValueCount(RemoveDuplicates(List(TABLE2::Program)))
You'll find that each record will contain the number of distinct programs for the given application. Showing a summary for each application should be relatively trivial from here.
I think the best way to do this is to create a separate applications table. So as you've given the data, it would have two records, one for A and one for B.
So, with the addition of an Applications table and your existing table, which I'll call Objects, create a relationship from Applications to Objects (with a table occurrence called ObjectsParent) based on the ApplicationName as the match field. Create a self join relationship between Objects and itself with both Application and Program as the match fields. I'll call one of the "table occurrences" ObjectsParent and the other ObjectsChildren. Make sure that there's a primary key field in Objects that is set to auto-enter a serial number or some other method to ensure uniqueness. I'll call this ID.
So your relationship graph has three table occurrences:
Applications::Applicaiton = ObjectsParent::Application
ObjectsParent::Application = ObjectsChildren::Application, ObjectsParent::Program = ObjectsChildren::Program
Now create a calculation field in Objects, and calculating from the context of ObjectsParent, give it the following formula:
AppCount = Count( ObjectsChildren::ID )
Create a calculation field in Applications and calculating from the context of the table occurrence you used to relate it to ObjectsParent with the following formula:
AppCount = ObjectsParent::AppCount
The count field in Objects will have the same value for every object with the same application, so it doesn't matter which one you get this data from.
If you now view the data in Applications in list view, you can place the Applications::Application and Applications::AppCount fields on the layout and you should get what you've requested.