upserting data in TitanDB - titan

I am using TitanDB with Cassandra as storage and ElasticSearch as Index. I found out that everytime you add Vertex in TitanDB, it generates a unique identifier.
All the elements I am adding into it, has already an identifier, this has been added as property of the Vertex.
My question is:
If I will add again a Vertex with the same id, How does TitanDB recognise that it is a duplicate?
Is it possible update element on duplicate key ? Or you have first to make a query within TitanDB? If so, isn't it a terrible waste of time doing so?

There is no direct method for "upsert". As noted above, in the comment on the question, the "getOrCreate" approach is the standard way to do this. So, "yes" you would need to do a lookup via index on your identifier property.
Titan can detect duplicates if you establish your indexed property with a unique constraint:
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
mgmt.buildIndex('byNameUnique', Vertex.class).addKey(name).unique().buildCompositeIndex()
mgmt.commit()
If the same property value is applied twice now, an exception will be generated on commit of the transaction. Use unique indexes wisely as they will affect performance especially if you expect heavy contention on the property that the unique is applied to.

Related

DynamoDB model that supports queries on any given attribute

The application we're designing has a function where users can dynamically add new elements to an entity that then need to be efficiently searched. The number of these elements is essentially unlimited. Our team has been looking at DynamoDB as a data store option, and we've been wrestling with the key/value model and how to get this dynamic data under an index for efficient querying.
I think I have a single-table solution that handles the problem elegantly and also allows for querying on any given attribute in the data store, but am disturbed that I can't find an example of it anywhere else. Hopefully it's not fundamentally flawed in some way - I would appreciate any critique!
The model is essentially the Entity-Attribute-Value approach used for adding dynamic or sparse data to RDBMs. So instead of storing different entities/objects in a DynamoDB table like so:
PK SK SK-1 SK-2 SK-3 SK-N... PK SK SK-1 SK-N...
Key Key Key Key --> Name Money
Entity Id Value Value Value Value Person 22 Fred 30000
... which lets me query things like "all persons where name = Fred" but where you would eventually run out of sort key indexes and you would need to know which index goes with which key before you query, the data could be stored in EAV format like so:
PK SK & GSI-PK GSI-SK PK SK & GSI-PK GSI-SK
Id Entity#Key Value 22 Person#Name Fred
Id Entity#Key Value --> 22 Person#Money 30000
Id Entity#Key Value 22 Person#Sex M
Id Entity#Key Value 22 Person#DOB 09/00
Now, with one global secondary index (GSI-1 PK over Entity.Key and GSI-1 SK over Value) I can do a range search on any value for any key and get a list of Ids that match. Users can add their attributes or even entirely new entities and have them persisted in a way that's instantly indexed without us having to revamp the DynamoDB schema.
The one major downside to this approach that I can think of is that data returned from a query on an Entity#Key-Value only contains values for that key and the entity Id, not the entire entity. That's fine for charts and graphs but a problem if you want to get a grid-type result with one query. I also worry about hot partition keys on the index, but hopefully we could solve that with intelligent write sharding.
That's pretty much it. With a few tweaks the model can be extended to support the logging of all changes on each key and allow some nice time series queries against those changes, but my question is if anyone has found it useful to take an EAV type approach to a KV store like DynamoDB, or if there's another way to handle querying a dynamic schema?
You can have pk as the id of the entity. And then a sort key of {attributeName}. You may still want to have the base entity with fields like createdAt, etc.
So you might have:
PK SORT Attributes:
#Entity#22 #Entity#Details createdAt=2020
#Entity#22 #Attribute#name key=name value=Fred
#Entity#22 #Attribute#money key=money value=30000
To get all the attributes of an entity you simply do a query with no filter of pk={id}. You cannot dynamically sort by every given attribute, this is exactly what DynamoDB is not good at, I repeat! That case is exactly what NOSQL performs poorly at.
What you can do is use streaming to do aggregation. So you can for instance store the top 10 wealthiest people:
PK SORT Attributes:
#Money#Highest #1 id=#Entity#22 value=30000
#Money#Highest #2 id=#Entity#52 value=30000
Which you would calculate in a DynamoDB Streams. But you couldn't dynamically index values, DynamoDB works by effectively copying data from one form to another so that it can be efficiently retrieved. So you would be copying your entire database for each new attribute you wanted to search by, or otherwise you would have to use Scans and that wouldn't make any sense to do because you would get no benefit to using DynamoDB if all you ever did was do Scans all the time.
Your processes need to be very well understood to make good use of DynamoDb, if you want to index data at will, and do all sorts of different queries, you probably want an SQL database or elasticsearch.

PostgreSQL array of elements that each are a foreign key

I am attempting to create a DB for my app and one thing I'd like to find the best way of doing is creating a one-to-many relationship between my Users and Items tables.
I know I can make a third table, ReviewedItems, and have the columns be a User id and an Item id, but I'd like to know if it's possible to make a column in Users, let's say reviewedItems, which is an integer array containing foreign keys to Items that the User has reviewed.
If PostgreSQL can do this, please let me know! If not, I'll just go down my third table route.
It may soon be possible to do this: https://commitfest.postgresql.org/17/1252/ - Mark Rofail has been doing some excellent work on this patch!
The patch will (once complete) allow
CREATE TABLE PKTABLEFORARRAY (
ptest1 float8 PRIMARY KEY,
ptest2 text
);
CREATE TABLE FKTABLEFORARRAY (
ftest1 int[],
FOREIGN KEY (EACH ELEMENT OF ftest1) REFERENCES PKTABLEFORARRAY,
ftest2 int
);
However, author currently needs help to rebase the patch (beyond my own ability) so anyone reading this who knows Postgres internals please help if you can.
No, this is not possible.
PostgreSQL is a relational DBMS, operating most efficiently on properly normalized data models. Arrays are not relational data structures - by definition they are sets - and while the SQL standard supports defining foreign keys on array elements, PostgreSQL currently does not support it. There is an (dormant? no activity on commitfest since February 2021) effort to implement this - see this answer to this same question - so the functionality might one day be supported.
For the time being you can, however, build a perfectly fine database with array elements linking to primary keys in other tables. Those array elements, however, can not be declared to be foreign keys and the DBMS will therefore not maintain referential integrity. Using an appropriate set of triggers (both on the referenced and referencing tables, as a change in either would have to trigger a check and possible update on the other) one would in principle be able to implement referential integrity over the array elements but the performance is unlikely to be stellar (because indexes would not be used, for instance).

MEAN Technology(mongodb,express js,angular js,node js)

In MEAN Technology,I need to develop the payroll application,In this I have two fields code and name,But code and name should be unique,how to code this in mongodb?
Are you serious?
But still, do you mean code and name should be unique individually or do you mean the combination of code+name should be unique.
There is a difference, if code and name should be unique individually it means there cannot be same code in two different DB. if the combination should be unique means for one code there could be different names and vice versa.
I am going to provide a subjective answer as you haven't shown what you have tried or what you wish to try.
either you user _id as code+name making sure that it would be unique.
or you can have different key for code and name in document having a unique key indexing on both of them. or in second case having a compound unique indexing.

CDC multiple insert/delete of the same identity value

I have a table T that contains an ID set as identity and primary key. I have enabled CDC on the table and then later added an XML field that I didn't care capturing so I did not do anything further (to recreate the capture table and/or migrate old capture data).
I now have a stored procedure that (among other things) updates only the newly created field (no other field) in table T. I notice that instead of recording an update (operation=3 followed by operation=4), CDC records a delete (operation=1) followed by an insert (operation=2) and all fields are the same (of course since none of them was updated)
I actually noticed this because I had the same identity value inserted and/or deleted more than once, which is not possible (unless identity_insert is on, which is not)
Why does CDC record operation=1 instead of 3 and operation=2 instead of 4?
Is this documented anywhere or is it a bug?
The reason you are seeing a Delete/insert pair (Operation number 1/2) as opposed to an update pair (3/4) is because you are updating a "set" of data that ALSO has a unique constraint on your column.
For SQL to make sense of this wihout violating the unique cosntraint, it deletes the row and reinserts it (with the "update").
More information on this. Its not an issue or a defect. its the way SQL works and CDC innocently logs it as it sees it. Remember, CDC is just a subscriber and replicates things as they happen.
If you have a need to see an update you may have to look for the 1/2 "pair" and not ONLY the operation code 3/4.
Some great articles:
Bounded Update is the term used to describe certain types of UPDATE statements from the publisher that will replicate as DELETE/INSERT pairs on the subscriber. We perform a bounded update for every set based update that changes a column that is part of a unique index or constraint. In other words, if an UPDATE statement touches more than one row and modifies a column that is has any UNIQUE constraints, the UPDATE statement is sent to the subscriber as a DELETE/INSERT pair ... read more here
https://support.microsoft.com/en-us/kb/238254

Insert record in table if does not exist in iPhone app

I am obtaining a json array from a url and inserting data into a table. Since the contents of the url are subject to change, I want to make a second connection to a url and check for updates and insert new records in y table using sqlite3.
The issues that I face are:
1) My table doesn't have a primary key
2) The url lists the changes on the same day. Hence, if I run my app multiple times, when I insert values in my database, I get duplicate entries. I want to keep a check for the day duplicated entries that should be removed. The problem can be solved by adding a constraint, but since the url itself has duplicated values, I find it difficult.
The only way I can see you can do it if you have no primary key or something you can use that is unique to each record, is when you get your new data in you go through the new entries where for each one you check if the exact same data exists in the database already. If it doesn't then you add it, if it does then you skip over it.
You could even do something like create a unique key yourself for each entry which is a concatenation of each column of the table. That way you can quickly do the check for if the entry already exists in the database.
I see two possibilities depending on your setup:
You have a column setup as UNIQUE (this can be through a PRIMARY KEY or not). In this case, you can use the ON CONFLICT clause:
http://www.sqlite.org/lang_conflict.html
If you find this construct a little confusing, you can instead use "INSERT OR REPLACE" or "INSERT OR IGNORE" as described here:
http://www.sqlite.org/lang_insert.html
You do not have a column setup as UNIQUE. In this case, you will need to SELECT first to verify for duplicate data, and based on the result INSERT, UPDATE, or do nothing.
A more common & robust way to handle this is to associate a timestamp with each data item on the server. When your app interrogates the server it provides the timestamp corresponding to the last time it synced. The server then queries its database and returns all values that are timestamped later than the timestamp provided by the app. Then it also returns a new timestamp value for the app to store, to use on the next sync.