Is there any way to detect if an index was generated automatically by the DBMS when adding,
e.g. the primary key or a unique constraint?
At the moment I try to fetch all the indexes of an table by using jdbc metadata. But the result here contains also the implizit generated indexes. I need now the possibility to detect if a dedicated index was auto-generated or not.
I've already tried to get these information from tables like pg_class or pg_index. But no success.
I don't think there is any way to distinguish those indexes - after all there is no difference between an index created automatically and one that was created manually.
The only way I can think of is to stick to some naming convention for your "manual" indexes. Then you could filter out all those that do not comply with that naming convention.
Related
In an application that I am currently working on, we need to ensure uniqueness with respect to the tuple of three properties in a specific kind. As such when creating a new entity, we need to ensure that no entity of that kind with the given tuple exists.
My naïve approach to this problem, was to create a simple query that filters on equality based on the three fields. If an entity with the given fields was found, the operation would abort, otherwise a new entity with those fields and other related data would be inserted. However, when trying to insert many entities in parallel, transaction contention would arise.
However, as soon as I added a composite index of those three properties, no contention occurs. I changed nothing in the code, I merely added a composite index for those fields.
I have been digging through all the documentation and searched around for anyone who has had a similar issue, but nobody has ever mentioned this "workaround".
Have I missed something? Perhaps discovered something? Or this is expected behavior; are the built-in indices not enough?
The main document you'd want to look at is https://cloud.google.com/datastore/docs/concepts/optimize-indexes .
In your case it looks like your merge join ends up locking a number of rows while looking for no-match. However, with the composite index you are only looking up the index entry needed for your query. Thus there is less contention with the composite index vs using a merge join for the query.
I have a large number of records to iterate over (coming from an external data source) and then insert into a mongo db.
I do not want to allow duplicates. How can this be done in a way that will not affect performance.
The number of records is around 2 million.
I can think of two fairly straightforward ways to do this in mongodb, although a lot depends upon your use case.
One, you can use the upsert:true option to update, using whatever you define as your unique key as the query for the update. If it does not exist it will insert it, otherwise it will update it.
http://docs.mongodb.org/manual/reference/method/db.collection.update/
Two, you could just create a unique index on that key and then insert ignoring the error generated. Exactly how to do this will be somewhat dependent on language and driver used along with version of mongodb. This has the potential to be faster when performing batch inserts, but YMMV.
2 million is not a huge number that will affect performance, split your records fields into diffent collections will be good enough.
i suggest create a unique index on your unique key before insert into the mongodb.
unique index will filter redundant data and lose some records and you can ignore the error.
I have a couple of fields in my documents that I want to make sure they are unique across a collection if they store non-null values, but I will never need to query for them - e.g. md5 hash of a file. As far as I've checked in the MongoDB documentation, for this situation it is suggested to use a unique and sparse index. My question is: is there any way to avoid creating an index, given the fact that I will never query on the md5 field of any document?
Since you will not be querying for these fields it is very difficult to say.
You could use query magic but then you might not have the values available to you, otherwise your only option is to enforce this client side which could create race conditions.
There's no way to guarantee uniqueness without creating indexes, as MongoDB doesn't provide any mechanism to enforce constraints.
I have created a SQL DB and examined the integrity. Now I wanted to put these tables in mongoDB, and I've kept it in the mapping rules. Table = collection, row = doc, and so on.
But how does one set about following in mongoDB:
create table pruefen
( MatrNr integer references Studenten on delete cascade,
VorlNr integer references Vorlesungen,
PersNr integer references Professoren on delete set null,
Note numeric(2,1) check (Note between 0.7 and 5.0),
primary key (MatrNr, VorlNr));
DBRef, I've tried but is not a foreign key replacement.
And if the application is to take over as it would look then?
MongoDB has no cascading deletes. When your application deletes data, it is also responsible for removing any referenced objects itself and any references to the deleted document. But usually when you use on delete in a relational database, you have a case of composition where one parent object owns one or more child objects, and the child objects are meaningless without the parent. In that situation, MongoDB encourages embedding instead of referencing. That means that you create an array in the parent object, and put the complete child documents into that array instead of keeping them in an own collection. That way they will be deleted together with the parent, because they are a part of it.
While keeping more than one value in a field is an absolute no-go in SQL, there is nothing wrong with that in MongoDB. That's because the MongoDB query language can easily work with arrays and embedded objects. You can even create indices on fields of sub-documents in arrays, so you can easily search for objects which are embedded in other objects.
When you still want to reference objects from another collection, you can either use a DBRef, or you can also use any other unique identifier (uniqueness is one of the few things which can be enforced by MongoDB. To do so, create an unique index with the createIndex command). But MongoDB does not enforce consistency in this case. You can create DBRefs which point to non-existing ObjectIds and when the document the DBRef points to is deleted, nothing will happen. The application is responsible for making sure that when it deletes a document, all documents which reference it are updated.
Constraints can not be enforced by MongoDB either. It can't even enforce a specific type for a field, due to the schemaless nature of MongoDB. Again, your application is responsible for making sure that the data it puts into mongodb is following specific specifications. When you want to automatize this, there are object-relational mapping frameworks for MongoDB for many programming languages available.
To wrap it all up: MongoDB is not as "smart" as SQL databases. It doesn't do much on its own. It does what it is told to do by the application, not more and not less. But that's the reason why it's so fast (no expensive consistency checks) and flexible (no database modifications necessary to implement new features).
One of the great things about relational database is that it is really good at keeping the data consistent within the database. One of the ways it does that is by using foreign keys. A foreign key constraint is that let's say there's a table with some column which will have a foreign key column with values from another table's column. In MongoDB, there's no guarantee that foreign keys will be preserved. It's upto the programmer to make sure that the data is consistent in that manner. This maybe possible in future versions of MongoDB but today, there's no such option. The alternative for foreign key constraints is embedding data.
My application needs configurable columns , and titles of these columns get configured in the begining, If relation database I would have created generic columns in table like CodeA, CodeB etc for this need because it helps queering on these columns (Code A = 11 ) it also helps in displaying the values (if that columns stores code and value) but now I am using Non Relational database Datastore (and I am new to it), should I follow the same old approach or I should use collection (Key Value pair) type of structure .
There will be lot of filters on these columns. Please suggest
What you've just described is one of the classic scenarios for a Key-Value database. The limitation here is that you will not have many of the set-based tools you're used to.
Most of the K-V databases are really good at loading one "record" or small set thereof. However, they don't tend to be any good at loading anything that may require a join. Given that you're using AppEngine, you probably appreciate this limitation. But it's worth stating.
As an important note, not all K-V database will allow you to "select by any column". Many K-V stores actually only allow for selection by a primary key. If you take a look at MongoDB, you'll find that you can query any column which sounds like a necessary feature.
I would suggest using key/value pairs where keys will act as your column names and value will be their data.