I have a json which I am storing it as jsonb in postgres.
{
"name": "Mr. Json",
"dept":{
"team":{
"aliases":["a1","a2","a3"],
"team_name": "xyz"
},
"type":"engineering",
"lead":"Mr. L"
},
"hobbies": ["Badminton", "Chess"],
"is_active": true
}
Have created a GIN index on the column
I need to do exact match queries like all rows containing type='engineering' and lead='Mr. L'.
I am currently doing containment queries like:
data #> '{"dept":{"type":"engineering"}}' and data #> '{"dept":{"lead":"Mr. L"}}'
I saw the query plan which shows GIN index is being used but I am unsure if this works or if there is some better way of achieving this.
Will I have to construct another index on nested keys?
Does indexing a jsonb column indexes the nested keys or just the top level ones?
Also please share some good resource on this.
From docs:
The default GIN operator class for jsonb supports queries with top-level key-exists operators ?, ?& and ?| operators and path/value-exists operator #>.
For containment #> it works with nested values. For other operators it works only for top-level keys or whatever level is used in expression index. Also, according to documentation, using expression index on level you want to query will be faster than simple index on whole column (makes sense as size is smaller).
If you are doing only containment search, consider using jsonb_path_ops while building your index. It is smaller and faster.
Related
I am new to PostgreSQL I created a table with a JSON type column
id,country_code
11767,{"country_code": [{"code": "GB01F290/00", "new": 1}, {"code": "DE08F290/00", "new": 1}, {"code": "GB02F290/00", "new": 1}]}
11768,{"country_code": [{"code": "GB01F290/20", "new": 1}, {"code": "GB20F290/23", "new": 1}]}
list = ["GB01F290/00", "GB21F290/41"]
How can I select the rows that country_code:code contains any element of the list?
There is probably a way to create a jsonpath query to do this, but you would need some way to transform your ["GB01F290/00", "GB21F290/41"] into the correct jsonpath. I'm not very good at jsonpath, so I won't go into that.
Another way to do this would be to use the #> operator with the ANY(...) construct. But that takes a PostgreSQL array of jsonb documents as its right-hand-side, and each document needs to have a specific structure to match the structure of the documents you are querying. One way to express that array of jsonb would be:
'{"{\"country_code\": [{\"code\": \"GB01F290/00\"}]}","{\"country_code\": [{\"code\": \"GB21F290/41\"}]}"}'::jsonb[]
Or another way, with less obnoxious quoting/escaping would be:
ARRAY['{"country_code": [{"code": "GB01F290/00"}]}'::jsonb,' {"country_code": [{"code": "GB21F290/41"}]}']
A way to obtain that value given your input would be with this query:
select array_agg(jsonb_build_object(
'country_code',
jsonb_build_array(jsonb_build_object( 'code', x))
)) from
jsonb_array_elements('["GB01F290/00", "GB21F290/41"]')
But there might be better ways of doing that, in python.
Then the query would be:
select * from thetable where country_code #> ANY($1::jsonb[])
Where $1 holds the value given in the first block, or the result of the expression given in the 2nd block or the result of the query given in the third block. You could also put combine the queries into one by putting the first into the second as a subquery, but that might inhibit use of indexes.
Note that the column country_code needs to be of type jsonb, not json, for this to work. But that is what it should be anyway.
It would probably be better if you chose a different way to store your data in the first place. An array of objects where each object has a unique name (the value of "code", here) is an antipattern, and should instead be an object of objects, with the unique name being the key. And having objects which just have one key at the top level, which is the same as the name of the column, is another antipattern. And what is the point of "new":1 if it is always present (or is that just an artifact of the example you chose)? Does it convey any meaning? And if you remove all of that stuff, you are left with just a list of strings. Why use jsonb in the first place for that?
When I read the document, I just see an "extractValue" function, but I don't know how it works.
When I pass a query like
Select *
from people
WHERE people.belongings && to_tsquery('hat & (case | bag)')
(and I have a gin index on people.belongings)
would this query use the index? what would the extractValue do to this query?
=======
and another question, why not, or why can't, the GisT index to index an array's objects individually like GIN index?
extractValue is a support function that is used when building the index, not when searching it. In the case of full text search, it would be fed the tsvector and returns the index keys contained in it.
The support function used to get the keys in a tsquery would be extractQuery. For full text search, that would be gin_extract_tsquery. It is defined in src/backend/utils/adt/tsginidx.c if you are interested in the implementation. What is does is convert the tsquery into an internal representation that can be searched in the index.
The actual check if an index entry matches the search expression is done by gin_tsquery_consistent.
The support functions are described in the documentation.
So, I read the following definition of indexes from [MongoDB Docs][1].
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures that store a small portion of the
collection’s data set in an easy to traverse form. The index stores
the value of a specific field or set of fields, ordered by the value
of the field. The ordering of the index entries supports efficient
equality matches and range-based query operations. In addition,
MongoDB can return sorted results by using the ordering in the index.
I have a sample database with a collection called pets. Pets have the following structure.
{
"_id": ObjectId(123abc123abc)
"name": "My pet's name"
}
I created an index on the name field using the following code.
db.pets.createIndex({"name":1})
What I expect is that the documents in the collection, pets, will be indexed in ascending order based on the name field during queries. The result of this index can potentially reduce the overall query time, especially if a query is strategically structured with available indices in mind. Under that assumption, the following query should return all pets sorted by name in ascending order, but it doesn't.
db.pets.find({},{"_id":0})
Instead, it returns the pets in the order that they were inserted. My conclusion is that I lack a fundamental understanding of how indices work. Can someone please help me to understand?
Yes, it is misunderstanding about how indexes work.
Indexes don't change the output of a query but the way query is processed by the database engine. So db.pets.find({},{"_id":0}) will always return the documents in natural order irrespective of whether there is an index or not.
Indexes will be used only when you make use of them in your query. Thus,
db.pets.find({name : "My pet's name"},{"_id":0}) and db.pets.find({}, {_id : 0}).sort({name : 1}) will use the {name : 1} index.
You should run explain on your queries to check if indexes are being used or not.
You may want to refer the documentation on how indexes work.
https://docs.mongodb.com/manual/indexes/
https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/
I am new to mongodb and want to make indexes for a specific collection. I have seen people use a digit "1" in front of the field name when they want to create an index. for example:
db.users.ensureIndex({user_name: 1})
now I want to know what does this digit mean and is it necessary to use it?
It's the type of index. MongoDB supports different kinds of indexes. However, only the first two indexes can be combined to a compound index.
1: Ascending binary-tree index.
-1: Descending binary-tree index. Very similar to the default index but the difference can matter for the behavior of compound indexes.
"hashed": A hashtable index. Very fast for lookup by exact value, especially in very large collections. But not usable for inexact queries ($gt, $regex or similar).
"text": A text index designed for searching for words in strings with natural language.
"2d": A geospatial index on a flat plane
"2dsphere": A geospatial index on a sphere
For more information, see the documentation of index types.
It defines the index type on that specefic field. For example the value of 1 creates an index with ascending order, while the value -1 create the index with descending order.
For more information, see the Manual
I created an index over this field:
ws.eId
so a query like this is pretty fast, which uses a BTree cursor:
db.workout.find({"ws.eId" : "648"})
However this query does not use the indexed field, which uses now a Basic cursor:
db.workout.find({"ws":{"eId" : "648"}})
-Why is this? -How can I make the 2nd query use the indexed field? Or should I just create an index for ws?
The second query searches the field ws for an object with exactly that one field with that value. It can't use an index for that because the object might have more fields than just eld and would then be ineligible for the return set.
To speed up this query, create an index on ws.