Index on JSON field with dynamic keys - postgresql

I'm on PG 9.5 and I have a table Visitors(id, data::json)
Example:
Visitor(id: 1, data: {name: 'Jack', age: 33, is_user: true })
I'd like to perform queries like
Give me all visitors named Jack and age > 25
Give me all visitors who are users, but where name is unspecified (key not in json)
The keys inside the data column user-specified and as such are dynamic.
Which index makes the most sense in this situation?

You can use a GIN index on a jsonb column, which gives you generalized, dynamic indexing of keys and values inside JSON value.
CREATE TABLE visitors (
id integer,
data jsonb
);
CREATE INDEX idx_visitors_data ON cards USING GIN (data);
SELECT * FROM visitors
WHERE data -> 'is_user' AND NOT data ? 'name';
Unfortunately, GIN indexes don't support numeric range comparisons. So while you could still issue a query for visitors named Jack aged over 25:
SELECT * FROM visitors
WHERE data #> '{"name": "Jack"}' AND ((data ->> 'age')::integer) > 25;
This will only use the index to find the name "Jack", and possibly to find rows which have an "age" key, but the actual test that the ages are over 25 will be done as a scan over the matching rows.
Note that if you really need range comparisons, you can still add non-GIN indexes on specific paths inside the JSON value, if you expect them to appear often enough to make that worthwhile. For example, you could add an index on data -> 'age' that supports range comparisons:
CREATE INDEX idx_visitors_data_age ON visitors ( ((data ->> 'age')::integer) );
(note the extra parentheses; you'll get an error without them).
See this excellent blog post for further information.

You can look at additional extension JsQuery – is a language to query jsonb data type, it provides an additional functionality to jsonb (currently missing in PostgreSQL), such as a simple and effective way to search in nested objects and arrays, more comparison operators with indexes support. Read more here: https://github.com/postgrespro/jsquery.
In your cases, you can create jsonb_path_value_ops index:
CREATE INDEX idx_visitors ON visitors USING GIN (jsonb jsonb_path_value_ops);
and use the next queries:
select * from visitors where jsonb ## 'name = "Jack" and age > 25';
select * from visitors where jsonb ## 'not name = * and is_user=true';

I believe the best approach here is to create a raw sql migration:
Run ./manage.py makemigrations --empty yourApp where yourApp is the app of the model you want to change indexes for.
Edit the migration i.e.
operations = [
migrations.RunSQL("CREATE INDEX idx_content_taxonomies_categories ON common_content((taxonomies->>'categories'));")
]
Where idx_content_taxonomies_categories is the name of the index, common_content is your table, taxonomies is your JSONField, and categories in this case is the key you want to index.
That should do it. Cheers!

Related

Error when filtering some data with like and jsonb in PostgreSQL

I keep having a problem when filtering some data in postgresql.
For example, I want to filter by a json.
My jsons are saved in the following way
"[{\"Brand\":\"Leebo\"},{\"Housing Color\":\"Black\"},{\"Beam Type\":\"High Beam, Low Beam\"}]"
And let's say that I want to filter after
[{\"Brand\":\"Leebo\"}]
Shouldn't I write something like that in the query?
SELECT * FROM public.products
WHERE attributes is not NULL
AND attributes::text LIKE '%{\"Brand\":\"Leebo\"}%';
I tried also
SELECT * FROM public.products WHERE attributes::jsonb #> '"[{\"Material\":\"Artificial Leather\"}]"'
Because I won't receive data
Do you know how I could proceed differently?
But it only works if the column has all the data (eg if I give the exact data that is in the column)
Also, how could I search with whereIn?
You have an array in your JSONB because those characters ([ and ]) are array characters. If you are sure that you will always have only one array in your JSONB so you can use this:
SELECT * FROM public.products
WHERE attributes is not NULL
AND attributes[0]->>'Brand' = 'Leebo'
But if you can have several arrays inside JSONB then use jsonb_array_elements for extracting array elements, after then you can use JSONB operations like as ->>

Postgres/jOOQ replace jsonb[] element

I'm having a Spring application with jOOQ and Postgresql database having a table (issues) with the following two columns:
id (Long)
documents (jsonb[]) <- array of jsonb (not jsonb array)
The document json structure is on the following format:
{
"id": (UUID),
"name": (String),
"owner"; (String)
}
What I want to achieve is to be able to replace documents with matching id (normally only one) with a new document. I'm struggling with the jOOQ or even the plain SQL.
I guess I need to write some plain SQL in jOOQ to be able to do this but that is ok (to a minimum). I had an idea to do the following:
Unnest the document column
Filter out the document that should be updated of the array
Append the document that should be updated
Store the whole array
Raw SQL looks like this but missing the new document to be added:
UPDATE issues SET documents = (SELECT ARRAY_AGG(doc) FROM issues, UNNEST(issues.documents) AS doc WHERE doc->>'id' != 'e4e3422f-83a4-493b-8bf9-37980d532538') WHERE issues.id = 1;
My final goal is to write this in jOOQ and append the document to be replaced. I'm using jOOQ 3.11.4.
You should be able to just concatenate arrays in PostgreSQL:
UPDATE issues
SET documents = (
SELECT ARRAY_AGG(doc) || '{"id":"e4e3422f-83a4-493b-8bf9-37980d532538","name":"n"}'::jsonb
FROM issues, UNNEST(issues.documents) AS doc
WHERE doc->>'id' != 'e4e3422f-83a4-493b-8bf9-37980d532538'
)
WHERE issues.id = 1
Some common array functions will be added to jOOQ in the near future, e.g. array concatenation, but you can get away for now with plain SQL templating I suspect?

How to filter dates in Couchbase and Scala

I have a simple json:
{
"id": 1,
"name": "John",
"login": "2019-02-13"
}
This kind of documents are stored in Couchbase, but for now I would like to create index (or list in some other, well time way) which should filter all documents where login is older then 30 days. How should I create it in Couchbase and get this in Scala?
For now I get all documents from database and filter them in API, but I think it is not very good way. I would like to filter it on database side and retrieve only documents which have login older then 30 days.
Now, in Scala I have only the method only to get docs by id:
bucket.get(id, classOf[RawJsonDocument])
I would recommend taking a look at N1QL (which is just SQL for JSON). Here's an example:
SELECT u.*
FROM mybucket u
WHERE DATE_DIFF_STR(NOW_STR(), login, 'day') > 30;
You'll also need an index, something like:
CREATE INDEX ix_login_date ON mybucket (login);
Though I can't promise that's the best index, it will at least get you started.
I used DATE_DIFF_STR and NOW_STR, but there are other ways to manipulate dates. Check out Date Functions in the documentation. And since you are new to N1QL, I'd recommend checking out the interactive N1QL tutorial.
The following query is more efficient because it can push the predicate to IndexScan when index key matched with one side of predicate relation operator. If you have expression that derives from index key, it gets all the values and filter in the query engine.
CREATE INDEX ix_login_date ON mybucket (login);
SELECT u.*
FROM mybucket AS u
WHERE u.login < DATE_ADD_STR(NOW_STR(), 'day', -30) ;

How to index and sorting with Pagination using custom field in MongoDB ex: name instead of id

https://scalegrid.io/blog/fast-paging-with-mongodb/
Example : {
_id,
name,
company,
state
}
I've gone through the 2 scenarios explained in the above link and it says sorting by object id makes good performance while retrieve and sort the results. Instead of default sorting using object id , I want to index for my own custom field "name" and "company" want to sort and pagination on this two fields (Both fields holds the string value).
I am not sure how we can use gt or lt for a name, currently blocked on how to resolve this to provide pagination when a user sort by name.
How to index and do pagination for two fields?
Answer to your question is
db.Example.createIndex( { name: 1, company: 1 } )
And for pagination explanation the link you have shared on your question is good enough. Ex
db.Example.find({name = "John", country = "Ireland"}). limit(10);
For Sorting
db.Example.find().sort({"name" = 1, "country" = 1}).limit(userPassedLowerLimit).skip(userPassedUpperLimit);
If the user request to fetch 21-30 first documents after sorting on Name then country both in ascending order
db.Example.find().sort({"name" = 1, "country" = 1}).limit(30).skip(20);
For basic understand of Indexing in MonogDB
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
Indexes are special data structures, that store a small portion of the collection’s data set in an easy to traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the field.
Default _id Index
MongoDB creates a unique index on the _id field during the creation of a collection. The _id index prevents clients from inserting two documents with the same value for the _id field. You cannot drop this index on the _id field.
Create an Index
Syntax to execute on Mongo Shell
db.collection.createIndex( <key and index type specification>, <options> )
Ex:
db.collection.createIndex( { name: -1 } )
for ascending use 1,for descending use -1
The above rich query only creates an index if an index of the same specification does not already exist.
Index Types
MongoDB provides different index types to support specific types of data and queries. But i would like to mention 2 important types
1. Single Field
In addition to the MongoDB-defined _id index, MongoDB supports the creation of user-defined ascending/descending indexes on a single field of a document.
2. Compound Index
MongoDB also supports user-defined indexes on multiple fields, i.e. compound indexes.
The order of fields listed in a compound index has significance. For instance, if a compound index consists of { name: 1, company: 1 }, the index sorts first by name and then, within each name value, sorts by company.
Source for my understanding and answer and to know more about MongoDB indexing MongoDB Indexing

Is there a way to index in postgres for fast substring searches

I have a database and want to be able to look up in a table a search that's something like:
select * from table where column like "abc%def%ghi"
or
select * from table where column like "%def%ghi"
Is there a way to index the column so that this isn't too slow?
Edit:
Can I also clarify that the database is read only and won't be updated often.
Options for text search and indexing include:
full-text indexing with dictionary based search, including support for prefix-search, eg to_tsvector(mycol) ## to_tsquery('search:*')
text_pattern_ops indexes to support prefix string matches eg LIKE 'abc%' but not infix searches like %blah%;. A reverse()d index may be used for suffix searching.
pg_tgrm trigram indexes on newer versions as demonstrated in this recent dba.stackexchange.com post.
An external search and indexing tool like Apache Solr.
From the minimal information given above, I'd say that only a trigram index will be able to help you, since you're doing infix searches on a string and not looking for dictionary words. Unfortunately, trigram indexes are huge and rather inefficient; don't expect some kind of magical performance boost, and keep in mind that they take a lot of work for the database engine to build and keep up to date.
If you need just to, for instance, get unique substrings in an entire table, you can create a substring index:
CREATE INDEX i_test_sbstr ON tablename (substring(columname, 5, 3));
-- start at position 5, go for 3 characters
It is important that the substring() parameters in the index definition are
the same as you use in your query.
ref: http://www.postgresql.org/message-id/BANLkTinjUhGMc985QhDHKunHadM0MsGhjg#mail.gmail.com
For the like operator use one of the operator classes varchar_pattern_ops or text_pattern_ops
create index test_index on test_table (col varchar_pattern_ops);
That will only work if the pattern does not start with a % in which case another strategy is required.