I am pretty novice in the PostgreSql world.
I store the following JSON objects in the jsonb column of PostgreSQL as one object per row.
{"cid":"CID1","Display":"User One CID1","F-Name":"Craig","LName":"One"}
{"cid":"CID1","Display":"User One CID1","F-Name":"Leo","LName":"One"}
{"cid":"CID2","OrderNo":"Ordr One Ord1","O-Name":"Michael","LName":"One"}
{"cid":"CID2","OrderNo":"Ordr One Ord1","O-Name":"Sam","LName":"One"}
{"cid":"CID3","InvocNo":"Invc One Inv1","I-Name":"Ron","LName":"One"}
{"cid":"CID3","InvocNo":"Invc One Inv1","I-Name":"Books","LName":"One"}
So these N objects are stored as N rows in a jsonb column (named as res). I have a requirement that would query these JSON objects for text match, contains type queries, on Keys ('Display', 'OrderNo', 'InvocNo', F-Name, O-Name etc).
The JSON objects generated are dynamic JSON and the columns (keys) of one JSON object may not be matching to that of another object. I am currently creating a GIN index on res column like below
CREATE INDEX gin_idx ON mytable USING gin (res)
The performance of filter queries on these columns do not show any improvements while using the GIN index. I have my DB filled with 50,000 rows with such data.
In all these JSON objects only the 'cid' column will be common that would exist in all json objects.
Which type of index will be best suitable in such scenarios considering that one column/key from JSOn object may not be a part of another object?
Related
I've inherited the task of retrieving data from a Postgres table.
The table has ~1m rows, and there are about 145k rows that I wish to retrieve. These 145k rows have a common string in one of their columns batch_name that I can use to search for them.
The table has two columns payload & result that are of type JSON. The result column contains the data that I wish to retrieve.
When I make even the simplest queries to the table:
SELECT * FROM table_name WHERE batch_name = 'an_id' limit 10
The request takes ~7-10 seconds to return data.
This is despite the fact that the batch_name column has an index on it and it's of type varchar(255)
Whilst investigating this, I've discovered that the JSON objects in the result column and payload column can be absolutely gigantic objects. When prettified, they are sometimes ~27k lines long.
These gigantic JSON objects seem to be the root cause of the problem.
My questions are:
What can I do to improve the efficiency of this query? Or is the ultimate solution here to just modify the table such that we are no longer storing gigantic JSON objects?
Given that I don't need to actually query fields in these JSON objects (but I DO need to retrieve them), would simply storing them as strings improve efficiency?
Why is storing large JSON objects SO inefficient?
Thanks in advance for any help, it's much appreciated.
I need to write CSV data into MongoDB (version 4.4.4). Right now I'm using MongoEngine as a data layer for my application.
Each CSV has at least 4 million records and 8 columns.
What is the fastest way to bulk insert (if the data doesn't exist yet) or update (if the data is already in the collection)?
Right now I'm doing the following:
for inf in list:
daily_report = DailyReport.objects.get(company_code=code, date=date)
if daily_report is not None:
inf.id = daily_report.id
inf.save()
The list is a list of DailyReports built from the CSV data.
The _id is auto-generated. However, for business purposes the primary keys are the variables company_code (StringField) and date (DateTimeField).
The DailyReport class has an unique compound index made of the following fields: company_code and date.
The previous code traverse through the list and for each DailyReport it looks for an existing DailyReport in the database with the same company_code and date. If so, the id from the DailyReport in the database is assigned to the DailyReport built from the CSV data. With the id assigned to the object, the object is saved using the Object.save() method from MongoEngine.
The insert operation is being done at each object and is incredibly slow.
Any ideas on how to make this process it faster?
I'm converting a one-to-one relationship into a one-to-many relationship. The old relationship was just a foreign key on the parent record. The new relationship will be an array of foreign keys on the parent record.
(Using Postgres dialect, BTW.)
First I'll add a new JSONB column, which will hold an array of UUIDs.
Then I'll run a query to update all existing rows such that the value from the old column is now stored in the new column (as the first element in an array).
Finally, I'll remove the old column.
I'm looking for help with step 2: writing the update statement that will update all rows, setting the value of the new column based on the value of the old column. Basically, I'm trying to figure out how to express this SQL query using Sequelize:
UPDATE "myTable"
SET "newColumn" = json_build_array("oldColumn")
-- ^^ this really works, btw
Where:
newColumn is type JSONB, and should hold an array (of UUIDs)
oldColumn is type UUID
names are double-quoted because they're mixed case in the DB (shrug)
Expressed using Sequelize sugar, that might be something like:
const { models } = require('../sequelize')
await models.MyModel.update({ newColumn: [ 'oldColumn' ] })
...except that would result in saving an array that contains the string "oldColumn" rather than an array whose first element is the value in that row's oldColumn column.
My experience, and the Sequelize documentation, is focused on working with individual rows via the standard instance methods. I could do that here, but it'd be a lot better to have the database engine do the work internally instead of forcing it to transfer every row to Node and then back again.
Looking for whatever is the most Sequelize-idiomatic way of doing this, if there is one.
Any help is appreciated.
I am new to PostgreSql world. We chose this DB so that we could query our JSON results for filter queries like contains, less than , greater than, etc. JSON results are dynamic and we cannot know in advance what keys will be generated as the output. Table (result_id (int64),jsondata(jsonb)) data looks like this
id1,{k1:vab,k2:abc,k3:def}
id1,{k1:abv,k2:v7,k3:ghu}
id1,{k1:v5,k2:vdd,k3:vew}
id1,{k1:v6,k2:v9s,k3:ved}
id2,{k4:vw,k5:vds,k6:vdss}
id2,{k4:v1,k5:fgg,k6:dd}
id2,{k4:qw,k5:gfd,k6:ess}
id2,{k4:er,k5:dfs,k6:fss}
My queries would be something like
Select * from table where result_id = 'id1' and jsondata->'k1' contains 'ab'
My script outputs a json content that I store in this table.
Each json key is represented in a Grid column and json key's values are column data.Grid offers filtering capabilities, which means filtering on JSON data.
My problem is that the filtering can happen on any JSON key, but key names are not static. Keys (json output) might change when the script content is changed So previously indexed keys would become irrelevant. But if the script is not changed the keys remain constant.
How do I apply indexing so that my JSON filter operations become faster? The same table contains many keys within the same JSON row and across rows. Wouldn't it be inefficient to index all keys so that filtering can be made efficient?
I do have multiple tables (MySQL) and I want to have a single index for them.
Each table has the primary key of int autoincrement type.
The structure of collected data is the same for each table (so no conflict), but as the IDs collide so it seems that I have to query each index separately (unless you can give me a hint of how to avoid ID collision)
Question is: If I query each index separately does it means that the weight of returned results are comparable between indexes?
unless you can give me a hint of how to avoid ID collision
See for example
http://sphinxsearch.com/forum/view.html?id=13078
You can just arrange for the ids to be offset differently. The 'sphinx document id' doesnt have to match the real primary key, but having a simple mapping makes the application simpler.
You have a choice between one-index, one-source (using a single sql query to union all the tables together. one-index, many-source. (a source per table, all making one index) or many-indexes (one index per table, each with own source). Which ever way will give the same query results.
If I query each index separately does it means that the weight of returned results are comparable between indexes?
Pretty much. The difference should be negiblibe that doesnt matter whic way round you do it.