MongoDB csv skip missing keys/values - mongodb

I got a large DB with around 1mil entries. I'm using Pymongo to convert it to CSV. The problem is, some of the key/value pairs are missing or are placed in wrong order in the DB.
This results in CSV file fields being shifted.
DB
[
{
"_id": "123abc",
"device":
{
"Chanel": "1",
"Vendror": "undefined",
"first_seen": "1611606675",
"last_seen": "1612606695",
},
{
"Chanel": "1",
"first_seen": "1612342635",
"Vendror": "abc",
"last_seen": "1631606928",
},
{
"Chanel": "1",
"first_seen": "1612342635",
"Vendror": "abc"
},
}
]
TRIED
1. Hardcoding headers, compare to key and write value. If key not existent write value 'N/A'
with open(fn, 'w', encoding="utf-8") as wf:
header = ['_id', 'Chanel', 'Vendor', 'first_seen', 'last_seen']
for key in header:
write_file.write(key + ',')
write_file.write('\n')
#timeframe is a list of all documents within certain timeframe
for batch in timeframe:
for data_items in batch['device']:
for key, value in data_items.items():
if key not in header:
wf.write('NA')
else:
value = str(value)
value = value.replace(',', '-')
write_file.write(f'{value},')
wf.write('\n')
wf.write('\n')
return fn
2. Tried converting to json and use Pandas to convert json to CSV - same issue
EXPECTED OUTPUT
_id | Chanel | Vendor | first_seen | last_seen
------------------------------------------------------
123abc | 1 | undefined | 1611606675 | 1612606695
------------------------------------------------------
123abc | 1 | undefined | 1612342635 | 1631606928
------------------------------------------------------
123abc | 1 | undefined | 1612342635 | N/A
------------------------------------------------------
ACTUAL OUTPUT
_id | Chanel | Vendor | first_seen | last_seen
------------------------------------------------------
123abc | 1 | undefined | 1611606675 | 1612606695
------------------------------------------------------
123abc | 1 | 1612342635| undefined | 1631606928
------------------------------------------------------
123abc | 1 | 1612342635 | undefined |
------------------------------------------------------

Related

MongoDB $in query multiple predicates

I have a table as follows:
------------------------------------------------------
| row_num | person_id | org_id | other columns |
|-----------|-------------|----------|---------------|
| 0 | person_0 | org_0 | . |
| 1 | person_1 | org_0 | . |
| 2 | person_2 | org_0 | . |
| 3 | person_3 | org_0 | . |
------------------------------------------------------
| 3 | person_0 | org_1 | . |
| 4 | person_1 | org_1 | . |
| 5 | person_2 | org_1 | . |
| 6 | person_3 | org_1 | . |
------------------------------------------------------
| 6 | person_0 | org_2 | . |
| 7 | person_1 | org_2 | . |
| 8 | person_2 | org_2 | . |
| 9 | person_3 | org_2 | . |
------------------------------------------------------
The primary key is (person_id, org_id). This combination is guaranteed to be unique.
Let us say, I have lists person_ids and corresponding org_ids for certain persons and I want to fetch their records from the collection.
persons = [("person_0", "org_0"), ("person_1", "org_1"), ("person_3", "org_1")]
person_ids, org_ids = zip(*persons)
In this case the expected output is columns from rows 0, 4, 6.
I can always find the answer by finding the intersection of the following two queries, but I was wondering if there is any smarter way to do this:
db.collection.find({person_id: {$in: person_ids})
db.collection.find({org_id: {$in: org_ids})
If you need it by OR operator you can do it by this command :
db.collection.find({
$or:[
{person_id: { $in: person_ids } },
{org_id: { $in: org_ids } }
]
})
If you need it by AND operator you can do it by this command :
db.collection.find({
person_id: { $in: person_ids },
org_id: { $in: org_ids }
})
You can find the answer in one query this way
db.collection.find({
person_id: { $in: person_ids },
org_id: { $in: org_ids }
})
Here's simple demo: https://mongoplayground.net/p/TwYxZRDFVBI

Updated varchar column data to jsonb data

I'm trying to update a column from a varchar type column into a JSON but the column is already filled in with varchars how can I cast them to turn into a JSON with additional information.
Table structure with data:
-----+--------+
| id | value |
+-----+--------+
| 1 | value1 |
| 2 | value2 |
| 3 | value3 |
| 4 | value4 |
| 5 | value5 |
| 6 | value6 |
+-----+--------*
Expected Result:
+-----+----------------------------------+
| id | value |
+-----+----------------------------------+
| 1 | {"t1": "val", "value": "value1"} |
| 2 | {"t1": "val", "value": "value2"} |
| 3 | {"t1": "val", "value": "value3"} |
| 4 | {"t1": "val", "value": "value4"} |
| 5 | {"t1": "val", "value": "value5"} |
| 6 | {"t1": "val", "value": "value5"} |
+-----+----------------------------------*
Kindly help me to resolve this query
demo:db<>fiddle
When altering a column type you can add an expression to convert the data from the old type into the new one. You have to add the USING clause in order to do so:
ALTER TABLE mytable
ALTER COLUMN "value" TYPE json USING json_build_object('t1', 'val', 'value', "value");
In this case use the json_build_object() function to create the expected JSON object incl. the old value

Merge jsonb rows of a column into json build object in postgres

There are three columns id : integer auto increment , col_jsonb: jsonb , date: timestamp.
I want to merge col_jsonb row values into json build object based on date,
the required output
Table:
+----+----------------+------------+
| id | col_jsonb | date |
+----+----------------+------------+
| 1 | {"Morning":10} | 2020-08-09 |
| 2 | {"Evening":20} | 2020-08-09 |
| 3 | {"Night":30} | 2020-08-09 |
| 4 | {"Morning":20} | 2020-08-10 |
+----+----------------+------------+
Expected o/p:
+----+----------------------------------------------+------------+
| id | col_jsonb | date |
+----+----------------------------------------------+------------+
| 1 | [{"Morning":10},{"Evening":20},{"Night":30}] | 2020-08-09 |
| 2 | {"Morning":20} | 2020-08-10 |
+----+----------------------------------------------+------------+
Try This query:
select
row_number() over (order by date_) as "id",
jsonb_agg(col_jsonb),
date_ as "Date"
from
example
group by
date_
row_number is added for numbering of rows if required
DEMO

PostgreSQL, import a csv file

I'm trying to copy a CSV file into an empty table, after trying to match the columns in the CSV which failed with the exact same error.
COPY books
FROM '/path/to/file/books.csv' CSV HEADER;
error:
ERROR: extra data after last expected column
CONTEXT: COPY books, line 2: "1,Harry Potter and the Half-Blood Prince (Harry Potter #6),J.K. Rowling/Mary GrandPré,4.57,0439785..."
SQL state: 22P04
Also, I would like that the publication_date will be of date type, so it can be queried, How can that be applied during copying?
a piece of the CSV file:
bookID| title | authors | average_rating | isbn|isbn13 |num_pages | ratings_count| text_reviews_count| publication_date|
----------------------------------------------------------------------------------------------------------------------------------
1 | harry potter | | | |
|(harry Potter | | | |
| #6) | author | 4 |"num" | "num"| 600 | 3243 | 534 | 01/01/2000 |
SELECT * FROM books
output:
bookID| title | authors | average_rating | isbn |isbn13| language_code
---------------------------------------------------------------------------------
text | character| text | integer | text | text | character |
| varying | | | | | varying
| num_pages | ratings_count| text_reviews_count| publication_date| publisher
-------------------------------------------------------------------------------
| integer | bigint | bigint | date | character
varying
First of all, the columns number mismatch from CSV file and TABLE, but you can specify via COPY command the columns for the table that you need:
COPY books (bookID,title,authors,average_rating,isbn,isbn13,num_pages,ratings_count , text_reviews_count , publication_date) FROM '/path/to/file/books.csv' CSV header delimiter ',';
You can specify you delimiter

Postgresql: Get all the outermost keys of a jsonb column

Assume that we have a large table with a jsonb column having only json object in it. How to get a list of all the outermost keys in column ?
i.e. if the table is something like this
| id | data_column |
| ---| -----------------------------------------------------|
| 1 | {"key_1": "some_value", "key_2": "some_value"} |
| 2 | {"key_3": "some_value", "key_4": "some_value"} |
| 3 | {"key_1": "some_value", "key_4": "some_object"} |
.....
is it possible to get a result something like this
| keys |
| -----|
| key_1|
| key_2|
| key_3|
| key_4|
Yes:
SELECT jsonb_object_keys(data_column) FROM test_table;
Or if you want to remove duplicates, order and have keys as column name:
SELECT DISTINCT jsonb_object_keys(data_column) AS keys FROM test_table ORDER by keys;
jsonb_object_keys() / json_object_keys() returns the outer-most keys from the json object.