How to manage BigQuery tables post firestore backfill [google-bigquery] - google-cloud-firestore

I am interested in learning how to manage BigQuery post firestore backfills.
First off, I utilize the firebase/firestore-bigquery-export#0.1.22 function with a table named 'n'. After creating this table, 2 tables are generated n_raw_changelog, n_raw_latest.
Can I delete either of the tables, and why are the names generated automatically?
Then I ran a backfill, because the previous collection preceded the BigQuery table using:
npx #firebaseextensions/fs-bq-import-collection \
--non-interactive \
--project blah \
--source-collection-path users \
--dataset n_raw_latest \
--table-name-prefix pre \
--batch-size 300 \
-query-collection-group true
And now the script adds 2 more tables with added extensions
i.e. n_raw_latest_raw_latest, n_raw_latest_raw_changelog.
Am I supposed to send these records to the previous tables, and delete them post-backfill?
Is there a pointer, did I use incorrect naming conventions?

As shown in this tutorial, those two tables are part of the dataset generated by the extension.
For example, suppose we have a collection in Firebase called orders, like this:
When we install the extension, in the configuration panel shows as follows:
Then,
As soon as we create the first document in the collection, the extension creates the firebase_orders dataset in BigQuery with two resources:
A table of raw data that stores a full change history of the documents within the collection... Note that the table is named orders_raw_changelog using the prefix we configured before.
A view, named orders_raw_latest, which represents the current state of the data within the collection.
So, these are generated by the extension.
From the command you posten in your question, I see that you used the fs-bq-import-collection script with the --non-interactive flag, and pass the --dataset parameter with the
n_raw_latest value.
The --dataset parameter corresponds with the Dataset ID parameter that is shown in the configuration panel above. Therefore, you are creating a new dataset named n_raw_latest which will contain the n_raw_latest_raw_changelog table and the n_raw_latest_raw_latest view. In fact, you are creating a new dataset with your current registries, and not updating the dataset you created for instance.
To avoid this, as stated in the documentation, you must use the same Dataset ID that you set when configuring the extension:
${DATASET_ID}: the ID that you specified for your dataset during extension installation
See also:
Automated Firestore Replication to BigQuery
Stream Collections to BigQuery - GitHub
Import existing documents - GitHub

Related

Azure datafactory get id of the tables which successeds

I am using Azure data factory to copy data from tables present in one database to tables present in another database. I am using the lookup table to get the list of tables that needs to be copied and after that using the foreach iterator to copy the data.
I am using the Below table to get a list of tables that needs to be copied.
The problem: I want to update the flag to 1 when a table is successfully copied. I tried using the log that is generated after the pipeline ran but I am unable to use it effectively.

How to copy data from an a csv to Azure SQL Server table?

I have a dataset based on a csv file. This exposes a data as follows:
Name,Age
John,23
I have an Azure SQL Server instance with a table named: [People]
This has columns
Name, Age
I am using the Copy Data task activity and trying to copy data from the csv data set into the azure table.
There is no option to indicate the table name as a source. Instead I have a space to input a Stored Procedure name?
How does this work? Where do I put the target table name in the image below?
You should DEFINITELY have a table name to write to. If you don't have a table, something is wrong with your setup. Anyway, make sure you have a table to write to; make sure the field names in your table match the fields in the CSV file. Then, follow the steps outlined in the description below. There are several steps to click through, but all are pretty intuitive, so just follow the instructions step by step and you should be fine.
http://normalian.hatenablog.com/entry/2017/09/04/233320
You can add records into the SQL Database table directly without stored procedures, by configuring the table value on the Sink Dataset rather than the Copy Activity which is what is happening.
Have a look at the below screenshot which shows the Table field within my dataset.

How to update PostgreSQL full text search field when relational data changes

I have the following strategy for the full text search in my web app which uses PostgreSQL for relational data storage. For example I will take Invoices table.
In the tables I have one additional field ALTER TABLE invoices ADD COLUMN tsv tsvector on which the full text search query is done like this ... WHERE tsv ## to_tsquery('query:*') ...
On every full text search table I have set an update trigger that updates tsv field on every change of the record. Update sets and concatenates the data from different fields to tsv field, sets the right weights, etc...
The data that gets set into tsv field can also be relational data from other tables. From example in table invoices I have client_id field but since I want to search invoices by the client name as well I also include clients.client_name data in the invoices.tsv field
My question is what is the best strategy to keep the relational data in tsv selectors in sync. In above scenario -> if client name changes I would need to update this in tsv field for every invoice...
Should I set cron job setup up that would do this every night? It could be also done with triggers, but since my database schema is very large I am scared it might get out of control if I have triggers all over the place.
If you add the clients name into the tsv field you will end up with more complexity. You might want to look into Materialized views as mentioned in this article. The trade-off might be speed in showing results and the need to refresh the view periodically. As of Postgres 9.4 you can now refresh a view concurrently.
Another thing you could do is create an update trigger in the Client table and when there's an update it will update the data in the Invoices table as well.

Using RESTFul Oracle APEX

I am building a mobile App using Appery.io platform which uses MongoDB -based database. I need to link this DB to Oracle database and use APEX to design an interface such that users can query, update the mobile App DB from Oracle as well as Oracle DB can be updated from the mobile App.
In APEX, I use the URI with GET method:
https://api.appery.io/rest/1/db/collections/Outlet_Details/
And I add the header:
X-Appery-Database-Id
When I run the query in the APEX where I insert the Database-Id, APEX shows the table/collection Outlet_Details in JSON format. However, not the entire table is shown due to, I think, the length of CLOB type.
Now my main problem is I need to query this table/collection called Outlet_Details by a column named: _id. So when I use the following URI:
https://api.appery.io/rest/1/db/collections/Outlet_Details/1234
It returns the specific record that ha _id = 1234. However, I do not want to hardcode it. Instead, I need to have more like where condition such that I can query based on any column value (e.g. userId instead _id). The CURL command is as follows:
curl -X GET
-H "X-Appery-Database-Id: 544a5cdfe4b03d005b6233b9"
-G --data-urlencode 'where={"userId ": "1234"}'
https://api.appery.io/rest/1/db/collections/outlet_details/
My problem is how to insert such a command into APEX, specailly (where) part.
In this tutorial, oracle database is used. Hence using where condition with =:DEP condition, and then bind it to a variable is pretty straightforward. However, I need to replicate this tutorial with my MongoDB.
The other question, which I guess would clarify a lot to me, in the aforementioned tutorial, there is a prefix URI that is by default APEX shema URI. Even when I insert different URI template, the resultant URI will append APEX to the one I inserted. How to build a service there using different URI?
I found that APEX takes where condition as encoded parameter in the URL. Something like:
https://api.appery.io/rest/1/db/collections/Outlet_Details?where=%7B%22Oracle_Flag%22%3A%22Y%22%7D
The header is same and no input parameters.
This can be done from Application builder > New Application > Database > Create Application > Shared Componenets > Create > REST and then start inserting the header, utl .. etc.
You can refer to this link as a reference encoded URL

How to access Library, File, and Field descriptions in DB2?

I would like to write a query that uses the IBM DB2 system tables (ex. SYSIBM) to pull a query that exports the following:
LIBRARY_NAME, LIBRARY_DESC, FILE_NAME, FILE_DESC, FIELD_NAME, FIELD_DESC
I can access the descriptions via the UI, but wanted to generate a dynamic query.
Thanks.
Along with SYSTABLES and SYSCOLUMNS, there is also a SYSSCHEMAS which appears to contain the data you need. Please note that accessing this information through QSYS2 will restrict rows returned to those objects with which you have some access - the SYSIBM schema appears to disregard this (check the reference - for V6R1 it's about page 1267).
You also shouldn't need to retrieve this with a dynamic query - static with host variables (if necessary) will work just fine.