Understanding the output files in a Datastore export - google-cloud-storage

We need to export our Datastore DB from Google Cloud to our local development evironment. I managed to export it and save it in a folder on the Storage. However, there are over a hundred files that are named: "output-{number}". Is not clear for me if we must use all of them in order to import the DB on local, or I just need one of this outputs.
The export created has the following structure:
default_namespace/
all_kinds/
default_namespace_all_kinds.export_metadata
output-0
output-1
...
output-N
Is the entire "default_namespace" directory needed to successfully import the data from Prod to Local?
If you need more information please write a comment and I will provide it to you.

Datasatore exports are expected to generate many different files as specified in the docs
However, the file you should use to perform the import is the one with the extension .overall_export_metadata. (Example: file-name.overall_export_metadata)
If what you want to do is import the Datastore Database to a local instance of the Datastore Emulator, take a look at this documentation

Related

Execute SQL scripts stored in folders on Azure DevOps repo in Java 11 HttpClient

I am version-controlling a database by uploading it to an Azure DevOps repository. The structure of the repository looks something like this:
repo_name
- schema1
- schema2
- schema3
Tables
- table1.sql
- table2.sql
Stored Procedures
- stored_procedures.sql
Functions
- functions.sql
In my Java program, I initialize a Java 11 HttpClient and will fetch the files using HttpRequest. I built out a small helper class that takes in a URI but I thought it would be good to get an expert opinion on how to approach this problem.
When I read the repository, there is still a folder structure that I need to overcome, organized by schema. Within each schema directory, I would execute the CREATE TABLE commands within table1.sql, for instance, and then execute the CREATE PROCEDURE and CREATE FUNCTION commands in their respective files.
My question is: should I write an additional class that will traverse the folder structure and execute the files in the order I mentioned above or should I "flatten" the structure by combining all of the .sql files into one large file and then executing that on the target database? Or is there a better way to approach this outside of the two I proposed?

Where is a file created via Terraform code stored in Terraform Cloud?

I've been using Terraform for some time but I'm new to Terraform Cloud. I have a piece of code that if you run it locally it will create a .tf file under a folder that I tell him but if I run it with Terraform CLI on Terraform cloud this won't happen. I'll show it to you so it will be more clear for everyone.
resource "genesyscloud_tf_export" "export" {
directory = "../Folder/"
resource_types = []
include_state_file = false
export_as_hcl = true
log_permission_errors = true
}
So basically when I launch this code with terraform apply in local, it creates a .tf file with everything I need. Where? It goes up one folder and under the folder "Folder" it will store this file.
But when I execute the same code on Terraform Cloud obviously this won't happen. Does any of you have any workaround with this kind of troubles? How can I manage to store this file for example in a github repo when executing github actions? Thanks beforehand
The Terraform Cloud remote execution environment has an ephemeral filesystem that is discarded after a run is complete. Any files you instruct Terraform to create there during the run will therefore be lost after the run is complete.
If you want to make use of this information after the run is complete then you will need to arrange to either store it somewhere else (using additional resources that will write the data to somewhere like Amazon S3) or export the relevant information as root module output values so you can access it via Terraform Cloud's API or UI.
I'm not familiar with genesyscloud_tf_export, but from its documentation it sounds like it will create either one or two files in the given directory:
genesyscloud.tf or genesyscloud.tf.json, depending on whether you set export_as_hcl. (You did, so I assume it'll generate genesyscloud.tf.
terraform.tfstate if you set include_state_file. (You didn't, so I assume that file isn't important in your case.
Based on that, I think you could use the hashicorp/local provider's local_file data source to read the generated file into memory once the MyPureCloud/genesyscloud provider has created it, like this:
resource "genesyscloud_tf_export" "export" {
directory = "../Folder"
resource_types = []
include_state_file = false
export_as_hcl = true
log_permission_errors = true
}
data "local_file" "export_config" {
filename = "${genesyscloud_tf_export.export.directory}/genesyscloud.tf"
}
You can then refer to data.local_file.export_config.content to obtain the content of the file elsewhere in your module and declare that it should be written into some other location that will persist after your run is complete.
This genesyscloud_tf_export resource type seems unusual in that it modifies data on local disk and so its result presumably can't survive from one run to the next in Terraform Cloud. There might therefore be some problems on the next run if Terraform thinks that genesyscloud_tf_export.export.directory still exists but the files on disk don't, but hopefully the developers of this provider have accounted for that somehow in the provider logic.

How to make Snakemake recognize Globus remote files using Globus CLI?

I am working in a high performance computing grid environment, where large-scale data transfers are done via Globus. I would like to use Snakemake to pull data from a Globus path, process the data, and then push the processed data to a different Globus path. Globus has a command-line interface.
Pulling the data is no problem, for I'd just create a rule that would run globus transfer to create the requisite local file. But for pushing the data back to Globus, I think I'll need a rule that can "see" that the file is missing at the remote location, and then work backwards to determine what needs to happen to create the file.
I could create local "proxy" files that represent the remote files. For example I could make a rule for creating 'processed_data_1234.tar.gz' output files in a directory. These files would just be created using touch (thus empty), and the same rule will run globus transfer to push the files remotely. But then there's the overhead of making sure that the proxy files don't get out of sync with the real Globus-hosted files.
Is there a more elegant way to do this akin to the Remote File capability? Is it difficult to add a Globus CLI support for Snakemake? Thanks in advance for any advice!
Would it help to create a utility function that would generate a list of all desired files and compare it against the list of files available on globus? Something like this (pseudocode):
def return_needed_files():
list_needed_files = [] # either hard-coded or specified with some logic
list_available = [] # as appropriate, e.g. using globus ls
return [i for i in list_needed_files if i not in list_available]
# include all the needed files in the all rule
rule all:
input: return_needed_files

de- serialize JSON metadata to .qvf using qlik sense API

I am aware of Qlik sense serialize app where we generate a JSON object containing metadata information of a .qvf file using Qlik sense API.
I want to do a reverse operation of this i.e generate .qvf file back from json metadata.
After many research just found this link github and it doesnot have a complete information.
Any solution would be helpfull.
Technically you cant create qvf directly from json. You'll have to create an empty qvf and then use various api to import the json.
Qlik have a very nice tool for un-build/build apps (and more). qlik-cli have dedicated commands for un-build/build:
If you are looking for something more "programmable" then ive create some enigma.js mixin for the same purpouse - enigma-mixin. I still need to perform more detailed testing there but it was working ok with simpler tests
Update 08/10/2021
Using qlik-cli
setup context
first unbuild an app:
qlik app unbuild --app 11111111-2222-3333-4444-555555555555
This will create new folder in the current folder named <app_name>-unbuild. The folder will contain all info about the app in json and/or yaml files
once these files are available then you can use them to build another app. Just to mention that the target app should exists before the build is ran:
qlik.exe app build --config ./config.yml --app 55555555-4444-3333-2222-111111111111
The above command will use all available files (specified in config.yml) and update the target app
If you dont want all files to be used and only want to update the data connections, for example, then the build command can be ran with different arguments:
qlik.exe app build --connections ./connections.yml --app 55555555-4444-3333-2222-111111111111
This command will only update the data connections in the target app and will not update anything else

Reading file names from an azure file_storage directory

I have a file_storage within my azure portal which is roughly like :
- 01_file.txt
- 02_file.txt
- 03_file.txt
In azure data studio I have a data set which is linked to this file storage.
If possible, I would like to loop through this directory and get a list of all the file names in my ETL Pipeline.
I've had a look at the For Each and look up but I can't figure out how to apply it to the directory.
the end result would be a list of file_names that I would then carry out some further procedures before ingesting the data into azure.
my current work around is to create a JSON file which lists the file_names when I load the data into the file-storage and parse that using look up and For Each but I'd like to know if there is a better solution using datafactory?
Please use GetMetadata-Activity. You could get folder metadata then get file name lists by accessing childItem properties. More details,please refer to https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity#get-a-folders-metadata
Pipeline configuration:
Execution: