Regex with Prefix parameter for list_blobs_with_prefix() - google-cloud-storage

I'm trying to get objects from gcp storage with prefix using python api client but getting problem with prefix parameter. I'm able to do it in gsutil with
gsutil ls -h gs://{{bucket-name}}/*/latest/
But not with python api
I'm using the function from documentation.
Tried passing prefix parameter as
*/latest/
/*/latest
*
and letting delimiter as none.Still not getting any result.
storage_client = storage.Client()
# Note: Client.list_blobs requires at least package version 1.17.0.
blobs = storage_client.list_blobs(bucket_name, prefix=prefix,
delimiter=delimiter)
print('Blobs:')
for blob in blobs:
print(blob.name)
if delimiter:
print('Prefixes:')
for prefix in blobs.prefixes:
print(prefix)
The expected output is
gs://{{bucket-name}}/product/latest/:
gs://{{bucket-name}}/product/latest/health
gs://{{bucket-name}}/product/latest/index.html

gsutil knows about regexes, but the GCS APIs themselves do not. The APIs only support literal prefixes.
Instead, you'll need to fetch everything and filter with the regex yourself, which is what gsutil is doing in your example.
all_blobs = storage_client.list_blobs(bucket_name)
regex = re.compile(r'.*/latest/.*')
blobs = filter(regex.match, all_blobs)
If you are going to have too many objects to make this worthwhile, I recommend reorganizing your data in a way that allows you to put a non-wildcard match at the beginning of the path, so that you can filter server-side.

Related

Best way to pass parameters to SparkSubmitOperator

So far i have been providing all required variables in the "application" field in the file itself this however feels a bit hacky.
So for example:
spark_clean_store_data = SparkSubmitOperator(
task_id="my_task_id",
application="/path/to/my/dags/scripts/clean_store_data.py",
conn_id="spark_conn",
dag=dag,
)
So the question is what is the most airflowy/proper way to provide SparkSubmitOperator with parameters like input data and or output files?
As per documentation, you might consider using the following parameters of the SparkSubmitOperator
files : a comma-separated string that allows you to upload files in the working directory of each executor
application_args : a list of string that allows you to pass arguments to the application

Invalid hashing in Firebase Cloud Storage Rules Playground

I am testing hashing in the rules playground:
This returns "CRexOpCRkV1UtjNvRZCVOczkUrNmGyHzhkGKJXiDswo=", the correct hash of the string "SECRET" :
let expected = hashing.sha256("SECRET");
But this returns "SECRETpath/to/the/file.mp4", the argument itself instead of its hash:
let expected = hashing.sha256("SECRET" + request.resource.name);
Is it a bug in the rules playground?
Can hashing functions be used on dynamic values or is it intentionally prevented?
The strange rules playground behavior has been mentioned here before, this time with Firestore security rules: Firestore rules hashing returns identity
Firebaser here!
There are a few issues at play here. I think the primary source of confusion is that the hashing.sha256 function returns a rules.Bytes type. It appears that the Rules Playground in the Firebase Console incorrectly shows a string value when debugging the bytes type, but that is unrelated to behavior in production. For example, this Rule will always deny:
allow write: if hashing.sha256("SECRET" + request.resource.name) ==
"SECRET" + request.resource.name;
To get the behavior you're looking for, you need to use one of the conversion functions for the rules.Bytes type. Based on your question, you'll probably want the toBase64() function, but toHexString() is also an option. If you try these functions in your Rules, the Playground should start behaving correctly and the Rules will work as expected in production as well. So to put it all together, you'd write:
let expected = hashing.sha256("SECRET" + request.resource.name).toBase64();
For example, the rules listed below would allow you to upload a file called "foo/bar" (as Gqot1HkcleDFQ5770UsfmKDKQxt_-Jp4DRkTNmXL9m4= is the Base64 SHA-256 hash of "SECRETfoo/bar")
allow write: if hashing.sha256('SECRET' + request.resource.name).toBase64() ==
"Gqot1HkcleDFQ5770UsfmKDKQxt_-Jp4DRkTNmXL9m4=";
I hope this helps clear things up! Separately we will look into addressing the wrong debugging output in the Playground
After trying with emulators and the deployed app, it seems that hashing.sha256 does not work on dynamic data in any environment. The behavior is consistent, so I filed a feature request to add this function to storage security rules. This would be nice because it would allow passing signed data to the security rule for each file (for ex: an upload authorization obtained via a Cloud Function)
As of now, the workaround that I imagine is putting data in user custom token (or custom claims), so I can pass signed data to the security rule. It is not ideal because I need to re-sign with custom token for every file upload.

avoid $( expansion in Qliksense

I have rest api which decrypts the token passed to it and returns the actual value.
The token can sometime contains $( values and hence this is causing issues in the post call to the api
[dbtable]:
SELECT X
FROM "table" WHERE key='1234';
Let v_C= Peek('X',0,'dbtable');
//create the json request
Let vRequestBody='[';
Let vRequestBody = vRequestBody&'{"troup":"CB","tt":"CBA","tk":"$(v_C)"}';
Let vRequestBody = vRequestBody&']';
LIB CONNECT TO 'postapi';
RestConnectorMasterTable:
SQL SELECT
"data"
FROM JSON (wrap on) "root"
WITH CONNECTION (BODY "$(vRequestBody)" );
its working for rest of the values. But for values with "$(" the value of v_C turns NULL due to $ expansion. is there a way where I can avoid $ expansion and pass the value as it is to the body of the api call in qlik sense
Yes, this is quite common with APIs where they can have ways they want things passing that "confuse" Qlik Sense's parser. Generally the way around it is to put in a placeholder and then replace that with the real value later or use a chr() command to get the character you want. I think the latter should work in this situation:
Let vRequestBody = vRequestBody&'{"troup":"CB","tt":"CBA","tk":"' & chr(36) & '(v_C)"}';
Hope that works.

How to reference a DAG's execution date inside of a `KubernetesPodOperator`?

I am writing an Airflow DAG to pull data from an API and store it in a database I own. Following best practices outlined in We're All Using Airflow Wrong, I'm writing the DAG as a sequence of KubernetesPodOperators that run pretty simple Python functions as the entry point to the Docker image.
The problem I'm trying to solve is that this DAG should only pull data for the execution_date.
If I was using a PythonOperator (doc), I could use the provide_context argument to make the execution date available to the function. But judging from the KubernetesPodOperator's documentation, it seems that the Kubernetes operator has no argument that does what provide_context does.
My best guess is that you could use the arguments command to pass in a date range, and since it's templated, you can reference it like this:
my_pod_operator = KubernetesPodOperator(
# ... other args here
arguments=['python', 'my_script.py', '{{ ds }}'],
# arguments continue
)
And then you'd get the start date like you'd get any other argument provided to a Python file run as a script, by using sys.argv.
Is this the right way of doing it?
Thanks for the help.
Yes, that is the correct way of doing it.
Each Operator would have template_fields. All the parameters listed in template_fields can render Jinja2 templates and Airflow Macros.
For KubernetesPodOperator, if you check docs, you would find:
template_fields = ['cmds', 'arguments', 'env_vars', 'config_file']
which means you can pass '{{ ds }}'to any of the four params listed above.

AWS CloudWatch Log Metric Filter with JSON key has character space

When creating an AWS CloudWatch Log Metric Filter, how would you match terms in JSON Log Events where the key has a character space in the name?
For example, let's assume there's a log line with JSON element like the following...
{"Event":"SparkListenerLogStart","Spark Version":"2.4.0-SNAPSHOT"}
How would you reference the "Spark Version"? $."Spark Version", $.Spark Version, $.Spark\ Version, and $.[Spark Version] don't work.
I couldn't find the answer in the AWS Filter and Pattern Syntax documentation.
At the time of writing, this is not possible. AWS will probably fix that at some point, but for now the only workaround would be to use the non-JSON syntax and search for the exact string. The following filter:
"\"Spark Version\":\"2.4.0-SNAPSHOT\""
will match:
{"Event":"SparkListenerLogStart","Spark Version":"2.4.0-SNAPSHOT"}