How to change the metadata of all specific file of exist objects in Google Cloud Storage? - google-cloud-storage

I have uploaded thousands of files to google storage, and i found out all the files miss content-type,so that my website cannot get it right.
i wonder if i can set some kind of policy like changing all the files content-type at the same time, for example, i have bunch of .html files inside the bucket
a/b/index.html
a/c/a.html
a/c/a/b.html
a/a.html
.
.
.
is that possible to set the content-type of all the .html files with one command in the different place?

You could do:
gsutil -m setmeta -h Content-Type:text/html gs://your-bucket/**.html

There's no a unique command to achieve the behavior you are looking for (one command to edit all the object's metadata) however, there's a command from gcloud to edit the metadata which you could use on a bash script to make a loop through all the objects inside the bucket.
1.- Option (1) is to use a the gcloud command "setmeta" on a bash script:
# kinda pseudo code here.
# get the list with all your object's names and iterate over the metadata edition command.
for OUTPUT in $(get_list_of_objects_names)
do
gsutil setmeta -h "[METADATA_KEY]:[METADATA_VALUE]" gs://[BUCKET_NAME]/[OBJECT_NAME]
# the "gs://[BUCKET_NAME]/[OBJECT_NAME]" would be your object name.
done
2.- You could also create a C++ script to achieve the same thing:
namespace gcs = google::cloud::storage;
using ::google::cloud::StatusOr;
[](gcs::Client client, std::string bucket_name, std::string object_name,
std::string key, std::string value) {
# you would need to find list all the objects, while on the loop, you can edit the metadata of the object.
for (auto&& object_metadata : client.ListObjects(bucket_name)) {
string bucket_name=object_metadata->bucket(), object_name=object_metadata->name();
StatusOr<gcs::ObjectMetadata> object_metadata =
client.GetObjectMetadata(bucket_name, object_name);
gcs::ObjectMetadata desired = *object_metadata;
desired.mutable_metadata().emplace(key, value);
StatusOr<gcs::ObjectMetadata> updated =
client.UpdateObject(bucket_name, object_name, desired,
gcs::Generation(object_metadata->generation()))
}
}

Related

How to set charset to UTF8 for text files with gsutil when uploading to Google Cloud Storage bucket?

We have a (public) Google Cloud Storage bucket that hosts a simple website, meaning both HTML and images.
Our build process uses Google Cloud Build, however the question is not tied to using Cloud Build, but specifically regarding on how to use gsutil properly.
This is our current gsutil task:
# Upload it to the bucket
- name: gcr.io/cloud-builders/gsutil
dir: "public/"
args: [
"-m", # run the rsync command in parallel
"-h", "Cache-Control: public, max-age=0", # Custom cache control header
"cp", # copy command
"-r", # recursively
".", # source folder
"gs://mybucket/" # the target bucket and folder
]
As you can see, this copies everything in the local public/ folder to the bucket and applies the Cache-Control header on all objects.
According to this:
https://cloud.google.com/storage/docs/gsutil/addlhelp/WorkingWithObjectMetadata
You can specify the content type with
-h "Content-Type:text/html; charset=utf-8"
However, this makes all objects (not only .html files, but also images, etc) to get the content type text/html; charset=utf-8.
(I have even tried -h "Content-Type:; charset=utf-8" but then gsutil fails saying its an invalid content type value).
Is there a way to tell gsutil to apply charset=utf-8 on all objects, without actually overwriting the main content type?

How to export IBM Watson conversation history?

Before running the code, install ibm-watson &
ibm-cloud-sdk-core package and also pip instll PyJWT==1.7.1.
I found in IBM document that "For a Python script you can run to export logs and convert them to CSV format, download the export_logs_py.py file from the Watson Assistant GitHub) repository."
But I don't really know where & how should I modify in order to connect my ibm skill.
There is no demo or instruction about where I can find those argument.
I only find these information in skill api details but it seems it needs more.
Do anyone have an example version about how to use the .py they provided?
(I'm a coding beginner, not really understand every lines in the .py)
The .py shows an error after I run the file without modification:
runfile('C:/export_logs.py', wdir='C:/Users/admin/Downloads')
usage: export_logs.py [-h] [--logtype {ASSISTANT,WORKSPACE,DEPLOYMENT}]
[--language LANGUAGE] [--filetype {CSV,TSV,XLSX,JSON}]
[--url URL] [--version VERSION]
[--totalpages TOTALPAGES] [--pagelimit PAGELIMIT]
[--filter FILTER] [--strip STRIP]
apikey id filename
export_logs.py: error: the following arguments are required: apikey, id, filename
An exception has occurred, use %tb to see the full traceback.
SystemExit: 2
The conversation I want to download:
First of all, Workspaces in IBM Watson Assistant are now called Skills.
To understand what arguments(positional and optional) you need to pass to the Python script, run the below command
python export_logs_py.py -h
Wherever you see workspace, you can replace it with skill.
To export the logs in the .csv file format, run the below command
python export_logs_py.py --filetype CSV --url <URL> <API_KEY> <SKILL_ID> output.csv
Replace placeholders <URL, <API_KEY> and <SKILL_ID> with appropriate values mentioned below.
<URL> & <API_KEY> - You can find them under the Manage page of your Watson Assistant service page
<SKILL_ID> - The same as the one in the image you uploaded. Check this StackOverflow answer for more info.
For Assistant logs, add --logtype ASSISTANT. The default is WORKSPACE.
You can also find the logs in the UI under Analytics section of your Skill
As you can see, the script reported an error and said that you have to provide the apikey, the id and the (presumably output) filename as parameters. It also showed that additional parameters can be specified.
usage: export_logs.py [-h] [--logtype {ASSISTANT,WORKSPACE,DEPLOYMENT}]
[--language LANGUAGE] [--filetype {CSV,TSV,XLSX,JSON}]
[--url URL] [--version VERSION]
[--totalpages TOTALPAGES] [--pagelimit PAGELIMIT]
[--filter FILTER] [--strip STRIP]
apikey id filename
Your next step could be now to invoke the script again, but provide an API key for Watson Assistant, the skill ID and a filename as additional paramaters. Next, I would try to something like, e.g., trying out to specify the output type:
export_logs.py --filetype CSV myapikey skillID output.csv
I am not the author of that script, but that is how I would approach it if I wanted to use it

Getting HASH of individual files within folder uploaded to IPFS

When I upload a folder of .jpg files to IPFS, I get the HASH of that folder - which is cool.
But is each individual file in that folder also getting hashed?
And if so, how do I get the hash of each file?
I basically want to be able to upload a whole bunch of files - like 500 images - and do it all at once, or programmatically, and have the hash of each file be returned to me.
Any way to do this?
Yes! From the command line you get back the CIDs (the Content IDentifier, aka, IPFS hash) for each file added when you run ipfs add -r <path to directory>
$ ipfs add -r gifs
added QmfBAEYhJp9ZjGvv8utB3Yv8uuuxsDKjv9rurkHRsYU3ih gifs/martian-iron-man.gif
added QmRBHTH3p4W2xAzgLxvdh8VJvAmWBgchwCr9G98EprwetE gifs/needs-more-dogs.gif
added QmZbffnCcV598QxsUy7WphXCAMZJULZAzy94tuFZzbFcdK gifs/satisfied-with-your-care.gif
added QmTxnmk85ESr97j2xLNFeVZW2Kk9FquhdswofchF8iDGFg gifs/stone-of-triumph.gif
added QmcN71Qh56oSg2YXsEXuf8o6u5CrBXbyYYzgMyAkdkcxxK gifs/thanks-dog.gif
added QmTnuLaivKc1Aj8LBf2iWBHDXsmedip3zSPbQcGi6BFwTC gifs
the root CID for the directory is always the last item in the list.
You can limit the output of that command to just include the CIDs using the --quiet flag
⨎ ipfs add -r gifs --quiet
QmfBAEYhJp9ZjGvv8utB3Yv8uuuxsDKjv9rurkHRsYU3ih
QmRBHTH3p4W2xAzgLxvdh8VJvAmWBgchwCr9G98EprwetE
QmZbffnCcV598QxsUy7WphXCAMZJULZAzy94tuFZzbFcdK
QmTxnmk85ESr97j2xLNFeVZW2Kk9FquhdswofchF8iDGFg
QmcN71Qh56oSg2YXsEXuf8o6u5CrBXbyYYzgMyAkdkcxxK
QmTnuLaivKc1Aj8LBf2iWBHDXsmedip3zSPbQcGi6BFwTC
Or, if you know the CID for a directory, you can list out the files it contains and their individual CIDs with ipfs ls. Here I list out the contents of the gifs dir from the previous example
$ ipfs ls QmTnuLaivKc1Aj8LBf2iWBHDXsmedip3zSPbQcGi6BFwTC
QmfBAEYhJp9ZjGvv8utB3Yv8uuuxsDKjv9rurkHRsYU3ih 2252675 martian-iron-man.gif
QmRBHTH3p4W2xAzgLxvdh8VJvAmWBgchwCr9G98EprwetE 1233669 needs-more-dogs.gif
QmZbffnCcV598QxsUy7WphXCAMZJULZAzy94tuFZzbFcdK 1395067 satisfied-with-your-care.gif
QmTxnmk85ESr97j2xLNFeVZW2Kk9FquhdswofchF8iDGFg 1154617 stone-of-triumph.gif
QmcN71Qh56oSg2YXsEXuf8o6u5CrBXbyYYzgMyAkdkcxxK 2322454 thanks-dog.gif
You can it programatically with the core api in js-ipfs or go-ipfs. Here is an example of adding a files from the local file system in node.js using js-ipfs from the docs for ipfs.addAll(files) - https://github.com/ipfs/js-ipfs/blob/master/docs/core-api/FILES.md#importing-files-from-the-file-system
There is a super helpful video on how adding files to IPFS works over at https://www.youtube.com/watch?v=Z5zNPwMDYGg
And a walk through of js-ipfs here https://github.com/ipfs/js-ipfs/tree/master/examples/ipfs-101

How to associate hash-named .wsp files to my tagged graphite metrics?

I used graphite tagged metrics over grafana and whisper, but http://graphite/tags/delSeries removes something but not .wsp files.
And untagged metrics creates .wsp files in whisper data folder with human-readable names, but tagged metrics creates only hash-named folders and .wsp files in _tagged directory.
Like so:
/whisper
/data
/Players
registrations.wsp
today_registrations.wsp
/Gaming
playing_count.wsp
/_tagged
/f58
/010
f58010d4cef67599a31f4daaab4a53c4d7fd85a9faea546282d2058c40c7e7b9.wsp
/f56
/031
f56031052aec89dc9cc38e44dbe71b2eb08fb513a3e60d515eb1dc23f5b929d1.wsp
How to know .wsp file associated with my tagged metric?
I'm just running into that problem as well, how to map the actual path/tag metric to its corresponding hashed wsp file.
I don't think you can compute the actual metric name from the hash, but you can do the other way around, by using graphite's encoding methods.
I've quickly written a python script just for lab purpose:
- It can take several metric names in parameters and returns a mapping
Just log into your graphite host and create a python script in /opt/graphite/webapp/graphite/tags
#!/opt/graphite/bin/python3
import sys
from utils import TaggedSeries
for line in sys.stdin:
paths = line.split()
for path in paths:
# Normalize first
parsed = TaggedSeries.parse(path)
print( path + " -> /opt/graphite/storage/whisper/" + TaggedSeries.encode(parsed.path,'/',True) + ".wsp")
You can then pipe a list of metrics:
# echo "users.count;server=s1" |python mapper.py
users.count;server=s1 -> /opt/graphite/storage/whisper/_tagged/b6c/c91/b6cc916d608e4b145b318669606e79118cc41d316f96735dd43621db4fd2bcaf.wsp
You can also get all your tagged metrics and generate a file that you can later cat into the script. In this example i get all metrics associated associated with the tag 'server':
# curl -s "http://localhost/tags/findSeries?expr=server=~." | sed s/"\", \""/\\n/g > my_metrics
Then cat your metrics:
# cat my_metrics | python mapper.py
That's a starting point. From there you can easily do some simple scripting for deleting wsp files, like the ones not updated since a month by example.
graphite

Copy all files with given extension to output directory using CMake

I've seen that I can use this command in order to copy a directory using cmake:
file(COPY "myDir" DESTINATION "myDestination")
(from this post)
My problem is that I don't want to copy all of myDir, but only the .h files that are in there. I've tried with
file(COPY "myDir/*.h" DESTINATION "myDestination")
but I obtain the following error:
CMake Error at CMakeLists.txt:23 (file):
file COPY cannot find
"/full/path/to/myDIR/*.h".
How can I filter the files that I want to copy to a destination folder?
I've found the solution by myself:
file(GLOB MY_PUBLIC_HEADERS
"myDir/*.h"
)
file(COPY ${MY_PUBLIC_HEADERS} DESTINATION myDestination)
this also works for me:
install(DIRECTORY "myDir/"
DESTINATION "myDestination"
FILES_MATCHING PATTERN "*.h" )
The alternative approach provided by jepessen does not take into account the fact that sometimes the number of files to be copied is too high. I encountered the issue when doing such thing (more than 110 files)
Due to a limitation on Windows on the number of characters (2047 or 8191) in a single command line, this approach may randomly fail depending on the number of headers that are in the folder. More info here https://support.microsoft.com/en-gb/help/830473/command-prompt-cmd-exe-command-line-string-limitation
Here is my solution:
file(GLOB MY_HEADERS myDir/*.h)
foreach(CurrentHeaderFile IN LISTS MY_HEADERS)
add_custom_command(
TARGET MyTarget PRE_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${CurrentHeaderFile} ${myDestination}
COMMENT "Copying header: ${CurrentHeaderFile}")
endforeach()
This works like a charm on MacOS. However, if you have another target that depends on MyTarget and needs to use these headers, you may have some compile errors due to not found includes on Windows. Therefore you may want to prefer the following option that defines an intermediate target.
function (CopyFile ORIGINAL_TARGET FILE_PATH COPY_OUTPUT_DIRECTORY)
# Copy to the disk at build time so that when the header file changes, it is detected by the build system.
set(input ${FILE_PATH})
get_filename_component(file_name ${FILE_PATH} NAME)
set(output ${COPY_OUTPUT_DIRECTORY}/${file_name})
set(copyTarget ${ORIGINAL_TARGET}-${file_name})
add_custom_target(${copyTarget} DEPENDS ${output})
add_dependencies(${ORIGINAL_TARGET} ${copyTarget})
add_custom_command(
DEPENDS ${input}
OUTPUT ${output}
COMMAND ${CMAKE_COMMAND} -E copy_if_different ${input} ${output}
COMMENT "Copying file to ${output}."
)
endfunction ()
foreach(HeaderFile IN LISTS MY_HEADERS)
CopyFile(MyTarget ${HeaderFile} ${myDestination})
endforeach()
The downside indeed is that you end up with multiple target (one per copied file) but they should all end up together (alphabetically) since they start with the same prefix ORIGINAL_TARGET -> "MyTarget"