How to watch content from a subfolder of a parent folder using watchman? - watchman

I am developing an image uploader by watching files. The images are enclosed in a folder (subfolder) and that folder is being copied in a parent folder that is being watched.
Example:
~/parent/subfolder1/image1.png
~/parent/subfolder2/image2.png
How do I watch these images using the parent folder argument being passed to watchman? or How do I include subfolders?
This is what I have now:
export PARENT_FOLDER = ~/parent/
export UPLOADER_SCRIPT = ~/upload_image.sh
watchman -- trigger $PARENT_FOLDER upload_image -- $UPLOADER_SCRIPT
Inside the upload_image.sh is just an upload function with the image file path as argument
python upload.py image1.png

Use the extended syntax: http://facebook.github.io/watchman/docs/trigger#extended-syntax so that you can define richer matching expressions. For example, this will match any .png file using a glob expression, but you can use any of the expression terms to scope this exactly how you need it:
$ watchman -j <<-EOT
["trigger", "/path/to/parent", {
"name": "upload_image",
"expression": ["match", "**/*.png"],
"command": ["/path/to/upload_image.sh"]
}]
EOT
You may also want to take a look at anyof as a way to combine multiple criteria.

Related

How to fetch file path dynamically using pyspark

I have multiple files in my folder , i want to pattern match if any file is present , if that file is present then store the variable with whole file path.
how to achieve this in pyspark
Since you want to store the whole path in a variable, you can achieve this with a combination of dbutils and Regular expression pattern matching.
We can use dbutils.fs.ls(path) to return the list of files present in a folder (storage account or DBFS). Assign its return value to a variable called files.
#my sample path- mounted storage account folder.
files = dbutils.fs.ls("/mnt/repro")
Loop through this list. Now using Python's re.match() you can check if the current item's file name matches your pattern. If it matches, append its path to your result variable (list).
from re import match
matched_files=[]
for file in files:
#print(file)
if(match("sample.*csv", file.name)): #"sample.*csv" is pattern to be matched
matched_files.append(file.path)
#print("Matched files: ",matched_files)
Sample output:

Databricks PySpark environment, find Azure storage account file path of files having same filename pattern

Use Case: In Databricks PySpark environment, I want to check if there are multiple files with same file name pattern existing in the Azure storage account. If they exist, I expect to get the list of file path locations for each file matched.
Tried using, dbutils.fs.ls, but it does not support the wildcard pattern. PFA.
Workaround: Get paths of all files in the folder and then loop over each file to do filename pattern matching and prepare a list of required file paths.
Do let me know, if there is any other way to get the file paths, without looping over?
In Databricks, dbutils.fs.ls() doesn’t support wildcard paths. This official documentation consists of all the Databricks utilies and there is no dbfs utility function that helps to use wildcard paths for matching file names.
You cannot proceed further without using loops. The following operations are done using a storage account with random files for demo. This demonstrates a way you can use to get the files that match your pattern.
Using os.listdir() function, you can get the list of all files in your container/directory.
path_dbfs="dbfs:/mnt/omega/" #absolute dbfs path to your storage
import os
#using os.listdir() to get all files in container.
path = "/dbfs/mnt/omega"
file_names = os.listdir(path)
print(file_names)
['country_data.csv', 'json_input.json', 'json_input.txt', 'person.csv', 'sample_1.csv', 'sample_2.csv', 'sample_3.csv', 'sample_new_date_4.csv', 'store.txt']
Once you have list of all files, you can use regular expressions with re.search() and match object property group() to check whether each file matches the pattern or not.
import re
#use regex with loops to get absolute paths of pattern matching files.
file_to_find_pattern = "sample.*csv" #match pattern in this case.
# .* indicates 0 or more occurances of other characters, you can build it according to your requirement.
matched_files = []
for file in file_names:
val = re.search(file_to_find_pattern,file)
if(val is not None):
matched_files.append(path_dbfs+val.group())
print(matched_files)
['dbfs:/mnt/omega/sample_1.csv', 'dbfs:/mnt/omega/sample_2.csv', 'dbfs:/mnt/omega/sample_3.csv', 'dbfs:/mnt/omega/sample_new_date_4.csv']

when statement to exclude specified folders and files from showing an extension command in a context menu

In my VS Code extension, I have some commands I wish to have available in the explorer view, and want their availability to be based on what's selected. Let's say I open a folder in VS Code, and the folder has the following:
/a
/a/1.txt
/a/2.txt
/b
/b/3.txt
/b/4.txt
/c
/c/5.txt
/c/6.txt
I wish for a command to be present when folder "c" or anything within it is selected, but not a or b.
Also, I want the logic to exclude specific folders (and their contents), not include specific folders (eg I want the logic to be, "the command is available to everything except a or b")
In my extension's activate() function, I have:
vscode.commands.executeCommand(
'setContext',
'ext.list',
[
'a',
'b'
]
);
And in my package.json, I have:
{
"command": "samplecommand",
"when": "resourceFilename in ext.list"
}
When I run this, I find that the command is available when I have folders a or b selected, but not when I have any of the items within it selected (like a/1.txt). I tried using a/*, a/.*, and a/*.* but I haven't found anything which works.
So, when using "resourceFilename in", does it support regex? Or does it only do exact string matching? How can I get a positive match when either the folder or one of the items within the folder is selected?
OK, next, I want to exclude the command anyway, and when I tried:
{
"command": "samplecommand",
"when": "!(resourceFilename in ext.list)"
}
...or:
{
"command": "samplecommand",
"when": "resourceFilename !in ext.list"
}
...the command doesn't appear for either a, b, or c. Is it possible to test for when the filename is not in the list? It appears this is always evaluating to false.
So, TL;DR, I'd like to filter commands in the Explorer view, and have then not appear for specific folders, and items within the folder. Is this possible?
First question:
You have
"contributes": {
"menus": {
"explorer/context": [
{
"command": "ext.samplecommand",
"when": "resourceFilename in ext.list"
}
]
}
}
and an 'ext.list', [ 'a', 'b' ]
When I run this, I find that the command is available when I have
folders a or b selected, but not when I have any of the items within
it selected (like a/1.txt). I tried using a/*, a/.*, and a/*.* but I
haven't found anything which works.
But resourceFilename resolves to something like a for a folder and myFile.js for a file. So "resourceFilename in ext.list" works as you expect for folders, since resourceFilename resolves to a or b if they are selected. But it won't help for files since resourceFilename does not resolve to the folder name but to a file name.
To get the menu item to show for both folders and their included files you could do this:
"menus": {
"explorer/context": [
{
"when": "resourceFilename in ext.list || !explorerResourceIsFolder && resourceDirname =~ /(\\/|\\\\)(bundle|concat)$/",
"command": "ext.samplecommand"
}
]
}
where (bundle|concat) represent dirname's you want the files of which to be included.
Which answers your question about regex's being supported in when clauses. Yes, see key-value when clause operator:
There is a key-value pair match operator for when clauses. The
expression key =~ value treats the right-hand side as a regular
expression to match against the left-hand side.
The above uses resourceDirname =~ /(\\/|\\\\)(bundle|concat)$
that will check the resourceDirname against ending in \bundle or /bundle or \concat or /concat (with lots of escapes before the back or forward slashes).
Now, to get to your final destination - filtering the menu to not show commands for specific folders and files.
I do not think there is a way to negate the resourceFilename in ext.list "operator". I seem to recall a github issue on it but I can't find it now.
Update: the not in operator is being added to vscode v1.70. See https://stackoverflow.com/a/70660631/836330. So that might solve your last negation question in a much cleaner fashion.
But there is another way since we can use a regex as we saw above in a when clause. But it has to be a little trickier because we are trying to exclude certain folders and files. Try this:
"menus": {
"explorer/context": [
{
"when": "explorerResourceIsFolder && resourcePath =~ /.*?(?<!(\\\\|\\\/)(bundle|concat))$/ || !explorerResourceIsFolder && resourceDirname =~ /.*?(?<!(\\\\|\\\/)(bundle|concat))$/",
"command": "ext.samplecommand"
}
]
},
explorerResourceIsFolder && resourcePath =~ /.*?(?<!(\\\\|\\\/)(bundle|concat))$
When the selected resource is a folder, check its resourcePath which will be its full file path ending in the folder name.
So resourcePath =~ /.*?(?<!(\\\\|\\\/)(bundle|concat))$ uses a negative lookbehind to match folder paths that do not end with a folder named bundle or concat in this example (your a and b).
!explorerResourceIsFolder && resourceDirname =~ /.*?(?<!(\\\\|\\\/)(bundle|concat))$/
When the selected resource is a file and not a folder, check its resourceDirname which will be its full file path ending in the folder name.
So like above this uses a negative lookbehind to match only files with dirnames that are not bundle or concat.
So in summary, since the resourceFilename in ext.list type of construct cannot be negated you have to use the key =~ value type of when clause and be sure to compare it to the right resource name. And "negate" that by using a regex negative lookaround of some sort.
To figure this out the Developer: Inspect Context Keys command is invaluable. It allows you to see what context keys are available for explorer folders and files and those keys' values. That is how I found that resourcePath was best for folders and resourceDirname for files - since in both cases the full paths ended with the folder name.
See Inspect Context Keys utility documentation.

Instagram media.json How to add them to exif?

I get my photos out of Instagram and i get zip file with all the photos what i have there but they dosnt have any exif data on them.
The zip file has also a json file called media.json where all these important metadata are. So is there any ways to get the metadata to these photos exif?
Exiftool can import things from files to exif but first i need to know what kind of format the metadata file has to be?
This is a example of a what the instagram media.json file content is and what kind of format:
{
"photos": [
{
"caption": "#nautitaan #kesä2019",
"taken_at": "2019-06-08T03:30:25",
"location": "Jokioinen",
"path": "photos/201906/b65bbda42ba74424a9d7be0c5163f78d.jpg"
},
{
"caption": "#lupanauttia #kesä2019",
"taken_at": "2019-06-07T07:42:38",
"location": "Jokioinen",
"path": "photos/201906/29fb24838136a1e80439ad7dcae00b4f.jpg"
}
]
}
I only need these taken_at entries, all the other things are just plus.
Exiftool can read JSON files. If you run the command exiftool -g1 -a -s on your example JSON file, you will get a list of tag names you can use to copy into your image file. Using your example, the result would be
[JSON] PhotosCaption : #nautitaan #kesä2019, #lupanauttia #kesä2019
[JSON] PhotosLocation : Jokioinen, Jokioinen
[JSON] PhotosPath : photos/201906/b65bbda42ba74424a9d7be0c5163f78d.jpg, photos/201906/29fb24838136a1e80439ad7dcae00b4f.jpg
[JSON] PhotosTaken_at : 2019-06-08T03:30:25, 2019-06-07T07:42:38
The problem now is that because there are multiple items for each tag name. Exiftool tool is very flexible about how it reads numbers for time stamps (see exiftool FAQ 5), so if the first entry is the correct one, you can simply use
exiftool -TagsFromFile FILE.Json "-DateTimeOriginal<PhotosTaken_at" FILE.jpg
If you want to use the second entry, you can use the -listitem option.
exiftool -listitem 1 -TagsFromFile FILE.Json "-DateTimeOriginal<PhotosTaken_at" FILE.jpg
Note that the list index starts at 0, so to get the second item, you would index #1.
To bulk copy, assuming that the base filename of the json file is the same as the image file and in the same directory, you could use this command
exiftool -TagsFromFile %d%f.Json "-DateTimeOriginal<PhotosTaken_at" /path/to/image/files/
This command creates backup files. Add -overwrite_original to suppress the creation of backup files. Add -r to recurse into subdirectories. If this command is run under Unix/Mac, reverse any double/single quotes to avoid bash interpretation.

Replace matches of one regex expression with matches from another, across two files

I am currently helping a friend reorganise several hundred images on a database driven website. I have generated a list of the new, reorganised image paths offline and would like to replace each matching image reference in the sql export of the database with the new paths.
EDIT: Here is an example of what I am trying to achieve
The new_paths_list.txt is a file that I generated using a batch script after I had organised all of the existing images into folders. Prior to this all of the images were in just a few folders. A sample of this generated list might be:
image/data/product_photos/telephones/snom/snom_xyz.jpg
image/data/product_photos/telephones/gigaset/giga_xyz.jpg
A sample of my_exported_db.sql (the database exported from the website) might be:
...
,(110,32,'data/phones/snom_xyz.jpg',3),(213,50,'data/telephones/giga_xyz.jpg',0),
...
The result I want is my_exported_db.sql to be:
...
,(110,32,'data/product_photos/telephones/snom/snom_xyz.jpg',3),(213,50,'data/product_photos/telephones/gigaset/giga_xyz.jpg',0),
...
Some pseudo code to illustrate:
1/ Find the first image name in my_exported_db.sql, such as 'snom_xyz.jpg'.
2/ Find the same image name in new_paths_list.txt
3/ If it is present, copy the whole line (the path and filename)
4/ Replace the whole path in in my_exported_db.sql of this image with the copied line
5/ Repeat for all other image names in my_exported_db.sql
A regex expression that appears to match image names is:
([^)''"/])+\.(?:jpg|jpeg|gif|png)
and one to match image names, complete with path (for relative or absolute) is:
\bdata[^)''"\s]+\.(?:jpg|jpeg|gif|png)
I have looked around and have seen that Sed or Awk may be capable of doing this, but some pointers would be greatly appreciated. I understand that this will only work accurately if there are no duplicated filenames.
You can use sed to convert new_paths_list.txt into a set of sed replacement commands:
sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt > rules.sed
The file rules.sed will look like this:
s#data/snom_xyz.jpg#image/data/product_photos/telephones/snom/snom_xyz.jpg#
s#data/giga_xyz.jpg#image/data/product_photos/telephones/gigaset/giga_xyz.jpg#
Then use sed again to translate my_exported_db.sql:
sed -i -f rules.sed my_exported_db.sql
I think in some shells it's possible to combine these steps and do without rules.sed:
sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt | sed -i -f - my_exported_db.sql
but I'm not certain about that.
EDIT<:
If the images are in several directories under data/, make this change:
sed "s|image/\(.*\(/[^/]*$\)\)|s#[^']*\2#\1#|" new_paths_list.txt > rules.sed