How can I extract a single file from a ZIP archive using Perl's Archive::Zip? - perl

I have a zip file X and I'd like do extract a single file, located in x/x/x/file.txt. How do I do this using Archive::Zip and Perl?

You can use the extractMember method:
extractMember( $memberOrName [, $extractedName ] )
Extract the given member, or match its name and extract it. Returns undef if member doesn't exist in this Zip. If optional second arg is given, use it as the name of the extracted member. Otherwise, the internal filename of the member is used as the name of the extracted file or directory. If you pass $extractedName, it should be in the local file system's format. All necessary directories will be created. Returns AZ_OK on success.

See Archive::Zip::FAQ, "extract file(s) from a Zip". The current version of the example file is online at http://cpansearch.perl.org/src/ADAMK/Archive-Zip-1.30/examples/extract.pl.

Related

Copy Each '.txt' File into respective date folder Based on Date in Filename using data factory

``
I have to copy files from source folder to target folder both are in the same storage account(ADL). The files in the source folder are of in .txt format and have date appended in the file name,
eg: RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
and
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
(20221201 and 20221202 is date in file name , date format: yyyymmdd)
I have to create a pipeline that will sort and store files in the folders in ADL's in this hierarchy
ex: adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
even if we have multiple files on same date in file name based on that date in file name it has to create year(YYYY) folder and in year(YYYY) folder it should create month(MM) folder and in month(MM) folder it should create date(DD) folder like above example. Each File should copy into respective yyyy and respective mm and respective date folder.
What I have done:
In Get Metadata - Given argument to extract **childitems**
For each activity that contains a Copy activity.
In Copy activity source wildcard path is given as *.txt
for sink took concat expression using split and substring functions
Please check the screenshots of all activities and expressions
but this pipeline is creating the folders based on date in file name (like adl/2022/12/01)
but problem is it was copying all files into all date(DD) folders
(like adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt)
1.[GET META to extract child items](https://i.stack.imgur.com/GVYgZ.png)
2.[Giving GET META output to FOREACH](https://i.stack.imgur.com/cbo30.png)
3.[Inside FOREACH using COPY ](https://i.stack.imgur.com/U5LK5.png)
4.[Source Data Set](https://i.stack.imgur.com/hyzuC.png)
5.[Sink Data Set](https://i.stack.imgur.com/aiYYm.png) Expression used in Data Set in Folder Path '#concat('adl','/'dataset().FolderName)
6.[Took parameter for Sink](https://i.stack.imgur.com/QihZR.png)
7.[Sink in copy activity ](https://i.stack.imgur.com/4OzT5.png)
Expression used in sink for dynamic folders using split and substring function
#concat(substring(split(item().name,'.')[3],0,4),'/',
substring(split(item().name,'.')[3],4,2),'/',
substring(split(item().name,'.')[3],6,2)
)
**OUTPUT for this pipeline**
adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
**Required Output is**
adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
(i.e each file should copy to respective date folders only even if we have multiple files in same date, they should copy to date folders based on date in file name)
I have reproduced the above and got same result when I followed the steps that you have given.
Copy activity did like this because, in source or sink you did not gave #item().name(file name for that particular iteration) and you have given *.txt in the wildcard path of source in copy activity.
It means for every iteration(for every file name) it copies all .txt files from source into that particular target folder(same happened for you).
To avoid this,
Give #item().name in source wild card file name
It means we are giving only one that iteration file name in the source for the copy.
(OR)
Keep the wildcard file name in source as it is(*.txt) and create a sink dataset parameter for file name.
and give #item().name to it in copy activity sink.
You can do any of the above and if you want you can do both at a time. I have checked all the 3 scenarios like
1.#item().name in wild card sink file name.
2. #item().name in dataset file name by keeping wildcard path same.
3. combining both 1 and 2(#item().name in wild card file name and in sink dataset parameter).
All are working fine and giving desired result.

How to fetch file path dynamically using pyspark

I have multiple files in my folder , i want to pattern match if any file is present , if that file is present then store the variable with whole file path.
how to achieve this in pyspark
Since you want to store the whole path in a variable, you can achieve this with a combination of dbutils and Regular expression pattern matching.
We can use dbutils.fs.ls(path) to return the list of files present in a folder (storage account or DBFS). Assign its return value to a variable called files.
#my sample path- mounted storage account folder.
files = dbutils.fs.ls("/mnt/repro")
Loop through this list. Now using Python's re.match() you can check if the current item's file name matches your pattern. If it matches, append its path to your result variable (list).
from re import match
matched_files=[]
for file in files:
#print(file)
if(match("sample.*csv", file.name)): #"sample.*csv" is pattern to be matched
matched_files.append(file.path)
#print("Matched files: ",matched_files)
Sample output:

Databricks PySpark environment, find Azure storage account file path of files having same filename pattern

Use Case: In Databricks PySpark environment, I want to check if there are multiple files with same file name pattern existing in the Azure storage account. If they exist, I expect to get the list of file path locations for each file matched.
Tried using, dbutils.fs.ls, but it does not support the wildcard pattern. PFA.
Workaround: Get paths of all files in the folder and then loop over each file to do filename pattern matching and prepare a list of required file paths.
Do let me know, if there is any other way to get the file paths, without looping over?
In Databricks, dbutils.fs.ls() doesn’t support wildcard paths. This official documentation consists of all the Databricks utilies and there is no dbfs utility function that helps to use wildcard paths for matching file names.
You cannot proceed further without using loops. The following operations are done using a storage account with random files for demo. This demonstrates a way you can use to get the files that match your pattern.
Using os.listdir() function, you can get the list of all files in your container/directory.
path_dbfs="dbfs:/mnt/omega/" #absolute dbfs path to your storage
import os
#using os.listdir() to get all files in container.
path = "/dbfs/mnt/omega"
file_names = os.listdir(path)
print(file_names)
['country_data.csv', 'json_input.json', 'json_input.txt', 'person.csv', 'sample_1.csv', 'sample_2.csv', 'sample_3.csv', 'sample_new_date_4.csv', 'store.txt']
Once you have list of all files, you can use regular expressions with re.search() and match object property group() to check whether each file matches the pattern or not.
import re
#use regex with loops to get absolute paths of pattern matching files.
file_to_find_pattern = "sample.*csv" #match pattern in this case.
# .* indicates 0 or more occurances of other characters, you can build it according to your requirement.
matched_files = []
for file in file_names:
val = re.search(file_to_find_pattern,file)
if(val is not None):
matched_files.append(path_dbfs+val.group())
print(matched_files)
['dbfs:/mnt/omega/sample_1.csv', 'dbfs:/mnt/omega/sample_2.csv', 'dbfs:/mnt/omega/sample_3.csv', 'dbfs:/mnt/omega/sample_new_date_4.csv']

Matlab load file in path of script

I have a matlab script that wants to load a .mat file that is in a directory fixed relative to the location of the script. The script itself could be in different places relative to the current working directory, so the location of the .mat file is not known relative to it. How do I specify the location of the file to load relative to the script that is executing?
The function mfilename returns the name of the currently running script. This however does not return the full path to the script. You probably want this and so you can specify the 'fullpath' option to return the full path to the actual script itself, including the name of the script.
You just want the actual directory of where the file is, and so first use mfilename to get the full path to the actual file, then use fileparts to actually extract the actual directory of where the file is. fileparts returns the directory of where the file is, the file name itself and the extension. You just want the first output argument and don't care about the other outputs. Once you have this, you can then use the actual directory then append this string with the location of your .mat file:
p = mfilename('fullpath');
[pathstr,~,~] = fileparts(p);
d = fullfile(pathstr, 'path', 'to', 'your', 'file.mat');
fullfile builds a directory string that is OS independent, so for each subdirectory you want to indicate to get to your .mat file, place these as separate input strings up until you reach the file you want. d will contain the full path of your .mat file relative to the currently running script, which you can then use to load accordingly.

check file existance in progress 4GL

How to check existance of particular file by use of code.
Eg.
def var a as character.
a = "abc.p"
run value(a).
---> here first i want to check if abc.p exist in workspace or not.
You can use the SEARCH function. Directly from the online manual:
SEARCH function
Searches the directories and libraries defined in the PROPATH environment variable for a file. The SEARCH function returns the full pathname of the file unless it is found in your current working directory. If SEARCH does not find the file, it returns the Unknown value (?).
Syntax
SEARCH ( opsys-file )
opsys-file
A character expression whose value is the name of the file you want to find. The name can include a complete or partial directory path. If opsys-file is a constant string, you must enclose it in quotation marks (" "). The value of opsys-file must be no more than 255 characters long.
Example:
DEFINE VARIABLE cPgm AS CHARACTER NO-UNDO.
cPgm = "test.p".
IF SEARCH(cPgm) <> ? THEN
RUN VALUE(cPgm).
If you provide a fully qualified pathname, SEARCH checks if the file exists. In this case, SEARCH does not search directories on the PROPATH.
If you do not want to use the propath you can use the FILE-INFO system handle.
After setting FILE-NAME, you can check the FILE-TYPE if it exists. See also the Progress Help for FILE-INFO.
FILE-INFO:FILE-NAME = a.
IF FILE-INFO:FILE-TYPE MATCHES "*F*"
THEN RUN VALUE(FILE-INFO:FULL-PATHNAME).