how to Load data from last modified files within one day from subfolders Azure Data Flow - azure-data-factory

I have the following directory structure on an Azure container:
-dwh-prod
-Main_Folder
-2021-01
-file1.parquet
-2021-02
-file2.parquet
-file3.parquet
where the Data is partitioned by year and month to create subfolders. Within these sub-folders, I have my data files. I want to load into my data flow only the latest files that were added within one day from running my data flow pipeline.
I tried using currentUTC() in End Time and subtracting one day -> AddDays(currentUTC(), -1) in Start Time in the 'Filter by last modified' option provided in source options but it didn't work.
I also tried using currentTimestamp() instead but to no avail.
How do I go about solving this?

Your expression is correct. Please change the folder path from MainFolder to Main_folder in your dataset and set Main_Folder/*/*.parquet as your Wildcard paths in your Source option. Then it will work.

I think your solution is close, but I'm not sure the folder name is sufficient. I'm also not familiar with "currentUTC". The correct function should be utcNow.
Below is an outline of how I would approach this problem.
Source Dataset
Add a Parameter for the subfolder (year-month):
and then set the Folder path to an expression like:
Pipeline
You could either pass in the subfolder or calculate it at runtime. My preference would be to pass it in as a parameter:
I would then add variables to calculate the start and end times. Since you are running this daily, I would be sure to force the time to the START of the day(s). This should handle any vagaries based on run time. Also, I would use the built in getPastTime function:
Now use these objects in your Source configuration:

Related

ADF Copy only when a new CSV file is placed in the source and copy to the Container

I want to copy the file from Source to target container but only when the Source file is new
(latest file is placed in source). I am not sure how to proceed this and not sure about the syntax to check the source file greater than target. Should i have to use two get metadata activity to check source and target last modified date and use if condition. i tried few ways but it didn't work.
Any help will be handy
syntax i used for the condition is giving me the error
#if(greaterOrEquals(ticks(activity('Get Metadata_File').output.lastModified),activity('Get Metadata_File2')),True,False)
error message
The function 'greaterOrEquals' expects all of its parameters to be either integer or decimal numbers. Found invalid parameter types: 'Object'
You can try one of the Pipeline Templates that ADF offers.
Use this template to copy new and changed files only by using
LastModifiedDate. This template first selects the new and changed
files only by their attributes "LastModifiedDate", and then copies
them from the data source store to the data destination store. You can
also go to "Copy Data Tool" to get the pipeline for the same scenario
with more connectors.
View
documentation
OR...
You can use Storage Event Triggers to trigger the pipeline with copy activity to copy when each new file is written to storage.
Follow detailed example here: Create a trigger that runs a pipeline in response to a storage event

Azure Factory v2 Wildcard

I am trying to create a new dataset in ADF that looks for csv files that meet a certain naming convention. These files are located within a series of different folders in my Azure Blob Storage.
For instance, in the sample directory below, I am trying to pull out csv files that contain the word "cars".
Folder A
fastcars.csv
fasttrucks.csv
Folder B
slowcars.csv
slowtrucks.csv
Ideally , I would end up with the files "slowcars.csv" and "fastcars.csv". I've seen examples out there were people were able to wildcard the file name. I have been playing around with that, but have had no luck. (See image below for one example of what I have been doing).
Is what I am trying to do even possible? Would appreciate any advice you guys may have. Please let me know if I can provide further clarification.
According to the description of filename in this documentation,
The file name under the given fileSystem + folderPath. If you want to
use a wildcard to filter files, skip this setting and specify it in
activity source settings.
so you need to specify it in activity not in file path.
A easy sample in copy activity:
Hope this can help you.

How to check generated file has been modified in Eclipse plugin development?

Currently the plugin will generate a series of files in an IProject, I need to check whether the generated file has been modified by user before. If the generated artifact has been modified by user, I will need to handle the regeneration differently.
What I can think of is by checking Creation Date == Modified Date . The fact that I will delete the old file and create it again when user has not touched the file before to make sure the Creation Date always equals Modified Date. However I did not see how to retrieve these 2 properties from IFile. Anyone can help me regarding this?
I am quite new to Eclipse plugin development, can anyone suggest another way around this ?
*** Generated files cannot be locked as those are source codes
The modification stamp of an IFile or more generally an IResource can be obtained with getModificationStamp(). The return value is not strictly a time stamp but should serve your needs, see the JavaDoc for details.
If, however, you would like to track whether the content of a file was changed I would rather compute a hash of the content, for example with a MessageDigest. You can then compare the two hashes to decide whether the file was changed.
This latter approach would regard a file as unchanged if it was changed - saved - changes reverted - saved again. The modification stamp on the other hand would declare the file as changed even though its content is the same again.
Whichever approach you choose, you can store the modification stamp (or content hash) at generation time by using IResource#setPersistentProperty() and later compare it with the current modification stamp. Persistent properties are stored on disk with the platform metadata and maintained across platform shutdown and restart.
I found the answer:
private boolean isModified(IFile existingFile) throws CoreException {
IFileState[] history = existingFile.getHistory(NullProgessMonitor);
return history.length > 0;
}
This feature is maintained by eclipse IDE so it will survive the restarting of eclipse. If the file has been created without modification , the history state is zero.
You can clear local history by doing:
existingFile.clearHistory(NullProgessMonitor);

Jenkins How can i upload a text file and use it as a parameter

I have a txt file that is holding a string inside, I want to be able to use this string in one of my scripts, so I'm wondering if there is a way to set the content of the file as one of the build properties or parameters which I'll be able to use in my scripts it should be the same as using one of the build environment properties.
For example : ${JOB_NAME} which is holding the the job name, so in the same way I want to access the content of the file which is holding some value inside.
Is it possible?
You can upload a file from your computer to the workspace through the File parameter of the job.
You can use Extended Choice plugin parameter, to read value(s) from a file and display them in a dropdown/radio-button/checkbox for the user to select, dynamically, every time the build is triggered.
You can use EnvInject plugin to read value(s) from a file and inject them into the build as environment variables, so that they can be used by the rest of the build steps/scripts.
Your question is very unclear on what your are trying to do. Pick one of the 3 methods above based on what you need, or clarify your question.

Creation Date of Compiled Executable (VC++ 2005)

The creation date of an executable linked in VS2005 is not set to the real creation-date of the .exe file. Only a complete re-build will set the current date, a re-link will not do it. Obviously the file is set to some date, which is taken from one of the project-files.
So: is there a way to force the linker to set the creation-date to the real link-date?
­­­­­­­­­­­­­­­­­­­­­­­­­­
Delete the executable as part of a pre-link event.
Edit:
Hah, I forgot about Explorer resetting the creation date if you name a file exactly the same as a file that was recently deleted.
Why are you keying off the creation date anyway?
A complete rebuild will delete that file forcing the linker to create it, hence the reason it gets a new creation date. You could try disabling incremental linking under project properties (Linker | General). If that doesn't do it you could add a build event to delete the exe file and force it to create a new file each time. Both of these things could increase your build time.
Deleting the executable doesn't do the job. That's the problem. Also I could not identify any projectfile, whose datetime was the same as the later linked executable. That lets me conclude, that the 'creation date' is an information taken from within some project-file.
The project has 400000 lines, so a full build is no option.
What about using somethign like DirDate (or write a little util yourself) to set the creation date and call it from the post-build step?