Data factory: File move - azure-data-factory

Data factory: File move - azure-data-factory

I am working on data factory and was wondering if there are any activities to just "move files" without actually reading them rather than "copy data" (which seems like does a read operation)?
I am trying to move files if any exist from one folder to another and if there are many files, since copy data reads each file, it makes the process slow.
Any suggestion. This is how my current data source looks like and all I want to do is, if there is any csv file exists at the location move it without reading it per say.

So here is a MSFT link I followed to move files.
https://learn.microsoft.com/en-us/azure/data-factory/solution-template-move-files
This tutorial was not very detailed when it comes to explaining everything. Like it assumes that the user needs parameters. I did as it said but my datasets were pointing to exactly where files need to be picked up and land, so I left the parameters empty. Debugging or running the trigger didn't move a file.. solution didn't work.
I had to remove the parameters created in the template to make this work. In case its helpful to some. File move started happening after that.
So lesson learned, empty parameters wont work. If you don't need them remove them.
Also, I watched this tutorial in case its helpful to some one.
https://www.youtube.com/watch?v=u_X_f4z8zoQ

Related

Regular Expressions at folder level in Data Factory

When creating a pipeline, I find myself with the need to bring all the files that are in folders called in the following way:
\folder1\folder2_202201\file1.parquet
\folder1\folder2_202204\file1.parquet
Like the folder2_?????? it will be changing all the time, I can't find a way to take a fixed path, since it is dynamic.
Is there any way to make a regular expression which can take the source like: \folder1\folder2_*\file1.parquet so that I can take everything that starts in folder?
thanks
regards
Osky

VSCode how to clear workspaceState data globally?

I have implemented a backup per workspace functionality in my extension using workspaceState. Since the data can be sensitive - I'd like to clear all workspaceStates on extension deactivation/uninstall.
The ExtensionContext provides no ability to clear all extension related data across different workspaces with their workspaceStates.
So I've considered saving data on the ExtensionContext globalState, tagging each entry with a workspace id. Problem is that the workspace namespace doesn't provide a way to uniquely identify the current workspace. I thought about hashing workspace name and path but both of these things are changeable and any change will destroy the pointer to the data. This is exactly why I cant just write files to internal folders. The only other solution I have is to write the backup data directly to the workspace and I'd like to avoid that.
How does VSCode maintain the knowledge of which workspaceState belongs to which workspace? How can I tie data to the workspace but have access from anywhere else in VSCode?

Side note: You should avoid saving sensitive data in general. And if necessary, try to encrypt it.
Anyway:
I don't have the full answers but i was researching something similar (An extension I use crashes due to now invalid settings in the WorkspaceState).
I found the Storage for the Workspace state in this folder (windows):
%appdata%\Code\User\workspaceStorage\
In there, you find a lot of folders with hex-based names. Inside those folders, I always found 2 Files named state.vscdb and state.vscdb.backup.
There usually is a 3rd file called workspace.json which helps you figure out if you are in the correct workspace. (but you'd have to iterate through all the folders - maybe there is a way to figure out the folder name coming from the extension API?)
If you open the state.vscdb-file you find something that looks quite like a serialized object set in my eyes. It does have some Seperator chars of unknown function. But you also find full paths or names in there that clearly origin from different modules of VSC - Including the extensions.
I don't need to worry about the other cached stuff i'm just gonna delete the whole folder to fix my current issue. But I'm pretty sure, one can figure out the way the file is built and edit out your sensitive data if one has to.
The state.vscdb.backup-file looks pretty much like what the name is telling you: they probably just make a copy of the other file every few minutes so you have a fallback position.

To add to the conversation, there are two SQLite state databases:
<user-data-dir>\User\globalStorage\state.vscdb
<user-data-dir>\User\workspaceStorage\<workspace.id>\state.vscdb
Depending on how VS Code was launched you could have a Single Folder Workspace or a Multi-Folder Workspace that is global or local. Globally, the data lives here:
Linux: $HOME/.config/Code/
OS X: $HOME/Library/Application Support/Code/
Windows: %APPDATA%\Code\
Locally, the data will be in the .vscode folder of the current workspace.
In my situation:
I open a new workspace.
Set it up as I want it to start every time.
Makes copies of the two SQLite databases.
Copy over the databases before launching VS Code.
This leads to a clean VS Code state.
To see how workspace.id is generated check this link.

why is my .gitattribute function not taking out my generated js file

Like many github users, I would benefit from being able to ignore a specific file from the language statistics. I'm generating/bundling a javascript file from react files and I want to exclude it from the statistics. Here's my .gitattributes file.
BlueSlide/static/js/homepage_compiled.js linguist-generated=true
I'm having trouble finding many examples on this, but the few that I do find look like this (https://help.github.com/en/articles/customizing-how-changed-files-appear-on-github) so I'm not sure why it's not working. Maybe it just takes time to update the statistics?

apparently the =true part breaks it, despite it being on there documentation... As soon as I got rid of it, it worked
BlueSlide/static/js/homepage_compiled.js linguist-generated

How to know if a Catpart is used in some product or not

I have hundreds of Catia V5 catparts and catproducts in a folder on hard disc. I want to know if a particular catpart is used in some catproduct or not. If it is not used in any product, I want to delete it and clean my hard disc. One way to do it is to open all catproducts one by one and check carefully they contain this model. This is cumbersome process and can lead to serious mistakes. Is there some automatic way to check it? If not, is it possible to write some macro for that purpose?

It is possible with a VBA script. If it's just Catpart file that your looking for in a product, then your script would work as follows
query your folder(s) for all catparts and catproducts.(use 2 dictionaries or arrays, one for each file type each)
Via a loop, Individually open and load each catproduct and essentially walk the tree and compare each child Catpart to your compiled list of catparts. If a match is found, movethe part to a new "white list"(dictionary or array)
Close the catproduct and check the next one.
Then, when all done, your original list(dictionary or array) will be your unused parts.
I'm not sure exactly how your models are built, but you may need to check for additional references/links in your catproducts (additional logic) before doing something like this.

AUCTeX: Get list of files for current document

I'm working on some function which should help the user when working with (la)tex documents. In order to provide some additional information to the user I need to get a list of all (la)tex files that belong to the document (read: compiled document) the user is currently working on. AUCTeX/RefTeX has already the facilities in place to define a master file on which all children files depend and from the looks of it there seems to be some internal list of files that belong to the current document.
However I don't find the appropriate piece of code or function to access this list... I don't even find the list, to be honest. Maybe someone can point me in the right direction.

You probably want to use (reftex-all-document-files). I can't remember the details of how to use it, but in tex-mode.el I ended up checking (and (fboundp 'reftex-scanning-info-available-p) (reftex-scanning-info-available-p)) before calling that function.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Data factory: File move - azure-data-factory

Related

Regular Expressions at folder level in Data Factory

VSCode how to clear workspaceState data globally?

why is my .gitattribute function not taking out my generated js file

How to know if a Catpart is used in some product or not

AUCTeX: Get list of files for current document

Categories

Resources