Regular Expressions at folder level in Data Factory - azure-data-factory

When creating a pipeline, I find myself with the need to bring all the files that are in folders called in the following way:
\folder1\folder2_202201\file1.parquet
\folder1\folder2_202204\file1.parquet
Like the folder2_?????? it will be changing all the time, I can't find a way to take a fixed path, since it is dynamic.
Is there any way to make a regular expression which can take the source like: \folder1\folder2_*\file1.parquet so that I can take everything that starts in folder?
thanks
regards
Osky

Related

Data factory: File move

I am working on data factory and was wondering if there are any activities to just "move files" without actually reading them rather than "copy data" (which seems like does a read operation)?
I am trying to move files if any exist from one folder to another and if there are many files, since copy data reads each file, it makes the process slow.
Any suggestion. This is how my current data source looks like and all I want to do is, if there is any csv file exists at the location move it without reading it per say.
So here is a MSFT link I followed to move files.
https://learn.microsoft.com/en-us/azure/data-factory/solution-template-move-files
This tutorial was not very detailed when it comes to explaining everything. Like it assumes that the user needs parameters. I did as it said but my datasets were pointing to exactly where files need to be picked up and land, so I left the parameters empty. Debugging or running the trigger didn't move a file.. solution didn't work.
I had to remove the parameters created in the template to make this work. In case its helpful to some. File move started happening after that.
So lesson learned, empty parameters wont work. If you don't need them remove them.
Also, I watched this tutorial in case its helpful to some one.
https://www.youtube.com/watch?v=u_X_f4z8zoQ

How to know if a Catpart is used in some product or not

I have hundreds of Catia V5 catparts and catproducts in a folder on hard disc. I want to know if a particular catpart is used in some catproduct or not. If it is not used in any product, I want to delete it and clean my hard disc. One way to do it is to open all catproducts one by one and check carefully they contain this model. This is cumbersome process and can lead to serious mistakes. Is there some automatic way to check it? If not, is it possible to write some macro for that purpose?
It is possible with a VBA script. If it's just Catpart file that your looking for in a product, then your script would work as follows
query your folder(s) for all catparts and catproducts.(use 2 dictionaries or arrays, one for each file type each)
Via a loop, Individually open and load each catproduct and essentially walk the tree and compare each child Catpart to your compiled list of catparts. If a match is found, movethe part to a new "white list"(dictionary or array)
Close the catproduct and check the next one.
Then, when all done, your original list(dictionary or array) will be your unused parts.
I'm not sure exactly how your models are built, but you may need to check for additional references/links in your catproducts (additional logic) before doing something like this.

How to build assetBundle without using Selection

I want to write my own editor that would allow me to search the whole Assets folder for object type of X and make an AssetBundle from them.
I went through all the manuals and docs and it seems that they all use the Selection class to build the Assets Bundle.
But what if there are 10 000+ objects scattered in different folders and I don't want to select each manually, yet I still want to track their dependencies. Is there a way to do so?
I have already tried editing Selection.objects, but as much as I understand it's limited to one concrete scene.
Is it possible to skip the whole Selection class and just use BuildPipeline.BuildAssetBundle() method? If yes, how to get the correct list of objects that can be used? Do I need to write my own deep threading loop to get the dependencies?
All you need is a list of objects. That's it. How you get them is irrelevant. You can do that with the Selection class, but if you have other means of finding the assets, that's no problem either.
As for dependencies, keep in mind that your final call will be something like:
BuildPipeline.BuildAssetBundle(main_asset, objects_array, path, BuildAssetBundleOptions.CollectDependencies | BuildAssetBundleOptions.CompleteAssets);
See that BuildAssetBundleOptions.CollectDependencies? It will make sure that all the necessary dependencies for the objects you've specified get included as well.

What is the closest thing MATLAB has to namespaces?

We have a lot of MATLAB code in my lab. The problem is there's really no way to organize it. Since all the functions have to be in the same folder to be called (or you have to add a bunch of folders to MATLAB's path environment variable), it seems that we're doomed have loads of files in the same folder, all in the global namespace. Is there a better way to organize our files and functions? I really wish there were some sort of module system...
MATLAB has a notion of packages which can be nested and include both classes and functions.
Just make a directory somewhere on your path with a + as the first character, like +mypkg. Then, if there is a class or function in that directory, it may be referred to as mypkg.mything. You can also import from a package using import mypkg.mysubpkg.*.
The one main gotcha about moving a bunch of functions into a package is that functions and classes do not automatically import the package they live in. This means that if you have a bunch of functions in different m-files that call each other, you may have to spend a while dropping imports in or qualifying function calls. Don't forget to put imports into subfunctions that call out as well. More info:
http://www.mathworks.com/help/matlab/matlab_oop/scoping-classes-with-packages.html
I don't see the problem with having to add some folder to Matlab's search path. I have modified startup.m so that it recursively looks for directories in my Matlab startup directory, and adds them to the path (it also runs svn update on everything). This way, if I change the directory structure, Matlab is still going to see all the functions the next time I start it.
Otherwise, you can look into object-oriented code, where you store all the methods in a #objectName folder. However, this may lead to a lot of re-writing code that can be avoided by updating the path (there is even a button add with subfolders if you add the folder to the path from the File menu) and doing a bit of moving code.
EDIT
If you would like to organize your code so that some functions are only visible to the functions that call them directly (and if you don't want to re-write in OOP), you put the calling functions in a directory, and within this directory, you create a subdirectory called private. The functions in there will only be visible to the functions in the parent directory. This is very useful if you have to overload some built-in Matlab functions for a subset of your code.
Another way to organize & reuse code is using matlab's object-oriented features. Each Object is customarily in a folder that begins with an "#" and has the file(s) for that class inside. (though the newer syntax does not require this for a class defined in a single file.) Using private folders inside class folders, matlab even supports private class members. Matlab's new class notation is relatively fully-featured, but even the old syntax is useful.
BTW, my startup.m that examines a well-known location that I do my SVN checkouts into, and adds all of the subfolders onto my path automatically.
The package system is probably the best. I use the class system (#ClassName folder), but I actually write objects. If you're not doing that, it's silly just to write a bunch of static methods. One thing that can be helpful is to put all your matlab code into a folder that isn't on the matlab path. Then you can selectively add just the code you need to the path.
So say you have two projects, stored in "c:\matlabcode\foo" and "c"\matlabcode\bar", that both use common code stored in "c:\matlabcode\common," you might have a function "setupPaths.m" like this:
function setupPaths(projectName)
basedir = fullfile('c:', 'matlabcode');
addpath(genpath(fullfile(basedir, projectName)));
switch (projectName)
case {'foo', 'bar'}
addpath(genpath(fullfile(basedir, 'common')));
end
Of course you could extend this. An obvious extension would be to include a text file in each directory saying what other directories should be added to the path to use the functions in that directory.
Another useful thing if you share code is to set up a "user specific/LabMember" directory structure, where you have different lab members save code they are working on. That way you have access to their code if you need it, but don't get clobbered when they write a function with the same name as one of yours.

Xcode - exclude files in a custom configuration - better way?

I'm trying to come up with a way to make it easy to switch out our "mock" data services and our live ones. Basically, we'll have live servers with real web services, but for whatever reason, a developer may want to load data from static files (file urls).
I figured I would solve this problem by creating categories that override the methods that fetch the data, thus leaving original code untouched (it has no concept of the "mock" data). I don't want to litter my code with #ifdef.
I can put an #ifdef at the very beginning of each file that has categories in it, and I can set a custom flag in the configuration settings, but I'd rather just have a way to include or exclude the files depending on the configuration. Is that possible? How do you solve this problem?
See http://lists.apple.com/archives/xcode-users/2009/Jun/msg00153.html
The trick is to define EXCLUDED_SOURCE_FILE_NAMES in the configuration you want to exclude the files from, and set the value of that custom build setting to a list of the file names (or a pattern that matches those, and only those, file names).
I would recommend creating two targets one of which has the mock categories included and another one which does not.
When you want to test, just build the target containing the mock categories. Everything else can remain identical.
If you would like to add a file but do not wont to compile it. Go to (for all your targets) project>build phases>compile source and take out the file that you do not want to compile.