How could I merge multiple (.stls) and save as a single (.stl) file?
I would like to save all the (.stls) in the following as a single (.stl) file in Paraview.
Select all the STL sources and run Filters -> Append Datasets. Save the result.
Related
We have a folder structure in data lake like this:
folder1/subfolderA/parquet files
folder1/subfolderB/parquet files
folder1/subfolderC/parquet files
folder1/subfolderD/parquet files
etc.
All the parquet files have the same schema, and all the parquet files have, amongst other fields, a code field, let's call it code_XX.
Now I want from all parquet files in all folders the distinct value of code_XX.
So if code_XX, value 'A345' resides multiple times in the parquet files in subfolderA and subfolderC, I only want it once.
Output must be a Parquet file with all unique codes.
Is this doable in Azure Data Factory, and how?
If not, can it be done in Databricks?
You can try as below.
Set source folder path to recursively look for all parquet files and choose a column to store the file names.
As it seems you only need file names in output parquet file, use select to have only that column forward.
Use expression in derived column to get the file names from path string.
distinct(array(right(fileNames,locate('/',reverse(fileNames))-1)))
If you have access to SQL, it can be done with two copy activities, no need for data flows.
Copy Activity 1 (Parquet to SQL): Ingest all files into a staging table.
Copy Activity 2 (SQL to Parquet): Select DISTINCT code_XX from the staging table.
NOTE:
Use Mapping to only extract the column you need.
Use a wildcard file path with the recursive option enabled to copy all files from subfolders. https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory#blob-storage-as-a-source-type
In ADLS Gen2, TextFiles folder has 3 CSV files. Column names are different in each file.
We need to convert all 3 CSV files to 3 parquet files and put it in ParquetFiles folder
I tried to use Copy Activity and it fails because the column names have empty space in it and parquet files doesn't allow it
To remove spaces, I used Data flow: Source -> Select (replace space by underscore in col name) and sink. This worked for a single file. When I tried to do it for all 3 files, it tries to merge 3 files and generates single file with incorrect data.
How to solve this, mainly removing spaces from column names in all files. What would be the other options here?
Pipeline: ForEach activity (loop over CSV files in folder and send in current iteration item to data flow as param) -> Data Flow activity with source that points to that folder (parameterize the file name in the source path)
I created 2 datasets, one in csv with wildcard format, the other in parquet. I used the Data Copy Activity using the parquet data set as sink and csv data set as source. I set the copy behavior to Merge files.
I want to read properties from my source file, and the add the properties to all records in the file itself
So I'd like to join the Element reading the FileProperties to all rows from the data...
Data1 FileProperty1 Fileproperty2
Data2 FileProperty1 FileProperty2
In fact, I just want to add the columns from the property dataset, to each row from the data... how can I do that ? I try merge and lookup, but I don't have any Id to match witch, just need to append...
Create your file property variables
Add a script task to control flow:
Add variables as read/write
Add the following script (I gave you 1 property as an example)
System.IO.FileInfo fi = new System.IO.FileInfo("[Your File Path]");
Dts.Variables["FileSize"].Value = fi.Length;
Add your dataflow and connect to script task (script first)
Add a derived column to DF and add additional columns for your variables.
Use your property file as a source to a pivot transform to transform the rows into a list of columns. Now that you have all the properties in a single row, simply use the pivot output as a source, and use merge transform to merge these new columns with the other file.
More info on how to use Pivot transform
I have three files which are having the same schema,
A1(file) received at 12:30:000.00,
A2(file) received at 12:35:000.00,
A3(file) received at 12:40:000.00.
Now I want to fetch the latest file which is A3.
Note: I have used to tfilelist component to fetch the file.
Talend Docs for tFileList:
Order by:
By modified date: most recent to least recent or least recent to most recent.
The Talend Knowledge Base has a load of information about components. Also, the components speak mostly for themselves if you examine them a bit.
tFileList --> tFileProperties --> tJavaRow
tFileList to iterate over the file list
tFileProperties to get files properties
tJavaRow to save the filepath (using a global variable) for the file with the greatest value for mtime field
After that, tFileInputDelimited using the global variable for filename
You can create a job with these components:
tFileList -> tFileProperties -> tAggregateRow -> tLogRow (or any output component)
In tFileList provide the Directory Path.
tFileProperties contains schema corresponding to the properties of the file like basename, Modified time, Absolute Path etc.
In tFileProperties pass the global variable for the filepath ie ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")).
In tAggregaterow under Operations section select the columns to be displayed & use Max function for mtime_string column.
I have 5 files in a folder.file names are stored in date format like "2015-09-10.txt" to "2015-09-15.txt".
if I give starting file name as 2015-09-11.txt and end file is 2015-09-13.txt then it will read all the files present in between these two files(i.e read 11,12 and 13 date files).and load data into database. the other files will not insert in database.
my current Talend Package is :
tFileList -> tFileInputDelimited -> tMapProcessing -> tMysqlOutput.
You can use this file mask in the tFileList:
"2015-09-1[1-3]"
if you have something more complicated, generate the file name using tJavaFlex and iterate over file names:
tJavaFlex ------(iterate)------tFileInputDelimited-------(main)------tMap-------(main)--- tMysqlOutput