External files in SAS DI - sas-dis

*I am facing one concern when I executed Job in SAS DI to populate records in .dsv file then I am getting columns headers in multiple line instead in a single line, after column 257 external file is breaking the column values in next line so please help me out to get resolution of this *
I have deleted the external file and re-created with a new template and even tried to enhance the logical length in File Parameter of External file but still on same page as I am new in SAS DI so not able to find the solution.

Related

Exporting the results of an Anylogic experiment

I have built my model and run the experiment. I cannot seem to find where the data is stored.
I now need to conduct several runs and compare the results, I am using normally distributed repair times so the results should vary between runs without modifying parameters.
How can I keep the results of each run and then present them all in the same data set?
There are two main options for getting data out of your simulation
Using the internal AnyLogic database
Using external files like Excel or txt
Step 1: Setup your objects
Internal Database
Create an empty table with the columns you require
External object
Setup either an Excel or text file using the objects provided by AnyLogic in the Connectivity palette
Step 2: Saving your data
For both cases you need to write your data to the object of your choosing, either as the data gets generated or at the end of the simulation model
using Internal DB
The best option is to write data using the following command
insertInto(table_name)
.columns(column_name)
.values(value);
This will just insert a new line into a database table that you created, you can save multiple values to multiple columns by adding comma-separated entries into the parameters for columns and values.
e.g
insertInto(temeprature_output_table)
.columns(scenario_name, time, temperature)
.values("sceanrio1", 10,5, 102);
External files
2.1) Using Excel
filename.setCellValue(value, sheetName, row, column);
or even better you can write out an entire dataset
excelFile.writeDataSet(dataset, sheetName, row, column);
2.2) Using a text file
fileName.println("value" + "\t" + " value 2");
You can use whatever separator you want "\t" for tab separated or "," for comma and so on
Step 3: Finish and export data
Internal Database
At the end of a simulation run, you can simply export the data
See help here https://anylogic.help/anylogic/connectivity/export-excel.html#exporting-data-to-ms-excel-workbook
P.S. It is possible to automate this with some effort
External file
On Excel you need to call .writeFile() to finish.
On both objects, you need to call .close() for them to be closed and saved to memory.
FYI
Excel has the option to save on termination.
Read more on using Excel here - https://anylogic.help/anylogic/connectivity/excel-file.html#writing-to-excel-file
And on text file here
https://anylogic.help/anylogic/connectivity/text-file.html#replicated
There is also an example model

Is there any pyspark method to read multiple file with different header

I have to migrate multiple files(around 2000) in same folder in azure blob storage. I want to read each file with header(as header is different for every file).
And write it into destination folder.
Is there anyway I can do it parallel via pyspark?
I am using below code, but it is only picking header from first file, which is producing wrong output.
Df.read.option(“header”, “true”).parquet(directory/*.parquet)
Df.write.option(“header”,”true”).csv(directory)
Please help me if you know how can I read all the files with source headers of their own.
Thanks!

Reading file from Google Drive with Talend

I need to read an uploaded file in Google Drive and perform X transformation with it. As per my reading, the single way to do it is by downloading the file to my local machine with the Talend component and then, reading from there.
If it is correct, I cannot figure what would be the file name assuming that I don't want to use the exact name of the file.
I found http://meowbi.com/2018/02/23/getting-google-sheet-gdrive-talend/ and it is exactly what I need - read from Google Drive, check the file name and proceed if the file name is X. What is unclear for me is what they used in tJava.
The output schema of tGoogleDriveList component's Main row contains a field name that is the file name you're looking for. Using Iterate row is less straightforward as you need to extract values from GlobalMap. In the article you cited they get file name by "tGoogleDriveList_1_TITLE" key of the GlobalMap.
Main row between tGoogleDriveList and tJava
For more details please look into the Talend Reference for Google Drive components. The Listing files and folders in Google Drive section should be particularly topical for your case.

Load multiple multischema delimited file from same directories

Could I know does it have any method to load multiple files that are multi schema delimited files which store in same directories in Talend?
I have tried use the tFileInputMSDelimited component before, but unable to link with tFilelist component to loop through the files inside the directory.
Does anyone have idea how to solve this problem?
To make clearer, each file only contain one batch line but contain multiple header line and it comes with a bunch of transaction line. As showing at the sample data below.
The component tFileOutputMSDelimited should suit your needs.
You will need multiple flows going into it.
You can either keep the files and read them or use tHashInput/tHashOutput to get the data directly.
Then you direct all the flows to the tFileOutputMSDelimited (example with tFixedFlowInput, adapt with your flows) :
In it, you can configure which flow is the parent flow containing your ID.
Then you can add the children flows and define the parent and the ID to recognize the rows in the parent flow :

Using Talend Open Studio DI to extract extract value from unique 1st row before continuing to process columns

I have a number of excel files where there is a line of text (and blank row) above the header row for the table.
What would be the best way to process the file so I can extract the text from that row AND include it as a column when appending multiple files? Is it possible without having to process each file twice?
Example
This file was created on machine A on 01/02/2013
Task|Quantity|ErrorRate
0102|4550|6 per minute
0103|4004|5 per minute
And end up with the data from multiple similar files
Task|Quantity|ErrorRate|Machine|Date
0102|4550|6 per minute|machine A|01/02/2013
0103|4004|5 per minute|machine A|01/02/2013
0467|1264|2 per minute|machine D|02/02/2013
I put together a small, crude sample of how it can be done. I call it crude because a. it is not dynamic, you can add more files to process but you need to know how many files in advance of building your job, and b. it shows the basic concept, but would require more work to suite your needs. For example, in my test files I simply have "MachineA" or "MachineB" in the first line. You will need to parse that data out to obtain the machine name and the date.
But here is how may sample works. Each Excel is setup as two inputs. For the header the tFileInput_Excel is configured to read only the first line while the body tFileInput_Excel is configured to start reading at line 4.
In the tMap they are combined (not joined) into the output schema. This is done for the Machine A Excel and Machine B excels, then those tMaps are combined with a tUnite for the final output.
As you can see in the log row the data is combined and includes the header info.