Powershell - Find duplicate entries in csv and split in multiple files

Powershell - Find duplicate entries in csv and split in multiple files - powershell

I am working on a data synch project between two systems based on a csv file. In this file you find Person, badges and profiles. Those needs to be imported in a access control system. Now I am facing a challenge.
The behavior of the importer is as follow:
The service checks the defined directory if a csv file is available and will import the data if true. If there are multiple files it will fetch them together and import it as one.
Inside the csv it is possible that a person has two badges and for that a second row is created. The importer see that as duplicate entry and import the first entry of the file and ignore the rest but output them inside the log file.
I found out that if the duplicated entries are separated in multiple files and the import is made on a different scheduled time, the additional badges will be assigned to the person and will not mark them as duplicated. (One file per badge)
To fix that on the importer we have a lot to change. so i try to find a workaround to be able to iterate the csv, check for duplicates and create an additional files if it is the case. So I can import the file at another time to make sure they where imported.
Does anyone knows a function in powershell to do so?
example:
Ori-File:
Person1,Badge1,Profile-1
Person1,Badge2,Profile-2
Person1,Badge3,Profile-3
expected result
File1:
Person1, Badge1, Profile1
File2:
Person1, Badge2, Profile2
File3:
Person1, Badge3, Profile3

Related

In Power Query, when duplicating the source query should I duplicate the Transform File folder as well?

My apologies in advance if this question has already been asked, if so I cannot find it.
So, I have this huge data base divided by country where I need to import from each country data base individually and then, in Power Query, append the queries as one.
When I imported the US files, the Power Query automatically generated a Transform File folder with 4 helper queries:
Then I just duplicated the query US - Sales and named it as UK - Sales pointing it to the UK sales folder:
The Transform File folder didn't duplicate, though.
Everything seems to be working just fine right now, however I'd like to know if this could be problem in the near future, because I still have several countries to go. Should I manually import new queries as new connections instead of just duplicating them or it just doesn't matter?
Many thanks!

The Transform Files Folder group contains the code that is called to transform a list of files. It is re-usable code. You can see the Sample File, which serves as the template for the transform actions.
As long as the file that is arrived at for the Sample File has the same structure as the files that you are feeding into the command, then you can use any query with any list of files.
One thing you need to make sure is that the Sample File is not removed from your data source. You may want to create a new dummy file just for that purpose, make sure it won't be deleted, and then point the Sample File query to pull just that file.

The Transform Helper Queries are special queries that you may edit the queries, but you cannot delete and recreate your own manually. They are automatically created by PQ when combining list of contents and are inherently linked to the parent query.
That said, you cannot replicate them, and must use the Combine function provided by PQ to create the helper queries.
You may however, avoid duplicating the queries, instead replicate your steps in the parent query, and use table union to join the list before combining the contents with the same helper queries.

Retrieve data from single line of text from Sharepoint list and save it as CSV file

I have a flow that registeres the responses from FORMS and saves it to Sharepoint LIST. When new items are automatically added, it also creates a CSV file in different location. I am trying to work it a little different so it only saved latest item added to the LIST as single CSV file. I do not want to have a full list exported as CSV file but new input as new file, and so on.
Is this feasible in any way?
It does not have to be done using power apps/ power automate but perhaps the python/ powershell script?

Truex, as you want to trigger the CSV file creation on the "list item added" action, it is not feasible using python or PowerShell script. One option is to schedule the script to check the new item added in the list and then generate a csv for new items added after the last run.

Load multiple multischema delimited file from same directories

Could I know does it have any method to load multiple files that are multi schema delimited files which store in same directories in Talend?
I have tried use the tFileInputMSDelimited component before, but unable to link with tFilelist component to loop through the files inside the directory.
Does anyone have idea how to solve this problem?
To make clearer, each file only contain one batch line but contain multiple header line and it comes with a bunch of transaction line. As showing at the sample data below.

The component tFileOutputMSDelimited should suit your needs.
You will need multiple flows going into it.
You can either keep the files and read them or use tHashInput/tHashOutput to get the data directly.
Then you direct all the flows to the tFileOutputMSDelimited (example with tFixedFlowInput, adapt with your flows) :
In it, you can configure which flow is the parent flow containing your ID.
Then you can add the children flows and define the parent and the ID to recognize the rows in the parent flow :

Does Azure DevOps have a feature to allow importing exported data from Monday.com

We have exported data from Monday.com and I was wondering if I would need to write a custom script to import this information or if Azure DevOps has a feature that would help me with this import.

What does the data you export contain and what is the data type? Take workitem as an example. If you export work items, save it as a .csv file, and then want to import it into AzureDevOps, you can refer to the following steps:
1.Create a local import.csv file and open it in Visual Studio Code or Excel.
2.The file must contain the Work Item Type and the Title fields. You can include other columns as needed.
You can check in there for more details.
Attention: Your CSV file must contain the ID, Work Item Type, Title, and State fields. If not, you need to edit this appropriately.

Using Talend Open Studio DI to extract extract value from unique 1st row before continuing to process columns

I have a number of excel files where there is a line of text (and blank row) above the header row for the table.
What would be the best way to process the file so I can extract the text from that row AND include it as a column when appending multiple files? Is it possible without having to process each file twice?
Example
This file was created on machine A on 01/02/2013
Task|Quantity|ErrorRate
0102|4550|6 per minute
0103|4004|5 per minute
And end up with the data from multiple similar files
Task|Quantity|ErrorRate|Machine|Date
0102|4550|6 per minute|machine A|01/02/2013
0103|4004|5 per minute|machine A|01/02/2013
0467|1264|2 per minute|machine D|02/02/2013

I put together a small, crude sample of how it can be done. I call it crude because a. it is not dynamic, you can add more files to process but you need to know how many files in advance of building your job, and b. it shows the basic concept, but would require more work to suite your needs. For example, in my test files I simply have "MachineA" or "MachineB" in the first line. You will need to parse that data out to obtain the machine name and the date.
But here is how may sample works. Each Excel is setup as two inputs. For the header the tFileInput_Excel is configured to read only the first line while the body tFileInput_Excel is configured to start reading at line 4.
In the tMap they are combined (not joined) into the output schema. This is done for the Machine A Excel and Machine B excels, then those tMaps are combined with a tUnite for the final output.
As you can see in the log row the data is combined and includes the header info.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Powershell - Find duplicate entries in csv and split in multiple files - powershell

Related

In Power Query, when duplicating the source query should I duplicate the Transform File folder as well?

Retrieve data from single line of text from Sharepoint list and save it as CSV file

Load multiple multischema delimited file from same directories

Does Azure DevOps have a feature to allow importing exported data from Monday.com

Using Talend Open Studio DI to extract extract value from unique 1st row before continuing to process columns

Categories

Resources