Split CSV files using a specific line

Split CSV files using a specific line - powershell

I would like to split the following csv into two csvs
StartOrder,1,SupplierName,
Line,2,12345,2,5,5.50,
Line,3,12345,3,6,5.20,
Line,4,12345,3,7,1.99,
EndOrder,5,booked as soon as possible to deliver.
StartOrder,6,SupplierName
Line,7,100015,2,5,5.50,
Line,8,100015,3,6,5.20,
Line,9,100015,3,7,1.99,
EndOrder,10,booked as soon as possible to deliver.
in order to be:
1st file
StartOrder,1,SupplierName,
Line,2,12345,2,5,5.50,
Line,3,12345,3,6,5.20,
Line,4,12345,3,7,1.99,
EndOrder,5,booked as soon as possible to deliver.
2nd file
StartOrder,6,SupplierName
Line,7,100015,2,5,5.50,
Line,8,100015,3,6,5.20,
Line,9,100015,3,7,1.99,
EndOrder,10,booked as soon as possible to deliver.
I have tried using GroupBy but is not working as I am expecting.
Any help?

This is something i would do with Regular Expressions.
$orders = (get-content -path C:\temp\orders.txt)
$orders = [string]::Join("`n",$orders) # this is to make sure you keep your lines
$output = [regex]::Matches($orders,'(?s)(StartOrder,(\d{0,}).*?deliver.)') # added regex option S
foreach($c in $output){
$order = $c.groups[2].value #order name that will serve as filename
""
$c.groups[0].value # content of order
$c.groups[0].value | out-file C:\temp\$order.txt -Force
}
This wil create a 1.txt and a 6.txt with its needed content.
EDIT : The only issue is that it doesn't keep the enters. -> FIXED THAT
The Regex is fairly simple, more detail on the regex : https://regex101.com/r/J0Xsu7/1
This will give you file 1.txt with
StartOrder,1,SupplierName,
Line,2,12345,2,5,5.50,
Line,3,12345,3,6,5.20,
Line,4,12345,3,7,1.99,
EndOrder,5,booked as soon as possible to deliver.
This will give you file 6.txt with
StartOrder,6,SupplierName
Line,7,100015,2,5,5.50,
Line,8,100015,3,6,5.20,
Line,9,100015,3,7,1.99,
EndOrder,10,booked as soon as possible to deliver.

Related

Create Record using Headers from a .csv

<EDIT: I kind of have it working, but in order to get it to work, my template csv has to have a blank line for every line I am going to be adding to it. So, if I could figure out how to add lines to the imported empty (just a header row) csv file, I could then use export-csv at the end. (It would be somewhat slower, but it would at least work.)>
I am creating a .csv file in PowerShell. The output file has 140 columns. Many of them are null.
I started out just doing
$out = 'S-'+$Snum+',,,,,TRUE,,,,,'+'S-'+$Snum+',"'
$out = $out + '{0:d9}' -f $item.SupplierCode2
until I had filled all the columns with the correct value. But, the system that is reading the output keeps changing the column locations. So, I wanted to take the header row from the template for the system and use that to name the columns. Then, if the columns change location, it won't matter because I will be referring to it by name.
Because there are so many columns, I'm trying to avoid a solution that has me enter all the column names. By using a blank .csv with just the headers, I can just paste that into the csv whenever it changes and I won't have to change my code.
So, I started by reading my csv file in so I can use the headers.
$TempA = Import-Csv -Path $Pathta -Encoding Default
Then I was hoping I could do something like this:
$TempA.'Supplier Key' = "S-$Snum"
$TempA.'Auto Complete' = "TRUE"
$TempA.'Supplier ID' = "S-$Snum"
$tempA.'Supplier Data - Supplier Reference ID' = '{0:d9}' -f $item.SupplierCode2
I would only need to fill in the fields that have values, everything else would be null.
Then I was thinking I could write out this record to a file. My old write looked like this
$writer2.WriteLine($out)
I wanted to write the line from the new csv line instead
$writer2.WriteLine($TempA)
I'd rather use streams if I can because the files are large and using add-Content really slows things down.
I know I need to do something to add a line to $TempA and I would like each loop to start with a new line (with all nulls) because there are times when certain lines only have a small subset of the values populated.
Clearly, I'm not taking the correct approach here. I'd really appreciate any advice anyone can give me.
Thank you.

If you only want to fill in certain fields, and don't mind using Export-Csv you can use the -append and -force switches, and it will put the properties in the right places. For example, if you had the template CSV file with only the column names in it you could do:
$Output = ForEach($item in $allItems){
[PSCustomObject]#{
'Supplier Key' = "S-$Snum"
'Auto Complete' = "TRUE"
'Supplier ID' = "S-$Snum"
'Supplier Data - Supplier Reference ID' = '{0:d9}' -f $item.SupplierCode2
}
}
$Output | Export-Csv -Path $Pathta -Append -Force
That would create objects with only the four properties that you are interested in, and then output them to the CSV in the correct columns, adding commas as needed to create blank values for all other columns.

Renaming several files using CSV in Powershell

I have a special need and I feel stuck on that..
Some user will put some file in a directory with several different name, and I need to rename them regarding a special pattern for those files to be consume by another app.
Example:
In directory -> Target
file1-dd-mm-yyyy -> file1
file2 -> thisfile2
flie45224 -> file123
So as you can see this can be some variables, dates, ID etc..
The target will always be the same but the file in source can be different because of date for example.
So first I had 2 files, so I write the script in plain text "If test-path blabla do this else go out" but it seems that now I will have 37 different files with all differents name. So I thought about using an excel sheet(CSV?), but I can't really find any help on this need.
The goal is to have a sheet as "pattern" like if you found file like in 'A1' then rename as in 'A2' etc...
Do you guys have an idea ?
Thanks by advance :)

I understand you need a csv with the following columns:
A1: regex/pattern to match
A2: transform rule which should be
dynamic
The trick is to use scriptblock if you want to use variables like the file name.
Basically, your csv will be:
A1;A2
"^file1-\d\d-\d\d-\d\d\d\d$";"file1"
"^file2$";"this$($file.Name)"
"^flie*";"file123"
And the code would be:
$myRules = Import-Csv "C:\xxx\test.csv" -Delimiter ";"
$files = gci C:\temp\
foreach ($file in $files) {
foreach ($rule in $myRules) {
if ($file.Name -match $rule.A1) {
Write-host "$($file.Name) is matching pattern ""$($rule.A1)"". Applying rename to ""$($rule.A2)"""
$NewScriptBlock = [scriptblock]::Create("rename-item -Path $($file.FullName) -NewName ""$($rule.A2)""")
$NewScriptBlock.Invoke()
}
}
}
Which gives before:
file1-01-02-0344
file2
flie45224
Output during the execution:
file1-01-02-0344 is matching pattern "^file1-\d\d-\d\d-\d\d\d\d$". Applying rename to "file1"
file2 is matching pattern "^file2$". Applying rename to "this$($file.Name)"
flie45224 is matching pattern "^flie*". Applying rename to "file123"
And after:
file1
thisfile2
file123
Explanations
The first foreach is parsing the files. Then for each of those files, we are checking if one of the rule is matching thanks to the -match $rule.A1.
With the first example, you can use regexp (\d to match digits). For the other cases, I kept it simple as you didn't clarify the rules but this will be your homework :)
Then in the transform rules, you can use the filename variable as shown in the second transform rule: this$($file.Name)
NB: it could be a good idea to add a flag to leave the loop for the current file once it has been renamed to avoid unecessary check and to display a message if the file hasn't match any pattern.

Seems a bit odd, I suspect you should be able to use a regex pattern instead but can't say without seeing the csv contents. Otherwise try this. Assumes the CSV has a column headers called First and Second, First matches the basename (no extension) of the file and Second contains the basename you want it changed to.
$csv = Import-Csv -Path c:\filenames.csv
Get-ChildItem -Path c:\temp | %{if($csv.First -contains $_.BaseName){Rename-Item -Path $_.FullName -NewName "$($csv.Second[$csv.First.IndexOf($_.BaseName)]+$_.Extension)"}}

Rename Files with Index(Excel)

Anyone have any ideas on how to rename files by finding an association with an index file?
I have a file/folder structure like the following:
Folder name = "Doe, John EO11-123"
Several files under this folder
The index file(MS Excel) has several columns. It contains the names in 2 columns(First and Last). It also has a column containing the number EO11-123.
What I would like to do is write maybe a script to look at the folder names in a directory, compare/find an associated value in the index file(like that number EO11-123) and then rename all the files under the folder using a 4th column value in the index.
So,
Folder name = "Doe, John EO11-123", index column1 contains same value "EO11-123", use column2 value "111111_000000" and rename all the files under that directory folder to "111111_000000_0", "111111_000000_1", "111111_000000_2" and so on.
This possible with powershell or vbscript?

Ok, I'll answer your questions in your comment first. Importing the data into PowerShell allows you to make an array in powershell that you can match against, or better yet make a HashTable to reference for your renaming purposes. I'll get into that later, but it's way better than trying to have PowerShell talk to Excel and use Excel's search functions because this way it's all in PowerShell and there's no third party application dependencies. As for importing, that script is a function that you can load into your current session, so you run that function and it will automatically take care of the import for you (it opens Excel, then opens the XLS(x) file, saves it as a temp CSV file, closes Excel, imports that CSV file into PowerShell, and then deletes the temp file).
Now, you did not state what your XLS file looks like, so I'm going to assume it's got a header row, and looks something like this:
FirstName | Last Name | Identifier | FileCode
Joe | Shmoe | XA22-573 | JS573
John | Doe | EO11-123 | JD123
If that's not your format, you'll need to either adapt my code, or your file, or both.
So, how do we do this? First, download, save, and if needed unblock the script to Import-XLS. Then we will dot source that file to load the function into the current PowerShell session. Once we have the function we will run it and assign the results to a variable. Then we can make an empty hashtable, and for each record in the imported array create an entry in the hashtable where the 'Identifier' property (in your example above that would be the one that has the value "EO11-123" in it), make that the Key, then make the entire record the value. So, so far we have this:
#Load function into current session
. C:\Path\To\Import-XLS.ps1
$RefArray = Import-XLS C:\Path\To\file.xls
$RefHash = #{}
$RefArray | ForEach( $RefHash.Add($_.Identifier, $_)}
Now you should be able to reference the identifier to access any of the properties for the associated record such as:
PS C:\> $RefHash['EO11-123'].FileCode
JD123
Now, we just need to extract that name from the folder, and rename all the files in it. Pretty straight forward from here.
Get-ChildItem c:\Path\to\Folders -directory | Where{$_.Name -match "(?<= )(\S+)$"}|
ForEach{
$Files = Get-ChildItem $_.FullName
$NewName = $RefHash['$($Matches[1])'].FileCode
For($i = 1;$i -lt $files.count;$i++){
$Files[$i] | Rename-Item -New "$NewName_$i"
}
}
Edit: Ok, let's break down the rename process here. It is a lot of piping here, so I'll try and take it step by step. First off we have Get-ChildItem that gets a list of folders for the path you specify. That part's straight forward enough. Then it pipes to a Where statement, that filters the results checking each one's name to see if it matches the Regular Expression "(?<= )(\S+)$". If you are unfamiliar with how regular expressions work you can see a fairly good breakdown of it at https://regex101.com/r/zW8sW1/1. What that does is matches any folders that have more than one "word" in the name, and captures the last "word". It saves that in the automatic variable $Matches, and since it captured text, that gets assigned to $Matches[1]. Now the code breaks down here because your CSV isn't laid out like I had assumed, and you want the files named differently. We'll have to make some adjustments on the fly.
So, those folder that pass the filter will get piped into a ForEach loop (which I had a typo in previously and had a ( instead of {, that's fixed now). So for each of those folders it starts off by getting a list of files within that folder and assigning them to the variable $Files. It also sets up the $NewName variable, but since you don't have a column in your CSV named 'FileCode' that line won't work for you. It uses the $Matches automatic variable that I mentioned earlier to reference the hashtable that we setup with all of the Identifier codes, and then looks at a property of that specific record to setup the new name to assign to files. Since what you want and what I assumed are different, and your CSV has different properties we'll re-work both the previous Where statement, and this line a little bit. Here's how that bit of the script will now read:
Get-ChildItem c:\Path\to\Folders -directory | Where{$_.Name -match "^(.+?), .*? (\S+)$"}|
ForEach{
$Files = Get-ChildItem $_.FullName
$NewName = $Matches[2] + "_" + $Matches[1]
That now matches the folder name in the Where statement and captures 2 things. The first thing it grabs is everything at the beginning of the name before the comma. Then it skips everything until it gets tho the last piece of text at the end of the name and captures everything after the last space. New breakdown on RegEx101: https://regex101.com/r/zW8sW1/2
So you want the ID_LName, which can be gotten from the folder name, there's really no need to even use your CSV file at this point I don't think. We build the new name of the files based off the automatic $Matches variable using the second capture group and the first capture group and putting an underscore between them. Then we just iterate through the files with a For loop basing it off how many files were found. So we start with the first file in the array $Files (record 0), add that to the $NewName with an underscore, and use that to rename the file.

handling a CSV with line feed characters in a column in powershell

Currently, I have a system which creates a delimited file like the one below in which I've mocked up the extra line feeds which are within the columns sporadically.
Column1,Column2,Column3,Column4
Text1,Text2[LF],text3[LF],text4[CR][LF]
Text1,Text2[LF][LF],text3,text4[CR][LF]
Text1,Text2,text3[LF][LF],text4[CR][LF]
Text1,Text2,text3[LF],text4[LF][LF][CR][LF]
I've been able to remove the line feeds causing me concern by using Notepad++ using the following REGEX to ignore the valid carriage return/Line feed combinations:
(?<![\r])[\n]
I am unable however to find a solution using powershell, because I think when I get-content for the csv file the line feeds within the text fields are ignored and the value is stored as a separate object in the variable assigned to the get-content action. My question is how can I apply the regex to the csv file using replace if the cmdlet ignores the line feeds when loading the data?
I've also tried the following method below to load the content of my csv which doesn't work either as it just results in one long string, which would be similar to using -join(get-content).
[STRING]$test = [io.file]::ReadAllLines('C:\CONV\DataOutput.csv')
$test.replace("(?<![\r])[\n]","")
$test | out-file .\DataOutput_2.csv

Nearly there, may I suggest just 3 changes:
use ReadAllText(…) instead of ReadAllLines(…)
use -replace … instead of .Replace(…), only then will the first argument be treated as a regex
do something with the replacement result (e.g. assign it back to $test)
Sample code:
[STRING]$test = [io.file]::ReadAllText('C:\CONV\DataOutput.csv')
$test = $test -replace '(?<![\r])[\n]',''
$test | out-file .\DataOutput_2.csv

Count characters in string then insert delimiter using PowerShell

I have a linux server that will be generating several files throughout the day that need to be inserted in to a database; using Putty I can sftp them off to a server running SQL 2008. Problem is is the structure of the file itself, it has a string of text that are to be placed in different columns, but bulk insert in sql tries to put it all in to one column instead of six. Powershell may not be the best method, but I have seen on several sites how it can find and replace or append to the end of the line, can it count and insert?
So the file looks like this: '18240087A +17135555555 3333333333', where 18, 24, 00, 87, A are different columns, then there is a blank space between the A and the +, that is character count 10-19 which is another column, then characters 20-30 are a column, characters 31-36 are a space which is new column and so on. So I want to insert a '|' or a ',' so that sql understands where the columns end. Is this possible for PowerShell to count randomly?
This may not be the way to respond to all who did answer, i apologize in advance. As this is my first PowerShell script, I appreciate the input from each of you. This is an Avaya SIP server that is generating CDR records, which I must pull from the server and insert in to SQL for later reports. The file exported looks like this:
18:47 10/15
18470214A +14434444444 3013777777 CME-SBC HHHH-CM 4 M00 0
At first I just thought to delete the first line and run a script against the output, which I modified from Kieranties post:
$test = Get-Content C:\Share\CDR\testCDR.txt
$pattern = "^(.{2})(.{2})(.{1})(.{2})(.{1})(.{1})\s*(.{15})(.{10})\s*(.{7})\s*(.{7})\s*(.{1})\s*(.{1})(.{1})(.{1})\s*(.*)$"
if($test -match $pattern){
$result = $matches.Values | select -first ($matches.Count-1)
[array]::Reverse($result, 0, $result.Length)
$result = $result -join "|"
$result | Out-File c:\Share\CDR\results1.txt
}
But then i realized I need that first line as it contains the date. I can try to work that out another way though.
I also now see that there are times when the file contains 2 or more lines of CDR info, such as:
18:24 10/15
18240087A +14434444444 3013777777 CME-SBC HRSA-CM 4 M00 0
18240096A +14434444445 3013777778 CME-SBC HRSA-CM 4 M00 0
Whereas the .ps1 file I made does not give the second string, so I tried adding in this:
foreach ($Data in $test)
{
$Data = $Data -split(',')
and it fails to run. How can I do multiple lines (and possibly that first line)? If you know of a tutorial that can help, that's greatly appreciated as well!

PowerShell is a great tool that I love and it can do many things. I see that you are using SQL Server 2008. Depending on the edition of SQL Server you have running on the server, it most likely has SQL Server Integration Services (SSIS), which is an Extract, Transform, and Load (ETL) tool designed to help migrate data in many scenarios, such as yours. The file you describe here is sounds like a fixed width file, which SSIS can easily handle and import and SQL Server has great ways to automate the loads if this is a recurring need (Which it sounds like), including the automation of the sftp task, and even running PowerShell scripts as part of the ETL (I've done that several times).
If your file truly is fixed width and you want to use PowerShell to transform it into a delimited file, the regex approach you have in your answer works well, or there are several approaches using the System.String methods, like .insert() which allows you to insert a delimiter character using a character index in your line (use Get-Content to read the file and create one String object per line, then loop through them using Foreach loop or Foreach-Object and the pipeline). A slightly more difficult approach would be to use the .Substring() method. You could build your new String line using Substring to extract each column and concatenating those values with a delimiter. That's probably a lot for someone new to PowerShell, but one of the best ways to learn and gain proficiency with it is to practice writing the same script multiple ways. You can learn new techniques that may solve other problems you might encounter in the future.

This is a way (really ugly IMO, I think it can better done):
$a = '18240087A +17135555555 3333333333'
$b = #( ($a[0..1] -join ''), ($a[2..3] -join ''), ($a[4..5] -join ''),
($a[6..7] -join ''), ($a[8] -join ''), ($A[10..19] -join ''),
($a[20..30] -join ''), ($a[31..36] -join ''))
$c = $b -join '|'
$c
18|24|00|87|A|+171355555|55 33333333|33
I don't know if is the rigth splitting you need, but changing the values in each [x..y] you can do what better fit your need. Remenber that character array are 0-based, then the first char is 0 and so on.

I don't quite follow the splitting rules. What kind of software writes the text file anyway? Maybe it can be instructed to change the structure?
That being said, inserting pipes is easy enough with .Insert()
$a= '18240087A +17135555555 3333333333'
$a.Substring(0, $a.IndexOf('+')).Insert(2, '|').insert(5,'|').insert(8, '|').insert(11, '|').insert(13, '|')
# Output: 18|24|00|87|A|
# Rest of the line:
$a.Substring($a.IndexOf('+')+1)
# Output: 17135555555 3333333333
From there you can proceed to splitting the rest of the row data.

I've improved my answer based on your response (note, it's probably best you update your actual question to include that information!)
The nice thing about Get-Content in Powershell is that it returns the content as an array split on the end of line characters. Couple that with allowing multiple assignment from an array and you end up with some neat code.
The following has a function to process each line based on your modified version of my original answer. It's then wrapped by a function which processes the file.
This reads the given file, setting the first line to $date and the rest of the content to $content. It then creates an output file adds the date to the output, then loops over the rest of the content performing the regex check and adding the parsed version of the content if the check is successful.
Function Parse-CDRFileLine {
Param(
[string]$line
)
$pattern = "^(.{2})(.{2})(.{1})(.{2})(.{1})(.{1})\s*(.{15})(.{10})\s*(.{7})\s*(.{7})\s*(.{1})\s*(.{1})(.{1})(.{1})\s*(.*)$"
if($line -match $pattern){
$result = $matches.Values | select -first ($matches.Count-1)
[array]::Reverse($result, 0, $result.Length)
$result = $result -join "|"
$result
}
}
Function Parse-CDRFile{
Param(
[string]$filepath
)
# Read content, setting first line to $date, the rest to $content
$date,$content = Get-Content $filepath
# Create the output file, overwrite if neccessary
$outputFile = New-Item "$filepath.out" -ItemType file -Force
# Add the date line
Set-Content $outputFile $date
# Process the rest of the content
$content |
? { -not([string]::IsNullOrEmpty($_)) } |
% { Add-Content $outputFile (Parse-CDRFileLine $_) }
}
Parse-CDRFile "C:\input.txt"
I used your sample input and the result I get is:
18:24 10/15
18|24|0|08|7|A|+14434444444 30|13777777 C|ME-SBC |HRSA-CM|4|M|0|0|0
18|24|0|09|6|A|+14434444445 30|13777778 C|ME-SBC |HRSA-CM|4|M|0|0|0
There are an incredible amount of resources out there but one I particularly suggest is Douglas Finkes Powershell for Developers It's short, concise and full of great info that will get you thinking in the right mindset with Powershell