Editing a specific column of data in a text file with powershell - powershell

So I’ve had a a request to edit a csv file by replacing column values with a a set of unique numbers. Below is a sample of the original input file with a a header line followed by a couple of rows. Note that the rows have NO column headers.
HH ### SLH304 01100001 2 20200606 20200605 011100
DD|GZFD|IE|USD|20200605|EUR200717||
DD|GZFD|IE|USD|20200605|EUR200717||
What I’m looking to do is change say the values in column 3 with a unique number.
So far I have the following …
$i=0
$txtin = Get-Content "C:\Temp\InFile.txt" | ForEach {"$($_.split('|'))"-replace $_[2],$i++} |Out-File C:\Temp\csvout.txt
… but this isn’t working as it removes the delimiter and adds numbers in the wrong places …
HH0###0000000SLH30400110000100000002000000202006060202006050011100
1D1D1 1G1Z1F1D1 1I1E1 1U1S1D1 12101210101610151 1E1U1R1210101711171 1 1
2D2D2 2G2Z2F2D2 2I2E2 2U2S2D2 22202220202620252 2E2U2R2220202721272 2 2
Ideally I want it to look like this, whereby the values of 'IE' have been replaced by '01' and '02' in each row ...
HH ### SLH304 01100001 2 20200606 20200605 011100
DD|GZFD|01|USD|20200605|EUR200717||
DD|GZFD|02|USD|20200605|EUR200717||
Any ideas on how to resolve would be much appreciated.

I think by spreading this out to multiline code will make it easier:
$txtin = Get-Content 'C:\Temp\InFile.txt'
# loop through the lines, skipping the first line
for ($i = 1; $i -lt $txtin.Count; $i++){
$parts = $txtin[$i].Split('|') # or use regex -split '\|'
if ($parts.Count -ge 3) {
$parts[2] = '{0:00}' -f $i # update the 3rd column
$txtin[$i] = $parts -join '|' # rejoin the parts with '|'
}
}
$txtin | Out-File -FilePath 'C:\Temp\csvout.txt'
Output will be:
HH ### SLH304 01100001 2 20200606 20200605 011100
DD|GZFD|01|USD|20200605|EUR200717||
DD|GZFD|02|USD|20200605|EUR200717||
Updated to use the more robust check suggested by mklement0. This avoids errors when the line does not have at least three parts in it after the split

Related

How can I loop through each record of a text file to replace a string of characters

I have a large .txt file containing records where a date string in each record needs to be incremented by 2 days which will then update the field to the right of it which contains dashes --------- with that date. For example, a record contains the following record data:
1440149049845_20191121000000 11/22/2019 -------- 0.000 0.013
I am replacing the -------- dashes with 11/24/2019 (2 days added to the date 11/22/2019) so that it shows as:
1440149049845_20191121000000 11/22/2019 11/24/2019 0.000 0.013
I have the replace working on a single record but need to loop through the entire .txt file to update all of the records. Here is what I tried:
$inputRecords = get-content '\\10.12.7.13\vipsvr\Rancho\MRDF_Report\_Report.txt'
foreach ($line in $inputRecords)
{
$item -match '\d{2}/\d{2}/\d{4}'
$inputRecords -replace '-{2,}',([datetime]$matches.0).adddays(2).tostring('MM/dd/yyyy') -replace '\b0\.000\b','0.412'
}
I get an PS error stating: "Cannot convert null to type "System.DateTime"
I'm sorry but why are we using RegEx for something this simple?
I can see it if there are differently formatted lines in the file, you'd want to make sure you aren't manipulating unintended lines, but that's not indicated in the question. Even still, it doesn't seem like you need to match anything within the line itself. It seems like it's delimited on spaces which would make a simple split a lot easier.
Example:
$File = "C:\temp\Test.txt"
$Output =
ForEach( $Line in Get-Content $File)
{
$TmpArray = $Line.Split(' ')
$TmpArray[2] = (Get-Date $TmpArray[1]).AddDays(2).ToString('M/dd/yyyy')
$TmpArray -join ' '
}
The 3rd element in the array do the calculation and reassign the value...
Notice there's no use of the += operator which is very slow compared to simply assigning the output to a variable. I wouldn't make a thing out of it but considering we don't know how big the file is... Also the String format given before 'mm/dd/yyyy' will result in 00 for the month like for example '00/22/2019', so I changed that to 'M/dd/yyyy'
You can still add logic to skip unnecessary lines if it's needed...
You can send $Output to a file with something like $Output | Out-File <FilePath>
Or this can be converted to a single pipeline that outputs directly to a file using | ForEach{...} instead of ForEach(.. in ..) If the file is truly huge and holding $Output in memory is an issue this is a good alternative.
Let me know if that helps.
You mostly had the right idea, but here are a few suggested changes, but not exactly in this order:
Use a new file instead of trying to replace the old file.
Iterate a line at a time, replace the ------, write to the new file.
Use '-match' instead of '-replace', because as you will see below that you need to manipulate the capture more than a simple '-replace' allows.
Use [datetime]::parseexact instead of trying to just force cast the captured text.
[string[]]$inputRecords = get-content ".\linesource.txt"
[string]$outputRecords
foreach ($line in $inputRecords) {
[string]$newLine = ""
[regex]$logPattern = "^([\d_]+) ([\d/]+) (-+) (.*)$"
if ($line -match $logPattern) {
$origDate = [datetime]::parseexact($Matches[2], 'mm/dd/yyyy', $null)
$replacementDate = $origDate.adddays(2)
$newLine = $Matches[1]
$newLine += " " + $origDate.toString('mm/dd/yyyy')
$newLine += " " + $replacementDate.toString('mm/dd/yyyy')
$newLine += " " + $Matches[4]
} else {
$newLine = $line
}
$outputRecords += "$newLine`n"
}
$outputRecords.ToString()
Even if you don't use the whole solution, hopefully at least parts of it will be helpful to you.
Using the suggested code from adamt8 and Steven, I added to 2 echo statements to show what gets displayed in the variables $logpattern and $line since it is not recognizing the pattern of characters to be updated. This is what displays from the echo:
Options MatchTimeout RightToLeft
CalNOD01 1440151020208_20191205000000 12/06/2019 12/10/2019
None -00:00:00.0010000 False
CalNOD01 1440151020314_20191205000000 12/06/2019 --------
None -00:00:00.0010000 False
this is the rendered output:
CalNOD01 1440151020208_20191205000000 12/06/2019 12/10/2019
CalNOD01 1440151020314_20191205000000 12/06/2019 --------
This is the code that was used:
enter image description here

Remove particular characters from lines and concatenate them

I have a problem where I need to cut specific characters from a line and then concatenate the line with the next lines, separated by commas.
Consider there is a text file abc.txt and I need the last 3 lines from the file. The last 3 lines are in the this format:
11/7/2000 17:22:54 - Hello world.
19/7/2002 8:23:54 - Welcome to the new technology.
24/7/2000 9:00:13 - Eco earth
I need to remove the starting time stamp from each line and then concatenate the lines as
Hello world.,Welcome to the new technology,Eco earth.
The time stamp is not static and I want to make use of a regex
I tried the following:
$Words = (Get-Content -Path .\abc.txt|Select-Object -last 3|Out-String)
$Words = $Words -split('-')
$regex = "[0-9]{1,2}/[0-9]{1,2}/[0-9]{1,4} [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}):[0-9]{1,3}"
The output I used to get is like
11/7/2000 17:22:54
Hello world
19/7/2002 8:23:54
Welcome to the new technology.
24/7/2000 9:00:13
Eco earth
There is no need to create a Regex that tries to figure out the timestamp part, because you want to skip that anyway.
This should work:
# read the file and get the last three lines as string array
$txt = Get-Content -Path 'D:\abc.txt' -Tail 3
# loop through the array and change the lines as you go
for ($i = 0; $i -lt $txt.Count; $i++) {
$txt[$i] = ($txt[$i] -split '-', 2)[-1].Trim()
}
# finally, join the array with commas
$txt -join ','
Output:
Hello world.,Welcome to the new technology.,Eco earth
try this:
Get-Content "C:\temp\example.txt" | %{
$array=$_ -split "-", 2
$array[1].Trim()
}
When you have for example : "DATE - blablabla"
If you do .Split("-") on it you get :
Date
blablabla
What you can do is $string.Split("-")[Which_Line] -> so
$string="12/15/18 08:05:10 - Hello World."
$string=$string.Split("-")[1]
Returns : Hello world. (with spaces before)
Now on string you can apply Trim() function - it removes spaces before and after your string
$string=$string.Trim()
Gives you Hello world.
For your answer, if it's static usage (always 3) :
$Words = (Get-Content -Path .\abc.txt|Select-Object -last 3|Out-String).Split("-")
$end=$Words[2].Trim() + "," + $Words[4].Trim() + "," + $Words[6].Trim()

Use Get content or Import-CSV to read 1st column in 2nd line in a csv

So I have a csv file which is 25MB.
I only need to get the value stored in 2nd line in first column and use it later in powershell script.
e.g data
File_name,INVNUM,ID,XXX....850 columns
ABCD,123,090,xxxx.....850 columns
ABCD,120,091,xxxx.....850 columns
xxxxxx5000+ rows
So my first column data is always the same and i just need to get this filename form the first column, 2nd row.
Should I try to use Get-content or Import-csv for this use case ?
Thanks,
Mickey
TessellatingHeckler's helpful answer contains a pragmatic, easy-to-understand solution that is most likely fast enough in practice; the same goes for Robert Cotterman's helpful answer which is concise (and also faster).
If performance is really paramount, you can try the following, which uses the .NET framework directly to read the lines - but given that you only need to read 2 lines, it's probably not worth it:
$inputFile = "$PWD/some.csv" # be sure to specify a *full* path
$isFirstLine=$true
$fname = foreach ($line in [IO.File]::ReadLines($inputFile)) {
if ($isFirstLine) { $isFirstLine = $false; continue } # skip header line
$line -replace '^([^,]*),.*', '$1' # extract 1st field from 2nd line and exit
break # exit
}
Note: A conceptually simpler way to extract the 1st field is to use ($line -split ',')[0], but with a large number of columns the above -replace-based approach is measurably faster.
Update: TessellatingHeckler offers 2 ways to speed up the above:
Use of $line.Substring(0, $line.IndexOf(',')) in lieu of $line -replace '^([^,]*),.*', '$1' in order to avoid relatively costly regex processing.
To lesser gain, use of a [System.IO.StreamReader] instance's .ReadLine() method twice in a row rather than [IO.File]::ReadLines() in a loop.
Here's a performance comparison of the approaches across all answers on this page (as of this writing).
To run it yourself, you must download functions New-CsvSampleData and Time-Command first.
For more representative results, the timings are averaged across 1,000 runs:
# Create sample CSV file 'test.csv' with 850 columns and 100 rows.
$testFileName = "test-$PID.csv"
New-CsvSampleData -Columns 850 -Count 100 | Set-Content $testFileName
# Compare the execution speed of the various approaches:
Time-Command -Count 1000 {
# Import-Csv
Import-Csv -LiteralPath $testFileName |
Select-Object -Skip 1 -First 1 -ExpandProperty 'col1'
}, {
# ReadLines(), -replace
$inputFile = $PWD.ProviderPath + "/$testFileName"
$isFirstLine=$true
foreach ($line in [IO.File]::ReadLines($inputFile)) {
if ($isFirstLine) { $isFirstLine = $false; continue } # skip header line
$line -replace '^([^,]*),.*', '$1' # extract 1st field from 2nd line and exit
break # exit
}
}, {
# ReadLines(), .Substring / IndexOf
$inputFile = $PWD.ProviderPath + "/$testFileName"
$isFirstLine=$true
foreach ($line in [IO.File]::ReadLines($inputFile)) {
if ($isFirstLine) { $isFirstLine = $false; continue } # skip header line
$line.Substring(0, $line.IndexOf(',')) # extract 1st field from 2nd line and exit
break # exit
}
}, {
# ReadLine() x 2, .Substring / IndexOf
$inputFile = $PWD.ProviderPath + "/$testFileName"
$f = [System.IO.StreamReader]::new($inputFile,$true);
$null = $f.ReadLine(); $line = $f.ReadLine()
$line.Substring(0, $line.IndexOf(','))
$f.Close()
}, {
# Get-Content -Head, .Split()
((Get-Content $testFileName -Head 2)[1]).split(',')[1]
} |
Format-Table Factor, Timespan, Command
Remove-Item $testFileName
Sample output from a single-core Windows 10 VM running Windows PowerShell v5.1 / PowerShell Core 6.1.0-preview.4 on a recent-model MacBook Pro:
Windows PowerShell v5.1:
Factor TimeSpan Command
------ -------- -------
1.00 00:00:00.0001922 # ReadLine() x 2, .Substring / IndexOf...
1.04 00:00:00.0002004 # ReadLines(), .Substring / IndexOf...
1.57 00:00:00.0003024 # ReadLines(), -replace...
3.25 00:00:00.0006245 # Get-Content -Head, .Split()...
25.83 00:00:00.0049661 # Import-Csv...
PowerShell Core 6.1.0-preview.4:
Factor TimeSpan Command
------ -------- -------
1.00 00:00:00.0001858 # ReadLine() x 2, .Substring / IndexOf...
1.03 00:00:00.0001911 # ReadLines(), .Substring / IndexOf...
1.60 00:00:00.0002977 # ReadLines(), -replace...
3.30 00:00:00.0006132 # Get-Content -Head, .Split()...
27.54 00:00:00.0051174 # Import-Csv...
Conclusions:
Calling .ReadLine() twice is marginally faster than the ::ReadLines() loop.
Using -replace instead of Substring() / IndexOf() adds about 60% execution time.
Using Get-Content is more than 3 times slower.
Using Import-Csv | Select-Object is close to 30 times(!) slower, presumably due to the large number of columns; that said, in absolute terms we're still only talking about around 5 milliseconds.
As a side note: execution on macOS seems to be noticeably slower overall, with the regex solution and the cmdlet calls also being slower in relative terms.
Depends what you want to prioritize.
$data = Import-Csv -LiteralPath 'c:\temp\data.csv' |
Select-Object -Skip 1 -First 1 -ExpandProperty 'File_Name'
Is short and convenient. (2nd line meaning 2nd line of the file, or 2nd line of the data? Don't skip any if it's the first line of data).
Select-Object with something like -First 1 will break the whole pipeline when it's done, so it won't wait to read the rest of the 25MB in the background before returning.
You could likely speed it up, or reduce memory use, a miniscule amount if you opened the file, seek'd two newlines, then a comma, then read to another comma, or some other long detailed code, but I very much doubt it would be worth it.
Same with Get-Content, the way it adds NoteProperties to the output strings will mean it's likely no easier on memory and not usefully faster than Import-Csv
You could really shorten it with
(gc c:\file.txt -head 2)[1]
Only reads 2 lines and then grabs index 1 (second line)
You could then split it. And grab index 1 of the split up line
((gc c:\file.txt -head 2)[1]).split(',')[1]
UPDATE:::After seeing the new post with times, I was inspired to do some tests myself (Thanks mklement0). this was the fastest I could get to work
$check = 0
foreach ($i in [IO.FILE]::ReadLines("$filePath")){
if ($check -eq 2){break}
if ($check -eq 1){$value = $i.split(',')[1]} #$value = your answer
$check++
}
Just thought of this: remove if -eq 2 and put break after a semi colon after the check 1 is performed. 5 ticks faster. Haven't tested.
here were my results over 40000 tests:
GC split avg was 1.11307622 Milliseconds
GC split Min was 0.3076 Milliseconds
GC split Max was 18.1514 Milliseconds
ReadLines split avg was 0.3836625825 Milliseconds
ReadLines split Min was 0.2309 Milliseconds
ReadLines split Max was 31.7407 Milliseconds
Stream Reader avg was 0.4464924825 Milliseconds
Stream Reader MIN was 0.2703 Milliseconds
Stream Reader Max was 31.4991 Milliseconds
Import-CSV avg was 1.32440485 Milliseconds
Import-CSV MIN was 0.2875 Milliseconds
Import-CSV Max was 103.1694 Milliseconds
I was able to run 3000 tests a second on the 2nd and 3rd, and 1000 tests a second on the first and last. Stream Reader was HIS fastest one. And import CSV wasn't bad, i wonder if the mklement0 didn't have a column named "file_name" in his test csv? Anyhow, I'd personally use the GC command because it's concise and easy to remember. But this is up to you, and I wish you luck on your scripting adventures.
I'm certain we could start hyperthreading this and get insane results, but when you're talking thousandths of a second is it really a big deal? Especially to get one variable? :D
here's the streamreader code I used for transparency reasons...
$inputFile = "$filePath"
$f = [System.IO.StreamReader]::new($inputFile,$true);
$null = $f.ReadLine(); $line = $f.ReadLine()
$line.Substring(0, $line.IndexOf(','))
$f.Close()
I also noticed this pulls the 1st value of the second line, and I have no idea how to switch it to the 2nd value... it seems to be measuring the width from point 0 to the first comma, and then cutting that. if you change substring from 0 to say 5, it still measures the length of 0 to comma, but then moves where to start grabbing... to the 6th character.
The import-csv I used was :
$data = Import-Csv -LiteralPath "$filePath" |
Select-Object -Skip 1 -First 1 -ExpandProperty 'FileName'
I tested these on a 90 meg csv, with 21 columns, and 284k rows. and "FileName" was the second column

How to import first two values for each line in CSV file | PowerShell

I have a CSV file that generates everyday, and generates with data such as:
windows:NT:v:n:n:d:n:n:n:n:m:n:n
I should also mention that that example is one of 3,900+ lines, and not every line of data has the same number of "columns". What I'm trying to do is import just the first two "columns" of data into a variable. For this example, it would be "Windows" and "NT", nothing else.
How would I go about doing this? I've tried using -delimiter ':', and not much luck.
The number of lines shouldn't matter.
My approach from comment (to your previous question) should work,
if there is no header and you only want the first two columns,
just specify Header 1,2
> import-csv .\strange.csv -delim ':' -Header (1..2) |Where 2 -eq 'NT'
1 2
- -
windows NT
Example for building the entire array
$Splitted_List = #()
foreach($Line in Get-Content '.\myfilewithuseragents.txt'){
$Splitted = $Line -split ":"
$Splitted_Object = [PSCustomObject]#{
$part1 = $splitted[0]
$part2 = $Splitted[1]
}
$Splitted_List.Add($Splitted_Object) | Out-Null
}
For every line you'll just read the line and with the string from that line, you're easily able to split it
$useragent = "windows:NT:v:n:n:d:n:n:n:n:m:n:n"
Then the first part will be referenced to as $useragent.Split(":")[0], the second as $useragent.Split(":")[1], etc.
Including the for-loop that would be something like
foreach($useragent in Get-Content '.\myfilewithuseragents.txt') {
$splitted = $useragent.Split(":")
$part1 = $splitted[0]
}

cleanup improperly formatted csv file

I am downloading a xlsx file from a sharepoint, and then convert it into a csv file. However, since the xlsx file contained empty columns that were not deleted, it exports those to a csv file like follows...
columnOne,columnTwo,columnThree,,,,
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
As you can see, Import-Csv cmdlet will fail with that file because of the extra null titles. I want to know how to count the extra commas at the end. The number of columns are always changing, and the name of the columns are also always changing. So we start the count based from the last non-null title number.
Right now, I'm doing the following...
$csvFileEdited = Get-Content $csvFile
$csvFileEdited[0] = $csvFileEdited[0].TrimEnd(',')
$csvFileEdited | Set-Content "$csvFile-temp"
Move-Item "$csvFile-temp" $csvFile -Force
Write-Host "Trim Complete."
This will make the file output like this...
columnOne,columnTwo,columnThree
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
The naming is now accepted for Import-Csv, but as you can see there is still extra null values that are not necessary since they are null for every row.
If I did the following code...
$csvFileWithExtraCommas = Get-Content $csvFile
$csvFileWithoutExtraCommas = #()
FOrEach ($line in $csvFileWithExtraCommas)
{
$line = $line.TrimEnd(',')
$csvFileWithoutExtraCommas += $line
{
$csvFileWithoutExtraCommas | Set-Content "$csvFile-temp"
Move-Item "$csvFile-temp" $csvFile -Force
Write-Host "Trim Complete."
Then it would remove a null value that should be null because it belongs to a non-null title-name. Such is the output....
columnOne,columnTwo,columnThree
valueOne
,valueTwo
,,valueThree
Here is the desired output:
columnOne,columnTwo,columnThree
valueOne,,
,valueTwo,
,,valueThree
Can anyone help with this?
Update
I'm using the following code to count the extra null titles...
$csvFileWithCommas = Get-Content $csvFile
[int]$csvFileWithExtraCommasNumber = $csvFileWithCommas[0].Length
$csvFileTitlesWithoutExtraCommas = $csvFileWithCommas[0].TrimEnd(',')
[int]$csvFileWithoutExtraCommasNumber = $csvFileTitlesWithoutExtraCommas.Length
$numOfCommas = $csvFileWithExtraCommasNumber - $csvFileWithoutExtraCommasNumber
The output of value of $numOfCommas is 4. Now the question is how can I use $line.TrimEnd(',') to only do so 4 times??
Ok.... If you really need to do this you can count the trailing commas from the header and use regex to remove as many the from the end of each line. There are other string manipulation approaches but the regex in this case is pretty clean.
Note that what Bluecakes answer shows should suffice. Perhaps there is some other hidden characters that are not being copied in the question or perhaps an encoding issue with your real file.
$file = Get-Content "D:\temp\text.csv"
# Number of trailing commas. Compare the length before and after the trim
$numberofcommas = $file[0].Length - $file[0].TrimEnd(",").Length
# Use regex to remove as many commas from the end of each line and convert to csv object.
$file -replace ",{$numberofcommas}$" | ConvertFrom-Csv
Regex is looking for X commas at the end of of each line where X is $numberofcommas. In our case it would look like ,{4}$
Source file used with above code was generated as such
#"
columnOne,columnTwo,columnThree,,,,
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
"# | set-content D:\temp\text.csv
Are you getting an error when trying to Import-csv? The cmdlet is smart enough to ignore columns without a heading without any additional code needed.
I copied your csv file to my H:\ drive:
columnOne,columnTwo,columnThree,,,,
valueOne,,,,,,
,valueTwo,,,,,
,,valueThree,,,,
and then ran $nullcsv = Import-Csv -Path H:\nullcsv.csv and this is what i got
PS> $nullcsv
columnOne columnTwo columnThree
--------- --------- -----------
valueOne
valueTwo
valueThree
The imported csv only contains 3 values as you would expect:
PS> $nullcsv.count
3
The cmdlet is also orrectly accounting for null values in each of the columns:
PS> $nullcsv | Format-List
columnOne : valueOne
columnTwo :
columnThree :
columnOne :
columnTwo : valueTwo
columnThree :
columnOne :
columnTwo :
columnThree : valueThree