extract 2 column rages from TXT; insert comma delimiter; save as CSV - powershell

I have a .txt file that does not have delimiters. I want to extract 2 column ranges from the file, and separate them by a comma delimiter. i then want to save the resulting data as a CSV.
For example, here is a 'raw' string:
abcdefghij
I want the script to convert it to this:
abc,h
I know GC / SC; I just need to know how to do the string manipulation.

Assuming that your columns are delimited by a start and end index, you could do something like this:
get-content raw.txt | %{ "$($_[$0..2] -join ''),$($_[7..7] -join '')"}

Using the -replace operator:
$text = 'abcdefghij'
$text -replace '(.{3}).{4}(.).+','$1,$2'
abc,h

Oh boy, this looks like fun.
How about:
"abcdefghij" | %{$_.Remove(3), $_[7] -join ","}
I think it would be more helpful to see the actual data you're trying to extract from. There are dozens of ways to do string manipulation and your choice is going to depend heavily on what it is you're trying to extract.

Related

I want to remove certain parts of a certain string in Powershell 7.2.4, how do I do it?

I have the string "url": "https://maven.fabricmc.net/net/fabricmc/fabric-installer/0.11.0/fabric-installer-0.11.0.jar". I want to remove everything from the string except https://maven.fabricmc.net/net/fabricmc/fabric-installer/0.11.0/fabric-installer-0.11.0.jar (I want to remove the quotes as well).
How do I go about it?
$str = """url"": ""https://maven.fabricmc.net/net/fabricmc/fabric-installer/0.11.0/fabric-installer-0.11.0.jar"""
$str.Split(""": """)[1].Replace("""","")
The shortest I could think of
Might have the incorrect indexnumber, just test switching '3'
( $str -split """ )[3]
You could use the -replace operator and extract the section you want.
'"url": "https://maven.fabricmc.net/net/fabricmc/fabric-installer/0.11.0/fabric-installer-0.11.0.jar"' -replace '"url": "(.+)"','$1'
However since this appears to be in JSON format you may be better off using ConvertFrom-Json on the entire thing and then accessing the property through dot notation.
It's slightly easier than trying to regex or string match sections of JSON.
$converted = Get-Content file.json | ConvertFrom-Json
$converted.url

Remove commas from numbers in a CSV

I have folder info for all user folders. It is dumped out to a CSV file as follows:
Servername, F:\Users\user, 9,355.7602 MB, 264, 3054, 03/15/2000 13:28:48, 12/10/2018 11:58:29
We are unable to work with the data as is due to the thousands separator in the 3rd column. I could run the report scripts again, but we have a lot of file servers and a large number of users on one in particular, so running it again is very time consuming. The reason the commas are there is that the data was written as a string not a number.
I can import and convert, the only problem is that any number over 1000 will be wrong and then all other data is 1 column off. I would like to replace any comma between 2 numbers. It doesn't seem it would be that hard to do with PowerShell, but I am not having any luck finding anything.
If you assume that columns of data are comma plus space separated and your numbers have no spaces, you can use the -replace operator for this.
$line = 'Servername, F:\Users\user, 9,355.7602 MB, 264, 3054, 03/15/2000 13:28:48, 12/10/2018 11:58:29'
$line -replace '(?<=\d),(?=\d)'
If you are reading the data from a file, you can read the data with Get-Content, replace your data, and update the file with Set-Content.
(Get-Content file.csv) -replace '(?<=\d),(?=\d)' | Set-Content file.csv
If the file is large, you can utilize the faster switch statement.
$data = switch -regex -file file.csv {
'(?<=\d),(?=\d)' { $_ -replace '(?<=\d),(?=\d)' }
default {$_}
}
$data | Set-Content file.csv
Explanation:
(?<=\d) uses a positive lookbehind assertion (?<=) that matches a single digit \d.
(?=\d) uses a positive lookahead assertion (?=) that matches a single digit. You could replace this with (?=\d{3}) to match 3 consecutive digits after the comma.
Since you want to replace the target comma with empty string, you do not need a replacement string.
Typically, it would be best to stick with commands that work with CSV data or files. However, if your data contains commas and you aren't qualifying your text, it may be difficult to distinguish between data and delimiters. If you have a clear way of making that distinction, you are better off using ConvertFrom-Csv for already read data or Import-Csv for files. You will need to define headers either in the files or in the command.
EDIT
It was my oversight that the , in the dataset is not delimited, which causes this answer to not work as expected as the comma is seen as a column separator when parsing the CSV. I'm going to leave it as it does explain how to generally manipulate the data as you'd expect, if the column data were escaped property. However, #AdminOfThings' answer below should work for your specific case here, and will fix the erroneous defined column without relying on parsing the CSV content as a CSV first.
Import the data using Import-Csv, then remove any , in the third column. This assumes that you have no values where , is the decimal separator:
If you have headers in the CSV, you won't need to define header names or get fancy with writing the CSV back out:
Import-Csv -Path \path\to\file.csv | Foreach-Object {
$_.ColumnName = $_.ColumnName -replace ','
} | Export-Csv -NoTypeInformation -Path \path\to\file.csv
The way this works is that we import the CSV as an operable PSCustomObject, then for each line we take whatever the column name with the size is and remove the , from it. Finally, we export the modified PSCustomObject back out to the original CSV.
If you don't have headers, it gets a little trickier since we have to define temporary headers, but Export-Csv doesn't have an option to skip writing out headers:
Import-Csv -Path \path\to\file.csv -Headers Col1, Col2, Col3, Col4, Col5, Col6, Col7 |
Foreach-Object {
$_.Col3 = $_.Col3 -replace ','
} | ConvertTo-Csv | Select-Object -Skip 1 |
Set-Content -Path \path\to\file.csv
This does the same thing as the first block of code, but since we don't want to export the temporary headers, we have to get creative. First, note we reference the target column with the temporary header name. Instead of piping the modified CSV object right to Export-Csv, first we want to convert the object to CSV using ConvertTo-Csv. We then use Select-Object to skip the first line of the converted CSV text, which is the header, so we just have the row data and column values. Finally, we use Set-Content to write the CSV text without the header back to the original file.

how to convert from Newline Delimited text file to csv with Powershell

I have a text file containing the below test data:
1234
\\test
QATest
Silk
Chrome
I have a requirement to convert this text file into a CSV file using Powershell which would look somewhat like
1234,\\test,QATest,Silk,Chrome
Could anybody please suggest the right way?
Read all the lines in, then concatenate them using the -join operator:
$originalLines = Get-Content .\input.txt
$originalLines -join ',' |Set-Content .\output.csv

Replace CRLF in file

I am actually trying to make a .txt file looking like this:
text1
text 2
text3
text4
in something like this:
text1,text2
text3,text4
I have tried a lot of things, like:
foreach($line in $file){
$line=$line -replace'`r`n', ','
}
Or directly by replacing in the file and not the string:
$file=Get-Content('testo.txt')
$file=$file -replace '`r`n',''
But nothing worked. I have never been able to replace the CRLF.
If someone has an idea or anything!
Something even simpler I think. This would depend on the size of the text file since it has to be all read into memory. However this would will work with variable lines of "test" data and white space. I am going to assume that your data does not already contain commas.
$file = (Get-Content "c:\temp\data.txt") -join "," -replace ",{2,}","`r`n"
Set-Content -Path "C:\temp\Newfile.txt" -Value $file
Take the file and convert it into a comma delimited string. What will happen is that you will see a series of groups commas. Using your test data
text1,text 2,,,text3,text4
The we use regex to replace and consecutive comma groups of 2 or more with newlines. Getting the desired output.
text1,text 2
text3,text4
In fact i have done something like this:
$Writer = New-Object IO.StreamWriter "$($PWD.Path)\$OutputFile"
$Writer.Write( [String]::Join(".", $data) )
$Writer.Close()
$data=get-content temp.txt
$data=$data -replace '[#]{2,}',"`r`n"
$data=$data -replace '[#]',","
$data=$data -replace '( ){1,},',"no data"
$data>>temp1.txt
It works and do what I want. It is not exactly efficient, but I wanted it to work quickly.
I will try all the answers when I have the time and say if it works or not.
And maybe I will change this part of my script if I find something more efficient.
Thanks to all!

Count characters in string then insert delimiter using PowerShell

I have a linux server that will be generating several files throughout the day that need to be inserted in to a database; using Putty I can sftp them off to a server running SQL 2008. Problem is is the structure of the file itself, it has a string of text that are to be placed in different columns, but bulk insert in sql tries to put it all in to one column instead of six. Powershell may not be the best method, but I have seen on several sites how it can find and replace or append to the end of the line, can it count and insert?
So the file looks like this: '18240087A +17135555555 3333333333', where 18, 24, 00, 87, A are different columns, then there is a blank space between the A and the +, that is character count 10-19 which is another column, then characters 20-30 are a column, characters 31-36 are a space which is new column and so on. So I want to insert a '|' or a ',' so that sql understands where the columns end. Is this possible for PowerShell to count randomly?
This may not be the way to respond to all who did answer, i apologize in advance. As this is my first PowerShell script, I appreciate the input from each of you. This is an Avaya SIP server that is generating CDR records, which I must pull from the server and insert in to SQL for later reports. The file exported looks like this:
18:47 10/15
18470214A +14434444444 3013777777 CME-SBC HHHH-CM 4 M00 0
At first I just thought to delete the first line and run a script against the output, which I modified from Kieranties post:
$test = Get-Content C:\Share\CDR\testCDR.txt
$pattern = "^(.{2})(.{2})(.{1})(.{2})(.{1})(.{1})\s*(.{15})(.{10})\s*(.{7})\s*(.{7})\s*(.{1})\s*(.{1})(.{1})(.{1})\s*(.*)$"
if($test -match $pattern){
$result = $matches.Values | select -first ($matches.Count-1)
[array]::Reverse($result, 0, $result.Length)
$result = $result -join "|"
$result | Out-File c:\Share\CDR\results1.txt
}
But then i realized I need that first line as it contains the date. I can try to work that out another way though.
I also now see that there are times when the file contains 2 or more lines of CDR info, such as:
18:24 10/15
18240087A +14434444444 3013777777 CME-SBC HRSA-CM 4 M00 0
18240096A +14434444445 3013777778 CME-SBC HRSA-CM 4 M00 0
Whereas the .ps1 file I made does not give the second string, so I tried adding in this:
foreach ($Data in $test)
{
$Data = $Data -split(',')
and it fails to run. How can I do multiple lines (and possibly that first line)? If you know of a tutorial that can help, that's greatly appreciated as well!
PowerShell is a great tool that I love and it can do many things. I see that you are using SQL Server 2008. Depending on the edition of SQL Server you have running on the server, it most likely has SQL Server Integration Services (SSIS), which is an Extract, Transform, and Load (ETL) tool designed to help migrate data in many scenarios, such as yours. The file you describe here is sounds like a fixed width file, which SSIS can easily handle and import and SQL Server has great ways to automate the loads if this is a recurring need (Which it sounds like), including the automation of the sftp task, and even running PowerShell scripts as part of the ETL (I've done that several times).
If your file truly is fixed width and you want to use PowerShell to transform it into a delimited file, the regex approach you have in your answer works well, or there are several approaches using the System.String methods, like .insert() which allows you to insert a delimiter character using a character index in your line (use Get-Content to read the file and create one String object per line, then loop through them using Foreach loop or Foreach-Object and the pipeline). A slightly more difficult approach would be to use the .Substring() method. You could build your new String line using Substring to extract each column and concatenating those values with a delimiter. That's probably a lot for someone new to PowerShell, but one of the best ways to learn and gain proficiency with it is to practice writing the same script multiple ways. You can learn new techniques that may solve other problems you might encounter in the future.
This is a way (really ugly IMO, I think it can better done):
$a = '18240087A +17135555555 3333333333'
$b = #( ($a[0..1] -join ''), ($a[2..3] -join ''), ($a[4..5] -join ''),
($a[6..7] -join ''), ($a[8] -join ''), ($A[10..19] -join ''),
($a[20..30] -join ''), ($a[31..36] -join ''))
$c = $b -join '|'
$c
18|24|00|87|A|+171355555|55 33333333|33
I don't know if is the rigth splitting you need, but changing the values in each [x..y] you can do what better fit your need. Remenber that character array are 0-based, then the first char is 0 and so on.
I don't quite follow the splitting rules. What kind of software writes the text file anyway? Maybe it can be instructed to change the structure?
That being said, inserting pipes is easy enough with .Insert()
$a= '18240087A +17135555555 3333333333'
$a.Substring(0, $a.IndexOf('+')).Insert(2, '|').insert(5,'|').insert(8, '|').insert(11, '|').insert(13, '|')
# Output: 18|24|00|87|A|
# Rest of the line:
$a.Substring($a.IndexOf('+')+1)
# Output: 17135555555 3333333333
From there you can proceed to splitting the rest of the row data.
I've improved my answer based on your response (note, it's probably best you update your actual question to include that information!)
The nice thing about Get-Content in Powershell is that it returns the content as an array split on the end of line characters. Couple that with allowing multiple assignment from an array and you end up with some neat code.
The following has a function to process each line based on your modified version of my original answer. It's then wrapped by a function which processes the file.
This reads the given file, setting the first line to $date and the rest of the content to $content. It then creates an output file adds the date to the output, then loops over the rest of the content performing the regex check and adding the parsed version of the content if the check is successful.
Function Parse-CDRFileLine {
Param(
[string]$line
)
$pattern = "^(.{2})(.{2})(.{1})(.{2})(.{1})(.{1})\s*(.{15})(.{10})\s*(.{7})\s*(.{7})\s*(.{1})\s*(.{1})(.{1})(.{1})\s*(.*)$"
if($line -match $pattern){
$result = $matches.Values | select -first ($matches.Count-1)
[array]::Reverse($result, 0, $result.Length)
$result = $result -join "|"
$result
}
}
Function Parse-CDRFile{
Param(
[string]$filepath
)
# Read content, setting first line to $date, the rest to $content
$date,$content = Get-Content $filepath
# Create the output file, overwrite if neccessary
$outputFile = New-Item "$filepath.out" -ItemType file -Force
# Add the date line
Set-Content $outputFile $date
# Process the rest of the content
$content |
? { -not([string]::IsNullOrEmpty($_)) } |
% { Add-Content $outputFile (Parse-CDRFileLine $_) }
}
Parse-CDRFile "C:\input.txt"
I used your sample input and the result I get is:
18:24 10/15
18|24|0|08|7|A|+14434444444 30|13777777 C|ME-SBC |HRSA-CM|4|M|0|0|0
18|24|0|09|6|A|+14434444445 30|13777778 C|ME-SBC |HRSA-CM|4|M|0|0|0
There are an incredible amount of resources out there but one I particularly suggest is Douglas Finkes Powershell for Developers It's short, concise and full of great info that will get you thinking in the right mindset with Powershell