Combine multiple Get-Content cmdlet in powershell - powershell

I am a beginner to PowerShell and would like to use it to perform an automated file editing. Below is my current work to make a "OR" delimited string:
$inputFile = "C:\Users\David Kao\Desktop\Powershell\Input\input.txt"
$outputFile = "C:\Users\David Kao\Desktop\Powershell\Output\output.txt"
$final = "C:\Users\David Kao\Desktop\Powershell\Output\final.txt"
(Get-Content $inputFile ) -replace ' \[.*','' -replace ' \(.*','' -replace ';','' -replace ',','' -replace '- ',''|
Where-Object { $_ -notmatch '[^\p{IsBasicLatin}]' }|
Sort-Object -Unique |
Set-Content $outputFile
(Get-Content $outputFile) -join '/' -replace '/','" OR "' -replace '\t" OR ', '' -replace '$', '"'|
Set-Content $final
Here is the sample data:
multiple sclerosis [A0484253/AOD/DE/0000006106]
ms [A1145632/BI/AB/BI00548]
multiple sclerosis [A0484254/BI/PT/BI00548]
MS [A0432904/CCPSS/PT/0056346]
MULTIPLE SCLEROSIS [A0433042/CCPSS/PT/0037395]
Multiple sclerosis [A0436411/CCS/MD/6.2.2]
Multiple sclerosis [A0436412/CCS/SD/80]
Multiple sclerosis [A31482484/CCSR_10/SD/NVS005]
disseminated sclerosis [A18685620/CHV/SY/0000008328]
insular sclerosis [A18685621/CHV/SY/0000008328]
MS [A18592794/CHV/SY/0000008328]
MS multiple sclerosis [A18685622/CHV/SY/0000008328]
multiple sclerosis [A18611430/CHV/SY/0000008328]
multiple sclerosis (MS) [A18555705/CHV/PT/0000008328]
multiple sclerosis MS [A18574147/CHV/SY/0000008328]
And the output would be something like:
"multiple scelorosis" OR "MS" OR "insular sclerosis" OR ....
So far as now, this code works well, but what I would like to achieve is to get rid of the second section and put it into the first section to make it more concise, efficient and professional. Something like this:
$inputFile = "C:\Users\helloworld\Desktop\Powershell\Input\input.txt"
$outputFile = "C:\Users\helloworld\Desktop\Powershell\Output\output.txt"
(Get-Content $inputFile ) -replace ' \[.*','' -replace ' \(.*','' -replace ';','' -replace ',','' -replace '- ',''|
Where-Object { $_ -notmatch '[^\p{IsBasicLatin}]' }|
Sort-Object -Unique |
(Get-Content $_) -join '/' -replace '/','" OR "' -replace '\t" OR ', '' -replace '$', '"'|
Set-Content $outputFile
I have googled a lot about this issue but got stuck for a while.
Can anyone help me out?
Thanks!
I am now adding ForEach-Object:
$inputFile = "C:\Users\David Kao\Desktop\Powershell\Input\input.txt"
$outputFile = "C:\Users\David Kao\Desktop\Powershell\Output\output.txt"
(Get-Content $inputFile ) -replace ' \[.*','' -replace ' \(.*','' -replace ';','' -replace ',','' -replace '- ',''|
Where-Object { $_ -notmatch '[^\p{IsBasicLatin}]' }|
Sort-Object -Unique |
ForEach-Object { $_ -join '/' -replace '/','" OR "' -replace '\t" OR ', '' -replace '$', '"'}|
Set-Content $outputFile
But it seems that join does not work it ForEach-Object, if I can fix this, then I think everything would be fine.

I would do the following, which assumes all spaces are a single space and names will only contain alpha characters and spaces.
$inputfile = Get-Content inputfile.txt
($inputfile -replace '(^[a-z][a-z ]*) [^a-z].*$','"$1"' -ne '' |
sort -unique) -join ' OR ' | Set-Content output.txt
See Regex for matching explanation. Capture group 1 is everything matched within the first () grouping. It is substituted in the replace string as $1.
-ne '' is to remove solitary blank lines. If there are other spaces on those lines you may need an additional -notmatch '^\s*$'.
If you have a custom definition of uniqueness, then the code will need to be altered slightly rather than doing just sort -unique.

here's another way to do this ... [grin]
what it does ...
creates a set of sample data to work with
when ready to work with real data, replace the entire #region/#endregion block with a Get-Content call.
splits the lines on the open [
filters out the lines that have a closing ']'
that gets rid of the unwanted remainder from the split.
filters out any lines that contain 2-or-more consecutive spaces
this gets rid of those odd lines that have just 4 spaces on them.
trims away any leading or trailing spaces
sorts the items and removes any exact dupes
joins the items with double-quoted OR strings
assigns that to $Result
adds the required leading and trailing double quotes to the previous string
assigns that to $FinalResult
displays that last item on screen
the code ...
#region >>> fake reading in a plain text file
# in real life, use Get-Content
$InStuff = #'
multiple sclerosis [A0484253/AOD/DE/0000006106]
ms [A1145632/BI/AB/BI00548]
multiple sclerosis [A0484254/BI/PT/BI00548]
MS [A0432904/CCPSS/PT/0056346]
MULTIPLE SCLEROSIS [A0433042/CCPSS/PT/0037395]
Multiple sclerosis [A0436411/CCS/MD/6.2.2]
Multiple sclerosis [A0436412/CCS/SD/80]
Multiple sclerosis [A31482484/CCSR_10/SD/NVS005]
disseminated sclerosis [A18685620/CHV/SY/0000008328]
insular sclerosis [A18685621/CHV/SY/0000008328]
MS [A18592794/CHV/SY/0000008328]
MS multiple sclerosis [A18685622/CHV/SY/0000008328]
multiple sclerosis [A18611430/CHV/SY/0000008328]
multiple sclerosis (MS) [A18555705/CHV/PT/0000008328]
multiple sclerosis MS [A18574147/CHV/SY/0000008328]
'# -split [System.Environment]::NewLine
#endregion >>> fake reading in a plain text file
# split on the open "["
$Result = (($InStuff -split '\[').
Where({
# filter out the lines that have a closing "]"
$_ -notmatch '\]' -and
# filter out the lines that have two-or-more consecutive spaces
$_ -notmatch '\s{2,}'
}).
# trim away any leading/trailing spaces
Trim() |
# sort the items and toss out any dupes.
# does a preliminary join with double-quoted OR strings
Sort-Object -Unique) -join '" OR "'
# adds leading and trailing double quotes
$FinalResult = '"{0}"' -f $Result
$FinalResult
output ...
"disseminated sclerosis" OR "insular sclerosis" OR "ms" OR "MS multiple sclerosis" OR "multiple sclerosis" OR "multiple sclerosis (MS)" OR "multiple sclerosis MS"

Related

How to replace a value in text file using powershell script

My file consists of following data (no header)
DEPOSIT ADD 123456789 (VALUE)(VARIABLE) NNNN VALUEVARIABLE
DEPOSIT ADD 234567890 (VALUE)(P75) NNNN VALUEVARIABLE
DEPOSIT ADD 345678901 (VALUE)(VARIABLE) NNNN VALUEVARIABLE
This is a tab delimited text file.
There are total of 5 columns. (123456789 (VALUE)(VARIABLE) is a single value column)
My requirements are:
I need to fetch only the row which contains P75 to update in the same file.
I have to replace the values in Col3,Col4 and in Col5 after fetching P75 other rows should be unaffected.
from
DEPOSIT ADD 234567890 (VALUE)(P75) NNNN VALUEVARIABLE
to
DEPOSIT ADD 234567890 (VTG)(SPVTG) TCM VTGSPVTG
Only the records which contains P75 should be updated like this. The replace values are same for all selected records.
My script which I have written is
$original_file='C:\Path\20200721130155_copy.txt' -header Col1,Col2,Col3,Col4,Col5,| Select Col3,Col4,Col5
(Get-Content $original_file) |ForEach-Object {
if($_.Col3 -match '(VALUE)(P75)')
{
$_ -replace '(VALUE)(P75)', '(VTG)(SPVTG)' `
-replace 'VALUEVARIABLE', 'VTGSPVTG' `
-replace 'NNNN', 'TCM' `
}
$_
}| Set-Content $original_file+'_new.txt' -Force
I am getting output file with same content. The file is not getting updated.
Please advice.
Thanks
You can do the following:
$newfile = "{0}_new.txt" -f $original_file
Get-Content $original_file | Foreach-Object {
if ($_ -match '\(VALUE\)\(P75\)') {
$_ -replace '\(VALUE\)\(P75\)','(VTG)(SPVTG)' -replace 'VALUEVARIABLE', 'VTGSPVTG' -replace 'NNNN', 'TCM'
} else {
$_
}
} | Set-Content $newfile -Force
Since -replace uses regex, you must backslash escape special regex characters like ( and ).
Since $_ is the current line read from Get-Content without -Raw, you will need to output $_ if you want to make no changes. If you do want to replace text, then $_ -replace 'regex','text' will output that line with the replaced text.
Alternatively, you can apply the same logic above in a switch statement, which is more efficient:
$newfile = "{0}_new.txt" -f $original_file
$(switch -regex -file $original_file {
'\(VALUE\)\(P75\)' {
$_ -replace '\(VALUE\)\(P75\)','(VTG)(SPVTG)' -replace 'VALUEVARIABLE', 'VTGSPVTG' -replace 'NNNN', 'TCM'
}
default { $_ }
}) | Set-Content $newfile -Force
You can use Import-Csv -Delimiter "`t" to import the data of the original file.
Then loop over the items and change the values if Col3 matches the search text.
$original_file = 'C:\Path\20200721130155_copy.txt'
$out_file = $original_file -replace '\.txt$', '_new.txt'
(Import-Csv -Path $original_file -Delimiter "`t" -Header 'Col1','Col2','Col3','Col4','Col5') | ForEach-Object {
if( $_.Col3 -like '*(*)*(P75)') {
$_.Col3 = $_.Col3 -replace '\([^)]+\)\(P75\)$', '(VTG)(SPVTG)'
$_.Col4 = 'TCM'
$_.Col5 = 'VTGSPVTG'
}
# rejoin the fields and output the line
$_.PsObject.Properties.Value -join "`t"
} | Set-Content -Path $out_file -Force
Output will be
DEPOSIT ADD 123456789 (VALUE)(VARIABLE) NNNN VALUEVARIABLE
DEPOSIT ADD 234567890 (VTG)(SPVTG) TCM VTGSPVTG
DEPOSIT ADD 345678901 (VALUE)(VARIABLE) NNNN VALUEVARIABLE
Regex details for -replace:
\( Match the character “(” literally
[^)] Match any character that is NOT a “)”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\) Match the character “)” literally
\( Match the character “(” literally
P75 Match the characters “P75” literally
\) Match the character “)” literally
$ Assert position at the end of the string (or before the line break at the end of the string, if any)

Change pipe delimited file to comma delimited in Powershell

I have a pipe delimited .TXT file. I need to change the delimiter to a comma instead but still keep the file extension as .TXT. The file looks like this:
Column 1 |Column 2
13|2019-09-30
96|2019-09-26
173|2019-09-25
I am using Windows Powershell 5.1 version for my script.
I am using the following code:
$file = New-Object System.IO.StreamReader -Arg "c:\file.txt"
$outstream = [System.IO.StreamWriter] "c:\out.txt"
while ($line = $file.ReadLine()) {
$s = $line -replace '|', ','
$outstream.WriteLine($s)
}
$file.close()
$outstream.close()
Instead of just replacing the pipe with a comma, the output file looks like this:
C,o,l,u,m,n, 1 , |,C,o,l,u,m,n, 2
1,3,|,2,0,1,9,-,0,9,-,3,0
9,6,|2,0,1,9,-,0,9,-,2,6
1,7,3,|,2,0,1,9,-,0,9,-,2,5
The only problem with your answer is in how you try to replace the | characters in the input:
$s = $line -replace '|', ',' # WRONG
PowerShell's -replace operator expects a regex (regular expression) as its first RHS operand, and | is a regex metacharacter (has special meaning)[1]; to use it as a literal character, you must \-escape it:
# '\'-escape regex metacharacter '|' to treat it literally.
$s = $line -replace '\|', ','
While PowerShell's -replace operator is very flexible, in simple cases such as this one you can alternatively use the [string] type's .Replace() method, which performs literal string replacements and therefore doesn't require escaping (it's also faster than -replace):
# Use literal string replacement.
# Note: .Replace() is case-*sensitive*, unlike -replace
$s = $line.Replace('|', ',')
[1] | denotes an alternation in a regex, meaning that the subexpressions on either side are matched against the input string and one of them matching is sufficient; if your full regex is just |, it effectively matches the empty string before and after each character in the input, which explains your symptom; e.g., 'foo' -replace '|', '#' yields #f#o#o#
You can use Import-Csv and Export-Csv by specifying the -Delimiter.
Import-Csv -Delimiter '|' -Path "c:\file.txt" | Export-Csv -Delimiter ',' -Path "c:\file.txt" -NoTypeInformation
You will find the -split and -join operators to be of interest.
Get-Content -Path "C:\File.TXT" | ForEach-Object { ($_ -split "\|") -join "," } | Set-Content -Path "C:\Out.TXT"

Split untill end of line powershell

I'm trying to do a script in PowerShell which adds a hyphen after every each 2 characters which is in a text file, and i have done it but I am facing an issue which is.
Code >
$file = get-content .\textfile.txt
($file -split "([a-z0-9]{2})" | ?{ $_.length -ne 0 }) -join "-" | Set-Content .\textfile.txt
If i have a value like below in a .txt file
000000000000
111111111111
Output is coming like.
00-00-00-00-00-00-11-11-11-11-11-11
I need an output like.
00-00-00-00-00-00
11-11-11-11-11-11
Kindly suggest what should i have to change.
Get-Content removes all the newlines, and outputs strings to the pipeline, one for each line.
$file is an array of two strings, #('000000000000', '111111111111'). When you -split it applies to both of the strings, and it turns into #('00', '00', '00', '00', '00', '00', '11', '11', '11', '11', '11', '11') and now you cannot tell where the lines start or end.
To fix it, you need to process each line separately:
(Get-Content .\textfile.txt) | ForEach-object {
($_ -split "([a-z0-9]{2})" |? { $_ }) -join "-"
} | Set-Content .\textfile.txt
Or change what you're doing to do a replace, that will work within the lines instead of merging them together:
(gc .\textfile.txt) -replace '([a-z0-9]{2})\B', '$1-' | sc textfile.txt
and the \B stops it from putting a - at the end of the line.
As answered on the TechNet Forums at https://social.technet.microsoft.com/Forums/en-US/e60e33d2-f065-4219-82cf-5797aaf10891/split-foreach-line?forum=winserverpowershell
This will add a dash (-) after every two characters in a line. Since it will also do this at the end of a line, we trim it.
$inputFile = Get-Content -Path C:\temp\file.txt
$newFile = foreach ($line in $inputFile) {
($line -replace '(..)', '$1-').trim('-')
}
$newFile | Set-Content -Path C:\temp\file.txt

Sort and export-CSV

I have a csv file containing rows of the following extract:
"EmployeeID","FirstName","LastName","Location","Department","TelephoneNo","Email"
"000001 ","abc ","def ","Loc1"," "," ","name1#company.com "
"000023 ","ghi ","jkl ","Loc2"," "," ","name2#company.com "
"000089 ","mno ","pqr ","Loc2"," "," ","name3#company.com "
How do I keep the quotes and sort and save as a csv file?
I have the following powershell source script which works with csv files not having double quotes for the columns:
Get-Content $Source -ReadCount 1000 |
ConvertFrom-Csv -Delimiter $Delimiter |
Sort-Object -Property $NamesOfColumns -Unique |
ForEach-Object {
# Each of the values in $ColumnValueFormat must be executed to get the property from the loop variable ($_).
$values = foreach ($value in $ColumnValueFormat) {
Invoke-Expression $value
}
# Then the values can be passed in as an argument for the format operator.
$ShowColsByNumber -f $values
} |
Add-Content $Destination;
The $Source, $Delimiter, $NamesOfColumns and $ColumnValueFormat are given or built dynamically.
$ColumnValueFormat with a non quoted csv file contains:
$_.EmployeeID.Trim()
$_.FirstName.Trim()
$_.LastName.Trim()
$_.Location.Trim()
$_.Department.Trim()
$_.TelephoneNo.Trim()
$_.Email.Trim()
$ColumnValueFormat with a quoted csv file contains:
$_."EmployeeID".Trim()
$_."FirstName".Trim()
$_."LastName".Trim()
$_."Location".Trim()
$_."Department".Trim()
$_."TelephoneNo".Trim()
$_."Email".Trim()
The problem seems to be based around the $ColumnValueFormat that is placing the column headers with the double quotes. (If I remove them I am not sure the internals of the cmdlet will recognize the column headings when it is processing the rows)
I am having two problems:
The column heading surrounded by the double quotes. The problem seems to be based around the $ColumnValueFormat that is placing the column headers with the double quotes as it does not process the rows. (If I remove the double quotes then it does not recognize the column headings when it is processing the rows).
Another problem I came across last minute is if the last column is blank it thinks it's a null and when the Invoke-Expression $value executes (where $value holds the last column expression of $_.Email.Trim() - on a non quoted CSV file) it bombs. If I try to place the statement in a try/catch block it simply ignore it the last column is not added to the $values array and again bombs.
Quotes around property names are used syntactically to access names with spaces, not to write quotes to the output.
Export-Csv cmdlet doesn't have an option to force quotes so we'll have to export the CSV manually. And we'll have to process empty values that are $Null after ConvertFrom-Csv with an empty string. In case only some fields are needed we'll use Select cmdlet with -index parameter.
Get-Content $Source |
ConvertFrom-Csv |
%{ $header = $false } {
if (!$header) {
$header = $true
'"' + (
($csv[0].PSObject.Properties.Name.trim() |
select -index 1,6
) -join '","'
) + '"'
}
'"' + (
($_.PSObject.Properties.Value |
%{ if ($_) { $_.trim() } else { '' } } |
select -index 1,6
) -join '","'
) + '"'
} | Out-File $Destination
The above code is great for pass-through processing of large CSV files because it doesn't keep the entire file in memory. Otherwise it's possible to simplify the code a bit:
$csv = Get-Content $Source | ConvertFrom-Csv
$csv | %{
'"' + (
($csv[0].PSObject.Properties.Name.trim() |
select -index 1,6
) -join '","'
) + '"'
} {
'"' + (
($_.PSObject.Properties.Value |
%{ if ($_) { $_.trim() } else { '' } } |
select -index 1,6
) -join '","'
) + '"'
) | Out-File $Destination

Find and Replace character only in certain column positions in each line

I'm trying to write a script to find all the periods in the first 11 characters or last 147 characters of each line (lines are fixed width of 193, so I'm attempting to ignore characters 12 through 45).
First I want a script that will just find all the periods from the first or last part of each line, but then if I find them I would like to replace all periods with 0's, but ignore periods on the 12th through 45th line and leaving those in place. It would scan all the *.dat files in the directory and create period free copies in a subfolder. So far I have:
$data = get-content "*.dat"
foreach($line in $data)
{
$line.substring(0,12)
$line.substring(46,147)
}
Then I run this with > Output.txt then do a select-string Output.txt -pattern ".". As you can see I'm a long ways from my goal as presently my program is mashing all the files together, and I haven't figured out how to do any replacement yet.
Get-Item *.dat |
ForEach-Object {
$file = $_
$_ |
Get-Content |
ForEach-Object {
$beginning = $_.Substring(0,12) -replace '\.','0'
$middle = $_.Substring(12,44)
$end = $_.Substring(45,147) -replace '\.','0'
'{0}{1}{2}' -f $beginning,$middle,$end
} |
Set-Content -Path (Join-Path $OutputDir $file.Name)
}
You can use the powershell -replace operator to replace the "." with "0". Then use substring as you do to build up the three portions of the string you're interested in to get the updated string. This will output an updated line for each line of your input.
$data = get-content "*.dat"
foreach($line in $data)
{
($line.SubString(0,12) -replace "\.","0") + $line.SubString(13,34) + ($line.substring(46,147) -replace "\.","0")
}
Note that the -replace operator performs a regular expression match and the "." is a special regular expression character so you need to escape it with a "\".