Get a part of a text into a variable - powershell

I have a .txt file with a content that is delimited by a blank line.
Eg.
Question1
What is your favourite colour?
Question2
What is your hobby?
Question3
What kind of music do you like?
...and so on.
I would like to put each of the text questions into an array.
I tried this
$path=".\Documents\Questions.txt"
$shareArray= gc $path
But it gives me every line into a variable.
Can someone give me a tip?
Thanks

This is a different approach which can handle multiline questions and doesn't need a seperating blank line. The split is ^Question but this text is not excluded. Output is to Out-Gridview.
## LotPings 2016-11-26
$InFile = ".\Questions.txt"
## prepare Table
$Table = New-Object system.Data.DataTable
$col = New-Object system.Data.DataColumn "QuestionNo",([string])
$Table.columns.add($col)
$col = New-Object system.Data.DataColumn "QuestionBody",([string])
$Table.columns.add($col)
## prepare RegEx for the split
$Delimiter = [regex]'Question'
$Split = "(?!^)(?=$Delimiter)"
(Get-Content $InFile -Raw) -split $Split |
ForEach-Object {
If ($_ -match '(?smi)^(?<QuestionNo>Question\d+)( *)(?<QuestionBody>.*)$') {
$Row = $Table.Newrow()
$Row.QuestionNo = $matches.QuestionNo.Trim()
$Row.QuestionBody = $matches.QuestionBody.Trim()
$Table.Rows.Add($Row)
} else {Write-Host "no Match"}
}
$Table | Out-Gridview

If you just want an array, you could filter on whether the lines have ? at the end or not:
$Questions = Get-Content $path |Where-Object {$_.Trim() -match '\?$'}
If you want to store the questions by the preceding name, you could use a hashtable.
Start by reading the file as a single string, then split by two consecutive line breaks:
$Blocks = (Get-Content $path -Raw) -split '\r?\n\r?\n' |ForEach-Object { $_.Trim() }
If you want both lines in each array item, you can stop here.
Otherwise split each block into a "question name" and "question" part, use those to populate your hashtable:
$Questions = #{}
$Blocks |ForEach-Object {
$Name,$Question = $_ -split '\r?\n'
$Questions[$Name.Trim()] = $Question
}
Now you can access the questions like:
$Questions['Question1']

other solution 1
select-string -Path "C:\temp\test.txt" -Pattern "Question[0-1]*" -Context 0,1 |
% {$_.Context.PostContext} |
out-file "c:\temp\result.txt"

other solution 2
$template=#"
{Question*:Question1}
{Text:an example of question?}
{Question*:Question2}
{Text:other example of question with Upper and digits 12}
"#
gc "C:\temp\test.txt" | ConvertFrom-String -TemplateContent $template | select Text

other solution 3
gc "C:\temp\test.txt" | ?{$_ -notmatch 'Question\d+' -and $_ -ne "" }

Related

Powershell - Remove Duplicate lines in TXT based on ID

I have a TXT-File with thousands of lines. The number after the first Slash is the image ID.
I want to delete all lines so that only one line remains for every ID. Which of the lines is getting killed doesn't matter.
I tried to pipe the TXT to a CSV with Powershell and work with the unique parameter. But it didnt work. Any ideas how I can iterate through the TXT and kill all lines, so that always only one line per unique ID remains? :/
Status Today
thumbnails/4000896042746/2021-08-17_4000896042746_small.jpg
thumbnails/4000896042746/2021-08-17_4000896042746_smallX.jpg
thumbnails/4000896042333/2021-08-17_4000896042746_medium.jpg
thumbnails/4000896042444/2021-08-17_4000896042746_hugex.jpg
thumbnails/4000896042333/2021-08-17_4000896042746_tiny.jpg
After the script
thumbnails/4000896042746/2021-08-17_4000896042746_small.jpg
thumbnails/4000896042333/2021-08-17_4000896042746_medium.jpg
thumbnails/4000896042444/2021-08-17_4000896042746_hugex.jpg
If it concerns "TXT-File with thousands of lines", I would use the PowerShell pipeline for this because (if correctly setup) it will perform the same but uses far less memory.
Performance improvements might actually be leveraged from using a HashTable (or a HashSet) which is based on a binary search (and therefore much faster then e.g. grouping).
(I am pleading to get an accelerated HashSet #16003 into PowerShell)
$Unique = [System.Collections.Generic.HashSet[string]]::new()
Get-Content .\InFile.txt |ForEach-Object {
if ($Unique.Add(($_.Split('/'))[-2])) { $_ }
} | Set-Content .\OutFile.txt
To add to iRons great answer, I've done a speed comparison on 5 different ways to do it using 250k lines of the OPs' example.
Using the Get-Content -raw read and write using Set-Content method is the fastest way to do it. At least in these examples, as it is nearly 3x faster than using Get-Content and Set-Content.
I was curious to see how the HashSet method stacked up against the System.Collections.ArrayList one. And as you can see from the result below for that it's not too dissimilar.
Edit note Got the -raw switch to work as it needed splitting by a new line.
$fileIn = "C:\Users\user\Desktop\infile.txt"
$fileOut = "C:\Users\user\Desktop\outfile.txt"
# All examples below tested with 250,000 lines
# In order from fastest to slowest
#
# EXAMPLE 1 (Fastest)
#
# [Finished in 2.4s]
# Using the -raw switch only with Get-Content
$Unique = [System.Collections.Generic.HashSet[string]]::new()
$fileInSplit = (Get-Content -raw $fileIn).Split([Environment]::NewLine,[StringSplitOptions]::None)
$fileInSplit |ForEach-Object {
if ($Unique.Add(($_.Split('/'))[-2])) { $_ }
} | Set-Content $fileOut
#
# EXAMPLE 2 (2nd fastest)
#
# [Finished in 2.5s]
# Using the -raw switch with Get-Content
# Using [IO.File] for write only
$Unique = [System.Collections.Generic.HashSet[string]]::new()
$fileInSplit = (Get-Content -raw $fileIn).Split([Environment]::NewLine,[StringSplitOptions]::None)
$contentToWriteArr = New-Object System.Collections.ArrayList
$fileInSplit |ForEach-Object {
if ($Unique.Add(($_.Split('/'))[-2])) { [void]$contentToWriteArr.Add($_) }
}
[IO.File]::WriteAllLines($fileOut, $contentToWriteArr)
# #
# EXAMPLE 3 (3rd fastest example)
#
# [Finished in 2.7s]
# Using [IO.File] for the read and write
$Unique = [System.Collections.Generic.HashSet[string]]::new()
$fileInSplit = ([IO.File]::ReadAllLines($fileIn)).Split([Environment]::NewLine,[StringSplitOptions]::None)
$contentToWriteArr = [Collections.Generic.HashSet[string]]::new()
$fileInSplit |ForEach-Object {
if ($Unique.Add(($_.Split('/'))[-2])) { $contentToWriteArr.Add($_) | out-null }
}
[IO.File]::WriteAllLines($fileOut, $contentToWriteArr)
#
# EXAMPLE 4 (4th fastest example)
#
# [Finished in 2.8s]
# Using [IO.File] for the read only
$Unique = [System.Collections.Generic.HashSet[string]]::new()
$fileInSplit = ([IO.File]::ReadAllLines($fileIn)).Split([Environment]::NewLine,[StringSplitOptions]::None)
$fileInSplit |ForEach-Object {
if ($Unique.Add(($_.Split('/'))[-2])) { $_ }
} | Set-Content $fileOut
#
# EXAMPLE 5 (5th fastest example)
#
# [Finished in 2.9s]
# Using [IO.File] for the read and write
# This is using a System.Collections.ArrayList instead of a HashSet
$Unique = [System.Collections.Generic.HashSet[string]]::new()
$fileInSplit = ([IO.File]::ReadAllLines($fileIn)).Split([Environment]::NewLine,[StringSplitOptions]::None)
$contentToWriteArr = New-Object System.Collections.ArrayList
$fileInSplit |ForEach-Object {
if ($Unique.Add(($_.Split('/'))[-2])) { $contentToWriteArr.Add($_) | out-null }
}
[IO.File]::WriteAllLines($fileOut, $contentToWriteArr)
#
# EXAMPLE 6 (Slowest example) - As per iRons answer
#
# [Finished in 7.2s]
$Unique = [System.Collections.Generic.HashSet[string]]::new()
$fileInSplit = Get-Content $fileIn
$fileInSplit |ForEach-Object {
if ($Unique.Add(($_.Split('/'))[-2])) { $_ }
} | Set-Content $fileOut
Here a solution that uses a calculated property to create an object that contains the ID and the FileName. Then I group the result based on the ID, iterate over each group and select the first FileName:
$yourFileList = #(
'thumbnails/4000896042746/2021-08-17_4000896042746_small.jpg',
'thumbnails/4000896042746/2021-08-17_4000896042746_smallX.jpg',
'thumbnails/4000896042333/2021-08-17_4000896042746_medium.jpg',
'thumbnails/4000896042444/2021-08-17_4000896042746_hugex.jpg',
'thumbnails/4000896042333/2021-08-17_4000896042746_tiny.jpg'
)
$yourFileList |
Select-Object #{Name = "Id"; Expression = { ($_ -split '/')[1] } }, #{ Name = 'FileName'; Expression = { $_ } } |
Group Id |
ForEach-Object { $_.Group[0].FileName }
You can group by custom property. So if you know what's your ID then you just have to group by that and then take the first element from the group:
$content = Get-Content "path_to_your_file";
$content = ($content | group { ($_ -split "/")[1] } | % { $_.Group[0] });
$content | Out-File "path_to_your_result_file"

How to remove the entire row when any one field of CVS is null in powershell?

ProcessName UserName PSComputerName
AnyDesk NT-AUTORITÄT\SYSTEM localhost
csrss dc-01
ctfmon SAD\Administrator rdscb-01
SAD\Administrator srv-01
Remove the second and last row here
Based on your comments, if $data is read from a CSV file and contains custom objects, you can do the following:
$data | where { $_.PsObject.Properties.Value -notcontains $null -and $_.PsObject.Properties.Value -notcontains '' }
This will apply to every property and won't require supplying named properties.
There are more elegant ways, but, here is a kind of ugly answer, to illustrate this...
$Data = #"
"ProcessName","UserName","PSComputerName"
"AnyDesk","NT-AUTORITÄT\SYSTEM","localhost"
"csrss","","dc-01"
"ctfmon","SAD\Administrator","rdscb-01"
"","SAD\Administrator","srv-01"
"# | Out-File -FilePath 'D:\Temp\ProcData.csv'
$headers = (
(Get-Content -Path 'D:\Temp\ProcData.csv') -replace '"','' |
select -First 1
) -split ','
$data = Import-Csv -Path 'D:\Temp\ProcData.csv'
$colCnt = $headers.count
$lineNum = 0
:newline
foreach ($line in $data)
{
$lineNum++
for ($i = 0; $i -lt $colCnt; $i++)
{
# test to see if contents of a cell is empty
if (-not $line.$($headers[$i]))
{
Write-Warning -Message "$($lineNum): $($headers[$i]) is blank"
continue newline
}
}
"$($lineNum): OK"
# Perform other actions with good data
}
<#
# Results
1: OK
WARNING: 2: UserName is blank
3: OK
WARNING: 4: ProcessName is blank
#>

Replace first duplicate without regex and increment

I have a text file and I have 3 of the same numbers somewhere in the file. I need to add incrementally to each using PowerShell.
Below is my current code.
$duped = Get-Content $file | sort | Get-Unique
while ($duped -ne $null) {
$duped = Get-Content $file | sort | Get-Unique | Select -Index $dupecount
$dupefix = $duped + $dupecount
echo $duped
echo $dupefix
(Get-Content $file) | ForEach-Object {
$_ -replace "$duped", "$dupefix"
} | Set-Content $file
echo $dupecount
$dupecount = [int]$dupecount + [int]"1"
}
Original:
12345678
12345678
12345678
Intended Result:
123456781
123456782
123456783
$filecontent = (get-content C:\temp\pos\bart.txt )
$output = $null
[int]$increment = 1
foreach($line in $filecontent){
if($line -match '12345679'){
$line = [int]$line + $increment
$line
$output += "$line`n"
$increment++
}else{
$output += "$line`n"
}
}
$output | Set-Content -Path C:\temp\pos\bart.txt -Force
This works in my test of 5 lines being
a word
12345679
a second word
12345679
a third word
the output would be :
a word
12345680
a second word
12345681
a third word
Let's see if i understand the question correctly:
You have a file with X-amount of lines:
a word
12345678
a second word
12345678
a third word
You want to catch each instance of 12345678 and add 1 increment to it so that it would become:
a word
12345679
a second word
12345679
a third word
Is that what you are trying to do?

String trim and split

I have a text file that I read and I need to get the values from.
Example text file:
[Site 01]
DBServer=LocalHost
DBName=Database01
Username=admin
Password=qwerty
[Site 02]
DBServer=192.168.0.10
DBName=Database02
Username=admin
Password=qwerty
Currently my code reads through the file and places each each as an array entry for each line DBServer= that is found and this text file can have many sites:
$NumOfSites = Get-Content $Sites |
Select-String -Pattern "DBServer=" -Context 0,3
$i = 0
$NumOfSites | ForEach-Object {
$svr = $NumOfSites[$i] -isplit "\n" |
% { ($_ -isplit 'DBServer=').Trim()[1] }
$db = $NumOfSites[$i] -isplit "\n" |
% { ($_ -isplit 'DBName='.Trim())[1] }
$uid = $NumOfSites[$i] -isplit "\n" |
% { ($_ -isplit 'Username='.Trim())[1] }
$pswd = $NumOfSites[$i] -isplit "\n" |
% { ($_ -isplit 'Password='.Trim())[1] }
$i = $i+1
}
I can't get each attribute to split out properly without some extra spaces or something nicely as a string variable.
I just need to extract the info to put into an SQL connection line as variables from the format of the file example I have.
Other than the record headers (i.e. [Site 01]) the rest can be handled by ConvertFrom-StringData just fine. We can just convert the records to objects directly splitting on the header row more or less. ConvertFrom-StringData turns a multi-line string into a hashtable, and you can just cast that as a [PSCustomObject] and viola, you have objects that are easy to use.
$NumOfSites = Get-Content $Sites -raw
$SiteObjects = $NumOfSites -split '\[.+?\]'|%{[PSCustomObject](ConvertFrom-StringData -StringData $_)}
Then you can manipulate $SiteObjects however you see fit (output to CSV if you want, or filter on any property using Select-Object). Or, if you're looking to make connections you can loop through it building your connections as needed...
ForEach($Connection in $SiteObjects){
$ConStr = "Server = {0}; Database = {1}; Integrated Security = False; User ID = {2}; Password = {3};" -f $Connection.DBServer.Trim(), $Connection.DBName.Trim(), $Connection.Username.Trim(), $Connection.Password.Trim()
<Do stuff with SQL>
}
Edit: Updating my answer since the sample text was changed to add <pre> and </pre>. We just need to remove those, and since the OP is getting errors about methods on null values we'll filter for null as well.
$NumOfSites = Get-Content $Sites -raw
$SiteObjects = $NumOfSites -replace '<.*?>' -split '\[.+?\]' | ?{$_} |%{[PSCustomObject](ConvertFrom-StringData -StringData $_)}
ForEach($Connection in $SiteObjects){
$svr = $Connection.DBServer.Trim()
$db = $Connection.DBName.Trim()
$uid = $Connection.Username.Trim()
$pwd = $Connection.Password.Trim()
}
Here's a suggestion if you only care about getting the value after the equals:
Get-Content Example.txt |
ForEach-Object {
Switch -Regex ($_) {
'dbs.+=' { $svr = ($_ -replace '.+=').Trim()
.. etc ..
}
}
Get-Content piped to ForEach-Object will interpret each line as its own object.
Edit:
You were most of the way there, but it's unnecessary to -split the lines
$NumOfSites = Get-Content $Sites | Select-String -pattern "DBServer=" -Context 0,3
$NumOfSites | ForEach-Object {
Switch -Wildcard ($_) {
'DBS*=' { $svr = ($_ -replace '.+=').Trim() }
'DBN*=' { $db = ($_ -replace '.+=').Trim() }
'U*=' { $uid = ($_ -replace '.+=').Trim() }
'P*=' { $pw = ($_ -replace '.+=').Trim() }
}
}

Powershell - reading ahead and While

I have a text file in the following format:
.....
ENTRY,PartNumber1,,,
FIELD,IntCode,123456
...
FIELD,MFRPartNumber,ABC123,,,
...
FIELD,XPARTNUMBER,ABC123
...
FIELD,InternalPartNumber,3214567
...
ENTRY,PartNumber2,,,
...
...
the ... indicates there is other data between these fields. The ONLY thing I can be certain of is that the field starting with ENTRY is a new set of records. The rows starting with FIELD can be in any order, and not all of them may be present in each group of data.
I need to read in a chunk of data
Search for any field matching the
string ABC123
If ABC123 found, search for the existence of the
InternalPartNumber field & return that row of data.
I have not seen a way to use Get-Content that can read in a variable number of rows as a set & be able to search it.
Here is the code I currently have, which will read a file, searching for a string & replacing it with another. I hope this can be modified to be used in this case.
$ftype = "*.txt"
$fnames = gci -Path $filefolder1 -Filter $ftype -Recurse|% {$_.FullName}
$mfgPartlist = Import-Csv -Path "C:\test\mfrPartList.csv"
foreach ($file in $fnames) {
$contents = Get-Content -Path $file
foreach ($partnbr in $mfgPartlist) {
$oldString = $mfgPartlist.OldValue
$newString = $mfgPartlist.NewValue
if (Select-String -Path $file -SimpleMatch $oldString -Debug -Quiet) {
$stringData = $contents -imatch $oldString
$stringData = $stringData -replace "[\n\r]","|"
foreach ($dataline in $stringData) {
$file +"|"+$stringData+"|"+$oldString+"|"+$newString|Out-File "C:\test\Datachanges.txt" -Width 2000 -Append
}
$contents = $contents -replace $oldString $newString
Set-Content -Path $file -Value $contents
}
}
}
Is there a way to read & search a text file in "chunks" using Powershell? Or to do a Read-ahead & determine what to search?
Assuming your fine isn't too big to read into memory all at once:
$Text = Get-Content testfile.txt -Raw
($Text -split '(?ms)^(?=ENTRY)') |
foreach {
if ($_ -match '(?ms)^FIELD\S+ABC123')
{$_ -replace '(?ms).+(^Field\S+InternalPartNumber.+?$).+','$1'}
}
FIELD,InternalPartNumber,3214567
That reads the entire file in as a single multiline string, and then splits it at the beginning of any line that starts with 'ENTRY'. Then it tests each segment for a FIELD line that contains 'ABC123', and if it does, removes everything except the FIELD line for the InternalPartNumber.
This is not my best work as I have just got back from vacation. You could use a while loop reading the text and set an entry flag to gobble up the text in chunks. However if your files are not too big then you could just read up the text file at once and use regex to split up the chunks and then process accordingly.
$pattern = "ABC123"
$matchedRowToReturn = "InternalPartNumber"
$fileData = Get-Content "d:\temp\test.txt" | Where-Object{$_ -match '^(entry|field)'} | Out-String
$parts = $fileData | Select-String '(?smi)(^Entry).*?(?=^Entry|\Z)' -AllMatches | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value
$parts | Where-Object{$_ -match $pattern} | Select-String "$matchedRowToReturn.*$" | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value
What this will do is read in the text file, drop any lines that are not entry or field related, as one long string and split it up into chunks that start with lines that begin with the work "Entry".
Then we drop those "parts" that do not contain the $pattern. Of the remaining that match extract the InternalPartNumber line and present.