Merge Text Files and Prepend Filename and Line Number - Powershell - powershell

I've been trying to adapt the answer to this question: powershell - concatenate N text files and prepend filename to each line
My desired output based on an example of 2 .txt files:
First.txt
lines of
data
Second.txt
more lines
of
data
Output.txt
First1 lines of
First2 data
Second1 more lines
Second2 of
Second3 data
Based on the most similar question I could find the following answer:
Select-String '^' *.txt >output.txt
Would give:
C:\A\Filepath\First.txt:1:lines of
C:\A\Filepath\First.txt:2:data
C:\A\Filepath\Second.txt:1:more lines
C:\A\Filepath\Second.txt:2:of
C:\A\Filepath\Second.txt:3:data
So I was hoping to use -replace to remove the filepath, keep the file name (but remove .txt:), keep the line number (but replace the final : with a space) and keep the text from the line.
Any help would be appreciated reaching the desired output.txt. Thanks

Not beautiful but this is one approach.
Get-ChildItem *.txt |
%{$FILENAME=$_.BaseName;$COUNT=1;get-content $_ |
%{"$FILENAME"+"$COUNT"+" " + "$_";$COUNT++}}|
Out-File Output.txt

The select-string approach is very interesting. The way I would go about it is to use Get-Content. The advantage there is that each line has a readcount property that represents the line number.
Get-ChildItem "C:\temp\*.file" | ForEach-Object{
$fileName = $_.BaseName
Get-content $_ | ForEach-Object{
"{0}{1} {2}" -f $fileName,$_.ReadCount,$_
}
} | Add-Content "C:\temp\output.txt"
Take each file and use Get-Content. With each line we process we send to the output stream a formatted line matching your desired output. No need to count the lines as $_.ReadCount already knows.
Select-String still works
You just need to manipulate the output to match what you want. Using Get-Member we can check the properties of what select-string returns to get our desired output.
Select-String '^' "c:\temp\*.txt" | ForEach-Object{
"{0}{1} {2}" -f $_.Filename,$_.LineNumber,$_.Line
} | Add-Content "C:\temp\output.txt"

Related

Powershell Select-String

I need your help with PowerShell.
I need Select-String with fixed Date (in variable). & Set-Content to result.txt
Example: $Date = "01.07.2020"
But also i need select string with date which lower than i written in variable.
My code: Get-Content -Path log.txt | Select-String "?????" | Set-Content $result.txt
In log.txt i have many strings like " Creation date 01.07.2020 " ; " Creation date 01.06.2020 "
123.txt
Creation date 01.07.2020
Creation date 02.05.2020
Creation date 01.06.2020
Creation date 28.08.2020
Example script
$file = "C:\Users\userprofile\Desktop\test\123.txt"
$regexpattern = "\d{2}\.\d{2}\.\d{4}"
$content = Get-Content $file | Where-object { $_ -match $regexpattern}
foreach($line in $content){
$line.Substring(13,11)
}
I used regex to find the lines you are wanting to output. We get the content only if it matches our regex, then for each line we found, I'm using substring to pull the date out. You could also put together a regex for this if you wanted to. Since we know the lines have the same number of characters it's safe to use the substring function.
If you want that output to a file, simply find $line.Substring(13,11) and then add this after it | Out-file "C:\Users\userprofile\desktop\test\output.txt" -append.

Is there a way to merge similar lines using Powershell?

Suppose I have two csv files. One is
id_number,location_code,category,animal,quantity
12212,3,4,cat,2
29889,7,6,dog,2
98900,
33221,1,8,squirrel,1
the second one is:
98900,2,1,gerbil,1
The second file may have a newline or something at the end (maybe or maybe not, I haven't checked), but only the one line of content. There may be three or four or more different varieties of the "second" file, but each one will have a first element (98900 in this example) that corresponds to an incomplete line in the first file similar to what is in this example.
Is there a way using powershell to automatically merge the line in the second (plus any additional similar) csv file into the matching line(s) of the first file, so that the resulting file is:
12212,3,4,cat,2
29889,7,6,dog,2
98900,2,1,gerbil,1
33221,1,8,squirrel,1
main.csv
id_number,location_code,category,animal,quantity
12212,3,4,cat,2
29889,7,6,dog,2
98900,
33221,1,8,squirrel,1
correction_001.csv
98900,2,1,gerbil,1
merge code used at the commandline, or in the .ps1 file of your choice
$myHeader = #('id_number','location_code','category','animal','quantity')
#Stage all the correction files: last correction in the most recent file wins
$ToFix = #{}
filter Plumbing_Import-Csv($Header){import-csv -LiteralPath $_ -Header $Header}
ls correction*.csv | sort -Property LastWriteTime | Plumbing_Import-Csv $myHeader | %{$ToFix[$_.id_number]=$_}
function myObjPipe($Header){
begin{
function TextTo-CsvField([String]$text){
#text fields which contain comma, double quotes, or new-line are a special case for CSV fields and need to be accounted for
if($text -match '"|,|\n'){return '"'+($text -replace '"','""')+'"'}
return $text
}
function myObjTo-CsvRecord($obj){
return ''+
$obj.id_number +','+
$obj.location_code +','+
$obj.category +','+
(TextTo-CsvField $obj.animal)+','+
$obj.quantity
}
$Header -join ','
}
process{
if($ToFix.Contains($_.id_number)){
$out = $ToFix[$_.id_number]
$ToFix.Remove($_.id_number)
}else{$out = $_}
myObjTo-CsvRecord $out
}
end{
#I assume you'd append any leftover fixes that weren't used
foreach($out in $ToFix.Values){
myObjTo-CsvRecord $out
}
}
}
import-csv main.csv | myObjPipe $myHeader | sc combined.csv -encoding ascii
You could also use ConvertTo-Csv, but my preference is to not have all the extra " cruft.
Edit 1: reduced code redundancy, accounted for \n, fixed appends, and used #OwlsSleeping suggestion about the -Header commandlet parameter
also works with these files:
correction_002.csv
98900,2,1,I Win,1
correction_new.csv
98901,2,1,godzilla,1
correction_too.csv
98902,2,1,gamera,1
98903,2,1,mothra,1
Edit 2: convert gc | ConvertTo-Csv over to Import-Csv to fix the front-end \n issues. Now also works with:
correction_003.csv
29889,7,6,"""bad""
monkey",2
This is a simple solution assuming there's always exactly one match, and you don't care about output order. Change the output path to csv1 to overwrite.
I added headers manually in both input files, but you can specify them in Import-Csv instead if you'd rather avoid changing your files.
[array]$MissingLine = Import-Csv -Path "C:\Users\me\Documents\csv2.csv"
[string]$MissingId = $MissingLine[0].id_number
[array]$BigCsv = Import-Csv -Path "C:\Users\me\Documents\csv1.csv" |
Where-Object {$_.id_number -ne $MissingId}
($BigCsv + $MissingLine) |
Export-Csv -Path "C:\Users\me\Documents\Combined.csv"

Remove empty rows from csv in powershell [duplicate]

I know that I can use:
gc c:\FileWithEmptyLines.txt | where {$_ -ne ""} > c:\FileWithNoEmptyLines.txt
to remove empty lines. But How I can remove them with '-replace' ?
I found a nice one liner here >> http://www.pixelchef.net/remove-empty-lines-file-powershell. Just tested it out with several blanks lines including newlines only as well as lines with just spaces, just tabs, and combinations.
(gc file.txt) | ? {$_.trim() -ne "" } | set-content file.txt
See the original for some notes about the code. Nice :)
This piece of code from Randy Skretka is working fine for me, but I had the problem, that I still had a newline at the end of the file.
(gc file.txt) | ? {$_.trim() -ne "" } | set-content file.txt
So I added finally this:
$content = [System.IO.File]::ReadAllText("file.txt")
$content = $content.Trim()
[System.IO.File]::WriteAllText("file.txt", $content)
You can use -match instead -eq if you also want to exclude files that only contain whitespace characters:
#(gc c:\FileWithEmptyLines.txt) -match '\S' | out-file c:\FileWithNoEmptyLines
Not specifically using -replace, but you get the same effect parsing the content using -notmatch and regex.
(get-content 'c:\FileWithEmptyLines.txt') -notmatch '^\s*$' > c:\FileWithNoEmptyLines.txt
To resolve this with RegEx, you need to use the multiline flag (?m):
((Get-Content file.txt -Raw) -replace "(?m)^\s*`r`n",'').trim() | Set-Content file.txt
If you actually want to filter blank lines from a file then you may try this:
(gc $source_file).Trim() | ? {$_.Length -gt 0}
You can't do replacing, you have to replace SOMETHING with SOMETHING, and you neither have both.
This will remove empty lines or lines with only whitespace characters (tabs/spaces).
[IO.File]::ReadAllText("FileWithEmptyLines.txt") -replace '\s+\r\n+', "`r`n" | Out-File "c:\FileWithNoEmptyLines.txt"
(Get-Content c:\FileWithEmptyLines.txt) |
Foreach { $_ -Replace "Old content", " New content" } |
Set-Content c:\FileWithEmptyLines.txt;
file
PS /home/edward/Desktop> Get-Content ./copy.txt
[Desktop Entry]
Name=calibre
Exec=~/Apps/calibre/calibre
Icon=~/Apps/calibre/resources/content-server/calibre.png
Type=Application*
Start by get the content from file and trim the white spaces if any found in each line of the text document. That becomes the object passed to the where-object to go through the array looking at each member of the array with string length greater then 0. That object is passed to replace the content of the file you started with. It would probably be better to make a new file...
Last thing to do is reads back the newly made file's content and see your awesomeness.
(Get-Content ./copy.txt).Trim() | Where-Object{$_.length -gt 0} | Set-Content ./copy.txt
Get-Content ./copy.txt
This removes trailing whitespace and blank lines from file.txt
PS C:\Users\> (gc file.txt) | Foreach {$_.TrimEnd()} | where {$_ -ne ""} | Set-Content file.txt
Get-Content returns immutable array of rows. You can covert this to mutable array and delete neccessary lines by index.Particular indexex you can get with match. After that you can write result to new file with Set-Content. With this approach you can avoid empty lines that powershell replace tool leaves when you try to replace smthing with "". Note that I dont guarantee perfect perfomance. Im not a professional powershell developer))
$fileLines = Get-Content $filePath
$neccessaryLine = Select-String -Path $filePath -Pattern 'something'
if (-Not $neccessaryLine) { exit }
$neccessaryLineIndex = $neccessaryLine.LineNumber - 1
$updatedFileContent = [System.Collections.ArrayList]::new($fileLines)
$updatedFileContent.RemoveAt($neccessaryLineIndex)
$updatedHostsFileContent.RemoveAt($domainInfoLineIndex - 1)
$updatedHostsFileContent | Set-Content $hostsFilePath
Set-Content -Path "File.txt" -Value (get-content -Path "File.txt" | Select-String -Pattern '^\s*$' -NotMatch)
This works for me, originally got the line from here and added Joel's suggested '^\s*$': Using PowerShell to remove lines from a text file if it contains a string

Find group of words in a text file and extract the line to new text file

txt contains word "hi" "hello" "aloha" as per below
hi
hello
aloha
And I have one more file abc.txt which contains many words including the above 3 words. Now I developed a PowerShell script to search the words in abc.txt and extract the line containing the words to a new file done.txt. I use
-match command to find the word.
How to use the file ref.txt which contains the words for the finding, instead of declare the words in coding?
I would like to develop it in cmd.exe instead of PowerShell.
$source = "C:\temp\abc.txt"
$destination = "C:\temp\done.txt"
$hits = select-string -Path $source -SimpleMatch "hi","hello","aloha"
$filecontents = get-content $source
foreach($hit in $hits)
{
$filecontents[$hit.linenumber-1]| out-file -append $destination
"" |out-file -append $destination
}
This should do the batch trick:
findstr /G:ref.txt abc.txt >> done.txt
This would print all lines containing the stings in ref.txt in abc.txt to done.txt
Have I understood you question correctly?
To cover the PowerShell aspect of this question...
To get the patterns you want from file is rather easy since Select-String supports strings arrays for the -Pattern parameter. In its simplest form you could just do something like this
$patterns = Get-Content c:\temp\ref.txt | Where-Object{$_}
$hits = Select-String c:\temp\test.txt -Pattern $patterns -SimpleMatch
Your file contained a blank which I was not sure was on purpose or not. I used Where-Object{$_} to filter that out just in case. Then just pass that string array $patterns to the parameter -Pattern.
The rest of your code after that could use a little tune up. There is no need to read the source file in a second time just to output the matches again. Your output is just the matching line with a newline following.
$patterns = Get-Content c:\temp\ref.txt | Where-Object{$_}
$results = Select-String c:\temp\test.txt -Pattern $patterns -SimpleMatch
$results.Line | ForEach-Object{"$_`r`n"} | Set-Content C:\temp\out.txt
Probably not the best way to get the desired output but it should work regardless.

find and delete lines without string pattern in text files

I'm trying to find out how to use powershell to find and delete lines without certain string pattern in a set of files. For example, I have the following text file:
111111
22x222
333333
44x444
This needs to be turned into:
22x222
44x444
given that the string pattern of 'x' is not in any of the other lines.
How can I issue such a command in powershell to process a bunch of text files?
thanks.
dir | foreach { $out = cat $_ | select-string x; $out | set-content $_ }
The dir command lists the files in the current directory; the foreach goes through each file; cat reads the file and pipes into select-string; select-string finds the lines that contains the specific pattern, which in this case is "x"; the result of select-string is stored in $out; and finally, $out is written to the same file with set-content.
We need the temporary variable $out because you cannot read and write the same file at the same time.
This will process all txt files from the working directory. Each file content is checked and only lines that have 'x' in them are allowed to pass on. The result is written back to the file.
Get-ChildItem *.txt | ForEach-Object{
$content = Get-Content $_.FullName | Where-Object {$_ -match 'x'}
$content | Out-File $_.FullName
}