Use Powershell to compare two text files and remove lines with duplicate

Use Powershell to compare two text files and remove lines with duplicate - powershell

I have two text files that contain many duplicate lines. I would like to run a powershell statement that will output a new file with only the values NOT already in the first file. Below is an example of two files.
File1.txt
-----------
Alpha
Bravo
Charlie
File2.txt
-----------
Alpha
Echo
Foxtrot
In this case, only Echo and Foxtrot are not in the first file. So these would be the desired results.
OutputFile.txt
------------
Echo
Foxtrot
I reviewed the below link which is similar to what I want, but this does not write the results to an output file.
Remove lines from file1 that exist in file2 in Powershell

Here's one way to do it:
# Get unique values from first file
$uniqueFile1 = (Get-Content -Path .\File1.txt) | Sort-Object -Unique
# Get lines in second file that aren't in first and save to a file
Get-Content -Path .\File2.txt | Where-Object { $uniqueFile1 -notcontains $_ } | Out-File .\OutputFile.txt

Using the approach in the referenced link will work however, for every line in the original file, it will trigger the second file to be read from disk. This could be painful depending on the size of your files. I think the following approach would meet your needs.
$file1 = Get-Content .\File1.txt
$file2 = Get-Content .\File2.txt
$compareParams = #{
ReferenceObject = $file1
DifferenceObject = $file2
}
Compare-Object #compareParams |
Where-Object -Property SideIndicator -eq '=>' |
Select-Object -ExpandProperty InputObject |
Out-File -FilePath .\OutputFile.txt
This code does the following:
Reads each file into a separate variable
Creates a hashtable for the parameters of Compare-Object (see about_Splatting for more information)
Compares the two files in memory and passes the results to Out-File
Writes the contents of the pipeline to "OutputFile.txt"
If you are comfortable with the overall flow of this, and are only using this in one-off situations, the whole thing can be compressed into a one-liner.
(Compare-Object (gc .\File1.txt) (gc .\File2.txt) | ? SideIndicator -eq '=>').InputObject | Out-File .\OutputFile.txt

Related

Add eighth and ninth lines to all *.txt files

i have more than 100 txt files in C:\myfolder*.txt
when i run this script from "C:\myfolder" i can add eighth and ninth lines to somename.txt
#echo off
powershell "$f=(Get-Content somename.txt);$f[8]='heretext1';$f | set-content somename.txt"
powershell "$f=(Get-Content somename.txt);$f[9]='heretext2';$f | set-content somename.txt"
but how can i add eighth and ninth lines to all *.txt files located in path C:\myfolder*.txt
Can someone explain me how to do it please...
Sorry for my English and Sorry if i didn't explaned my problem. i will try now:
I uses "*.uci" files, instead of *.txt files. i wrote txt because uci extensions are unknown for most of the people. These *.uci files are settings for chess engines with uci protocol.
So when you use chessbase program you have a lot of chess engines and each engine creates their "enginename.uci" file.
If you want to change the numbers of core used on your PC from 1 to 16 you need to do it manually by adding following information in *.uci file like this:
[OPTIONS]
Threads=1
That's why is better to make small batch or ps1 to change settings to all engines by adding these two lines with one click

Perhaps something like this PowerShell script would suit your task:
Get-ChildItem -Path 'C:\myfolder' -Filter '*.txt' | ForEach-Object {
$LineIndex = 0
$FileContent = Switch -File $_.FullName {Default {
$LineIndex++
If ($LineIndex -Eq 8) {#'
heretext1
heretext2
'#}
$_}}
Set-Content -Path $_.FullName -Value $FileContent}

Note:
Your code isn't adding lines, it is modifying existing lines. The solution below does the same.
Indices [8] and [9] access the 9th and 10th lines, not the 8th and 9th, given that array indexing is 0-based.
You need to call Get-ChildItem with your file-name pattern, C:\myfolder\*.txt, and process each matching file via ForEach-Object:
#echo off
powershell "Get-ChildItem C:\myfolder\*.txt | ForEach-Object { $f=$_ | Get-Content -ReadCount 0; $f[8]='heretext1'; $f[9]='heretext2'; Set-Content $_.FullName $f }"
Due to calling from a batch file (cmd.exe), the PowerShell command is specified on a single line; here's the readable version:
Get-ChildItem C:\myfolder\*.txt | # get all matching files
ForEach-Object { # process each
$f = $_ | Get-Content -ReadCount 0 # read all lines
$f[8] = 'heretext1'; $f[9] = 'heretext2' # update the 9th and 10th line
Set-Content $_.FullName $f # save result back to input file
}
Note:
Consider adding -noprofile after powershell, so as to suppress potentially unnecessary loading of profile files - see the documentation of the Windows PowerShell CLI, powershell.exe.
Using -ReadCount 0 with Get-Content greatly speeds up processing, because all lines are then read into a single array, instead of streaming the lines one by one, which requires collecting them in an array, which is much slower.
Note: If a given file has fewer than 10 lines, the above solution won't work, because you can only assign to existing elements of an array (an array is a fixed-size data structure). If you need to deal wit this case, insert the following after the $f = $_ | Get-Content -ReadCount 0 line, which inserts empty lines as needed to ensure that at least 10 lines are present:
if ($f.Count -lt 10) { $f += #('') * (10 - $f.Count) }

Easiest solution I can think of is using the -Index parameter provided in Select-Object for that.
Get-ChildItem -Path .\Desktop\*.txt | % { Get-Content $_.FullName | Select-Object -Index 7,8 } |
Out-File -FilePath .\Desktop\index.txt
Edit: based on your post.

File Size with Powershell

What I am trying to do is create a PS script to see when a certain folder has a file over 1GB. If it found a file over 1GB, I want it to write a log file with info saying the name of the certain file and its size.
This works but not fully, if the file is less than 1GB I don't want a log file. (right now this will display the file info for over 1GB but if its less that 1GB it still creates a log file with no data). I don't want it to create a log for anything less than 1GB.
Any idea on how to do that?
Thanks!
Ryan
Get-ChildItem -Path C:\Tomcat6.0.20\logs -File -Recurse -ErrorAction SilentlyContinue |
Where-Object {$_.Length -gt 1GB} |
Sort-Object length -Descending |
Select-Object Name,#{n='GB';e={"{0:N2}" -F ($_.length/ 1GB)}} |
Format-List Name,Directory,GB > C:\Users\jensen\Desktop\FolderSize\filesize.log`

First, set a variable with the term/filter you're after and store the results
$items = Get-ChildItem -Path C:\Tomcat6.0.20\logs -File -Recurse -ErrorAction SilentlyContinue |
Where-Object {$_.Length -gt 1GB} |
Sort-Object Length -Descending |
Select-Object Name,#{n='GB';e={"{0:N2}" -F ($_.length/ 1GB)}}
Then pipe that to Out-File to your desired output path. This example will output a file to the Desktop of the user running the script, change as needed:
$items | Out-File -FilePath $ENV:USERPROFILE\Desktop\filesize.log -Force
The -Force parameter will overwrite an existing filesize.log file if one already exists.

To make sure you don't write blank files you should collect the minimal starting results that match your filter, and test them to see if they contain anything at all.
Then if they on;t you can ed the script, but if they do you can go on to do the sort and select the data and output it to a log file.
# Collect Matching Files
$Matched = GCI -Path "C:\Tomcat6.0.20\logs" -File -R -ErrorA "SilentlyContinue" | ? {
$_.Length -gt 1GB
}
# Check is $Matched contains Results before further processing, otherwise, we're done!
IF ([bool]$Matched) {
# If here, we have Data so select what we need and output to the log file:
$Matched | Sort Length -D | FT Name,Directory,#{
n="GB";e={"{0:N2}" -F ($_.Length/ 1GB)}
} -Auto | Out-File "C:\Users\jensen\Desktop\FolderSize\filesize_$(Get-Date -F "yyyy-MM-dd_HH-mm-ss").log"
}
In the above script, I fixed the $. to be $_., and separated Matching the 1GB files from Manipulating them, and Outputting them to a file.
We simply test if any files were matched at 1 GB by checking to see if the Variable has any results or is $NULL/undefined.
If so, there is no need to take any further action.
Only when 1Gb files are matched do we quickly sort them, and select the details you wanted, but instead we'll just Use Format-Table (FT) with -Auto-size to get nice looking output that is much easier to review for this sort of data.
(Note Format-Table selects and formats the info into a table in one step, saving the unnecessary step of using Select to get the data and then piping (|) it to Format-list or Format-Table, as that just adds a bit of a redundant step. Select-Object is best used when you will need to do further steps with that data that require "object-oriented" operations in future steps.)
Then we pipe that output to save it all to a Log file using Out-File, and I also changed the log file name to contain the current date and time in ISO format filesize_$(Get-Date -F "yyyy-MM-dd_HH-mm-ss").log So you can save each run and review them later and won't want to have one gigantic file or have no history of runs.

Combine TXT files in a directory to one file with column added at end with for file name

I have got a set of txt files in a directory that I want to merge together.
The contents of all the txt files are in the same format as follows:
IPAddress Description DNSDomain
--------- ----------- ---------
{192.168.1.2} Microsoft Hyper-V Network Adapter
{192.168.1.30} Microsoft Hyper-V Network Adapter #2
I have the below code that combines all the txt files in to one txt file called all.txt.
copy *.txt all.txt
From this all.txt I can't see what lines came from what txt file. Any ideas on any bits of code that would add an extra column to the end txt file with the file name the rows come from?

As per the comments above, you've put the output of Format-Table into a text file. Note that Format-Table might be visually structured on screen, but is just lines of text. By doing that you have made it harder to work with the data.
If you just want a few properties from the results of the Get-WMIObject cmdlet, use Select-Object which (in the use given here) will effectively filter the data for just the properties you want.
Instead of writing text to a simple file, you can preserve the tabular nature of the data by writing to a structured file (i.e. CSV):
Get-WmiObject -Class Win32_NetworkAdapterConfiguration -Filter IPEnabled=TRUE -ComputerName SERVERNAMEHERE |
Select-Object PSComputerName, IPAddress, Description, DNSDomain |
Export-Csv 'C:\temp\server.csv'
Note that we were able to include the PScomputerName property in each line of data, effectively giving you the extra column of data you wanted.
So much for getting the raw data. One way you could read in all the CSV data and write it out again might look like this:
Get-ChildItem *.csv -Exclude all.csv |
Foreach-Object {Import-Csv $_} |
Export-Csv all.csv
Note that we exclude the output file in the initial cmdlet to avoid reading and writing form/to the same file endlessly.
If you don't have the luxury to collect the data again you'll need to spool the files together. Spooling files together is done with Get-Content, something like this:
Get-ChildItem *.txt -Exclude all.txt |
Foreach-Object {Get-Content $_ -Raw} |
Out-File all.txt
In your case, you wanted to suffix each line, which tricker as you need to process the files line-by-line:
$files = Get-ChildItem *.txt
foreach($file in $files) {
$lines = Get-Content $file
foreach($line in $lines) {
"$line $($file.Name)" | Out-File all.txt -Append
}
}

compare two files and update the differences to 2nd file

I am trying to use PowerShell using Get-Content to read 2 files and update the changes in file 1 to file 2, here is my code:
Compare-Object (Get-Content c:\file1) (Get-Content c:file2) | diff > (Get-Content c:file2)
and its not working, I need to append the file so it appends any changes to the 2nd file.

Couple of issues here
You are calling diff but in PowerShell that is an alias for Compare-Object which you see from get-alias diff. I am guessing that was not expected.
If you want to append the differences that occur in the first file you need to filter the output from compare-object accordingly.
So with that in mind I present...
$file1 = "c:\file1"
$file2 = "c:\file2"
Compare-Object (Get-Content $file1) (Get-Content $file2) | Where-Object{$_.SideIndicator -eq "<="} | Add-Content $file2
$_.SideIndicator -eq "<=" Will only allow the entries that are unique to $file1 to continue thru the pipe to Add-Content. If you just look at the output of compare-object before the Where-Object you can get a good idea of whats going on.

Using PowerShell, read multiple known file names, append text of all files, create and write to one output file

I have five .sql files and know the name of each file. For this example, call them one.sql, two.sql, three.sql, four.sql and five.sql. I want to append the text of all files and create one file called master.sql. How do I do this in PowerShell? Feel free to post multiple answers to this problem because I am sure there are several ways to do this.
My attempt does not work and creates a file with several hundred thousand lines.
PS C:\sql> get-content '.\one.sql' | get-content '.\two.sql' | get-content '.\three.sql' | get-content '.\four.sql' | get-content '.\five.sql' | out-file -encoding UNICODE master.sql

Get-Content one.sql,two.sql,three.sql,four.sql,five.sql > master.sql
Note that > is equivalent to Out-File -Encoding Unicode. I only tend to use Out-File when I need to specify a different encoding.

There are some good answers here but if you have a whole lot of files and maybe you don't know all of the names this is what I came up with:
$vara = get-childitem -name "path"
$varb = foreach ($a in $vara) {gc "path\$a"}
example
$vara = get-childitem -name "c:\users\test"
$varb = foreach ($a in $vara) {gc "c:\users\test\$a"}
You can obviously pipe this directly into | add-content or whatever but I like to capture in variables so I can manipulate later on.

See if this works better
get-childitem "one.sql","two.sql","three.sql","four.sql","five.sql" | get-content | out-file -encoding UNICODE master.sql

I needed something similar, Chris Berry's post helped, but I think this is more efficient:
gci -name "*PathToFiles*" | gc > master.sql
The first part gci -name "*PathToFiles*" gets you your file list. This can be done with wildcards to just get your .sql files i.e. gci -name "\\share\folder\*.sql"
Then pipes to Get-Content and redirects the output to your master.sql file. As noted by Kieth Hill, you can use Out-File in place of > to better control your output if needed.

I think logical way of solving this is to use Add-Content
$files = Get-ChildItem '.\one.sql', '.\two.sql', '.\three.sql', '.\four.sql', '.\five.sql'
$files | foreach { Get-Content $_ | Add-Content '.\master.sql' -encoding UNICODE }
hovewer Get-Content is usually very slow when reading multiple very large files. If its your case this article could help: http://keithhill.spaces.live.com/blog/cns!5A8D2641E0963A97!756.entry

What about:
Get-Content .\one.sql,.\two.sql,.\three.sql,.\four.sql,.\five.sql | Set-Content .\master.sql

Here is how I do concatenate sql files from the Sql folder:
# Set the current location of the script to use relative path
Set-Location $PSScriptRoot
# Concatenate all the sql files
$concatSql = Get-Content -Path .\Sql\*.sql
# Write/overwrite sql to single file
Add-Content -Path concatFile.sql -Value $concatSql

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Use Powershell to compare two text files and remove lines with duplicate - powershell

Related

Add eighth and ninth lines to all *.txt files

File Size with Powershell

Combine TXT files in a directory to one file with column added at end with for file name

compare two files and update the differences to 2nd file

Using PowerShell, read multiple known file names, append text of all files, create and write to one output file

Categories

Resources