Get a specific content from xml and compare - powershell

I have many compressed files that has inside an xml file with Package ID = Guid.
In any file the Package ID should show more then one. some of them will be 000000000 so I should ignore them. The goal is to collect all the package IDs in the file from all the compressed files and compare between them. If 2 files has the same package ID I want to know who has them.
So far I have a script that open all the compressed files and read the text file inside and replace a string (Not as xml)
$fileToEdit = "vip.manifest"
$toReplace = '<Prop Name="WarningDuringUpgrade" Value="True"'
$replaceWith = '<Prop Name="WarningDuringUpgrade" Value="False"'
function ModifyFiles($Manifests)
{
foreach ($file in $Manifests) {
try {
$zip = [System.IO.Compression.ZipFile]::Open($file, "Update")
}
catch {
Write-Warning $_.Exception.Message
continue
}
$entries = $zip.Entries.Where({ $_.Name -like $fileToEdit })
foreach($entry in $entries) {
$reader = [System.IO.StreamReader]::new($entry.Open())
$content = $reader.ReadToEnd().Replace($toReplace, $replaceWith)
$reader.Close()
$reader.Dispose()
$writer = [System.IO.StreamWriter]::new($entry.Open())
$writer.BaseStream.SetLength(0)
$writer.Write($content)
$writer.Flush()
$writer.Close()
$writer.Dispose()
}
$zip.Dispose()
Write-Host "$entry for $file was updated"
}
}
The tag called Package + ID=Guid

Related

Find string in file and replace with value from another file for multilines - powershell

I would like to replace each string from one file A with value from another file B if string exist if no then add at end of string information "Link not exist". There is a small diffrent between files in. File A doesn't have number at end of each row which should be added from file B.
String from file A replace with string from file B*
L5020|http://linktosite.de|URL to L5020|http://linktosite.de|URL|P555
I tried to do so with csv map however without success as file A is changing every day, and if string change position or new string is added into file A then map is not working.
$file = "C:\Users\XXX\Desktop\URL\MAP.csv"
$mapping = Import-CSV $file -Encoding UTF8 -Delimiter ";"
$original_file = "C:\Users\XXX\Desktop\URL\fileA.txt"
$destination_file = "C:\Users\XXX\Desktop\URL\Output.txt"
$content = Get-Content $original_file
for($i=0; $i -lt $content.length; $i++) {
foreach($map in $mapping) {
#If([string]::IsNullOrEmpty($content[$i])) {
If ($InputString -like ($content[$i])) {
$content[$i] = "$($map.HEADER1)|NOTEXIST"
}
ElseIf ($content[$i] -eq "$($map.HEADER1)") {
$content[$i] = $map.HEADER2
}
}
}
There is a hundred lines in each file but files are not same
There is a few scenario:
File A have more rows then File B
File B have more rows then File A
FileA
L5020|http://linktosite.de|URL
L100|http://sitelink.de|URL
L50|http://abcde.de|URL
L511|http://bbcccddeee.de|URL
L300|http://link123456.de|URL
L5450|http://randomlink.de|URL_DE
L5460|http://randomwebsitelink.de|URL_DE
FileB
L5020|http://linktosite.de|URL|P555
L511|http://bbcccddeee.de|URL|P540
L100|http://sitelink.de|URL|P523
L50|http://abcde.de|URL|P53
Rsults for scenario 1:
L5020|http://linktosite.de|URL|P555
L100|http://sitelink.de|URL|P523
L50|http://abcde.de|URL|P53
L511|http://bbcccddeee.de|URL|P540
L300|http://link123456.de|URL|LINK NOT EXIST
L5450|http://randomlink.de|URL_DE|LINK NOT EXIST
L5460|http://randomwebsitelink.de|URL_DE|LINK NOT EXIST
$fileA = Get-Content -Path fileA.txt
$fileB = Get-Content -Path fileB.txt
$result = #()
foreach ($lineA in $fileA) {
$linkExistsInFileB = $false
foreach ($lineB in $fileB) {
if ($lineB -ilike "$lineA*") {
$linkExistsInFileB = $true
$result += $lineB
break
}
}
if (-not $linkExistsInFileB) {
$result += "$lineA|LINK NOT EXIST"
}
}
$result
Output:
L5020|http://linktosite.de|URL|P555
L100|http://sitelink.de|URL|P523
L50|http://abcde.de|URL|P53
L511|http://bbcccddeee.de|URL|P540
L300|http://link123456.de|URL|LINK NOT EXIST
L5450|http://randomlink.de|URL_DE|LINK NOT EXIST
L5460|http://randomwebsitelink.de|URL_DE|LINK NOT EXIST

how can i convert a RTF document to docx

I have found something similar on here but when I try running this I get errors.
I was therfore wondering if it would be possible to make a Powershell script that can take .RTF documents and convert them all to .docx documents?
Use this to convert rtf to docx:
Function Convert-Dir($path){
$Files=Get-ChildItem "$($path)\*.docx" -Recurse
$Word=New-Object –ComObject WORD.APPLICATION
foreach ($File in $Files) {
# open a Word document, filename from the directory
$Doc=$Word.Documents.Open($File.fullname)
# Swap out DOCX with PDF in the Filename
$Name=($Doc.Fullname).replace("docx","doc")
if (Test-Path $Name){
} else {
# Save this File as a PDF in Word 2010/2013
Write-Host $Name
$Doc.saveas([ref] $Name, [ref] 0)
$Doc.close()
}
}
$Files=Get-ChildItem "$($path)\*.rtf" -Recurse
$Word=New-Object –ComObject WORD.APPLICATION
foreach ($File in $Files) {
# open a Word document, filename from the directory
$Doc=$Word.Documents.Open($File.fullname)
# Swap out DOCX with PDF in the Filename
$Name=($Doc.Fullname).replace("rtf","doc")
if (Test-Path $Name){
} else {
# Save this File as a PDF in Word 2010/2013
Write-Host $Name
$Doc.saveas([ref] $Name, [ref] 0)
$Doc.close()
}
}
}
Convert-Dir "RtfFilePath";
Code from and attribution: https://gist.github.com/rensatsu/0a66a65c3a508ecfd491#file-rtfdocxtodoc-ps1

Search large .log for specific string quickly without streamreader

Problem: I need to search a large log file that is currently being used by another process. I cannot stop this other process or put a lock on the .log file. I need to quickly search this file, and I can't read it all into memory. I get that StreamReader() is the fastest, but I can't figure out how to avoid it attempting to grab a lock on the file.
$p = "Seachterm:Search"
$files = "\\remoteserver\c\temp\tryingtofigurethisout.log"
$SearchResult= Get-Content -Path $files | Where-Object { $_ -eq $p }
The below doesn't work because I can't get a lock of the file.
$reader = New-Object System.IO.StreamReader($files)
$lines = #()
if ($reader -ne $null) {
while (!$reader.EndOfStream) {
$line = $reader.ReadLine()
if ($line.Contains($p)) {
$lines += $line
}
}
}
$lines | Select-Object -Last 1
This takes too long:
get-content $files -ReadCount 500 |
foreach { $_ -match $p }
I would greatly appreciate any pointers in how I can go about quickly and efficiently (memory wise) searching a large log file.
Perhaps this will work for you. It tries to read the lines of the file as fast as possible, but with a difference to your second approach (which is approx. the same as what [System.IO.File]::ReadAllLines() would do).
To collect the lines, I use a List object which will perform faster than appending to an array using +=
$p = "Seachterm:Search"
$path = "\\remoteserver\c$\temp\tryingtofigurethisout.log"
if (!(Test-Path -Path $path -PathType Leaf)) {
Write-Warning "File '$path' does not exist"
}
else {
try {
$fileStream = [System.IO.FileStream]::new($path, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::ReadWrite)
$streamReader = [System.IO.StreamReader]::new($fileStream)
# or use this syntax:
# $fileMode = [System.IO.FileMode]::Open
# $fileAccess = [System.IO.FileAccess]::Read
# $fileShare = [System.IO.FileShare]::ReadWrite
# $fileStream = New-Object -TypeName System.IO.FileStream $path, $fileMode, $fileAccess, $fileShare
# $streamReader = New-Object -TypeName System.IO.StreamReader -ArgumentList $fileStream
# use a List object of type String or an ArrayList to collect the strings quickly
$lines = New-Object System.Collections.Generic.List[string]
# read the lines as fast as you can and add them to the list
while (!$streamReader.EndOfStream) {
$lines.Add($streamReader.ReadLine())
}
# close and dispose the obects used
$streamReader.Close()
$fileStream.Dispose()
# do the 'Contains($p)' after reading the file to not slow that part down
$lines.ToArray() | Where-Object { $_.Contains($p) } | Select-Object -Last 1
}
catch [System.IO.IOException] {}
}
Basically, it does what your second code does, but with the difference that using just the StreamReader, the file is opened with [System.IO.FileShare]::Read, whereas this code opens the file with [System.IO.FileShare]::ReadWrite
Note that there may be exceptions thrown using this because another application has write permissions to the file, hence the try{...} catch{...}
Hope that helps

Too many if else statements

I am trying to check if a string "03-22-2019" exists (or not and show the result in output) the file is of ".sql" or ".txt" .
execution\development\process.sql
insert.sql
production\logs.txt
restore.sql
rebuild.txt
I am trying below code but I did with too many if else. The above file path stored in the $paths variable. I need to split the path with "\" and get the last part of the path to do something else.
if ($paths -like "*.sql") {
if ($paths.Contains("\")) {
$last = $paths.Split("\")[-1] #get the last part of the path
$result = "03-22-2019" #get this date from some where else
if ($result) {
#check if pattern string exists in that file.
$SEL = Select-String -Path <path location> -Pattern $result
if ($SEL -ne $null) {
Write-Host "`n $last Contains Matching date "
} else {
Write-Host "`n $last Does not Contains date"
}
} else {
Write-Host "`ndate field is blank"
}
} else {
$result = "03-22-2019" #get this date from some where else
if ($result) {
#check if pattern string exists in that file.
$SEL = Select-String -Path <path location> -Pattern $result
if ($SEL -ne $null) {
Write-Host "`n $last Contains Matching date "
} else {
Write-Host "`n $last Does not Contains date"
}
} else {
Write-Host "`ndate field is blank"
}
}
} elseIf ($paths -like "*.txt") {
if ($paths.Contains("\")) {
$last = $paths.Split("\")[-1] #get the last part of the path
$result = "03-22-2019" #get this date from some where else
if ($result) {
#check if pattern string exists in that file.
$SEL = Select-String -Path <path location> -Pattern $result
if ($SEL -ne $null) {
Write-Host "`n $last Contains Matching date "
} else {
Write-Host "`n $last Does not Contains date"
}
} else {
Write-Host "`ndate field is blank"
}
} else {
$result = "03-22-2019" #get this date from some where else
if ($result) {
#check if pattern string exists in that file.
$SEL = Select-String -Path <path location> -Pattern $result
if ($SEL -ne $null) {
Write-Host "`n $last Contains Matching date "
} else {
Write-Host "`n $last Does not Contains date"
}
} else {
Write-Host "`ndate field is blank"
}
}
} else {
Write-Host "other file types"
}
I would make this a little simpler, below is an example to determine if a file contains the date:
$paths = #("c:\path1","c:\path2\subdir")
ForEach ($path in $paths) {
$files = Get-ChildItem -LiteralPath $path -file -include "*.sql","*.txt"
$last = ($path -split "\\")[-1] # contains the last part of the path
$output = ForEach ($file in $files) {
If (Select-String -path $file -pattern "03-22-2019") {
"$($file.fullname) contains the date."
}
else {
"$($file.fullname) does not contain the date."
}
}
}
$output # outputs whether or not a file has the date string
The outer ForEach loop loops through the paths in $paths. Inside of that loop, you can do what you need to each path $path. I used $last to store the last part of the path in the current iteration. You have not said what to do with that.
The inner ForEach checks each .txt and .sql file for the date text 03-22-2019. $output stores a string indicating whether each .txt and .sql file contains the date string.
If your paths contain the file names, then you can use the following alternatives to grab the file name (last part of the path):
$path | split-path -leaf # inside of the outer ForEach loop
# Or
$file.name # inside of the inner ForEach loop
Looks like you should start with a foreach after your $paths variable like this:
foreach ($path in $paths) {
if ($path -like "*.sql") { #Note the use of one single item in the $paths array you have
$last = $path.split("\")[-1]
$result = #Whatever method you want to use to return a DateTime object
if ($result) { #### Now, this line doesn't make any sense. Do you want to compare a date to an older date or something? Maybe something like "if ($result -ge (Get-Date).addDays(-1) )
{ # Do your stuff }
Doing something like:
if ($paths -like "*.sql")
Doesn't work because $paths is an array and you are making a string comparison and never the two shall meet. Now, if you are trying to find if a string is inside a file, you should use something like "Get-Content" or "Import-Csv"
You can use the "Get-Date" cmdlet to get many different formats for the date. Read about that here. If you are trying to compare multiple dates against multiple files, I would start with a for loop on the files like I did up there, and then a for loop on each file for an array of dates. Maybe something like this:
foreach ($path in $paths) {
foreach ($date in $dates) {
# Get the contents of a file and store it in a variable
# Search for the string in that variable and store the results in a variable
# Write to the console
} # End foreach ($date in $dates)
} # End foreach ($path in $paths)
Post some more updated code and let's see what you come up with.

Optimize Word document keyword search

I'm trying to search for keywords across a large number of MS Word documents, and return the results to a file. I've got a working script, but I wasn't aware of the scale, and what I've got isn't nearly efficient enough, it would take days to plod through everything.
The script as it stands now takes keywords from CompareData.txt and runs it through all the files in a specific folder, then appends it to a file.
So when I'm done I will know how many files have each specific keyword.
[cmdletBinding()]
Param(
$Path = "C:\willscratch\"
) #end param
$findTexts = (Get-Content c:\scratch\CompareData.txt)
Foreach ($Findtext in $FindTexts)
{
$matchCase = $false
$matchWholeWord = $true
$matchWildCards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = 1
$application = New-Object -comobject word.application
$application.visible = $False
$docs = Get-childitem -path $Path -Recurse -Include *.docx
$i = 1
$totaldocs = 0
Foreach ($doc in $docs)
{
Write-Progress -Activity "Processing files" -status "Processing $($doc.FullName)" -PercentComplete ($i /$docs.Count * 100)
$document = $application.documents.open($doc.FullName)
$range = $document.content
$null = $range.movestart()
$wordFound = $range.find.execute($findText,$matchCase,
$matchWholeWord,$matchWildCards,$matchSoundsLike,
$matchAllWordForms,$forward,$wrap)
if($wordFound)
{
$doc.fullname
$document.Words.count
$totaldocs ++
} #end if $wordFound
$document.close()
$i++
} #end foreach $doc
$application.quit()
"There are $totaldocs total files with $findText" | Out-File -Append C:\scratch\output.txt
#clean up stuff
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($range) | Out-Null
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($document) | Out-Null
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($application) | Out-Null
Remove-Variable -Name application
[gc]::collect()
[gc]::WaitForPendingFinalizers()
}
What I'd like to do is figure out a way to search each file for everything in CompareData.txt once, rather than iterate through it a bunch of times. If I was dealing with a small set of data, the approach I've got would get the job done - but I've come to find out that both the data in CompareData.txt and the source Word file directory will be very large.
Any ideas on how to optimize this?
Right now you're doing this (pseudocode):
foreach $Keyword {
create Word Application
foreach $File {
load Word Document from $File
find $Keyword
}
}
That means that if you have a 100 keywords and 10 documents, you're opening and closing a 100 instances of Word and loading in a thousand word documents before you're done.
Do this instead:
create Word Application
foreach $File {
load Word Document from $File
foreach $Keyword {
find $Keyword
}
}
So you only launch one instance of Word and only load each document once.
As noted in the comments, you may optimize the whole process by using the OpenXML SDK, rather than launching Word:
(assuming you've installed OpenXML SDK in its default location)
# Import the OpenXML library
Add-Type -Path 'C:\Program Files (x86)\Open XML SDK\V2.5\lib\DocumentFormat.OpenXml.dll'
# Grab the keywords and file names
$Keywords = Get-Content C:\scratch\CompareData.txt
$Documents = Get-childitem -path $Path -Recurse -Include *.docx
# hashtable to store results per document
$KeywordMatches = #{}
# store OpenXML word document type in variable as a shorthand
$WordDoc = [DocumentFormat.OpenXml.Packaging.WordprocessingDocument] -as [type]
foreach($Docx in $Docs)
{
# create array to hold matched keywords
$KeywordMatches[$Docx.FullName] = #()
# open document, wrap content stream in streamreader
$Document = $WordDoc::Open($Docx.FullName, $false)
$DocumentStream = $Document.MainDocumentPart.GetStream()
$DocumentReader = New-Object System.IO.StreamReader $DocumentStream
# read entire document
$DocumentContent = $DocumentReader.ReadToEnd()
# test for each keyword
foreach($Keyword in $Keywords)
{
$Pattern = [regex]::Escape($KeyWord)
$WordFound = $DocumentContent -match $Pattern
if($WordFound)
{
$KeywordMatches[$Docx.FullName] += $Keyword
}
}
$DocumentReader.Dispose()
$Document.Dispose()
}
Now, you can show the word count for each document:
$KeywordMatches.GetEnumerator() |Select File,#{n="Count";E={$_.Value.Count}}