Flatten LaTeX file with PowerShell - powershell

I would like to make a simple PowerShell script that:
Takes an input .tex file
replaces occurrences of \input{my_folder/my_file} with the file content itself
outputs a new file
My first step is to match the different file names so as to import them, although the following code outputs not only the file names but also \include{file1}, \include{file2}, etc.
$ms = Get-Content ms.tex -Raw
$environment = "input"
$inputs = $ms | Select-String "\\(?:input|include)\{([^}]+)\}" -AllMatches | Foreach {$_.matches}
Write-Host $inputs
I thought using the parenthesis would create a matched group but this fails, can you to me explain why and what is the proper way of just getting the filenames instead of the full match?
On regex101 this regexp \\(?:input|include)\{([^}]+)\} seems to work fine.

You are looking for Positive lookbehind and positive lookahead:
#'
Some line
\input{my_folder/my_file}
Other line
'# | Select-String '(?<=\\input{)[^}]+(?=})' -AllMatches | Foreach {$_.matches}
Result
Groups : {0}
Success : True
Name : 0
Captures : {0}
Index : 18
Length : 17
Value : my_folder/my_file

Related

Powershell script to sum only specific records from file(s)

I am trying to work with a directory full of files.
I want to find specific rows within the file,
from those rows, extract a numeric value
and them sum up all these values, for all values, in a directory.
It would look like this...
File1.txt
bread:123
ham:456
eggs:789
File2.txt
bread:999
mayo:789
eggs:123
and so on...
I want to find the row with eggs, extract the number, and sum these numbers together across files.
I found this script from other posts but it's only segements, I still have trouble understanding how to use and pipe/ variables /braces.
dir . -filter "*.txt" -Recurse -name | foreach{(GC $_).Count} | measure-object -sum
#?
Get-Content | Select-String -Pattern "eggs*"
#?
$record -split ":"
I want the script to say "eggs = 912" which would be 123 + 789 = 912
Here is a possible solution:
$pattern = 'eggs'
$sum = Get-ChildItem . -File -Recurse -Filter *.txt |
Get-Content |
Where-Object { $_ -match $pattern } |
ForEach-Object { ($_ -split ':')[1] } |
Measure-Object -Sum |
ForEach-Object Sum
"$pattern = $sum"
Output:
eggs = 912
Get-ChildItem finds all files recursively that match the filter
Get-Content reads each line of every file and passes that on in the pipeline
Where-Object includes only lines that match the given RegEx pattern
The ForEach-Object line splits the line at : and extracts the sub string, which is at array index [1].
Measure-Object accumulates all numbers (it converts strings to double, if necessary). Internally, it creates a variable in its begin block, accumulates the pipeline input to this variable in its process block and outputs the variable value in its end block.
The last ForEach-Object line is necessary because Measure-Object actually outputs an object with a Sum property, but we only want the value of that property, not the entire object. If you'd remove that line you'd have to write "$pattern = $($sum.Sum)" instead, to access the Sum property of the sum object.
You can treat the as csv files. import-csv doesn't take wildcards for the filename.
import-csv file1.txt,file2.txt -Delimiter : -Header item,amount |
where item -eq eggs | measure -sum amount
Count : 2
Average :
Sum : 912
Maximum :
Minimum :
StandardDeviation :
Property : amount

Export the matched output to a CSV as Column 1 and Column 2 using Powershell

I have below code to match the pattern and save it in CSV file. I need to save regex1 and regex2 as col1 and col2 in csv instead of saving all in 1st col.
$inputfile = ( Get-Content D:\Users\naham1224\Desktop\jil.txt )
$FilePath = "$env:USERPROFILE\Desktop\jil2.csv"
$regex1 = "(insert_job: [A-Za-z]*_*\S*)"
$regex2 = "(machine: [A-Z]*\S*)"
$inputfile |
Select-String -Pattern $regex2,$regex1 -AllMatches |
ForEach-Object {$_.matches.groups[1].value} |
Add-Content $FilePath`
Input file contains : input.txt
/* ----------------- AUTOSYS_DBMAINT ----------------- */
insert_job: AUTOSYS_DBMAINT job_type: CMD
command: %AUTOSYS%\bin\DBMaint.bat
machine: PWISTASASYS01
owner: svc.autosys#cbs
permission:
date_conditions: 1
days_of_week: su,mo,tu,we,th,fr,sa
start_times: "03:30"
description: "Runs DBmaint process on AE Database - if fails - MTS - will run next scheduled time"
std_out_file: ">$$LOGS\dbmaint.txt"
std_err_file: ">$$LOGS\dbmaint.txt"
alarm_if_fail: 0
alarm_if_terminated: 0
send_notification: 0
notification_msg: "Check DBMaint output in autouser.PD1\\out directory"
notification_emailaddress: jnatal#cbs.com
/* ----------------- TEST_ENV ----------------- */
insert_job: TEST_ENV job_type: CMD
command: set
machine: PWISTASASYS01
owner: svc.autosys#cbs
permission:
date_conditions: 1
days_of_week: su,mo,tu,we,th,fr,sa
start_times: "03:30"
description: "output env"
std_out_file: ">C:\Users\svc.autosys\Documents\env.txt"
std_err_file: ">C:\Users\svc.autosys\Documents\env.txt"
alarm_if_fail: 1
alarm_if_terminated: 1
Current output :
Current output
Expected output :
Expected output
I am trying various ways to do so but no luck. any suggestions and help is greatly appreciated.
Here is how I would do this:
$inputPath = 'input.txt'
$outputPath = 'output.csv'
# RegEx patterns to extract data.
$patterns = #(
'(insert_job): ([A-Za-z]*_*\S*)'
'(machine): ([A-Z]*\S*)'
)
# Create an ordered Hashtable to collect columns for one row.
$row = [ordered] #{}
# Loop over all occurences of the patterns in input file
Select-String -Path $inputPath -Pattern $patterns -AllMatches | ForEach-Object {
# Extract key and value from current match
$key = $_.matches.Groups[ 1 ].Value
$value = $_.matches.Value
# Save one column of current row.
$row[ $key ] = $value
# If we have all columns of current row, output it as PSCustomObject.
if( $row.Count -eq $patterns.Count ) {
# Convert hashtable to PSCustomObject and output (implicitly)
[PSCustomObject] $row
# Clear Hashtable in preparation for next row.
$row.Clear()
}
} | Export-Csv $outputPath -NoTypeInformation
Output CSV:
"insert_job","machine"
"insert_job: AUTOSYS_DBMAINT","machine: PWISTASASYS01"
"insert_job: TEST_ENV","machine: PWISTASASYS01"
Remarks:
Using Select-String with parameter -Path we don't have to read the input file beforehand.
An ordered Hashtable (a dictionary) is used to collect all columns, until we have an entire row to output. This is the crucial step to produce multiple columns instead of outputting all data in a single column.
Converting the Hashtable to a PSCustomObject is necessary because Export-Csv expects objects, not dictionaries.
While the CSV looks like your "expected output" and you possibly have good reason to expect it like that, in a CSV file the values normally shouldn't repeat the column names. To remove the column names from the values, simply replace $value = $_.matches.Value by $_.matches.Groups[ 2 ].Value, which results in an output like this:
"insert_job","machine"
"AUTOSYS_DBMAINT","PWISTASASYS01"
"TEST_ENV","PWISTASASYS01"
As for what you have tried:
Add-Content writes only plain text files from string input. While you could use it to create CSV files, you would have to add separators and escape strings all by yourself, which is easy to get wrong and more hassle than necessary. Export-CSV otoh takes objects as inputs and cares about all of the CSV format details automatically.
As zett42 mentioned Add-Content is not the best fit for this. Since you are looking for multiple values separated by commas Export-Csv is something you can use. Export-Csv will take objects from the pipeline, convert them to lines of comma-separated properties, add a header line, and save to file
I took a little bit of a different approach here with my solution. I've combined the different regex patterns into one which will give us one match that contains both the job and machine names.
$outputPath = "$PSScriptRoot\output.csv"
# one regex to match both job and machine in separate matching groups
$regex = '(?s)insert_job: (\w+).+?machine: (\w+)'
# Filter for input files
$inputfiles = Get-ChildItem -Path $PSScriptRoot -Filter input*.txt
# Loop through each file
$inputfiles |
ForEach-Object {
$path = $_.FullName
Get-Content -Raw -Path $path | Select-String -Pattern $regex -AllMatches |
ForEach-Object {
# Loop through each match found in the file.
# Should be 2, one for AUTOSYS_DBMAINT and another for TEST_ENV
$_.Matches | ForEach-Object {
# Create objects with the values we want that we can output to csv file
[PSCustomObject]#{
# remove next line if not needed in output
InputFile = $path
Job = $_.Groups[1].Value # 1st matching group contains job name
Machine = $_.Groups[2].Value # 2nd matching group contains machine name
}
}
}
} | Export-Csv $outputPath # Pipe our objects to Export-Csv
Contents of output.csv
"InputFile","Job","Machine"
"C:\temp\powershell\input1.txt","AUTOSYS_DBMAINT","PWISTASASYS01"
"C:\temp\powershell\input1.txt","TEST_ENV","PWISTATEST2"
"C:\temp\powershell\input2.txt","AUTOSYS_DBMAINT","PWISTASAPROD1"
"C:\temp\powershell\input2.txt","TEST_ENV","PWISTATTEST1"

How can I write the output of Select-String in powershell without the line numbers?

I am using powershell to filter a textfile using a regular expression. To do that I am using the following command:
Select-String -Pattern "^[0-9]{2}[A-Z]{2}[a-z]{5}" -CaseSensitive rockyou.txt > filter.txt
The issue however, when writing them to filter.txt it's preceding the matched strings with the name of the original file followed by the line numbers e.g.:
rockyou.txt:12345:abcdefg
rockyou.txt:12345:abcdefg
rockyou.txt:12345:abcdefg
How can I make it so that it ommits the line numbers?
Select-String outputs an object per match, and each has a Line property containing the original line in which the match occurred. You can grab only the Line value, like so:
... |Select-String ... |Select-Object -ExpandProperty Line |Out-File filter.txt
This way seems to work. Set-content saves the string version of the matchinfo object without any extra blank lines, as opposed to out-file or ">".
get-content rockyou.txt | select-string '^[0-9]{2}[A-Z]{2}[a-z]{5}' -ca |
set-content filter.txt
get-content filter.txt
01ABcdefg
It occurred to me you might still want the filename:
select-string '^[0-9]{2}[A-Z]{2}[a-z]{5}' rockyou.txt -ca |
% { $_.filename + ':' + $_.line } > filter.txt
cat filter.txt
rockyou.txt:01ABcdefg

Display Multiple pipeline Values Of Same Get-Content

I am trying to get the string value in csv file.
$path = "product.csv"
Get-Content $path | Select-String -AllMatches -Pattern "[^\x00-\x79]"
I successfully grab the string, however I wish display the line numbers then the string values.
Example Output:
LineNo String
1 a
2 b
3 c
I did successfully grab the linenumber using below command. How should I combine the command with the first command so the output will be alike the example output?
Get-Content $path | Select-String -AllMatches -Pattern "[^\x00-\x79]" | Select-Object LineNumber
If you want the entire line, select the Line property:
... |Select-Object LineNumber,Line
If you only want the part of the line that was matched by the pattern, you'll need a calculated property to grab the Value from the Matches property:
... |Select-Object LineNumber,#{Name='String';Expression={$_.Matches.Value}}

.matches in powershell not returning any results

I'm working on a script that is based off of a cryptography script on blog.commandlinekungfu.com. Essentially I want to get the frequency of all of the letters in a text file. In the example, he uses Here-String to store the values, but I want to use Get-Content. Here's the breakdown.
This code works
PS c:\$foobar = #"
foo
bar
"#
PS c:\$foobar | Select-String -AllMatches "[A-Z]").matches
It returns the approriate values. However, if I have a text file that contains exactly the same information, I get a null value returned.
PS c:\$text = Get-Content "foobar.txt"
PS c:\$text | Select-String -AllMatches "[A-Z]").matches
Returns nothing
Does anyone know why a Here-String works but not Get-Content?
The here string is treated as one string, with get-content you pipe a collection of strings. You can pipe the filoe content to the Out-String cmdlet:
(get-content file.txt | out-string | Select-String -AllMatches "[A-Z]").matches