Convert PDF to CSV via Powershell - powershell

I am trying to take data from a webform that is generated in pdf format and migrate it to csv so I can use my limited powershell knowledge to automate VM builds. This is all done on linux as well. This is what I have so far and that seems to work, but I need to automate it all so I need to use wildcards as the file names will change.
pdftotext -nopgbrk './AndyTest-2 - Linux - Debian 10_13.pdf' test.txt
Import-Csv test.txt | Export-Csv test.csv

Cool - it sounds like you've already got 90++% of the solution.
Q: Why not just call your PS script with parameters?
EXAMPLE:
$PdfFile=$args[0]
$TxtFile=$args[1]
$CsvFile=$args[2]
pdftotext -nopgbrk $PdfFile $TxtFile
Import-Csv test.txt | Export-Csv $CsvFile
One possible "gotcha" is that apparently have spaces in your .Pdf filenames. One solution is to quote your variable, e.g.:
"$PdfFile"
Look here for more ideas:
https://social.technet.microsoft.com/Forums/en-US/3a527307-5bb1-40fa-94b3-9af0a3e181f3/
https://social.technet.microsoft.com/Forums/en-US/8e51b6f4-4adf-4253-8228-c410032209f7/
'Hope that helps!

Related

Powershell equivalent of Cat ./* ../path2/* > file.sql

I'm trying to translate a Linux command to be easily usable for Windows user for a project, but I am not having any luck finding comparable commands in Powershell.
I have two paths with some SQL and CSV files. What I need is this command:
cat ./* ../path/* > new_file.sql
This takes all content from all files in path1 and then all content from all files in path2 and writes it to a file.
I assumed I could do something similar in Powershell, but apparently the behaviour is wildly different.
What I have tried are:
cat ./*, ../path/* > new_file.sql
Get-Content ./*, ../path2/* | Out-File new_file.sql
They both do the same which seems to... I'm not sure, take the entirety of path2/* for every file in path1? The output quickly balloons to tens of megabytes. The combined content of both directories is a perhaps 40 kilobytes.
Anyone know? I cannot find a proper answer to this. Thanks!
EDIT: I think I figured out what the problem is. I guess I should've just used the actual paths for the example. First path is ./* and it seems like it keeps looping over the Out-File it makes itself. I have updated the title and examples to reflect this.
Enumerate the files as a separate step before concatenating their contents (this way Get-Content won't accidentally discover the new file halfway through):
$files = Get-ChildItem ./,../path2/ -File
$files |Get-Content |Out-File newfile.txt
You can combine these statements in a single pipeline if you wish:
(Get-ChildItem ./,../path2/ -File) |Get-Content |Out-File newfile.txt

powershell - read all .sql files in a folder and save them all into a single .sql file without changing line ends or line feeds

I manage database servers and often I have to apply scripts into different servers or databases.
Sometimes these scripts are all saved in a directory and need to be open and run in the target server\database.
As I have been looking at automating this task I came across how Run All PowerShell Scripts In A Directory and also How can I execute a set of .SQL files from within SSMS? and that is exactly what I needed, however I stumbled over a few issues:
I don't know the file names
:setvar path "c:\Path_to_scripts\"
:r $(path)\file1.sql
:r $(path)\file2.sql
I tried to add all .sql files into one big thing, but when I copied from powershell into sql, in many of the procedures that had long lines, the lines got messed up
cls
$Radhe = Get-Content 'D:\apply all scripts to SQLPRODUCTION\*.sql' -Raw
$Radhe.Count
$Radhe.LongLength
$Radhe
If I could read all the files in that specific folder and save them all into a single the_scripts_to_run.sql file, without changing the line endings, that would be perfect.
I don't need to use get-content or any command in particular, I just would like to get all my scripts into a big single script with everything in it, without changes.
How can I achieve that?
I even found Merge multiple SQL files into a single SQL file but I want to get it done via powershell.
This should work fine, I'm not sure what you mean by not needing to use Get-Content you could use [System.IO.File]::ReadAllLines( ) or [System.IO.File]::ReadAllText( ) but this should work fine too. Try it and let me know if it works.
$path = "c:\Path_to_scripts"
$scripts = (Get-ChildItem "$path\*.sql" -Recurse -File).FullName
$merged = [system.collections.generic.list[string[]]]::new()
foreach($script in $scripts)
{
$merged.Add((Get-Content $script))
}
$merged | Out-File "$path\mergedscripts.sql"
This is actually much simpler than the proposed solutions. Get-Content takes a list of paths and supports wildcards, so no loop is required.
$path = 'c:\temp\sql'
Set-Content -Path "$path\the_scripts_to_run.sql" -Value (Get-Content -Path "$path\*.sql" -Raw)
Looks like me and #Santiago had the same idea:
Get-ChildItem -Path "$path" -Filter "*.sql" | ForEach-Object -Process {
Get-Content $_.FullName | Out-File $Path\stuff.txt -Append utf8
}

How to display the file a match was found in using get-content and select-string one liner

I am attempting to search a directory of perl scripts and compile a list of all the other perl scripts executed from those files(intentionally trying to do this through Powershell). A simplistic dependency mapper, more or less.
With the below line of code I get output of every line where a reference to a perl file is found, but what I really need is same output AND the file in which each match was found.
Get-Content -Path "*.pl" | Select-String -Pattern '\w+\.pl' | foreach {Write-Host "$_"}
I have succeeded using some more complicated code but I think I can simplify it and accomplish most of the work through a couple lines of code(The code above accomplishes half of that).
Running this on a windows 10 machine powershell v5.1
I do things like this all the time. You don't need to use get-content.
ls -r *.pl | Select-String \w+\.pl
file.pl:1:file2.pl
You don't need to use ls or Get-ChildItem either; Select-String can take a path parameter:
Select-String -Pattern '\w+\.pl' -Path *.pl
which shortens to this in the shell:
sls \w+\.pl *.pl
(if your regex is more complex it might need spaces around it).
For the foreach {write-host part, you're writing a lot of code to turn useful objects back into less-useful strings, and forcibly writing them to the host instead of the standard output stream. You can pick out the data you want with:
sls \w+\.pl *.pl | select filename, {$_.matches[0]}
which will keep them as objects with properties, but render by default as a table.

Wrapping Powershell script and files together?

I'm currently using PS2EXE to compile my powershell script into an executable, works very well indeed!
My problem is that this script relies on other files/folders. So instead of having these out with the exe I want to also have these files 'wrapped' up into the exe along with the PS script. Running the exe will run the PS script then extract these files/folders and move them out of the exe...
Is this even possible?
Thanks for your help
A Powershell script that requires external files can be self-sustained by embedding the data within. The usual way is to convert data into Base64 form and save it as strings within the Powershell script. At runtime, create new files by decoding the Base64 data.
# First, let's encode the external file as Base64. Do this once.
$Content = Get-Content -Path c:\some.file -Encoding Byte
$Base64 = [Convert]::ToBase64String($Content)
$Base64 | Out-File c:\encoded.txt
# Create a new variable into your script that contains the c:\encoded.txt contents like so,
$Base64 = "ABC..."
# Finally, decode the data and create a temp file with original contents. Delete the file on exit too.
$Content = [Convert]::FromBase64String($Base64)
Set-Content -Path $env:temp\some.file -Value $Content -Encoding Byte
The full sample code is avalable on a blog.

How do I concatenate two text files in PowerShell?

I am trying to replicate the functionality of the cat command in Unix.
I would like to avoid solutions where I explicitly read both files into variables, concatenate the variables together, and then write out the concatenated variable.
Simply use the Get-Content and Set-Content cmdlets:
Get-Content inputFile1.txt, inputFile2.txt | Set-Content joinedFile.txt
You can concatenate more than two files with this style, too.
If the source files are named similarly, you can use wildcards:
Get-Content inputFile*.txt | Set-Content joinedFile.txt
Note 1: PowerShell 5 and older versions allowed this to be done more concisely using the aliases cat and sc for Get-Content and Set-Content respectively. However, these aliases are problematic because cat is a system command in *nix systems, and sc is a system command in Windows systems - therefore using them is not recommended, and in fact sc is no longer even defined as of PowerShell Core (v7). The PowerShell team recommends against using aliases in general.
Note 2: Be careful with wildcards - if you try to output to inputFiles.txt (or similar that matches the pattern), PowerShell will get into an infinite loop! (I just tested this.)
Note 3: Outputting to a file with > does not preserve character encoding! This is why using Set-Content is recommended.
Do not use >; it messes up the character encoding. Use:
Get-Content files.* | Set-Content newfile.file
In cmd, you can do this:
copy one.txt+two.txt+three.txt four.txt
In PowerShell this would be:
cmd /c copy one.txt+two.txt+three.txt four.txt
While the PowerShell way would be to use gc, the above will be pretty fast, especially for large files. And it can be used on on non-ASCII files too using the /B switch.
You could use the Add-Content cmdlet. Maybe it is a little faster than the other solutions, because I don't retrieve the content of the first file.
gc .\file2.txt| Add-Content -Path .\file1.txt
To concat files in command prompt it would be
type file1.txt file2.txt file3.txt > files.txt
PowerShell converts the type command to Get-Content, which means you will get an error when using the type command in PowerShell because the Get-Content command requires a comma separating the files. The same command in PowerShell would be
Get-Content file1.txt,file2.txt,file3.txt | Set-Content files.txt
I used:
Get-Content c:\FileToAppend_*.log | Out-File -FilePath C:\DestinationFile.log
-Encoding ASCII -Append
This appended fine. I added the ASCII encoding to remove the nul characters Notepad++ was showing without the explicit encoding.
If you need to order the files by specific parameter (e.g. date time):
gci *.log | sort LastWriteTime | % {$(Get-Content $_)} | Set-Content result.log
You can do something like:
get-content input_file1 > output_file
get-content input_file2 >> output_file
Where > is an alias for "out-file", and >> is an alias for "out-file -append".
Since most of the other replies often get the formatting wrong (due to the piping), the safest thing to do is as follows:
add-content $YourMasterFile -value (get-content $SomeAdditionalFile)
I know you wanted to avoid reading the content of $SomeAdditionalFile into a variable, but in order to save for example your newline formatting i do not think there is proper way to do it without.
A workaround would be to loop through your $SomeAdditionalFile line by line and piping that into your $YourMasterFile. However this is overly resource intensive.
To keep encoding and line endings:
Get-Content files.* -Raw | Set-Content newfile.file -NoNewline
Note: AFAIR, whose parameters aren't supported by old Powershells (<3? <4?)
I think the "powershell way" could be :
set-content destination.log -value (get-content c:\FileToAppend_*.log )