I have a very large (~250k row and 171 Column) Tab delimited text file that I need to edit. I need to add the letter "H" to the third column on every row.
So I need it to go from 03/20/2020 09:00 03/20/2020 10:00 1269805 ......
to 03/20/2020 09:00 03/20/2020 10:00 H1269805 .....
I actually have this working with the following code:
$source = Get-ChildItem "C:\test\input\*.txt"
$target = "C:\test\test.txt"
$data = Get-Content -Path $source | ConvertFrom-Csv -Delimiter "`t" -Header Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10, Column11, Column12, Column13, Column14, Column15, Column16, Column17, Column18, Column19, Column20,
Column21, Column22, Column23, Column24, Column25, Column26, Column27, Column28, Column29, Column30, Column31, Column32, Column33, Column34, Column35, Column36, Column37, Column38, Column39, Column40,
Column41, Column42, Column43, Column44, Column45, Column46, Column47, Column48, Column49, Column50, Column51, Column52, Column53, Column54, Column55, Column56, Column57, Column58, Column59, Column60,
Column61, Column62, Column63, Column64, Column65, Column66, Column67, Column68, Column69, Column70, Column71, Column72, Column73, Column74, Column75, Column76, Column77, Column78, Column79, Column80,
Column81, Column82, Column83, Column84, Column85, Column86, Column87, Column88, Column89, Column90, Column91, Column92, Column93, Column94, Column95, Column96, Column97, Column98, Column99, Column100,
Column101, Column102, Column103, Column104, Column105, Column106, Column107, Column108, Column109, Column110, Column111, Column112, Column113, Column114, Column115, Column116, Column117, Column118, Column119, Column120,
Column121, Column122, Column123, Column124, Column125, Column126, Column127, Column128, Column129, Column130, Column131, Column132, Column133, Column134, Column135, Column136, Column137, Column138, Column139, Column140,
Column141, Column142, Column143, Column144, Column145, Column146, Column147, Column148, Column149, Column150, Column151, Column152, Column153, Column154, Column155, Column156, Column157, Column158, Column159, Column160,
Column161, Column162, Column163, Column164, Column165, Column166, Column167, Column168, Column169, Column170, Column171
$data | % {
If ($_.Column3) {
#import ID
$_.Column3 = "H$($_.Column3)"
} }
$data | Select Column1, Column2, Column3, Column4, Column5, Column6, Column7, Column8, Column9, Column10, Column11, Column12, Column13, Column14, Column15, Column16, Column17, Column18, Column19, Column20,
Column21, Column22, Column23, Column24, Column25, Column26, Column27, Column28, Column29, Column30, Column31, Column32, Column33, Column34, Column35, Column36, Column37, Column38, Column39, Column40,
Column41, Column42, Column43, Column44, Column45, Column46, Column47, Column48, Column49, Column50, Column51, Column52, Column53, Column54, Column55, Column56, Column57, Column58, Column59, Column60,
Column61, Column62, Column63, Column64, Column65, Column66, Column67, Column68, Column69, Column70, Column71, Column72, Column73, Column74, Column75, Column76, Column77, Column78, Column79, Column80,
Column81, Column82, Column83, Column84, Column85, Column86, Column87, Column88, Column89, Column90, Column91, Column92, Column93, Column94, Column95, Column96, Column97, Column98, Column99, Column100,
Column101, Column102, Column103, Column104, Column105, Column106, Column107, Column108, Column109, Column110, Column111, Column112, Column113, Column114, Column115, Column116, Column117, Column118, Column119, Column120,
Column121, Column122, Column123, Column124, Column125, Column126, Column127, Column128, Column129, Column130, Column131, Column132, Column133, Column134, Column135, Column136, Column137, Column138, Column139, Column140,
Column141, Column142, Column143, Column144, Column145, Column146, Column147, Column148, Column149, Column150, Column151, Column152, Column153, Column154, Column155, Column156, Column157, Column158, Column159, Column160,
Column161, Column162, Column163, Column164, Column165, Column166, Column167, Column168, Column169, Column170, Column171 | ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | % { $_ -replace '"', "" } | Select-Object -Skip 1 | Set-Content -Path $target
The problem I have is it takes a long time. I understand it is a large file, but is there any other way to do this faster? I feel like the converting to and from CSV is what is taking the longest, but I may be wrong. The whole process takes roughly 25 minutes to complete. Any help would be great.
To speed up processing, avoid the pipeline, use .NET types for file I/O and use plain-text operations:
# Create the output file.
$outFile = [IO.File]::CreateText($target)
# Loop over all input files
foreach ($file in Get-ChildItem C:\test\input\*.txt) {
# Loop over a given file's lines.
foreach ($line in [IO.File]::ReadLines($file.FullName)) {
# Prepend 'H' to the 3rd column and append to the output file.
$outFile.WriteLine(($line -replace '^.*?\t.*?\t', '$&H'))
}
}
$outFile.Close()
Note:
Be sure to always pass full file paths to .NET methods, because .NET's working directory usually differs from PowerShell's.
.NET file I/O methods default to BOM-less UTF-8 encoding.
The H is inserted in front of the 3rd tab-separated column using PowerShell's regex-based -replace operator.
I want to be able to run a program through command line and I want to start it with VbScript. I also want to get the output of the command line and assign it to a variable and I want all this to be done silently without cmd windows popping up. I have managed two things separately but not together. Here's what I got so far.
Run the command from cmd and get output:
Dim WshShell, oExec
Set WshShell = WScript.CreateObject("WScript.Shell")
Set oExec = WshShell.Exec("C:\snmpget -c public -v 2c 10.1.1.2 .1.3.6.1.4.1.6798.3.1.1.1.5.1")
x = oExec.StdOut.ReadLine
Wscript.Echo x
The above script works and does what I want except that cmd pops up for a brief moment.
Here's a script that will run silently but won't grab the output
Set WshShell = WScript.CreateObject("WScript.Shell")
Return = WshShell.Run("C:\snmpset -c public -v 2c -t 0 10.1.1.2 .1.3.6.1.4.1.6798.3.1.1.1.7.1 i 1", 0, true)
Is there a way to get these two to work together?
Let me give you a background on why I want do to this. I am basically polling a unit every 5-10 minutes and I am going to get the script to email or throw a message box when a certain condition occurs but I don't want to see cmd line popping up all day long on my computer. Any suggestions?
Thanks
You can redirect output to a file and then read the file:
return = WshShell.Run("cmd /c C:\snmpset -c ... > c:\temp\output.txt", 0, true)
Set fso = CreateObject("Scripting.FileSystemObject")
Set file = fso.OpenTextFile("c:\temp\output.txt", 1)
text = file.ReadAll
file.Close
I have taken this and various other comments and created a bit more advanced function for running an application and getting the output.
Example to Call Function: Will output the DIR list of C:\ for Directories only. The output will be returned to the variable CommandResults as well as remain in C:\OUTPUT.TXT.
CommandResults = vFn_Sys_Run_CommandOutput("CMD.EXE /C DIR C:\ /AD",1,1,"C:\OUTPUT.TXT",0,1)
Function
Function vFn_Sys_Run_CommandOutput (Command, Wait, Show, OutToFile, DeleteOutput, NoQuotes)
'Run Command similar to the command prompt, for Wait use 1 or 0. Output returned and
'stored in a file.
'Command = The command line instruction you wish to run.
'Wait = 1/0; 1 will wait for the command to finish before continuing.
'Show = 1/0; 1 will show for the command window.
'OutToFile = The file you wish to have the output recorded to.
'DeleteOutput = 1/0; 1 deletes the output file. Output is still returned to variable.
'NoQuotes = 1/0; 1 will skip wrapping the command with quotes, some commands wont work
' if you wrap them in quotes.
'----------------------------------------------------------------------------------------
On Error Resume Next
'On Error Goto 0
Set f_objShell = CreateObject("Wscript.Shell")
Set f_objFso = CreateObject("Scripting.FileSystemObject")
Const ForReading = 1, ForWriting = 2, ForAppending = 8
'VARIABLES
If OutToFile = "" Then OutToFile = "TEMP.TXT"
tCommand = Command
If Left(Command,1)<>"""" And NoQuotes <> 1 Then tCommand = """" & Command & """"
tOutToFile = OutToFile
If Left(OutToFile,1)<>"""" Then tOutToFile = """" & OutToFile & """"
If Wait = 1 Then tWait = True
If Wait <> 1 Then tWait = False
If Show = 1 Then tShow = 1
If Show <> 1 Then tShow = 0
'RUN PROGRAM
f_objShell.Run tCommand & ">" & tOutToFile, tShow, tWait
'READ OUTPUT FOR RETURN
Set f_objFile = f_objFso.OpenTextFile(OutToFile, 1)
tMyOutput = f_objFile.ReadAll
f_objFile.Close
Set f_objFile = Nothing
'DELETE FILE AND FINISH FUNCTION
If DeleteOutput = 1 Then
Set f_objFile = f_objFso.GetFile(OutToFile)
f_objFile.Delete
Set f_objFile = Nothing
End If
vFn_Sys_Run_CommandOutput = tMyOutput
If Err.Number <> 0 Then vFn_Sys_Run_CommandOutput = "<0>"
Err.Clear
On Error Goto 0
Set f_objFile = Nothing
Set f_objShell = Nothing
End Function
I am pretty new to all of this, but I found that if the script is started via CScript.exe (console scripting host) there is no window popping up on exec(): so when running:
cscript myscript.vbs //nologo
any .Exec() calls in the myscript.vbs do not open an extra window, meaning
that you can use the first variant of your original solution (using exec).
(Note that the two forward slashes in the above code are intentional, see cscript /?)
Here I found a solution, which works for me:
set wso = CreateObject("Wscript.Shell")
set exe = wso.Exec("cmd /c dir /s /b d:\temp\*.jpg")
sout = exe.StdOut.ReadAll
Look for assigning the output to Clipboard (in your first script) and then in second script parse Clipboard value.
#Mark Cidade
Thanks Mark! This solved few days of research on wondering how should I call this from the PHP WshShell. So thanks to your code, I figured...
function __exec($tmppath, $cmd)
{
$WshShell = new COM("WScript.Shell");
$tmpf = rand(1000, 9999).".tmp"; // Temp file
$tmpfp = $tmppath.'/'.$tmpf; // Full path to tmp file
$oExec = $WshShell->Run("cmd /c $cmd -c ... > ".$tmpfp, 0, true);
// return $oExec == 0 ? true : false; // Return True False after exec
return $tmpf;
}
This is what worked for me in my case. Feel free to use and modify as per your needs. You can always add functionality within the function to automatically read the tmp file, assign it to a variable and/or return it and then delete the tmp file.
Thanks again #Mark!
Dim path As String = GetFolderPath(SpecialFolder.ApplicationData)
Dim filepath As String = path + "\" + "your.bat"
' Create the file if it does not exist.
If File.Exists(filepath) = False Then
File.Create(filepath)
Else
End If
Dim attributes As FileAttributes
attributes = File.GetAttributes(filepath)
If (attributes And FileAttributes.ReadOnly) = FileAttributes.ReadOnly Then
' Remove from Readonly the file.
attributes = RemoveAttribute(attributes, FileAttributes.ReadOnly)
File.SetAttributes(filepath, attributes)
Console.WriteLine("The {0} file is no longer RO.", filepath)
Else
End If
If (attributes And FileAttributes.Hidden) = FileAttributes.Hidden Then
' Show the file.
attributes = RemoveAttribute(attributes, FileAttributes.Hidden)
File.SetAttributes(filepath, attributes)
Console.WriteLine("The {0} file is no longer Hidden.", filepath)
Else
End If
Dim sr As New StreamReader(filepath)
Dim input As String = sr.ReadToEnd()
sr.Close()
Dim output As String = "#echo off"
Dim output1 As String = vbNewLine + "your 1st cmd code"
Dim output2 As String = vbNewLine + "your 2nd cmd code "
Dim output3 As String = vbNewLine + "exit"
Dim sw As New StreamWriter(filepath)
sw.Write(output)
sw.Write(output1)
sw.Write(output2)
sw.Write(output3)
sw.Close()
If (attributes And FileAttributes.Hidden) = FileAttributes.Hidden Then
Else
' Hide the file.
File.SetAttributes(filepath, File.GetAttributes(filepath) Or FileAttributes.Hidden)
Console.WriteLine("The {0} file is now hidden.", filepath)
End If
Dim procInfo As New ProcessStartInfo(path + "\" + "your.bat")
procInfo.WindowStyle = ProcessWindowStyle.Minimized
procInfo.WindowStyle = ProcessWindowStyle.Hidden
procInfo.CreateNoWindow = True
procInfo.FileName = path + "\" + "your.bat"
procInfo.Verb = "runas"
Process.Start(procInfo)
it saves your .bat file to "Appdata of current user" ,if it does not exist and remove the attributes
and after that set the "hidden" attributes to file after writing your cmd code
and run it silently and capture all output saves it to file
so if u wanna save all output of cmd to file just add your like this
code > C:\Users\Lenovo\Desktop\output.txt
just replace word "code" with your .bat file code or command and after that the directory of output file
I found one code recently after searching alot
if u wanna run .bat file in vb or c# or simply
just add this in the same manner in which i have written
I'm pretty happy with s3cmd, but there is one issue: How to copy all files from one S3 bucket to another? Is it even possible?
EDIT: I've found a way to copy files between buckets using Python with boto:
from boto.s3.connection import S3Connection
def copyBucket(srcBucketName, dstBucketName, maxKeys = 100):
conn = S3Connection(awsAccessKey, awsSecretKey)
srcBucket = conn.get_bucket(srcBucketName);
dstBucket = conn.get_bucket(dstBucketName);
resultMarker = ''
while True:
keys = srcBucket.get_all_keys(max_keys = maxKeys, marker = resultMarker)
for k in keys:
print 'Copying ' + k.key + ' from ' + srcBucketName + ' to ' + dstBucketName
t0 = time.clock()
dstBucket.copy_key(k.key, srcBucketName, k.key)
print time.clock() - t0, ' seconds'
if len(keys) < maxKeys:
print 'Done'
break
resultMarker = keys[maxKeys - 1].key
Syncing is almost as straight forward as copying. There are fields for ETag, size, and last-modified available for keys.
Maybe this helps others as well.
s3cmd sync s3://from/this/bucket/ s3://to/this/bucket/
For available options, please use:
$s3cmd --help
AWS CLI seems to do the job perfectly, and has the bonus of being an officially supported tool.
aws s3 sync s3://mybucket s3://backup-mybucket
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
The answer with the most upvotes as I write this is this one:
s3cmd sync s3://from/this/bucket s3://to/this/bucket
It's a useful answer. But sometimes sync is not what you need (it deletes files, etc.). It took me a long time to figure out this non-scripting alternative to simply copy multiple files between buckets. (OK, in the case shown below it's not between buckets. It's between not-really-folders, but it works between buckets equally well.)
# Slightly verbose, slightly unintuitive, very useful:
s3cmd cp --recursive --exclude=* --include=file_prefix* s3://semarchy-inc/source1/ s3://semarchy-inc/target/
Explanation of the above command:
–recursiveIn my mind, my requirement is not recursive. I simply want multiple files. But recursive in this context just tells s3cmd cp to handle multiple files. Great.
–excludeIt’s an odd way to think of the problem. Begin by recursively selecting all files. Next, exclude all files. Wait, what?
–includeNow we’re talking. Indicate the file prefix (or suffix or whatever pattern) that you want to include.s3://sourceBucket/ s3://targetBucket/This part is intuitive enough. Though technically it seems to violate the documented example from s3cmd help which indicates that a source object must be specified:s3cmd cp s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]
You can also use the web interface to do so:
Go to the source bucket in the web interface.
Mark the files you want to copy (use shift and mouse clicks to mark several).
Press Actions->Copy.
Go to the destination bucket.
Press Actions->Paste.
That's it.
I needed to copy a very large bucket so I adapted the code in the question into a multi threaded version and put it up on GitHub.
https://github.com/paultuckey/s3-bucket-to-bucket-copy-py
It's actually possible. This worked for me:
import boto
AWS_ACCESS_KEY = 'Your access key'
AWS_SECRET_KEY = 'Your secret key'
conn = boto.s3.connection.S3Connection(AWS_ACCESS_KEY, AWS_SECRET_KEY)
bucket = boto.s3.bucket.Bucket(conn, SRC_BUCKET_NAME)
for item in bucket:
# Note: here you can put also a path inside the DEST_BUCKET_NAME,
# if you want your item to be stored inside a folder, like this:
# bucket.copy(DEST_BUCKET_NAME, '%s/%s' % (folder_name, item.key))
bucket.copy(DEST_BUCKET_NAME, item.key)
Thanks - I use a slightly modified version, where I only copy files that don't exist or are a different size, and check on the destination if the key exists in the source. I found this a bit quicker for readying the test environment:
def botoSyncPath(path):
"""
Sync keys in specified path from source bucket to target bucket.
"""
try:
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
srcBucket = conn.get_bucket(AWS_SRC_BUCKET)
destBucket = conn.get_bucket(AWS_DEST_BUCKET)
for key in srcBucket.list(path):
destKey = destBucket.get_key(key.name)
if not destKey or destKey.size != key.size:
key.copy(AWS_DEST_BUCKET, key.name)
for key in destBucket.list(path):
srcKey = srcBucket.get_key(key.name)
if not srcKey:
key.delete()
except:
return False
return True
I wrote a script that backs up an S3 bucket: https://github.com/roseperrone/aws-backup-rake-task
#!/usr/bin/env python
from boto.s3.connection import S3Connection
import re
import datetime
import sys
import time
def main():
s3_ID = sys.argv[1]
s3_key = sys.argv[2]
src_bucket_name = sys.argv[3]
num_backup_buckets = sys.argv[4]
connection = S3Connection(s3_ID, s3_key)
delete_oldest_backup_buckets(connection, num_backup_buckets)
backup(connection, src_bucket_name)
def delete_oldest_backup_buckets(connection, num_backup_buckets):
"""Deletes the oldest backup buckets such that only the newest NUM_BACKUP_BUCKETS - 1 buckets remain."""
buckets = connection.get_all_buckets() # returns a list of bucket objects
num_buckets = len(buckets)
backup_bucket_names = []
for bucket in buckets:
if (re.search('backup-' + r'\d{4}-\d{2}-\d{2}' , bucket.name)):
backup_bucket_names.append(bucket.name)
backup_bucket_names.sort(key=lambda x: datetime.datetime.strptime(x[len('backup-'):17], '%Y-%m-%d').date())
# The buckets are sorted latest to earliest, so we want to keep the last NUM_BACKUP_BUCKETS - 1
delete = len(backup_bucket_names) - (int(num_backup_buckets) - 1)
if delete <= 0:
return
for i in range(0, delete):
print 'Deleting the backup bucket, ' + backup_bucket_names[i]
connection.delete_bucket(backup_bucket_names[i])
def backup(connection, src_bucket_name):
now = datetime.datetime.now()
# the month and day must be zero-filled
new_backup_bucket_name = 'backup-' + str('%02d' % now.year) + '-' + str('%02d' % now.month) + '-' + str(now.day);
print "Creating new bucket " + new_backup_bucket_name
new_backup_bucket = connection.create_bucket(new_backup_bucket_name)
copy_bucket(src_bucket_name, new_backup_bucket_name, connection)
def copy_bucket(src_bucket_name, dst_bucket_name, connection, maximum_keys = 100):
src_bucket = connection.get_bucket(src_bucket_name);
dst_bucket = connection.get_bucket(dst_bucket_name);
result_marker = ''
while True:
keys = src_bucket.get_all_keys(max_keys = maximum_keys, marker = result_marker)
for k in keys:
print 'Copying ' + k.key + ' from ' + src_bucket_name + ' to ' + dst_bucket_name
t0 = time.clock()
dst_bucket.copy_key(k.key, src_bucket_name, k.key)
print time.clock() - t0, ' seconds'
if len(keys) < maximum_keys:
print 'Done backing up.'
break
result_marker = keys[maximum_keys - 1].key
if __name__ =='__main__':main()
I use this in a rake task (for a Rails app):
desc "Back up a file onto S3"
task :backup do
S3ID = "*****"
S3KEY = "*****"
SRCBUCKET = "primary-mzgd"
NUM_BACKUP_BUCKETS = 2
Dir.chdir("#{Rails.root}/lib/tasks")
system "./do_backup.py #{S3ID} #{S3KEY} #{SRCBUCKET} #{NUM_BACKUP_BUCKETS}"
end
mdahlman's code didn't work for me but this command copies all the files in the bucket1 to a new folder (command also creates this new folder) in bucket 2.
cp --recursive --include=file_prefix* s3://bucket1/ s3://bucket2/new_folder_name/
s3cmd won't cp with only prefixes or wildcards but you can script the behavior with 's3cmd ls sourceBucket', and awk to extract the object name. Then use 's3cmd cp sourceBucket/name destBucket' to copy each object name in the list.
I use these batch files in a DOS box on Windows:
s3list.bat
s3cmd ls %1 | gawk "/s3/{ print \"\\"\"\"substr($0,index($0,\"s3://\"))\"\\"\"\"; }"
s3copy.bat
#for /F "delims=" %%s in ('s3list %1') do #s3cmd cp %%s %2
You can also use s3funnel which uses multi-threading:
https://github.com/neelakanta/s3funnel
example (without the access key or secret key parameters shown):
s3funnel source-bucket-name list | s3funnel dest-bucket-name copy --source-bucket source-bucket-name --threads=10