I using sphinx search to create indexes and search data in my PostgreSQL database.
I have 2 questions about it.
If I run command
/usr/bin/indexer --config /etc/sphinxsearch/sphinx.conf --rotate --all
I get output from 'show tables;'
Index
Type
dist_title_de
distributed
word_title_de
local
word_titlestemmed_de
local
rt_title_de
rt
But If I run command
/usr/bin/indexer --config /etc/sphinxsearch/sphinx_another_conf_file.conf --rotate --all
Then I get the same output on terminal, but I dont see new indexes on 'show tables;'. It seems like '--config' option in indexer not working and only properly name is sphinx.conf. It's problematic, because if I want reindex sphinxsearch I have to changing file sphinx.conf.
Second question is it possible to 'add' new index without delete old ones? Currently I using sphinx like (everyday):
Get new data (datasource1, datasource2, ..., datasource8)
Index --rotate --all (index data from 8 datasources)
Search some info on indexes
Write it to db
But now, I want sth like:
Get new data from datasource1
Index datasource1
Get new data from datasource2
Index datasource2 (without delete index datasource1)
Search something in index datasource1
....
Get new data form datasource8 (without deleteing indexes)
Index datasource8
etc
On 'without delete index' I mean, now if I use command from top of topic, I 'lost' my indexes and get only new ones (from sphinx.conf).
My sphinx.conf (only 1 datasource):
source src_title_de
{
type = pgsql
sql_host = #######
sql_user = #######
sql_pass = #######
sql_db = #######
sql_port = 3306 # optional, default is 3306
sql_query = \
SELECT id, group_id, (date_extraction::TIMESTAMP) AS date_extraction, title \
FROM sphinx_test
sql_ranged_throttle = 0
}
index word_title_de
{
source = src_title_de
path = /var/lib/sphinxsearch/data/word_title_de
docinfo = extern
dict = keywords
mlock = 0
morphology = none
stopwords = /var/lib/sphinxsearch/data/stopwords.txt
wordforms = /var/lib/sphinxsearch/data/wordforms_de.txt
min_word_len = 1
}
index word_titlestemmed_de : word_title_de
{
path = /var/lib/sphinxsearch/data/word_titlestemmed_de
morphology = stem_en
}
index dist_title_de
{
type = distributed
local = word_title_de
local = word_titlestemmed_de
agent = localhost:9313:remote1
agent = localhost:9314:remote2,remote3
agent_connect_timeout = 1000
agent_query_timeout = 3000
}
index rt_title_de
{
type = rt
path = /var/lib/sphinxsearch/data/rt_title_de
rt_field = title
rt_field = content
rt_attr_uint = gid
}
indexer
{
mem_limit = 128M
}
searchd
{
listen = 9312:sphinx
listen = 9306:mysql41
log = /var/log/sphinxsearch/searchd.log
query_log = /var/log/sphinxsearch/query.log
read_timeout = 5
client_timeout = 300
max_children = 30
persistent_connections_limit = 30
pid_file = /var/run/sphinxsearch/searchd.pid
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
mva_updates_pool = 1M
max_packet_size = 8M
max_filters = 256
max_filter_values = 4096
max_batch_queries = 32
workers = threads # for RT to work
}
My second file for 8 datasources like the same like above with CTRL+C CTRL+V on 'source src_title_de', 'index word_title_de', 'index word_titlestemmed_de', 'index rt_title_de' with another countries and change table with data in 'sql_query'.
On your first question, the --config option only applies to that indexer run. Ie the --all should cause it index (or try to ) index all the plain indexes mentioned in that file.
... but when it sends the signal to reload (what the --rotate) does, searchd just reloads its CURRENT config file, NOT the one you told indexer about.
To get serachd to use a new config file would have to stop searchd, and start it again with new config file.
So change sphinx.conf directly, rather than a 'second' file.
Acully the second question is the same answer...
So change sphinx.conf directly, rather than a 'second' file.
Ie add your new index to sphinx.conf, use indexer to 'build' it. When indexer has finished, it will tell 'reload' whcih will cause searchd to load the new config file AND the new index just built.
Related
My system is
% uname -or
FreeBSD 11.0-RELEASE-p2
Sphinx version is
% searchd --help
Sphinx 2.2.11-id64-release (95ae9a6)
Sphinx configuration:
index content_rt_template : common_template
{
type = rt
rt_mem_limit = 128M # 128M only...
rt_field = text
rt_attr_string = text
rt_field = title
rt_attr_string = title
rt_field = url
rt_attr_string = url
rt_attr_bigint = item_id
rt_attr_uint = source_id
rt_attr_timestamp = published_date
rt_attr_timestamp = created_date
}
common {
lemmatizer_base = /path/to/sphinx/
}
indexer
{
mem_limit = 128M # 128M only...
}
index content_rt_from_20170501_to_20170601 : content_rt_template
{
path = /path/to/sphinx/data/2017/content_rt_from_20170501_to_20170601
}
index content_rt_from_20170601_to_20170701 : content_rt_template
{
path = /path/to/sphinx/data/2017/content_rt_from_20170601_to_20170701
}
index content_rt_from_20171201_to_20180101 : content_rt_template
{
path = /path/to/sphinx/data/2017/content_rt_from_20171201_to_20180101
}
index content2017
{
type = distributed
local = content_rt_from_20170501_to_20170601
local = content_rt_from_20170601_to_20170701
local = content_rt_from_20171201_to_20180101
}
searchd
{
listen = 127.0.0.1:9417
listen = 9317:mysql41
log = /path/to/sphinx/log/searchd_2017.log
query_log = /path/to/sphinx/log/query_2017.log
read_timeout = 60
max_children = 30
pid_file = /path/to/sphinx/pid/searchd2017.pid
seamless_rotate = 0
preopen_indexes = 0
unlink_old = 1
workers = threads # for RT to work
binlog_path = /path/to/sphinx/data/2017/
}
Used memory before starting Sphinx:
Mem[||||||||| 5.33G/40.0G]
Swp[||||||||||||||||||||||||||||||3.35G/4.00G]
Log on Sphinx start:
% ./start.sh
Sphinx 2.2.11-id64-release (95ae9a6)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/path/to/sphinx/conf/content2017.conf'...
listening on 127.0.0.1:9417
listening on all interfaces, port=9317
WARNING: index 'common_template': key 'path' not found - NOT SERVING
WARNING: index 'content_rt_template': path must be specified - NOT SERVING
WARNING: failed to init process shared rwlock: process shared rwlock is not supported by FreeBSD; ALTER disabled
precaching index 'content_rt_from_20170501_to_20170601'
WARNING: failed to init process shared rwlock: process shared rwlock is not supported by FreeBSD; ALTER disabled
precaching index 'content_rt_from_20170601_to_20170701'
WARNING: failed to init process shared rwlock: process shared rwlock is not supported by FreeBSD; ALTER disabled
precaching index 'content_rt_from_20171201_to_20180101'
precached 3 indexes in 6.520 sec
Sphinx 2.2.11-id64-release (95ae9a6)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/path/to/sphinx/conf/content_dist.conf'...
listening on 127.0.0.1:9312
listening on all interfaces, port=9306
WARNING: index 'common_template': key 'path' not found - NOT SERVING
WARNING: index 'content_rt_template': path must be specified - NOT SERVING
WARNING: failed to init process shared rwlock: process shared rwlock is not supported by FreeBSD; ALTER disabled
precaching index 'content_snippet'
precached 1 indexes in 0.064 sec
Used memory after Sphinx started:
Mem[||||||||||||||| 11.6G/40.0G]
Swp[||||||||||||||||||||||||||||||3.34G/4.00G]
Sphinx use about 6G. But according to mem_limit and rt_mem_limit must use not more than 128 * 3 = 384M.
What may be reason of using lot of memory?
Maybe reason in this warning?
WARNING: failed to init process shared rwlock: process shared rwlock
UPD
I has tryed it on Ubuntu 16.04 and situation is same.
The rt_mem_limit only limits the size of the ram chunk itself. Any disk chunks will use their own memory.
Typically its attributes that compose the biggest part , as by default, all are held in memory. Can cut down memory with http://sphinxsearch.com/docs/current.html#conf-ondisk-attrs
With option
ondisk_attrs = pool
RT index use not many RAM.
I'm trying to figure out Heka configuration, and made such a test config:
[hekad]
maxprocs = 4
[Dashboard]
type = "DashboardOutput"
address = ":4352"
ticker_interval = 15
[testfile]
type = "LogstreamerInput"
log_directory = "/tmp"
file_match = 'test\.log'
[filewriter]
type = "FileOutput"
path = "/tmp/output.log"
perm = "666"
message_matcher = "TRUE"
encoder = "PayloadEncoder"
[PayloadEncoder]
append_newlines = false
prefix_ts = true
ts_from_message = false
Next I write to log with while true; do date >> /tmp/test.log ; sleep 1; done and I would expect /tmp/output.log to get filled with the same info, yet regardless of whether test.log is written to or not, output log gets filled with info like:
[2016/Jan/05:11:01:51 +0200] [2016/Jan/05:11:02:06 +0200] {"encoders":[{"Name":"filewriter-PayloadEncoder"}],"globals":[{"InChanCapacity":{"representation":"count","value":100},"InChanLength":{"representation":"count","value":100},"Name":"inputRecycleChan"},{"InChanCapacity":{"representation":"count","value":100},"InChanLength":{"representation":"count","value":100},"Name":"injectRecycleChan"},{"InChanCapacity":{"representation":"count","value":30},"InChanLength":{"representation":"count","value":0},"Name":"Router","ProcessMessageCount":{"representation":"count","value":260}}],"inputs":[{"Name":"testfile","testfile-bytes":{"representation":"count","value":84419},"testfile-filename":{"representation":"","value":"/tmp/test.log"}}],"outputs":[{"InChanCapacity":{"representation":"count","value":30},"InChanLength":{"representation":"count","value":0},"LeakCount":{"representation":"count","value":0},"MatchAvgDuration":{"representation":"ns","value":1506},"MatchChanCapacity":{"representation":"count","value":30},"MatchChanLength":{"representation":"count","value":0},"Name":"Dashboard"},{"InChanCapacity":{"representation":"count","value":30},"InChanLength":{"representation":"count","value":0},"LeakCount":{"representation":"count","value":0},"MatchAvgDuration":{"representation":"ns","value":550},"MatchChanCapacity":{"representation":"count","value":30},"MatchChanLength":{"representation":"count","value":0},"Name":"filewriter"}],"splitters":[{"Name":"testfile-TokenSplitter-1"}]}
[2016/Jan/05:11:02:06 +0200] [2016/Jan/05:11:02:21 +0200] {"encoders":[{"Name":"filewriter-PayloadEncoder"}],"globals":[{"InChanCapacity":{"representation":"count","value":100},"InChanLength":{"representation":"count","value":100},"Name":"inputRecycleChan"},{"InChanCapacity":{"representation":"count","value":100},"InChanLength":{"representation":"count","value":100},"Name":"injectRecycleChan"},{"InChanCapacity":{"representation":"count","value":30},"InChanLength":{"representation":"count","value":0},"Name":"Router","ProcessMessageCount":{"representation":"count","value":262}}],"inputs":[{"Name":"testfile","testfile-bytes":{"representation":"count","value":84419},"testfile-filename":{"representation":"","value":"/tmp/test.log"}}],"outputs":[{"InChanCapacity":{"representation":"count","value":30},"InChanLength":{"representation":"count","value":0},"LeakCount":{"representation":"count","value":0},"MatchAvgDuration":{"representation":"ns","value":1506},"MatchChanCapacity":{"representation":"count","value":30},"MatchChanLength":{"representation":"count","value":0},"Name":"Dashboard"},{"InChanCapacity":{"representation":"count","value":30},"InChanLength":{"representation":"count","value":0},"LeakCount":{"representation":"count","value":0},"MatchAvgDuration":{"representation":"ns","value":550},"MatchChanCapacity":{"representation":"count","value":30},"MatchChanLength":{"representation":"count","value":0},"Name":"filewriter"}],"splitters":[{"Name":"testfile-TokenSplitter-1"}]}
What is this, why is it written, how do I disable it?
update:
I've removed ticker_interval from DashboardOutput, yet the problem persists.
Apparently DashboardOutput overrides ticker_interval from the default of 0 to 5, so in order to get rid of those strings, ticker_interval = 0 should be added to its config.
I am trying to run the indexer of my sphinx server.
This is the command I use (through root access) to start the indexing:
indexer --all
When I use the command, this is the reponse I get:
Sphinx 2.1.9-id64-release (rel21-r4761)
Copyright (c) 2001-2014, Andrew Aksyonoff
Copyright (c) 2008-2014, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinxsearch/sphinx.conf'...
FATAL: no indexes found in config file '/etc/sphinxsearch/sphinx.conf'
This is the sphinx.conf file that is located in /etc/sphinxsearch/
#############################################################################
## indexer settings
#############################################################################
indexer
{
# memory limit, in bytes, kiloytes (16384K) or megabytes (256M)
# optional, default is 32M, max is 2047M, recommended is 256M to 1024M
mem_limit = 1024M
}
#############################################################################
## searchd settings
#############################################################################
searchd
{
listen = 127.0.0.1:9312
listen = 127.0.0.1:9306:mysql41
log = /var/log/sphinxsearch/searchd.log
query_log = /var/log/sphinxsearch/query.log
read_timeout = 5
client_timeout = 300
max_children = 30
pid_file = /var/log/sphinxsearch/searchd.pid
max_matches = 1000
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
mva_updates_pool = 1M
max_packet_size = 8M
max_filters = 256
max_filter_values = 4096
workers = threads # for RT to work
}
index myindex
{
type = rt
path = /var/www/vhosts/user/sphinx/myindex
rt_field = description
rt_field = searchcode
rt_field = weight
rt_field = productid
rt_attr_uint = stockproduct
rt_attr_uint = instock
charset_type = utf-8
min_infix_len = 3
enable_star = 1
expand_keywords = 1
dict = keywords
}
# --eof--
Can someone help me with resolving this error?
FATAL: no indexes found in config file '/etc/sphinxsearch/sphinx.conf
the indexer command only works on traditional disk-index. Not real time indexes.
Because indexer doesn't do anything with type=rt it doesnt 'see' them, hence your config file has no indexes to index.
I guess in an ideal world it would say 'no plain indexes found' or similar to clarify its ignoring rt (same as it ignores distributed)
All you have to do is put sphinx.conf file inside bin folder, which mean it will be inside this path for example "etc/sphinxsearch/bin/sphinx.conf".
I'm getting the following error while trying a wildcard(*) enabled search in Sphinx 2.0.6
index products: syntax error, unexpected $undefined near '*'
My search term is iphone 4s*
It's using the products index as defined below.
index users
{
enable_star = 1
docinfo = extern
morphology = stem_en
charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
ignore_chars = U+0021..U+002F,U+003A..U+003F,U+0060
charset_type = utf-8
html_strip = 0
source = gdgt_user
path = /var/lib/sphinxsearch/data/gdgt/users
min_infix_len = 3
min_word_len = 3
}
index products : users
{
enable_star = 1
min_infix_len = 1
min_word_len = 1
source = gdgt_products
path = /var/lib/sphinxsearch/data/gdgt/products
}
I am using the php api that can be found in the source tar ball.
I am able to see the error when using search CLI.
search -c app/config/sphinx.compiled.conf -i products -e "ipho*"
Sphinx 2.0.6-id64-release (r3473)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file 'app/config/sphinx.compiled.conf'...
index 'products': search error: .
My php code looks like
$client = new SphinxClient();
$client->SetServer($serverIp, $serverPort);
$client->SetMaxQueryTime(5000);
$client->SetSortMode(SPH_SORT_RELEVANCE);
$client->SetMatchMode(SPH_MATCH_EXTENDED);
$res = $client->query('ipho*', 'products');
var_dump($res, $client->getLastError(), $client->getLastWarning());
The issue is that star(*) for wildcard is also in your ignore_chars (U+002A).
Update it to:
ignore_chars = U+0021..U+0029,U+002B..U+002F,U+003A..U+003F,U+0060
I'm pretty happy with s3cmd, but there is one issue: How to copy all files from one S3 bucket to another? Is it even possible?
EDIT: I've found a way to copy files between buckets using Python with boto:
from boto.s3.connection import S3Connection
def copyBucket(srcBucketName, dstBucketName, maxKeys = 100):
conn = S3Connection(awsAccessKey, awsSecretKey)
srcBucket = conn.get_bucket(srcBucketName);
dstBucket = conn.get_bucket(dstBucketName);
resultMarker = ''
while True:
keys = srcBucket.get_all_keys(max_keys = maxKeys, marker = resultMarker)
for k in keys:
print 'Copying ' + k.key + ' from ' + srcBucketName + ' to ' + dstBucketName
t0 = time.clock()
dstBucket.copy_key(k.key, srcBucketName, k.key)
print time.clock() - t0, ' seconds'
if len(keys) < maxKeys:
print 'Done'
break
resultMarker = keys[maxKeys - 1].key
Syncing is almost as straight forward as copying. There are fields for ETag, size, and last-modified available for keys.
Maybe this helps others as well.
s3cmd sync s3://from/this/bucket/ s3://to/this/bucket/
For available options, please use:
$s3cmd --help
AWS CLI seems to do the job perfectly, and has the bonus of being an officially supported tool.
aws s3 sync s3://mybucket s3://backup-mybucket
http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
The answer with the most upvotes as I write this is this one:
s3cmd sync s3://from/this/bucket s3://to/this/bucket
It's a useful answer. But sometimes sync is not what you need (it deletes files, etc.). It took me a long time to figure out this non-scripting alternative to simply copy multiple files between buckets. (OK, in the case shown below it's not between buckets. It's between not-really-folders, but it works between buckets equally well.)
# Slightly verbose, slightly unintuitive, very useful:
s3cmd cp --recursive --exclude=* --include=file_prefix* s3://semarchy-inc/source1/ s3://semarchy-inc/target/
Explanation of the above command:
–recursiveIn my mind, my requirement is not recursive. I simply want multiple files. But recursive in this context just tells s3cmd cp to handle multiple files. Great.
–excludeIt’s an odd way to think of the problem. Begin by recursively selecting all files. Next, exclude all files. Wait, what?
–includeNow we’re talking. Indicate the file prefix (or suffix or whatever pattern) that you want to include.s3://sourceBucket/ s3://targetBucket/This part is intuitive enough. Though technically it seems to violate the documented example from s3cmd help which indicates that a source object must be specified:s3cmd cp s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]
You can also use the web interface to do so:
Go to the source bucket in the web interface.
Mark the files you want to copy (use shift and mouse clicks to mark several).
Press Actions->Copy.
Go to the destination bucket.
Press Actions->Paste.
That's it.
I needed to copy a very large bucket so I adapted the code in the question into a multi threaded version and put it up on GitHub.
https://github.com/paultuckey/s3-bucket-to-bucket-copy-py
It's actually possible. This worked for me:
import boto
AWS_ACCESS_KEY = 'Your access key'
AWS_SECRET_KEY = 'Your secret key'
conn = boto.s3.connection.S3Connection(AWS_ACCESS_KEY, AWS_SECRET_KEY)
bucket = boto.s3.bucket.Bucket(conn, SRC_BUCKET_NAME)
for item in bucket:
# Note: here you can put also a path inside the DEST_BUCKET_NAME,
# if you want your item to be stored inside a folder, like this:
# bucket.copy(DEST_BUCKET_NAME, '%s/%s' % (folder_name, item.key))
bucket.copy(DEST_BUCKET_NAME, item.key)
Thanks - I use a slightly modified version, where I only copy files that don't exist or are a different size, and check on the destination if the key exists in the source. I found this a bit quicker for readying the test environment:
def botoSyncPath(path):
"""
Sync keys in specified path from source bucket to target bucket.
"""
try:
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
srcBucket = conn.get_bucket(AWS_SRC_BUCKET)
destBucket = conn.get_bucket(AWS_DEST_BUCKET)
for key in srcBucket.list(path):
destKey = destBucket.get_key(key.name)
if not destKey or destKey.size != key.size:
key.copy(AWS_DEST_BUCKET, key.name)
for key in destBucket.list(path):
srcKey = srcBucket.get_key(key.name)
if not srcKey:
key.delete()
except:
return False
return True
I wrote a script that backs up an S3 bucket: https://github.com/roseperrone/aws-backup-rake-task
#!/usr/bin/env python
from boto.s3.connection import S3Connection
import re
import datetime
import sys
import time
def main():
s3_ID = sys.argv[1]
s3_key = sys.argv[2]
src_bucket_name = sys.argv[3]
num_backup_buckets = sys.argv[4]
connection = S3Connection(s3_ID, s3_key)
delete_oldest_backup_buckets(connection, num_backup_buckets)
backup(connection, src_bucket_name)
def delete_oldest_backup_buckets(connection, num_backup_buckets):
"""Deletes the oldest backup buckets such that only the newest NUM_BACKUP_BUCKETS - 1 buckets remain."""
buckets = connection.get_all_buckets() # returns a list of bucket objects
num_buckets = len(buckets)
backup_bucket_names = []
for bucket in buckets:
if (re.search('backup-' + r'\d{4}-\d{2}-\d{2}' , bucket.name)):
backup_bucket_names.append(bucket.name)
backup_bucket_names.sort(key=lambda x: datetime.datetime.strptime(x[len('backup-'):17], '%Y-%m-%d').date())
# The buckets are sorted latest to earliest, so we want to keep the last NUM_BACKUP_BUCKETS - 1
delete = len(backup_bucket_names) - (int(num_backup_buckets) - 1)
if delete <= 0:
return
for i in range(0, delete):
print 'Deleting the backup bucket, ' + backup_bucket_names[i]
connection.delete_bucket(backup_bucket_names[i])
def backup(connection, src_bucket_name):
now = datetime.datetime.now()
# the month and day must be zero-filled
new_backup_bucket_name = 'backup-' + str('%02d' % now.year) + '-' + str('%02d' % now.month) + '-' + str(now.day);
print "Creating new bucket " + new_backup_bucket_name
new_backup_bucket = connection.create_bucket(new_backup_bucket_name)
copy_bucket(src_bucket_name, new_backup_bucket_name, connection)
def copy_bucket(src_bucket_name, dst_bucket_name, connection, maximum_keys = 100):
src_bucket = connection.get_bucket(src_bucket_name);
dst_bucket = connection.get_bucket(dst_bucket_name);
result_marker = ''
while True:
keys = src_bucket.get_all_keys(max_keys = maximum_keys, marker = result_marker)
for k in keys:
print 'Copying ' + k.key + ' from ' + src_bucket_name + ' to ' + dst_bucket_name
t0 = time.clock()
dst_bucket.copy_key(k.key, src_bucket_name, k.key)
print time.clock() - t0, ' seconds'
if len(keys) < maximum_keys:
print 'Done backing up.'
break
result_marker = keys[maximum_keys - 1].key
if __name__ =='__main__':main()
I use this in a rake task (for a Rails app):
desc "Back up a file onto S3"
task :backup do
S3ID = "*****"
S3KEY = "*****"
SRCBUCKET = "primary-mzgd"
NUM_BACKUP_BUCKETS = 2
Dir.chdir("#{Rails.root}/lib/tasks")
system "./do_backup.py #{S3ID} #{S3KEY} #{SRCBUCKET} #{NUM_BACKUP_BUCKETS}"
end
mdahlman's code didn't work for me but this command copies all the files in the bucket1 to a new folder (command also creates this new folder) in bucket 2.
cp --recursive --include=file_prefix* s3://bucket1/ s3://bucket2/new_folder_name/
s3cmd won't cp with only prefixes or wildcards but you can script the behavior with 's3cmd ls sourceBucket', and awk to extract the object name. Then use 's3cmd cp sourceBucket/name destBucket' to copy each object name in the list.
I use these batch files in a DOS box on Windows:
s3list.bat
s3cmd ls %1 | gawk "/s3/{ print \"\\"\"\"substr($0,index($0,\"s3://\"))\"\\"\"\"; }"
s3copy.bat
#for /F "delims=" %%s in ('s3list %1') do #s3cmd cp %%s %2
You can also use s3funnel which uses multi-threading:
https://github.com/neelakanta/s3funnel
example (without the access key or secret key parameters shown):
s3funnel source-bucket-name list | s3funnel dest-bucket-name copy --source-bucket source-bucket-name --threads=10