I want to dump data in mongodb with a specific time range.
Dump one collection works right.
mongodump --db VnTrader_Tick_ALL_1106 --collection au1712 --out tick_1106 --query "{ datetime: {$gte: new Date(1509973200000), $lt: new Date(1510038900000) }}"
And I am wondering how can I dump all collections in my database by same query?
Thanks a lot
I don't have answer for this in Windows version. But I too crossed this scenario and used the following command in linux to achieve today with the help of xargs.
echo "show collections" | mongo <dbname> --quiet | grep -v "system.indexes" | xargs -I {} mongodump --db <dbname> --collection {} --query "{}"
I finally solved the problem with a navie implementation.
first generate a dumpcommond.txt file in python with regular expression.
command_tick = 'mongodump' + ' --db ' + HOST_TICK_DB + ' --collection ' + contract + ' --out tick_remote_1 ' + \
' --query "{ datetime: {$gte: new Date(' + ds_time + '), $lt: new Date(' + de_time + ')}}"'
with open(MONGODB_BIN_PATH + '\dumpcommond.txt','a') as myfile:
myfile.write(command_tick + '\n')
open a cmd in mongo/bin
copy and past all the commands in dumpcommond.txt the cmd to get all collections
Related
I want to make a batch file that will get query from .SQL script from the directory and export results in .csv format. I need to connect to the Postgres server.
So I'm trying to do this using that answer https://stackoverflow.com/a/39049102/9631920.
My file:
#!/bin/bash
# sql_to_csv.sh
echo test1
CONN="psql -U my_user -d my_db -h host -port"
QUERY="$(sed 's/;//g;/^--/ d;s/--.*//g;' 'folder/folder/folder/file.sql' | tr '\n' ' ')"
echo test2
echo "$QUERY"
echo test3
echo "\\copy ($QUERY) to 'folder/folder/folder/file.csv' with csv header" | $CONN > /dev/null
echo query in progress
It shows me script from query and test3 and then stops. What am I doing wrong?
edit.
My file:
#!/bin/bash
PSQL = "psql -h 250.250.250.250 -p 5432 -U user -d test"
${PSQL} << OMG2
CREATE TEMP VIEW xyz AS
`cat C:\Users\test\Documents\my_query.sql`
;
\copy (select * from xyz) TO 'C:\Users\test\Documents\res.csv';
OMG2
But it's not asking password, and not getting any result file
a shell HERE-document will solve most of your quoting woes
a temp view will solve the single-query-on-a-single line problem
Example (using a multi-line two-table JOIN):
#!/bin/bash
PSQL="psql -U www twitters"
${PSQL} << OMG
-- Some comment here
CREATE TEMP VIEW xyz AS
SELECT twp.name, twt.*
FROM tweeps twp
JOIN tweets twt
ON twt.user_id = twp.id
AND twt.in_reply_to_id > 3
WHERE 1=1
AND (False OR twp.screen_name ilike '%omg%' )
;
\copy (select * from xyz) TO 'omg.csv';
OMG
If you want the contents of an existing .sql file, you can cat it inside the here document, using a backtick-expansion:
#!/bin/bash
PSQL="psql -X -n -U www twitters"
${PSQL} << OMG2
-- Some comment here
CREATE TEMP VIEW xyz AS
-- ... more comment
-- cat the original file here
`cat /home/dir1/dir2/dir3/myscript.sql`
;
\copy (select * from xyz) TO 'omg.csv';
OMG2
please assist how can we achieve this:
/home/bin/mongoexport --host pr01.prod.com:3046 --authenticationDatabase admin -u m7#pub.com -p 'orc1*&'
--readPreference secPref --db action --collection pubaction --type=csv -f "_id" --sort "{_id:-1}"
--noHeaderLine --limit 1 -o /opt/app/maxvalue_id_rt24.csv
2021-04-01T00:48:22.721-0400 exported 1 record
$ cat /opt/app/maxvalue_id_rt24.csv
ObjectId(60659ac)
$ cat lastvalue_id_rt24.csv
ObjectId(60654fe)
--- we are putting manually lastvalue in lastvalue_id_rt24.csv file for the first time run the automated script
OUR QUERY ---- db.getCollection('pubaction').find({"_id":{"$gt":ObjectId("60654fe") ,"$lte":ObjectId("60659ac")}})
My requirement here to cut the value inside () in maxvalue and lastvalue to the next script and pass value x=60654fe And y =60659ac
$/home/bin/mongoexport --host pr01.prod.com:3046
--authenticationDatabase admin -u m7#pub.com -p 'orc1*&'
--readPreference secPref --db action
--collection pubaction --type=csv -f "_id,eventId,eventName,timeStamp,recordPublishIndicator"
-q '{"_id":{"$gt":'$minvalue' ,"$lte":'$maxvalue'}}' -o /opt/app/act_id_rt24.csv
So inside script -q '{"_id":{"$gt":ObjectId("x"),"$lte":ObjectId("y")}}' --- we will need to hardcode ObjectId And pass the x and y values
ex: from above output
-q '{"_id":{"$gt":ObjectId("60654fe") ,"$lte":ObjectId("60659ac")}}'
in next run in below script we must change the logic, to put the old ID value in MINVALUE and NEW ID VALUE IN MAXVALUE
/home/bin/mongoexport --host pr01.prod.com:3046
--authenticationDatabase admin -u m7#pub.com -p 'orc1*&'
--readPreference secPref --db action
--collection pubaction --type=csv -f "_id" --sort "{_id:-1}"
--noHeaderLine --limit 1 -o /opt/app/maxvalue_id_rt24.csv
if [ $? -eq 0 ]
then
maxvalue=`cat maxvalue_id_rt24.csv`
echo $maxvalue
minvalue=`cat lastvalue_id_rt24.csv`
and after sqlldr:
sqlldr report/fru1p control=act_id.ctl ERRORS=500 log=x`.log direct=Y
we MUST change LOGIC below to put the old ID value in MINVALUE and NEW ID VALUE IN MAXVALUE
cat maxvalue_id_rt24.csv > lastvalue_id_rt24.csv
I want to export last n minutes data from MongoDB using mongoexport using timestamp value from objectID.
example document
{
"_id": ObjectId("5ec428cf85aff1e13c9058a4"),
"key": "val"
}
Failed attempts
mongoexport -d data -c logs -o out.json -q "{_id:{$gt: new Date(ISODate().getTime() - 1000 * 60 * 18)}}"
mongoexport -d data -c logs -o out.json -q "{'_id.getTimestamp()':{$gt: new Date(ISODate().getTime() - 1000 * 60 * 18)}}"
mongoexport -d data -c logs -o out.json -q "{'this._id.getTimestamp()':{$gt: new Date(ISODate().getTime() - 1000 * 60 * 18)}}"
Any idea of how to do this correctly
I think when passing query as JSON you will not be able to pass the new Date() function to it, also the document id is of type ObjectId so you need to convert the id to ISODate before comparing.
If you already have a valid date string eg: 2020-06-03T00:00:00.000Z then you can query like
mongoexport -d data -c logs -o out.json -q '{"$expr": {"$gt":[{"$toDate":"$_id"}, {"$toDate": "2020-06-03T00:00:00.000Z"}]}}'
So far I have loaded all the parcel tables (with geometry information) in Alaska to PostgreSQL. The tables are originally stored in dump format. Now, I want to convert each table in Postgres to shapefile through cmd interface using ogr2ogr.
My code is something like below:
ogr2ogr -f "ESRI Shapefile" "G:\...\Projects\Dataset\Parcel\test.shp" PG:"dbname=parceldb host=localhost port=5432 user=postgres password=postgres" -sql "SELECT * FROM ak_fairbanks"
However, the system kept returning me this info: Unable to open datasource
PG:dbname='parceldb' host='localhost' port='5432' user='postgres' password='postgres'
With the following drivers.
There is pgsql2shp option also available. For this you need to have this utility in your system.
The command that can be follow for this conversion is
pgsql2shp -u <username> -h <hostname> -P <password> -p 5434 -f <file path to save shape file> <database> [<schema>.]<table_name>
This command has other options also which can be seen on this link.
Exploring this case based on the comments in another answer, I decided to share my Bash scripts and my ideas.
Exporting multiple tables
To export many tables from a specific schema, I use the following script.
#!/bin/bash
source ./pgconfig
export PGPASSWORD=$password
# if you want filter, set the tables names into FILTER variable below and removing the character # to uncomment that.
# FILTER=("table_name_a" "table_name_b" "table_name_c")
#
# Set the output directory
OUTPUT_DATA="/tmp/pgsql2shp/$database"
#
#
# Remove the Shapefiles after ZIP
RM_SHP="yes"
# Define where pgsql2shp is and format the base command
PG_BIN="/usr/bin"
PG_CON="-d $database -U $user -h $host -p $port"
# creating output directory to put files
mkdir -p "$OUTPUT_DATA"
SQL_TABLES="select table_name from information_schema.tables where table_schema = '$schema'"
SQL_TABLES="$SQL_TABLES and table_type = 'BASE TABLE' and table_name != 'spatial_ref_sys';"
TABLES=($($PG_BIN/psql $PG_CON -t -c "$SQL_TABLES"))
export_shp(){
SQL="$1"
TB="$2"
pgsql2shp -f "$OUTPUT_DATA/$TB" -h $host -p $port -u $user $database "$SQL"
zip -j "$OUTPUT_DATA/$TB.zip" "$OUTPUT_DATA/$TB.shp" "$OUTPUT_DATA/$TB.shx" "$OUTPUT_DATA/$TB.prj" "$OUTPUT_DATA/$TB.dbf" "$OUTPUT_DATA/$TB.cpg"
}
for TABLE in ${TABLES[#]}
do
DATA_QUERY="SELECT * FROM $schema.$TABLE"
SHP_NAME="$TABLE"
if [[ ${#FILTER[#]} -gt 0 ]]; then
echo "Has filter by table name"
if [[ " ${FILTER[#]} " =~ " ${TABLE} " ]]; then
export_shp "$DATA_QUERY" "$SHP_NAME"
fi
else
export_shp "$DATA_QUERY" "$SHP_NAME"
fi
# remove intermediate files
if [[ "$RM_SHP" = "yes" ]]; then
rm -f $OUTPUT_DATA/$SHP_NAME.{shp,shx,prj,dbf,cpg}
fi
done
Splitting data into multiple files
To avoid the problem of large tables when pgsql2shp does not write data to the shapefile, we can adopt data splitting using the paging strategy. In Postgres we can use LIMIT, OFFSET and ORDER BY for paging.
Applying this method and considering that your table has a primary key used to sort the data in my example script.
#!/bin/bash
source ./pgconfig
export PGPASSWORD=$password
# if you want filter, set the tables names into FILTER variable below and removing the character # to uncomment that.
# FILTER=("table_name_a" "table_name_b" "table_name_c")
#
# Set the output directory
OUTPUT_DATA="/tmp/pgsql2shp/$database"
#
#
# Remove the Shapefiles after ZIP
RM_SHP="yes"
# Define where pgsql2shp is and format the base command
PG_BIN="/usr/bin"
PG_CON="-d $database -U $user -h $host -p $port"
# creating output directory to put files
mkdir -p "$OUTPUT_DATA"
SQL_TABLES="select table_name from information_schema.tables where table_schema = '$schema'"
SQL_TABLES="$SQL_TABLES and table_type = 'BASE TABLE' and table_name != 'spatial_ref_sys';"
TABLES=($($PG_BIN/psql $PG_CON -t -c "$SQL_TABLES"))
export_shp(){
SQL="$1"
TB="$2"
pgsql2shp -f "$OUTPUT_DATA/$TB" -h $host -p $port -u $user $database "$SQL"
zip -j "$OUTPUT_DATA/$TB.zip" "$OUTPUT_DATA/$TB.shp" "$OUTPUT_DATA/$TB.shx" "$OUTPUT_DATA/$TB.prj" "$OUTPUT_DATA/$TB.dbf" "$OUTPUT_DATA/$TB.cpg"
}
for TABLE in ${TABLES[#]}
do
GET_PK="SELECT a.attname "
GET_PK="${GET_PK}FROM pg_index i "
GET_PK="${GET_PK}JOIN pg_attribute a ON a.attrelid = i.indrelid AND a.attnum = ANY(i.indkey) "
GET_PK="${GET_PK}WHERE i.indrelid = 'test'::regclass AND i.indisprimary"
PK=($($PG_BIN/psql $PG_CON -t -c "$GET_PK"))
MAX_ROWS=($($PG_BIN/psql $PG_CON -t -c "SELECT COUNT(*) FROM $schema.$TABLE"))
LIMIT=10000
OFFSET=0
# base query
DATA_QUERY="SELECT * FROM $schema.$TABLE"
# continue until all data are fetched.
while [ $OFFSET -le $MAX_ROWS ]
do
DATA_QUERY_P="$DATA_QUERY ORDER BY $PK OFFSET $OFFSET LIMIT $LIMIT"
OFFSET=$(( OFFSET+LIMIT ))
SHP_NAME="${TABLE}_${OFFSET}"
if [[ ${#FILTER[#]} -gt 0 ]]; then
echo "Has filter by table name"
if [[ " ${FILTER[#]} " =~ " ${TABLE} " ]]; then
export_shp "$DATA_QUERY_P" "$SHP_NAME"
fi
else
export_shp "$DATA_QUERY_P" "$SHP_NAME"
fi
# remove intermediate files
if [[ "$RM_SHP" = "yes" ]]; then
rm -f $OUTPUT_DATA/$SHP_NAME.{shp,shx,prj,dbf,cpg}
fi
done
done
Common config file
Configuration file for PostgreSQL connection used in both examples (pgconfig):
user="postgres"
host="my_ip_or_hostname"
port="5432"
database="my_database"
schema="my_schema"
password="secret"
Another strategy is to choose GeoPackage as the output file that supports a larger file size than the shapefile format, maintaining portability across Operating Systems and having sufficient support in GIS softwares.
ogr2ogr -f GPKG output_file.gpkg PG:"host=my_ip_or_hostname user=postgres dbname=my_database password=secret" -sql "SELECT * FROM my_schema.my_table"
References:
Retrieve primary key columns - Postgres
LIMIT, OFFSET, ORDER BY and Pagination in PostgreSQL
ogr2ogr - GDAL
Hello I am trying to migrate from Mysql to Postgresql.
I have an SQL file which queries some records and I want to put this in Redis with mass insert.
In Mysql it was working below this sample command;
sudo mysql -h $DB_HOST -u $DB_USERNAME -p$DB_PASSWORD $DB_DATABASE --skip-column-names --raw < test.sql | redis-cli --pipe
I figured out test.sql file for Postgresql syntax.
SELECT
'*3\r\n' ||
'$' || length(redis_cmd::text) || '\r\n' || redis_cmd::text || '\r\n' ||
'$' || length(redis_key::text) || '\r\n' || redis_key::text || '\r\n' ||
'$' || length(sval::text) || '\r\n' || sval::text || '\r\n'
FROM (
SELECT
'SET' as redis_cmd,
'ddi_lemmas:' || id::text AS redis_key,
lemma AS sval
FROM ddi_lemmas
) AS t
and its one output like
"*3\r\n$3\r\nSET\r\n$11\r\nddi_lemmas:1\r\n$22\r\nabil+abil+neg+aor+pagr\r\n"
But I couldn't find any example like Mysql command piping from command line.
There are some examples that have two stages not directly (first insert to a txt file and then put it in Redis)
sudo PGPASSWORD=$PASSWORD psql -U $USERNAME -h $HOSTNAME -d $DATABASE -f test.sql > data.txt
Above command working but with column names which i dont want.
I am trying to find directly send output of Postgresql result to Redis.
Could you help me please?
Solution:
If I want to insert with RESP commands from a sql file. (with the help of #teppic )
echo -e "$(psql -U $USERNAME -h $HOSTNAME -d $DATABASE -AEt -f test.sql)" | redis-cli --pipe
From the psql man page, -t will "Turn off printing of column names and result row count footers, etc."
-A turns off alignment, and -q sets "quiet" mode.
It looks like you're outputting RESP commands, in which case you'll have to use the escaped string format to get the newline/carriage return pairs, e.g. E'*3\r\n' (note the E).
It might be simpler to pipe SET commands to redis-cli:
psql -At -c "SELECT 'SET ddi_lemmas:' || id :: TEXT || ' ' || lemma FROM ddi_lemmas" | redis-cli