Importing planet OSM using osm2pgsql Issues - openstreetmap

I used osm2pgsql to import the entire planet-osm into my PostGIS data. It took a whole week and finally finished during the week-end when I was not at work. Also, some how the server restarted during the week-end so did not know whether the import was successful. From my previous experience with osm2pgsql, it would rollback if it encountered errors during import. It did not rollback this time and I assumed the import was successful. However when I inspected the indexes of the planet_osm_line, planet_osm_roads, and planet_osm_polygon, I only saw one index created instead of two. The planet_osm_*_index was missing. Not sure what happened. I am greatly appreciated if someone tell me what is going on and whether there is anyway to fix the problem. Note: The reason I know the index was missing because I was successfully imported a small region of OSM data before and saw two indexes, the "*_index" and "*_pkey". The command I used for import was:
osm2pgsql -s -m -d planet-osm -E 3857 -S default.style planet-osm-*.bz2
Many thanks in advance.
Regards,
Tam

Related

Dspace 7.2 - import of data from 5.8 has taken over a month

Inherited a system that is old (5.8), on an EOL box, and is having real issues post Log4j and some necessary network changes. Finally got permission to do a new version, so we are importing our Library collection into 7.2. We have 36G in our assetstore.
/opt/dspace/bin/dspace packager -r -a -k -t AIP -o skipIfParentMissing=true -e <ADMIN_EMAIL>-i /0 /opt/AIP/fac_site.zip
Current Meta-data Entry Count is 1530880
However, this process has been running for about seven weeks now!
Is this normal?
Is there any way we can see how much longer it will take? (management is nervous, understandably, as the current live version is very fragile)
Is there anyway to expedite this?
Thanks very much for any assistance that can be offered.

Source of the ' unexpected keyword argument 'fetch' ' error in pandas to_sql?

I am trying to upload a dataframe to a Heroku postgreSQL server, which I have done successfully several time before.
here is my code, where for_db is the name of my Pandas dataframe:
from sqlalchemy import create_engine
engine = create_engine('postgresql://wgam{rest of url}',
echo=False)
# attach the data frame to the sql server
for_db.to_sql('phil_nlp',
con = engine,
if_exists='replace')
At first, it was not able to connect because the server URL Heroku gave me had only 'postgres' at the beginning, but I understand it has to be changed to 'postgresql' to work properly and have gotten past that initial error.
Now I am getting a new error.
/usr/local/lib/python3.7/dist-packages/sqlalchemy/dialects/postgresql/psycopg2.py in do_executemany(self, cursor, statement, parameters, context)
899 template=executemany_values,
900 fetch=bool(context.compiled.returning),
--> 901 **kwargs
902 )
903
TypeError: execute_values() got an unexpected keyword argument 'fetch'
I'm not understanding why this would come up. Obviously I never specified such a keyword argument. I've done a lot of searching without any good results. Anyone know why it would now throw this error in code that was working just last week?
I ran into the same issue running the DataFrame.to_sql method. Adding method='multi' does get it working and is a good workaround.
Investigating it a bit a further it turned out to be an issue with the versions of sqlalchemy and psycopg2 that I had installed. These github issues here and here led me to the following.
The fetch parameter was added on psycopg2 version 2.8. I had version 2.7 and sqlalchemy 1.4.15
Installing a newer version fixed the problem without the need to add the method='multi' parameter.
pip install psycopg2-binary==2.8.6
Hope this helps anyone else finding this issue
Was able to fix this by adding a 'multi' as the method parameter:
for_db.to_sql('phil_nlp',
con = engine,
if_exists='replace',
index=False,
method='multi')
Still not sure what caused the error, but I guess that's the problem fixed :)
worked for me: pip install psycopg2-binary==2.8.6

missing chunk number 0 for toast value 37946637 in pg_toast_2619

Main Issue:
Getting "ERROR: missing chunk number 0 for toast value 37946637 in pg_toast_2619" while selecting from tables.
Steps that led to the issue:
- Used pg_basebackup from a Primary db and tried to restore it onto a Dev host.
- Did a pg_resetxlog -f /${datadir} and started up the Dev db.
- After starting up the Dev db, when I query a varchar column, I keep getting:
psql> select text_col_name from big_table;
ERROR: missing chunk number 0 for toast value 37946637 in pg_toast_2619
This seems to be happening for most varchar columns in the restored db.
Has anyone else seen it?
Does anyone have ideas of why it happens and how to fix it?
pg_resetxlog is a bit of a last resort utility which you should prefer not to use. Easiest way to make a fully working backup is to use pg_basebackup with the -X s option. That is an uppercase X. What this does is that basebackup opens two connections. One to copy all the data files and one to receive all of the wal that is written during the duration of the backup. This way you cannot run into the problem that parts of the wal you need are already deleted.
I tried a few things since by original question. I can confirm that the source of my error "ERROR: missing chunk number 0 for toast value 37946637 in pg_toast_2619" was doing a pg_resetxlog during the restore process.
I re-did the restore today but this time, applied the pg_xlog files from Primary using recovery.conf. The restored db started up fine now and all queries are running as expected.

mongorestore for a collection results in "Killed" output and collection isn't fully restored

I type the following below:
root#:/home/deploy# mongorestore --db=dbname --collection=collectionname pathtobackupfolder/collectionname.bson
Here's the output:
2016-07-16T00:08:03.513-0400 checking for collection data in pathtobackupfolder/collectionname.bson
2016-07-16T00:08:03.525-0400 reading metadata file from pathtobackupfolder/collectionname.bson
2016-07-16T00:08:03.526-0400 restoring collectionname from file pathtobackupfolder/collectionname.bson
Killed
What's going on? I can't find anything on Google or Stackoverflow about a mongorestore resulting in "Killed". The backup folder that I'm restoring from is a collection of 12875 documents, and yet everytime I run the mongorestore, it always says "Killed", and always restores a different number that is less than the total number: 4793, 2000, 4000, etc.
The machine that I'm performing this call on is "Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-71-generic x86_64)" from Digital Ocean
Any help is appreciated. Thanks.
After trying the mongorestore command for the 5th and 6th time after posting this question, this time more explicit output came out that indicated it was a memory issue specific to Digital Ocean. I followed https://www.digitalocean.com/community/tutorials/how-to-add-swap-on-ubuntu-14-04 and the restore finished completely without errors.
If you are trying to solve it in docker, just increase swap memory in settings.json file

Failed to import gs://bucket_name/Cloud.sql

I have stored everything needed for my database in phpmyadmin , and exported the my database from it. That was saved as Cloud.sql , so now this sql file I imported it to the Google Cloud Storage with the help of this link https://developers.google.com/cloud-sql/docs/import_export.
Now after importing the Contents of .sql using the Import option present in the action of the instance, it shows the green working sign , and after a while it stops, when I check in the Logs , it shows
Failed to import gs://bucket_name/Cloud.sql: An unknown problem occurred (ERROR_RDBMS)
So ,
I am unable to find out the reason behind the error as its not clear, and how can this be solved
Google Cloud Sql probably doesn't know to which database the gs://bucket_name/Cloud.sql commands apply.
From https://groups.google.com/forum/#!topic/google-cloud-sql-discuss/pFGe7LsbUaw:
The problem is the dump doesn't contain the name of the database to use. If you add a 'USE XXX' at the top the dump where XXX is the database you want to use I would expect the import to succeed.
I had a few issues that were spitting out the ERROR_RDBMS error.
It turns out that google actually does have more precise errors now, but you have to go here
https://console.cloud.google.com/sql/instances/{DATABASE_NAME}/operations
And you will see a description of why the operation failed.