How to setup CSV file in the db/migration folder? - postgresql

In usual migration I am using SQL usual COPY CSV command, but now all the SpringBoot project must be into a git and generic path for clone (and do it in a protected server):
where the best place or "standard Spring folder" to CSV files used by main/resources/db/migration?
how to use the COPY (SQL) with a relative path?
Relative path by shell is:
psql -h localhost -U postgres gcp -c "\
CREATE TABLE question_import (question text, weight integer); \
COPY question_import FROM STDIN WITH CSV HEADER delimiter as ',' \
" < _docs/data/csc-questoes.csv
but PostgreSQL not supports internal relative path.

Once you're using SpringBoot, why not incorporating the benefits of Liquibase (section 75.5.2)?
Apart from the ability to define/version your database schema, there are also methods for loading data, e.g. CSV in your case:
<changeSet author="liquibase-docs" id="loadUpdateData-example">
<loadUpdateData catalogName="cat"
encoding="UTF-8"
file="com/example/users.csv"
primaryKey="pk_id"
quotchar="A String"
schemaName="public"
separator="A String"
tableName="person">
<column name="address" type="varchar(255)"/>
</loadUpdateData>
The Spring-Boot-Liquibase Sample project should give you a quickstart.

Related

db2 how to configure external tables using extbl_location, extbl_strict_io

db2 how to configure external tables using extbl_location, extbl_strict_io. Could you please give insert example for system table how to set up this parameters. I need to create external table and upload data to external table.
I need to know how to configure parameters extbl_location, extbl_strict_io.
I created table like this.
CREATE EXTERNAL TABLE textteacher(ID int, Name char(50), email varchar(255)) USING ( DATAOBJECT 'teacher.csv' FORMAT TEXT CCSID 1208 DELIMITER '|' REMOTESOURCE 'LOCAL' SOCKETBUFSIZE 30000 LOGDIR '/tmp/logs' );
and tried to upload data to it.
insert into textteacher (ID,Name,email) select id,name,email from teacher;
and get exception [428IB][-20569] The external table operation failed due to a problem with the corresponding data file or diagnostic files. File name: "teacher.csv". Reason code: "1".. SQLCODE=-20569, SQLSTATE=428IB, DRIVER=4.26.14
If I correct understand documentation parameter extbl_location should pointed directory where data will save. I suppose full directory will showed like
$extbl_location+'/'+teacher.csv
I found some documentation about error
https://www.ibm.com/support/pages/how-resolve-sql20569n-error-external-table-operation
I tried to run command in docker command line.
/opt/ibm/db2/V11.5/bin/db2 get db cfg | grep -i external
but does not information about external any tables.
CREATE EXTERNAL TABLE statement:
file-name
...
When both the REMOTESOURCE option is set to LOCAL (this is its default value) and the extbl_strict_io configuration parameter is set
to NO, the path to the external table file is an absolute path and
must be one of the paths specified by the extbl_location configuration
parameter. Otherwise, the path to the external table file is relative
to the path that is specified by the extbl_location configuration
parameter followed by the authorization ID of the table definer. For
example, if extbl_location is set to /home/xyz and the authorization
ID of the table definer is user1, the path to the external table file
is relative to /home/xyz/user1/.
So, If you use relative path to a file as teacher.csv, you must set extbl_strict_io to YES.
For an unload operation, the following conditions apply:
If the file exists, it is overwritten.
Required permissions:
If the external table is a named external table, the owner must have read and write permission for the directory of this file.
If the external table is transient, the authorization ID of the statement must have read and write permission for the directory of this file.
Moreover you must create a sub-directory equal to your username (in lowercase) which is owner of this table in the directory specified in extbl_location and ensure, that this user (not the instance owner) has rw permission to this sub-directory.
Update:
To setup presuming, that user1 runs this INSERT statement.
sudo mkdir -p /home/xyz/user1
# user1 must have an ability to cd to this directory
sudo chown user1:$(id -gn user1) /home/xyz/user1
db2 connect to mydb
db2 update db cfg using extbl_location /home/xyz extbl_strict_io YES

How to change the metadata of all specific file of exist objects in Google Cloud Storage?

I have uploaded thousands of files to google storage, and i found out all the files miss content-type,so that my website cannot get it right.
i wonder if i can set some kind of policy like changing all the files content-type at the same time, for example, i have bunch of .html files inside the bucket
a/b/index.html
a/c/a.html
a/c/a/b.html
a/a.html
.
.
.
is that possible to set the content-type of all the .html files with one command in the different place?
You could do:
gsutil -m setmeta -h Content-Type:text/html gs://your-bucket/**.html
There's no a unique command to achieve the behavior you are looking for (one command to edit all the object's metadata) however, there's a command from gcloud to edit the metadata which you could use on a bash script to make a loop through all the objects inside the bucket.
1.- Option (1) is to use a the gcloud command "setmeta" on a bash script:
# kinda pseudo code here.
# get the list with all your object's names and iterate over the metadata edition command.
for OUTPUT in $(get_list_of_objects_names)
do
gsutil setmeta -h "[METADATA_KEY]:[METADATA_VALUE]" gs://[BUCKET_NAME]/[OBJECT_NAME]
# the "gs://[BUCKET_NAME]/[OBJECT_NAME]" would be your object name.
done
2.- You could also create a C++ script to achieve the same thing:
namespace gcs = google::cloud::storage;
using ::google::cloud::StatusOr;
[](gcs::Client client, std::string bucket_name, std::string object_name,
std::string key, std::string value) {
# you would need to find list all the objects, while on the loop, you can edit the metadata of the object.
for (auto&& object_metadata : client.ListObjects(bucket_name)) {
string bucket_name=object_metadata->bucket(), object_name=object_metadata->name();
StatusOr<gcs::ObjectMetadata> object_metadata =
client.GetObjectMetadata(bucket_name, object_name);
gcs::ObjectMetadata desired = *object_metadata;
desired.mutable_metadata().emplace(key, value);
StatusOr<gcs::ObjectMetadata> updated =
client.UpdateObject(bucket_name, object_name, desired,
gcs::Generation(object_metadata->generation()))
}
}

Getting an file exists error while import into Hive using sqoop

I am trying to copy the retail_db database tables into hive database which I already created. When I execute the following code
sqoop import-all-tables \
--num-mappers 1 \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=retail_dba \
--password=cloudera \
--hive-import \
--hive-overwrite \
--create-hive-table \
--outdir java_files \
--hive-database retail_stage
My Map-reduce job stops with the following error:
ERROR tool.ImportAllTablesTool: Encountered IOException running import
job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output
directory hdfs://quickstart.cloudera:8020/user/cloudera/categories
already exists
I am trying to copy the tables to hive database,Then why an existing file in cloudera caused the problem. Is there a way to ignore this error or overwrite the existing file.
This is how sqoop imports job works:
sqoop creates/imports data in tmp dir(HDFS) which is user's home dir(in your case it is /user/cloudera).
Then copy data to its actual hive location (i.e., /user/hive/wearhouse.
This categories dir should have exist before you ran import statements. so delete that dir or rename it if its important.
hadoop fs -rmr /user/cloudera/categories
OR
hadoop fs -mv /user/cloudera/categories /user/cloudera/categories_1
and re-run sqoop command!
So in short, Importing to Hive will use hdfs as the staging place and sqoop deletes staging dir /user/cloudera/categories after copying(sucessfully) to actual hdfs location - it is last stage of sqoop job to clean up staging/tmp files - so if you try to list the tmp staging dir, you won't find it.
After successful import: hadoop fs -ls /user/cloudera/categories - dir will not be there.
Sqoop import to Hive works in 3 steps:
Put data to HDFS
Create Hive table if not exists
Load data into Hive Table
You have not mentioned --target-dir or --warehouse-dir, so it will put data in HDFS Home Directory which I believe /user/cloudera/ in your case.
Now for a MySQL table categories you might have imported it earlier. So, /user/cloudera/categories directory exists and you are getting this exception.
Add any non-existing directory in --taget-dir like --taget-dir /user/cloudera/mysqldata. Then sqoop will put all the Mysql Tables imported by above command in this location.
Based on answer #1 above, I found this. I tried and it works.
So, just add --delete-target-dir
You cannot use hive-import and hive-overwrite at the same time.
The version I confirmed this issue is;
$ sqoop help import
--hive-overwrite Overwrite existing data in
the Hive table
$ sqoop version
Sqoop 1.4.6-cdh5.13.0
ref. https://stackoverflow.com/a/22407835/927387

Can I set search_path when importing a shape file with ogr2ogr from GDAL?

I'm importing shape files into Postgresql via this command:
ogr2ogr PG:host=localhost dbname=someDbName user=someUserName password=somePassword shapeFile.shp -nln alternateLayerName -nlt someValidGeometry
This works well but goes into public schema in Postgresql. I'd like to choose a different schema. Is there a way to achieve this with ogr2ogr alone?
There's no reference in man ogr2ogr to schema. My web search wasn't fruitful either.
I know I can do an ALTER TABLE some_table set schema a_different_schema, but that would mean adding another step to the process.
$ ogr2ogr --version
GDAL 2.1.0, released 2016/04/25
After a better web search I found on GDAL pg driver documentation that one can use ACTIVE_SCHEMA=string: Active schema. to set the schema where the table will be created.
I tried it like this:
ogr2ogr -f "PostgreSQL" PG:"dbname=mydb active_schema=layers" country.shp -nln test_table -nlt MULTILINESTRING
And it complains with:
ERROR 1: PQconnectdb failed: invalid connection option "active_schema"
But table gets created and populated properly. So I guess it's ok.
I found a better solution by using lco option:
ogr2ogr -f PostgreSQL "PG:host=localhost port=5432 dbname=some_db
user=postgres password=" someShapeFile.shp -nln desiredTableName
-nlt someValidGeometry -lco SCHEMA=desiredPostgresqlSchema

Export CSV file to Hive

I have a CSV file which I downloaded from mongo db and would like to export it to hive so that I can query it and analyze it. However, i suppose I need to first export it to HDFS. I have Hive installed on my system. I used the following command:
CREATE EXTERNAL TABLE reg_log (path STRING, ip STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> LOCATION '/home/nazneen/Desktop/mongodb-linux/bin/reqlog_new_mod.csv'
> STORED AS CSVFILE;
This is throwing error. Any pointers would be appreciated.
I believe the 'STORED AS' clause doesn't support keyword 'CSVFILE', see here. In your case 'STORED AS TEXTFILE' should be fine.