Executing Linux Command in Scala-Shell - scala

I'm working on a project where I'm needing to execute some linux commands (sqoop command) in my Scala application. See sample command I tried executing with MySql on my VM.
import sys.process._
"sqoop eval --connect jdbc:mysql://localhost:3306/retail_db --username root --password cloudera --query 'select * from categories'".!
I got the following error:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/06/24 15:25:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
20/06/24 15:25:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure.
Consider using -P instead.
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Error parsing arguments for eval:
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Unrecognized argument: *
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Unrecognized argument: from
20/06/24 15:25:27 ERROR tool.BaseSqoopTool: Unrecognized argument: categories
I used this command as well and I got same error message:
"sqoop eval --connect jdbc:mysql://localhost:3306/retail_db --username root --password cloudera --query 'select * from categories'".!<
Can someone help me figure out what's cause of the error. I've tried using single quote and double quotes, all to no avail. I searched all over SO but I could not get any solution. That's why I'm posting here.
NOTE: Same command successfully executed in pyspark as seen below:
>>> import os
>>> import sys
>>> query = "sqoop eval --connect jdbc:mysql://localhost:3306/retail_db --username root --password
cloudera --query 'select * from categories'"
>>> os.system(query)
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/06/24 15:28:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
20/06/24 15:28:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure.
Consider using -P instead.
20/06/24 15:28:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
----------------------------------------------------
| category_id | category_department_id | category_name |
----------------------------------------------------
| 1 | 2 | Football |
| 2 | 2 | Soccer |
| 3 | 2 | Baseball & Softball |
| 4 | 2 | Basketball |
| 5 | 2 | Lacrosse |
| 6 | 2 | Tennis & Racquet |

It looks like sqoop doesn't recognize *, from, and categories as individual arguments. The reason it works when invoked from the command line is that the shell interprets the quote marks and presents them as a single select * from categories argument. In other words, the shell does some pre-processing before handing everything off to the sqoop program.
The .! method (i.e. the Scala ProcessBuilder) launches processes directly, which means that the command elements are not passed to a shell for pre-processing. There are two ways to get around this problem.
You can invoke the shell directly and pass the command-line to it as a single argument, or
you can do most of the obvious pre-processing yourself.
Here's an example of the 2nd option.
Seq("sqoop"
,"eval"
,"--connect"
,"jdbc:mysql://localhost:3306/retail_db"
,"--username"
,"root"
,"--password"
,"cloudera"
,"--query"
,"select * from categories").!
As you can see, all the individual arguments are presented as individual arguments, including the last one.

Related

Postgres replication command "IDENTIFY_SYSTEM" syntax error

When I run the command on the database server it is returning the syntax error. Command:
psql -h xxxxx -p 5432 -U user -W -d "dbname=db_replication replication=true" -c "IDENTIFY_SYSTEM;"
command response
ERROR: syntax error at or near "IDENTIFY_SYSTEM"
LINE 1: IDENTIFY_SYSTEM;
Postgres version is 14.4.
which according to the documentation is a supported version.
running the command on a docker I raised
systemid | timeline | xlogpos | dbname
---------------------+----------+-----------+--------
9999999999999999999 | 1 | 0/FFFFFFF | replic

Installing pg_profile for postgreSql

I am trying to use the extension pg_profile in order to get awr-like reports in postgreSql.
In githab README file it says:
https://github.com/zubkov-andrei/pg_profile
# cp pg_profile* `pg_config --sharedir`/extension
and then:
postgres=# CREATE EXTENSION pg_profile;
But this fails with:
ERROR: could not open extension control file "/usr/share/postgresql/11/extension/pg_profile.control": No such file or directory
and indeed I can only find pg_profile.control.tpl in under this directory (and in the zip file I downloaded from Github).
IF I try to follow the tutorial instructions saying:
https://dbtut.com/index.php/2019/03/29/postgresql-awr/
After downloading the file to the server, go the directory you
downloaded and follow the steps below. We’re running the Make command.
make install
Then I get this error:
# make install
Makefile:16: ../../src/Makefile.global: No such file or directory
Makefile:17: /contrib/contrib-global.mk: No such file or directory
make: *** No rule to make target '/contrib/contrib-global.mk'. Stop.
Any idea how to solve these?
I was able to run CREATE EXTENSION in PG 12.3 database but to do that:
I have removed the line PG_CONFIG = /usr/local/pgsql/bin/pg_config in Makefile:
I have run as root:
$ make USE_PGXS=y install
/usr/bin/mkdir -p '/usr/pgsql-12/share/extension'
/usr/bin/mkdir -p '/usr/pgsql-12/share/extension'
/usr/bin/install -c -m 644 .//pg_profile.control '/usr/pgsql-12/share/extension/'
/usr/bin/install -c -m 644 pg_profile--0.1.1.sql pg_profile.control '/usr/pgsql-12/share/extension/'
And after that I could run without error:
postgres=# CREATE EXTENSION dblink;
postgres=# CREATE EXTENSION pg_stat_statements;
postgres=# CREATE EXTENSION pg_profile;
postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
--------------------+---------+------------+--------------------------------------------------------------
dblink | 1.2 | public | connect to other PostgreSQL databases from within a database
pg_profile | 0.1.1 | public | PostgreSQL load profile repository and report builder
pg_stat_statements | 1.7 | public | track execution statistics of all SQL statements executed
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
postgres_fdw | 1.0 | public | foreign-data wrapper for remote PostgreSQL servers
(5 rows)

How to use lobject function in psycopg2

I am trying to insert Large Binary data into postgresql using psycopg2. I understand bytea datatype is more common to use but testing BLOB for any future use cases.
Versions of postgresql and psycopg2 is below.
pip list | grep psycopg2
psycopg2 (2.5.1)
rpm -qa | grep postgres
postgresql-server-9.2.15-1.el7_2.x86_64
I use python 2.7.5
python -V
Python 2.7.5
Below is my code snippet
file = "/home/test/jefferson_love_memorial_514993.jpg"
with open(file,"r") as fd:
try:
# First connect to postgresql server
conn = psycopg2.connect("dbname='sample' user='sample' host='10.1.0.19' password='sample'")
# Initate the session with postgresql to write large object instance
lobj = conn.lobject(0,'r',0)
# Write the data to database
lobj.write(fd.read())
except (psycopg2.Warning, psycopg2.Error) as e:
print "Exception: {}".format(e)
However, after I execute the code I get no error but nothing is inserted into the table.
-bash-4.2$ psql -d sample
psql (9.2.15)
Type "help" for help.
sample=# SELECT * FROM pg_largeobject_metadata;
lomowner | lomacl
----------+--------
(0 rows)
sample=# SELECT * FROM pg_largeobject;
loid | pageno | data
------+--------+------
(0 rows)
May I ask what is lacking in my code?
I found the reason.
I have forgotten to do conn.commit() after lobj.write().
After doing commit it works perfectly.

when are newlines in psql command line strings significant?

I was trying to break a long command line involving psql command string (i.e. psql -c),and this seems to cause errors. For example, with PostgreSQL 9.5 and Ubuntu 16.04:
$ psql -c "\\dt"
works fine, while
$ psql -c "
> \\dt
> "
generates:
ERROR: syntax error at or near "\"
LINE 2: \dt
Just out of curiosity, when is it OK to insert newlines (i.e. \n) into a psql command string?
command must be either a command string that is completely parsable by the server (i.e., it contains no psql-specific features), or a single backslash command.
https://www.postgresql.org/docs/current/static/app-psql.html
It seems that psql does not understand a backslash command with the leading new line.
As an alternative you can use piped echo command, also described in the documentation. For example:
$ echo '
> \d
> select 1 as x;' | psql postgres
List of relations
Schema | Name | Type | Owner
--------+-------+------+----------
public | dummy | view | postgres
(1 row)
x
---
1
(1 row)

automation script for postgresql

Could you please help how can I run automation script for postgresql after installing postgresql on Ubuntu? I need to automatize DB preparation before using it (create table, insert data, alter permissions).
I need to do it with the current rights of the user.
E.g., I have the only user admin_ubuntu. He has all rights to run psql.
All scripts are written but how can I run the script? usually, I need to edit configs ( I believe, /etc/postgresql/9.1/main/pg_hba.conf). But I do not want to do it.
So, what I need is just to run sql which will make a lot of things. How can I do it to run it? The problem is that I need to do a lot of things to allow to run sql when the OS is immaculate (empty).
It will be made every time when the Ubuntu will have been installed.
you should be able to run shell provisioning - following is example of what you can do:
# creating user
sudo -u postgres psql -c "CREATE USER admin WITH PASSWORD 'password';"
# creating new db if needed .. might need 2 (dev/test)
createdb -U vagrant mydevdb
# if you have more complex things you'll need to put that in a create_db.sql file and run the script as
sudo -u postgres psql < create_db.sql
The create_db.sql file can contain any CREATE TABLE statement
Sharing some scripts here that show how to create databases, roles, schemas and tables. Not meant to be taken as best practice or acceptable for production, but hopefully it will help with getting started with PostgreSQL. Any code fragment surrounded by _ e.g. _variable_, indicate a string that should be replaced accordingly. Standard shell commands begin with $.
PostgreSQL Installation
My environment is an Ubuntu server/container where PostgreSQL was installed and started with:
$ apt install postgresql-12
$ pg_ctlcluster 12 main start
This automatically adds a postgres Linux user, which has superuser privileges in PostgreSQL, allowing the installation to be tested with:
$ sudo su postgres
$ psql
Which should result in a prompt like postgres=#. If that's the case, you should be able to follow the steps below with the default /etc/postgresql/12/main/pg_hba.conf settings.
Database and Role Creation
After saving this to a setup.sql file, it can be run with $ sudo -u postgres psql < setup.sql:
CREATE ROLE testadmin WITH LOGIN CREATEDB PASSWORD 'secret5';
CREATE DATABASE testdb OWNER testadmin;
CREATE ROLE _current-linux-user_ WITH LOGIN CREATEDB INHERIT;
GRANT pg_read_server_files TO testadmin;
GRANT pg_read_server_files TO _current-linux-user_;
GRANT testadmin to _current-linux-user_;
While playing around, it was useful to start everything from scratch, running $ sudo -u postgres psql < teardown.sql:
DROP DATABASE testdb;
DROP ROLE testadmin;
DROP ROLE _current-linux-user_;
Loading the Database From CSV Files
There's a reason we created a role with the same login name as the current user. It allows us to connect to the database by simply doing $ psql testdb which shows a prompt like testdb=>.
First we'll need the CSV files to populate the testdb database, I used the two below for an example food database. Be careful to not leave blank lines at the end of the files, otherwise there will be an ERROR: missing data for column.
categories.csv:
Category ID,Category Name
1,Fruit
2,Nut
3,Vegetable
4,Grain
5,Fungus
6,Alga
7,Seed
items.csv:
Food Name,Category,Nutrition
Peach,1,"Vitamin A, C, Potassium, Magnesium, Iron"
Brazil nut,7,"Iron, Calcium, Protein"
Broccoli,3,"Vitamin C, Magnesium"
Bean,4,"Magnesium, Iron, Calcium, Protein"
Mushroom,5,"Iron, Magnesium, Sodium, Protein"
Now we can run $ psql testdb < schema.sql:
CREATE SCHEMA food
CREATE TABLE food.categories (category_id integer PRIMARY KEY, category text)
CREATE TABLE food.items (id serial, name text, category_id integer REFERENCES food.categories (category_id), nutrition text);
/* Load data from CSV files into tables */
COPY food.categories(category_id, category)
FROM '/path/to/categories.csv' WITH (FORMAT csv, HEADER ON);
COPY food.items(name, category_id, nutrition)
FROM '/path/to/items.csv' WITH (FORMAT csv, HEADER ON);
/* Test */
SELECT * FROM food.categories;
SELECT * FROM food.items;
SELECT name,category FROM food.items INNER JOIN food.categories
ON food.items.category_id = food.categories.category_id;
Which results in the following output:
CREATE SCHEMA
COPY 7
COPY 5
category_id | category
-------------+-----------
1 | Fruit
2 | Nut
3 | Vegetable
4 | Grain
5 | Fungus
6 | Alga
7 | Seed
(7 rows)
id | name | category_id | nutrition
----+------------+-------------+------------------------------------------
1 | Peach | 1 | Vitamin A, C, Potassium, Magnesium, Iron
2 | Brazil nut | 7 | Iron, Calcium, Protein
3 | Broccoli | 3 | Vitamin C, Magnesium
4 | Bean | 4 | Magnesium, Iron, Calcium, Protein
5 | Mushroom | 5 | Iron, Magnesium, Sodium, Protein
(5 rows)
name | category
------------+-----------
Peach | Fruit
Brazil nut | Seed
Broccoli | Vegetable
Bean | Grain
Mushroom | Fungus
(5 rows)
Specifying the CSV format allows us to enclose the default delimiter, which is , in quotes inside a data column. The default loading is text which could lead to an ERROR: extra data after last expected column. Another source of this error is forgetting to include all the fields in the COPY command, e.g. nutrition.
Login With New User
What about the testadmin user we just created? We can connect to the database with a password as follows:
$ psql -U testadmin -d testdb -h localhost
Password for user testadmin:
psql (12.8 (Ubuntu 12.8-0ubuntu0.20.04.1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
testdb=>
If you forget to use -h localhost you'll likely get a psql: error: FATAL: Peer authentication failed for user "testadmin".
There is excellent documentation for the above commands on the official site.