Read Parquet files directly while using Postgres - postgresql

I am trying to install an extension to postgres that will help me write postgres queries to read data directly from parquet files.
This is the extension I found - https://github.com/pgspider/parquet_s3_fdw
After installing the required dependencies I went ahead and tried running the 'make' command.
make install
But ends up with an error
Makefile:45: /contrib/contrib-global.mk: No such file or directory
make: *** No rule to make target '/contrib/contrib-global.mk'. Stop.
Has anyone else tried using this extension ? Or can you suggest me some other way to read data directly from parquet files while using postgres ? (Please note: conversion from parquet to any other format is not allowed under the circumstances that I'm trying this)
Thanks

I'm not sure about the error there but the FDW you referenced is for accessing parquet files on S3 which you didn't mention as a requirement. You might want to try a simpler version like https://github.com/adjust/parquet_fdw

Related

Error while importing .csv file int mongodb

while importing a fairly big .csv file into mongodb using mongoimport, i have rectified and corrected every error except one that says: open "some.csv file" access is denied. not able to understand where i went wrong here. access is open to this file though.
Thanks
Mongoimport is probably trying to acquire a write lock on the file - and the file is already opened or you have insufficient rights - if intrusive alternatives are not an option, simply copying the file should suffice.

Pentaho PDI - Reading data from MongoDB

I have installed Pentaho Data Integration version (ce-5.0.1.A-stable) in my machine and I am trying to retrive information from MongoDB using PDI. I have created a transformation with Mongo Input step. Now when I try to configure my MongoDB connection details, I couldnt find any explicit connection Type for MongoDB. Could someone please advise on how to configure MongoDB datasource in Pentaho.
I have referred most of the Pentaho-MongoDb docs, but none of the solution works out.
Also, I have tried performing below steps as mentioned in Pentaho Official site, but still I couldnt find any connection Type for MongoDB
1- Move the following folder out of the data-integration folder structure:
data-integration/plugins/pentaho-big-data-plugin
2- Move the following files out of the data-integration folder structure if they exist:
data-integration/libext/JDBC/pentaho-hadoop-hive-jdbc-shim-1.3.0.jar
data-integration/libext/JDBC/pentaho-hadoop-hive-jdbc-shim-1.3.1.jar
data-integration/libext/JDBC/pentaho-hadoop-hive-jdbc-shim-1.3.2.jar
3- Unzip the file pentaho-big-data-plugin-shimtastic-1.3.3.1.zip from the data-integration/plugins folder.
4- Optionally, remove irrelevant folders under data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations.
5- Copy the file pentaho-hadoop-hive-jdbc-shim-1.3.3.jar into the folder
data-integration/libext/JDBC
6- Unzip the file pentaho-instaview-templates-shimtastic-1.3.3.zip to the following directory to
data-integration/plugins/spoon/agile-bi/platform/pentaho-solutions/system/instaview/templates/Big Data
Any help is really appreciated..!
Pentaho doesnot have a specific database connection for MongodB. So you will not find it in the Database Connection viewer. The way to connect to Mongodb is to use Mongodb Input step in PDI. There you will find the connection details section (configure credentials). You can then connect JSON Input step to read the results of your mongodb output. Check the below screenshot:
You can also read it from the Pentaho Wiki in here. Though the documentation seems to be slightly old, but it is the exact process to do it.
On a note you don't need Bigdata shims to connect to mongodb. It seems you have configured the hadoop-hive shims. It not required in here.
Hope it helps :)

PostgreSQL 9.x - pg_read_binary_file & inserting files into bytea

I have been looking everywhere (google, stackoverflow, etc.) for some documentation on how to use the PostgreSQL pg_read_binary_file() function.
The only meaningful thing I can find is this page in the official documentation.
Every time I try to use this function I get an error.
For example:
SELECT pg_read_binary_file('/some/path/and/file.gif');
ERROR: absolute path not allowed
or
SELECT pg_read_binary_file('file.gif');
ERROR: could not stat file "file.gif": No such file or directory
Do I need to have my file in a specific directory for Postgres to have access to it? If so what directory?
If it matters, the reason I am looking at this function is because I am trying to insert a file into the database without doing crazy things.
As stated by #a_horse_with_no_name and #guedes the solution is to ensure that the file being uploaded is on the server in the PGDATA directory.
The postgres documentation does state the file location as a requirement.
Additionally, I made a symlink from another directory to the PGDATA directory so that I would not disturb any of the postgres data structure. This seems to be working well and I don't have to do any of the above crazy things.

HBase Export/Import: Unable to find output directory

I am using HBase for my application and I am trying to export the data using org.apache.hadoop.hbase.mapreduce.Export as it was directed here. The issue I am facing with the command is that once the command is executed, there are no errors while creating the export. But the specified output directoy does not appear at its place.The command I used was
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export table_name db_dump/
I got the solution hence I am replying my own answer
You must have following two lines in hadoop-env.sh in conf directory of hadoop
export HBASE_HOME=/home/sitepulsedev/hbase/hbase-0.90.4
export HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.90.4.jar:$HBASE_HOME/conf:$HBASE_HOME/hbase-0.90.4-test.jar:$HBASE_HOME/lib/zookeeper-3.3.2.jar:$HBASE_HOME
save it and restart mapred by ./stop-mapred.sh and ./start-mapred.sh
now run in bin directory of hadoop
./hadoop jar ~/hbase/hbase-0.90.4/hbase-0.90.4.jar export your_table /export/your_table
Now you can verify the dump by hitting
./hadoop fs -ls /export
finally you need to copy the whole thing into your local file system for which run
./hadoop fs -copyToLocal /export/your_table ~/local_dump/your_table
here are the References that helped me out in export/import and in hadoop shell commands
Hope this one helps you out!!
As you noticed the HBase export tool will create the backup in the HDFS, if you instead want the output to be written on your local FS you can use the file URI. In your example it would be something similar to:
bin/hbase org.apache.hadoop.hbase.mapreduce.Export table_name file:///tmp/db_dump/
Related to your own answer, this would also avoid going through the HDFS. Just be very careful if your are running this is a cluster of servers, because each server will write the result files in their own local file systems.
This is true for HBase 0.94.6 at least.
Hope this helps
I think the previous answer needs some modification:
Platform: AWS EC2,
OS: Amazon Linux
Hbase Version: 0.96.1.1
Hadoop Distribution: Cloudera CDH5.0.1
MR engine: MRv1
To export data from Hbase Table to local filesystem:
sudo -u hdfs /usr/bin/hbase org.apache.hadoop.hbase.mapreduce.Export -Dmapred.job.tracker=local "table_name" "file:///backups/"
This command will dump data in HFile format with number of files equaling the number of regions of that table in Hbase.

PostgreSQL issue: could not access file "$libdir/plpgsql": No such file or directory

I get this exception in PostgreSQL:
org.postgresql.util.PSQLException: ERROR: could not access file "$libdir/plpgsql": No such file or directory
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1721)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1489)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:193)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:337)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:236)
at org.apache.commons.dbcp.DelegatingStatement.executeQuery(DelegatingStatement.java:205)
I searched a lot and most solution points to a wrong installation. But this is my test db which has been running without issues for a long time. Also inserts are working. Issue occurs only on select queries.
Apparently, you moved your PostgreSQL lib directory out of place. To confirm this, try the following in psql:
> SET client_encoding TO iso88591;
ERROR: could not access file "$libdir/utf8_and_iso8859_1": No such file or directory
If you get an error message like this, then my theory is correct. You'll need to find out where those files ended up, or you can reinstall PostgreSQL to restore them.
To find out what $libdir is referring to, run the following command:
pg_config --pkglibdir
For me, this produces:
/usr/lib/postgresql
I have the same problem: the other postgres server instance (8.4) was interfering with the 9.1 one; when the 8.4 instance is removed it works.
the other instance can sometimes be removed from the system while still running (e.g. you do a gentoo update and a depclean without stopping and migrating your data). so the error seems particularly mysterious.
the solution is usually going to be doing a slot install/eselect of the old version (in gentoo terms, or simply downgrading on other distros), running its pg_dumpall, and then uninstalling/reinstalling the new version and importing the data.
this worked pretty painlessly for me