How can I manage sqoop target dir and result files permissions - import

When I use sqoop import with target-dir parameter I have result in some folder with parts files and _SUCCESS file. How can I manage permission for this folder and files when I use sqoop. I know, we can change permissin after import, but I need to manage permission when I use only sqoop.
Ps. I am running sqoop from oozie workflow, probably I can use it to specify permissions.

Related

DB2 restore from encrypted back-up

I am trying to restore a DB2 database using an encrypted backup file. The backup zip file contains an .lst file, a .ddl file, over 3000 .ixf files, same number of message files and a folder with few .lob files in it.
I have tried using bind # list_file grant public after placing the .lst file and .ixf files in the /bind directory. But the error was that .ixf files could not be opened.
Any help appreciated.
What you have is not a backup (encrypted or otherwise) but the output from the db2move export command execution. Read the db2move documentation to learn how to perform the opposite operation.

Apache Spark Multiple sources found for csv Error

I'm trying to run my spark program using the spark-submit command (i'm working with scala), i specified the master adress, the class name, the jar file with all dependencies, the input file and then the output file but i'm having and error:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Multiple sources found for csv
(org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2,
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please
specify the fully qualified class name.;
Here is a screenshot for this error, What is it about? How can i fix it?
Thank you
Here you got some warnings also,
If you correctly run your fat-jar file with correct permissions you can get a output like this for ./spark-submit
Check whether if correctly set environmental variables for spark (~/.bashrc). Also check the source CSV file permissions. May be it will be the problem.
If you are running on linux environment set the folder permissions for the source CSV folder as
sudo chmod -R 777 /source_folder
After that again try to run ./spark-submit with your fat-jar file.

Azure Databricks: How to delete files of a particular extension outside of DBFS using python

I am able to delete a file of a particular extension from the directory /databricks/driver using the bash command in databricks.
%%bash
rm /databricks/driver/file*.xlsx
But I am unable to figure out, how to access and delete a file outside of dbfs in a python script,
I think using dbutils we cannot access files outside of DBFS and the below command outputs False as its looking in DBFS.
dbutils.fs.rm("/databricks/driver/file*.xlsx")
I am eager to be corrected.
Not sure how to do it using dbutils but I am able to delete it using glob
import os
from glob import glob
for file in glob('/databricks/driver/file*.xlsx'):
os.remove(file)

Seperate log path and dump path in expdp

I have to run multiple exports in one of the Oracle 12c databases for which I am using PAR files. Now I want to put the dump file and the log file in separate paths while using expdp.
Please guide me on how to achieve this
The DIRECTORY parameter
Specifies the default location to which Export can write the dump file set and the log file.
The DUMPFILE parameter and LOGFILE parameter each allow that default to overridden, by supplying an optional directory name:
DUMPFILE=[directory_object:]file_name [, ...]
LOGFILE=[directory_object:]file_name
So your parameter file needs to include those overriding directory names in the relevant parameters. Note that they still have to directory objects, defined in the database and which you have privileges against; you can't supply native operating systems paths directly in any of those parameters.

Import dataset into MongoDB

I am trying to insert this database into MongoDB using Studio 3T. I can import the bson without any issues (the countries and timezones) by selecting the parent folder and using BSON - mongodump folder option. However I cannot figure out how to import the split cities dataset.
I have tried all the options available on Studio3T and attempted to change the filename to gz however it always fails to import. I don't know what file format the cities are in.
Normally I do not have any issue importing but I cannot figure out how to do this. How would I achieve this?
The source DB is here https://github.com/VinceG/world-geo-data
This data is nothing but a big .bson file that has been gzipped up and split into various parts. I was not able to import the .bson file successfully. However, I could unzip the file at least without an error using the following commands and GZip for Windows
copy /b city_split_aa+city_split_ab+city_split_ac+city_split_ad+city_split_ae cities.bson.gz
gzip -d cities.bson