How to export a huge amount of data from Oracle SQL Developer in PSV Format and in multiple files - oracle-sqldeveloper

I have a huge dataset comprising of about 600000000 rows. I need to export all these rows in pipe separated values format, and data should be in multiple files instead of 1 huge file, with each file containing 100000000 rows.
I tried exporting this file with right click --> Export --> Delimited --> Multiple files, but there is no option to specify number of rows in each file, and data is exported in .dsv format not .psv format.
Is there any way to achieve this?

Related

Skip lines while reading csv - Azure Data Factory

I am trying to copy data from Blob to Azure SQL using data flows within a pipeline.
Data Files is in csv format and the Header is at 4th row in the csv file.
i want to use the header as is what is available in the csv data file.
I want to loop through all the files and upload data.
Thanks
Add a Surrogate Key transformation and then a Filter transformation to filter out row number 4.
You need to first uncheck the "First row as header" in your CSV dataset. Then you can use the "Skip line count" field in the copy data activity source tab and skip any number of lines you want.

How to automatically transfer data

I have thousands of csv files and they basically have 2 formats. One type of 2 formats is that in those csv files there are 100 rows and 2 columns. The other type of csv files has 50 columns and 5 rows. The numbers are given just to provide an example.
What I want to do is to write a Matlab code that will extract the complete second row of the csv files with the first format and make it the first row of the csv files with the second format. The number of the csv files with the first and second format is equal.
Any help is appreciated.

how to load specific row and column from an excel sheet through pyspark to HIVE table?

I have an excel file having 4 worksheets. Each worksheet has first 3 rows as blank, i.e. the data starts from row number 4 and that continues for thousands of rows further.
Note: As per the requirement I am not supposed to delete the blank rows.
My goals are below
1) read the excel file in spark 2.1
2) ignore the first 3 rows, and read the data from 4th row to row number 50. The file has more than 2000 rows.
3) convert all the worksheets from the excel to separate CSV, and load them to existing HIVE tables.
Note: I have the flexibility of writing separate code for each worksheet.
How can I achieve this?
I can create a Df to read a single file and load it to HIVE. But I guess my requirement would need more than that.
You could for instance use the HadoopOffice library (https://github.com/ZuInnoTe/hadoopoffice/wiki).
There you have the following options:
1) use Hive directly to read the Excel files and to CTAS to a table in CSV format
You would need to deploy the HadoopOffice Excel Serde
https://github.com/ZuInnoTe/hadoopoffice/wiki/Hive-Serde
then you need to create the table (see documentation for all the option, the example reads from sheet1 and skips the first 3 lines)
create external table ExcelTable(<INSERTHEREYOURCOLUMNSPECIFICATION>) ROW FORMAT SERDE 'org.zuinnote.hadoop.excel.hive.serde.ExcelSerde' STORED AS INPUTFORMAT 'org.zuinnote.hadoop.office.format.mapred.ExcelFileInputFormat' OUTPUTFORMAT 'org.zuinnote.hadoop.excel.hive.outputformat.HiveExcelRowFileOutputFormat' LOCATION '/user/office/files' TBLPROPERTIES("hadoopoffice.read.simple.decimalFormat"="US","hadoopoffice.read.sheet.skiplines.num"="3", "hadoopoffice.read.sheet.skiplines.allsheets"="true", "hadoopoffice.read.sheets"="Sheet1","hadoopoffice.read.locale.bcp47"="US","hadoopoffice.write.locale.bcp47"="US");
Then do CTAS into a CSV format table:
create table CSVTable ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' AS Select * from ExcelTable;
2) use Spark
Depending on the Spark version you have different options:
for Spark 1.x you can use the HadoopOffice fileformat and for Spark 2.x the Spark2 DataSource (the latter would also include support for Python). See howtos here

Multiple tables with different columns on single BIRT reports

I have a BIRT report with multiple tables with different datasets and number of columns on it. I generate output in .xls and convert into .csv using ssconvert utility on Unix. But in the .csv file I see extra delimiter for tables where there are fewer columns. For example, here is the .csv output with extra "," in .csv file:
table1-- this has only 10 columns
5912,,,0,,,0,,0,,,0,,,0,,,0,,,
tables2 --this has 20 columns
'12619493',28/03/2018 17:27:40,sdfsdfasd,'61901492478'1.08,,,1.08,sdfs,,dsf,,sdfadfs,'738331',,434,,,,,,,333,
I try to put grid but still I see extra ",". I have opened the .xls file and I see it has same issue. The cells in Excel are merged.

Excel 2010 - Pivot using external csv file - how to make dates work?

I have a set of pivot tables that use external csv files as their data sources. The csv files originally contained dates in the format dd/mm/yy (e.g. 31/01/13). The pivot tables did not recognise these as dates. I converted the dates in the csv files to dd/mm/yyyy (e.g. 31/01/2013) but these were still not recognised as dates by the pivot tables.
I tried setting up a calculated field =DATEVALUE(date_from_csv) but when used in the pivot table (I'm using the Max option to select the most recent date) I get #VALUE! errors.
I have tried converting the csv file to xlsx and also importing the data into the workbook that contains the pivot table - but I can't change from the external connection to use the internal data. I don't want to rebuild the pivots as there are a lot of variables and formatting that would take ages to redo.
Any ideas??
The problem was caused by the date column being blank for some rows and I found that if I moved a row to the top (after the header line) that had all the fields filled in, then Excel got the formats correct and the pivot tables now work!