How to process the csv file with 60000 rows and write in h2 database? - spring-batch

I have a csv file with 60000 lines. i like to read and process the file and write in h2 database.
The problem is ,it had only processed the first 1000 rows. How to overcome this problem ?

Related

Handling delimited files in Azure Data factory

I have got a very large table with around 28 columns and 900k records.
I converted it to CSV file (Pipe separated) and then tried to use that file for feeding another table using ADF itself.
When I tried to use that file, it keeps triggering an error saying some column datatype mismatch.
So excavating more into the data I have found few rows having Pipe (|) symbol in their text itself. So at the time coverting it back, the text after the pipe been considered for the next column and thus the error.
So how to handle the conversion into CSV efficiently when there are texts with delimiters in their columns.
Option1: If there is a possibility, I would suggest changing the delimiter to other than pipe(|), as the column value also contains pipe in its text.
Option2: In the CSV dataset, select a Quote character to identify the columns.
Step1: Copying data from table1 to CSV.
Source:
Sink CSV dataset:
Output:
Step2: Loading same CSV data to table2 with a copy activity.
CSV output file of Step1.
Source CSV dataset:
Sink dataset:
Output:

How to export a huge amount of data from Oracle SQL Developer in PSV Format and in multiple files

I have a huge dataset comprising of about 600000000 rows. I need to export all these rows in pipe separated values format, and data should be in multiple files instead of 1 huge file, with each file containing 100000000 rows.
I tried exporting this file with right click --> Export --> Delimited --> Multiple files, but there is no option to specify number of rows in each file, and data is exported in .dsv format not .psv format.
Is there any way to achieve this?

data losing while reading a file of huge size in spark scala

val data = spark.read
.text(filepath)
.toDF("val")
.withColumn("id", monotonically_increasing_id())
val count = data.count()
This code works fine when I am reading a file contains upto 50k+ rows.. but when a file comes with rows more than that , this code starts losing data.when this code reads a file having 1 million+ rows , the final datframe count only gives 65k+ rows data.
I can't understand where the problem is happening in this code and what needs to change in this code so that it will ingest every data in the final dataframe.
p.s - highest file this code will ingest , having almost 14 million + rows , currently this code ingests only 2 million rows out of them.
Seems related to How do I add an persistent column of row ids to Spark DataFrame?
i.e. avoid using monotonically_increasing_id and follow some of the suggestions from that thread.

How can I extract 7 million records from Oracle database to CSV file

I have already tried the export option in the sql developer,it is very time consuming.
I need to know if there is quicker way to extract data to CSV file.
enter image description here
I think you can try with Toad for oracle in DataGrid view then right click and export Data Set, you can start with a million with WHERE ROWNUM < 1000001

How tfileinputexcel works in talend

I'm trying to load xlsx file into postgresql database.
my excel file contiens 11262 row but after excuting the job i fond 14**** rows and i don't know why? I want only to find my 11262 rows in my table
here is the job
here is my excel file
I'm guessing you didn't check Stop reading on encountering empty rows, and that it's reading empty rows from your excel file.
Please try with checkbox checked.