How to avoid commas for a specific column in csv file

How to avoid commas for a specific column in csv file - talend

I have similar question in input csv file. I’m currently loading data from csv file to DB. Getting wrong data in target table not sure how to ignore commas
I have below input
Col1,col2,col3
1,2,3,4
Output should be populated as
Col1 col2 col3 col4
1 2 3 ,4
3,4 should be populated in col3
Instead I’m getting data like below. It has not populated like above. Can someone please help me.Not sure how to do it in Talend.
Col1 col2 col3 col4
1 2 3 4
3,4 data was not populating in same column not sure how to ignore ,comma for 3 and 4

Use Text enclosure character in source file
Col1,col2,col3
"1","2","3,4"
You may also use different escape characters or even delimiter. see "CSV Options" in Component tFileInputDelimited

Related

spark: split only one column in dataframe and keep remaining columns as it is

I am reading the file in spark dataframe.
In the first column, I will get two values concatenated with "_".
I need to split the first column into two columns and keep the remaining columns as it is. I am using Scala with Spark
For example:
col1 col2 col3
a_1 xyz abc
b_1 lmn opq
I need to have new DF as:
col1_1 col1_2 col2 col3
a 1 xyz abc
b 1 lmn opq
only one column needs to be split into two columns.
I tried with split function with df.select but I need to write the select for remaining columns and considering different files with 100's of columns and I want to use the reusable code for all files.

you can do something like:
import spark.implicits._
df.withColumn("_tmp", split($"col1", "_"))
.withColumn("col1_1", $"_tmp".getItem(0))
.withColumn("col1_2", $"_tmp".getItem(1))
.drop("_tmp")

Row to Column conversion in Talend

I am learning Talend Open studio. I want to implement the scenario where a row converts into 3 rows. My Source is like
Col1 Col2 Col3
a b c
I want to get the output like below
Col
a
b
c
I have used tcolumntopivotdelimited but failed.

Here is the solution :
In your tmap you need to concat with a ";" for example and normalize the resulted column with the good delimiter

T-SQL - Insert Statement Ignoring a Column

I have a table with a dozen or so columns. I only know the name of the first column and the last 4 columns (in theory, I could only know the name of one column and not its position)
How can I can I write a statement which ignores this column? At the moment I do various column counts in ASP and construct a statement that way but was wondering if there was an easier way
UPDATE
INSERT INTO tblName VALUES ("Value for col2", "Value for col3")
but the table has a col4 and potentially more, which I'd be ignoring.
I basically have a CSV file. This CSV file has no headers. It has 'X' less columns than the table I'm inserting into. I would like to insert the data from the CSV into the table.
There are many tables of different structures and many CSV files. I have created a ASP page to take any CSV and upload it to the corresponding table (based on a parameter within the CSV file).
It works fine, I was just wondering that when I was doing the INSERT statement, if I could ignore certain columns and cut down on my code.
So let's say the CSV has data as follows
123 | 456 | 789
234 | 567 | 873
The table has a structure of
ID | Col1 | Col2 | Col3 | Col4 | Col5
I currently construct an insert statement that says
INSERT into tblName ("123", "456","789","","")
However I was wondering if there was a way I could omit the empty values by somehow "ignoring" the columns. As mentioned, the column names are not known apart from the ones I have no data for.

There is no Sql shortcut for
Select * (except column col1) from ...
You have to construct your Sql from database metadata, like you already did if I understood you correctly.

You can specify the columns that you want to insert.
So instead of...
INSERT INTO tblName VALUES ("Value for col2", "Value for col3")
You could specify column names...
INSERT INTO tblName (ColumnName1, ColumnName2) VALUES ("Value for col2", "Value for col3")

Split a column value into two columns in query output (DB2)

How can I split a column value into two values in the output? I need have the numerals in one column and the alphabet in the other.
For Example 1
Existing
Column
========
678J
2345K
I need the output to be:
Column 1 Column 2
======== ========
678 J
2345 K
The existing column can have 4 or 5 characters, as shown in the example. There is no space.
Thanks in advance!!

You could convert all letters to spaces & strip them away, then do the opposite with digits in the other column:
SELECT trim(translate(mycol,repeat(' ',26),'ABCDEFGHIJKLMNOPQRSTUVWXYZ')) as col1,
trim(translate(mycol,repeat(' ',10),'0123456789')) as col2
FROM mytable
Adjust as necessary to translate additional characters.

I am not sure about the performance of WarrenT's solution, but it looks like very heavy solution. It does what it is supposed to be doing with little constraints on the the data. If you know more about the data, you can optimize.
String always ends with 1 and only one letter
select left(mycol, length(mycol)-1), right(mycol,1) from mytable

Duplicate values returned with joins

I was wondering if there is a way using TSQL join statement (or any other available option) to only display certain values. I will try and explain exactly what I mean.
My database has tables called Job, consign, dechead, decitem. Job, consign, and dechead will only ever have one line per record but decitem can have multiple records all tied to the dechead with a foreign key. I am writing a query that pulls various values from each table. This is fine with all the tables except decitem. From dechead I need to pull an invoice value and from decitem I need to grab the net wieghts. When the results are returned if dechead has multiple child decitem tables it displays all values from both tables. What I need it to do is only display the dechad values once and then all the decitems values.
e.g.
1 ¦123¦£2000¦15.00¦1
2 ¦--¦------¦20.00¦2
3 ¦--¦------¦25.00¦3
Line 1 displays values from dechead and the first line/Join from decitems. Lines 2 and 3 just display values from decitem. If I then export the query to say excel I do not have duplicate values in the first two fileds of lines 2 and 3
e.g.
1 ¦123¦£2000¦15.00¦1
2 ¦123¦£2000¦20.00¦2
3 ¦123¦£2000¦25.00¦3
Thanks in advance.

Check out 'group by' for your RDBMS http://msdn.microsoft.com/en-US/library/ms177673%28v=SQL.90%29.aspx

this is a task best left for the application, but if you must do it in sql, try this:
SELECT
CASE
WHEN RowVal=1 THEN dt.col1
ELSE NULL
END as Col1
,CASE
WHEN RowVal=1 THEN dt.col2
ELSE NULL
END as Col2
,dt.Col3
,dt.Col4
FROM (SELECT
col1, col2, col3
,ROW_NUMBER OVER(PARTITION BY Col1 ORDER BY Col1,Col4) AS RowVal
FROM ...rest of your big query here...
) dt
ORDER BY dt.col1,dt.Col4

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to avoid commas for a specific column in csv file - talend

Use Text enclosure character in source file Col1,col2,col3 "1","2","3,4" You may also use different escape characters or even delimiter. see "CSV Options" in Component tFileInputDelimited

Related

spark: split only one column in dataframe and keep remaining columns as it is

Row to Column conversion in Talend

T-SQL - Insert Statement Ignoring a Column

Split a column value into two columns in query output (DB2)

Duplicate values returned with joins

Categories

Resources