using diff to compare two csv and extract value that exist one one csv but not the other - diff

i have two csv files with one column as shown in the example below. how can i use diff or any other methods to output value that exist in CSV1 and not CSV 2?
CSV1:
a
b
c
d
e
CSV2:
a
a
a
b
b
b
e
e
e
Expected results:
d
c
Thanks.

This question would be very easy to answer if you had both columns of data as separate tables in a database. In the demo below, I simply left join the first table to the second one to arrive at the answer you want:
SELECT t1.value
FROM table1 t1
LEFT JOIN table2 t2
ON t1.value = t2.value
WHERE t2.value IS NULL;
Demo
By the way, the expected output is both c and d, because these letters appear in the first csv but not the second.

Related

SQL with table as becomes ambiguous

Perhaps I'm approaching this all wrong, in which case feel free to point out a better way to solve the overall question, which "How do I use an intermediate table for future queries?"
Let's say I've got tables foo and bar, which join on some baz_id, and I want to use combine this into an intermediate table to be fed into upcoming queries. I know of the WITH .. AS (...) statement, but am running into problems as such:
WITH foobar AS (
SELECT *
FROM foo
INNER JOIN bar ON bar.baz_id = foo.baz_id
)
SELECT
baz_id
-- some other things as well
FROM
foobar
The issue is that (Postgres 9.4) tells me baz_id is ambiguous. I understand this happens because SELECT * includes all the columns in both tables, so baz_id shows up twice; but I'm not sure how to get around it. I was hoping to avoid copying the column names out individually, like
SELECT
foo.var1, foo.var2, foo.var3, ...
bar.other1, bar.other2, bar.other3, ...
FROM foo INNER JOIN bar ...
because there are hundreds of columns in these tables.
Is there some way around this I'm missing, or some altogether different way to approach the question at hand?
WITH foobar AS (
SELECT *
FROM foo
INNER JOIN bar USING(baz_id)
)
SELECT
baz_id
-- some other things as well
FROM
foobar
It leaves only one instance of the baz_id column in the select list.
From the documentation:
The USING clause is a shorthand that allows you to take advantage of the specific situation where both sides of the join use the same name for the joining column(s). It takes a comma-separated list of the shared column names and forms a join condition that includes an equality comparison for each one. For example, joining T1 and T2 with USING (a, b) produces the join condition ON T1.a = T2.a AND T1.b = T2.b.
Furthermore, the output of JOIN USING suppresses redundant columns: there is no need to print both of the matched columns, since they must have equal values. While JOIN ON produces all columns from T1 followed by all columns from T2, JOIN USING produces one output column for each of the listed column pairs (in the listed order), followed by any remaining columns from T1, followed by any remaining columns from T2.

How to take an intermediate data for the complex sql query. Postgresql

I have some complex queries to the postgresql which takes data from several tables joined each other with outer left join operators.
I need to test these queries so I need a fixtures for the tests contain only data I need, not whole tables data.
How could I see the intermediate results for these join subqueries to use it as a fixtures?
For example, I have tables A, B and C and query
SELECT A.column
FROM A
LEFT JOIN B ON A.b_id = B.id
LEFT JOIN C ON A.c_id = C.a_id
How could I take a result as "From table a: {part of A table taking part on query}, From table B {part of B table taking part on query}" etc, when parts of tables shows needed data or something like this. Is there any existing tool or method for it?
Unfortunately, EXPLAIN and ANALYSE shows only statistics and benchmarks, not data.
maybe you mean
SELECT A.*
FROM A
LEFT JOIN B ON A.b_id = B.id
LEFT JOIN C ON A.c_id = C.a_id
limit 10
to see what's happening in A from the join?
Or perhaps
select concat('from table a', a.col1, a.col2...) ,
concat('from table b', b.col1, b.col2...)
from ...
String functions such as concat: http://www.postgresql.org/docs/9.1/static/functions-string.html
also worth looking into http://www.postgresql.org/docs/9.1/static/functions-array.html at array_append()

kdb Update entire column with data from another table

I have two partitioned tables. Table A is my main table and Table B is full of columns that are exact copies of some of the columns in Table A. However, there is one column in Table B that has data I need- because the matching column in Table A is full of nulls.
I would like to get rid of Table B completely, since most of it is redundant, and update the matching column in Table A with the data from the one column in Table B.
Visually,
Table A: Table B:
a b c d a b d
__________________ ______________
1 null 11 A 1 joe A
2 null 22 B 2 bob B
3 null 33 C 3 sal C
I want to fill the b column in Table A with the values from the b column in Table B, and then I no longer need Table B and can delete it. I will have to do this repeatedly since these two tables are given to me daily from two separate sources.
I cannot key these tables, since they are both partitioned.
I have tried:
update columnb:(exec columnb from TableB) from TableA;
but I get a `length error.
Suggestions on how to approach this in any manner are appreciated.
To replace a column in memory you would do the following.
t1:([]a:1 2 3;b:0N)
a b
---
1
2
3
t2:([]c:`aa`bb`cc;b:5 6 7)
c b
----
aa 5
bb 6
cc 7
t1,'t2
a b c
------
1 5 aa
2 6 bb
3 7 cc
If you are getting length errors then the columns do not have
the same count and the following would solve it. The obvious
problem with this solution is that it will start to repeat
data if t2 has a lower column count that t1. You will have to find out why that is.
t1,'count[t1]#t2
Now for partitions, you will use the amend function to change
the the b column of partitioned table, table A, at date 2007.02.23 (or whatever date your partition is).
This loads the b column of tableB into memory to preform the amend. You must perform the amend for each partition.
#[`:2007.02.23/tableA/;`b;:;count[tableA]#exec b from select b from tableB where date=2007.02.23]

Issue with the count in PostgreSQL

I want the count of the one column and I have 5 columns in FROM clause but it is giving wrong count as I have included all my columns that are in the from clause. I don't want that particular column in the GROUP BY clause.
If I remove that column from GROUP BY clause it throws the following error:
ERROR: column "pt.name" must appear in the GROUP BY clause or be used
in an aggregate function LINE 1: SELECT distinct on (pu.id) pu.id,
pt.name as package_name, c...
E.g.:
SELECT DISTINCT ON (a) a,b,c,count(d),e
FROM table GROUP BY a,b,c,d,e ORDER BY a
From this I want to remove e from the GROUP BY.
How can I remove that column from GROUP BY so that I can get correct count?
Updated after rereading the question.
You are mixing GROUP BY and DISTINCT ON. What you want (how I understand it) can be done with a window function combined with a DISTINCT ON:
SELECT DISTINCT ON (a)
a, b, c
, count(d) OVER (PARTITION BY a, b, c) AS d_ct
, e
FROM tbl
ORDER BY a, d_ct DESC;
Window functions require PostgreSQL 8.4 ore later.
What happens here?
Count in d_ct how many identical sets of (a,b,c) there are in the table with non-null values for d.
Pick exactly one row per a. If you don't ORDER BY more than just a, a random row will be picked.
In my example I ORDER BY d_ct DESC in addition, so a pseudo-random row out of the set with the highest d_ct will be picked.
Another, slightly different interpretation of what you might need, with GROUP BY:
SELECT DISTINCT ON (a)
a, b, c
, count(d) AS d_ct
, min(e) AS min_e -- aggregate e in some way
FROM t
GROUP BY a, b, c
ORDER BY a, d_ct DESC;
GROUP BY is applied before DISTINCT ON, so the result is very similar to the one above, only the value for e / min_e is different.

How to select distinct-columns along with one nondistinct-column in DB2?

I need to perform distinct select on few columns out of which, one column is non-distinct. Can I specify which columns make up the distinct group in my SQL statement.
Currently I am doing this.
Select distinct a,b,c,d from TABLE_1 inner join TABLE_2 on TABLE_1.a = TABLE_2.a where TABLE_2.d IS NOT NULL;
The problem I have is I am getting 2 rows for the above SQL because column D holds different values. How can I form a distinct group of columns (a,b&c) ignoring column d, but have column d in my select clause as well?
FYI: I am using DB2
Thanks
Sandeep
SELECT a,b,c,MAX(d)
FROM table_1
INNER JOIN table_2 ON table_1.a = table_2.a
GROUP BY a,b,c
Well, your question, even with refinements, is still pretty general. So, you get a general answer.
Without knowing more about your table structure or your desired results, it may be impossible to give a meaningful answer, but here goes:
SELECT a, b, c, d
FROM table_1 as t1
JOIN table_2 as t2
ON t2.a = t1.a
AND t2.[some_timestamp_column] = (SELECT MAX(t3.[some_timestamp_column])
FROM table_2 as t3
WHERE t3.a = t2.a)
This assumes that table_1 is populated with single rows to retrieve, and that the one-to-many relationship between table_1 and table_2 is created because of different values of d, populated at unique [some_timestamp_column] times. If this is the case, it will get the most-recent table_2 record that matches to table_1.