How to remove everything after a ',' in a Column in PostgreSQL - postgresql

I have a table with a column containing an address.
I want to remove everything after the , in the string.
How do I go about doing that in PostgreSQL?
I've tried using REPLACE, but that only works on specific strings, which is a problem because each row in the column would have a different address.
SELECT *
FROM address_book
r_name r_address
xxx 123 XYZ st., City, Zipcode
yyy 333 abc road, City, Zipcode
zzz 222 qwe blvd, City, Zipcode
I'm need column r_address to only return:
123 XYZ st.
333 abc road
222 qwe blvs

Use the split_part function, like so:
SELECT r_name, split_part(r_address, ',', 1) AS street
FROM address_book
Docs: https://www.postgresql.org/docs/current/functions-string.html
Fiddle: http://sqlfiddle.com/#!17/51afe/1

Related

Updating a specific word in a column

How could I update the St. to Street on this column? I am getting a hard time on how to figure this out
Address
125 Center St, New York City, NY 10001
68 Hickory St, Seattle, WA 98101
I am trying to update one word on a column which is the St. to Street
I would use an expression like
regexp_replace(col, '\mSt\M\.?', 'Street', 'g')

Postgres | Get all ids after a promo code is used

I'm trying to get all order ids that used a specific promo code (ABC123). However, I want to see all subsequent orders, rather than just all the ids. For example, if we have the following table:
Account_id
order_id
promo_code
1
123
NULL (no promo code used)
2
124
ABC123
3
125
HelloWorld!
2
125
NULL
1
126
ABC123
2
127
HelloWorld!
3
128
ABC123
Ideally, what I want to get is this (ordered by account_id):
Account_id
order_id
promo_code
1
126
ABC123
2
124
ABC123
2
125
NULL
2
127
HelloWorld!
3
128
ABC123
As you can see promo_code = ABC123 is like a placeholder in which all once that ID is found, I want all preceding order_ids.
So far to filer all the account_ids that used this promo_code is:
SELECT account_ids, order_id, promo_code
FROM orders
WHERE account_id IN (SELECT account_id FROM order WHERE promo_code = 'ABC123');
This allows me to get the account_ids that have an order where the desired promo_code was used.
Thanks in advance!
Extract all account_id-s that used 'ABC123' and the smallest corresponding order_number-s (the t CTE) then join these with the table and filter/order the result set.
with t as
(
select distinct on (account_id) account_id, order_id
from the_table where promo_code = 'ABC123'
order by account_id, order_id
)
select the_table.*
from the_table
inner join t on the_table.account_id = t.account_id
where the_table.order_id >= t.order_id -- the subsequent orders
order by the_table.account_id, the_table.order_id;
SQL Fiddle

PostgreSQL: Count Number of Occurrences in Columns

BACKGROUND
I have three large tables (employee_info, driver_info, school_info) that I have joined together on common attributes using a series of LEFT OUTER JOIN operations. After each join, the resulting number of records increased slightly, indicating that there are duplicate IDs in the data. To try and find all of the duplicates in the IDs, I dumped the ID columns into a temp table like so:
Original Dump of ID Columns
first_name
last_name
employee_id
driver_id
school_id
Mickey
Mouse
1234
abcd
wxyz
Donald
Duck
2423
heca
qwer
Mary
Poppins
1111
acbe
aaaa
Wiley
Cayote
1234
strf
aaaa
Daffy
Duck
1256
acbe
pqrs
Bugs
Bunny
9999
strf
yxwv
Pink
Panther
2222
zzzz
zzaa
Michael
Archangel
0000
rstu
aaaa
In this overly simplified example, you will see that IDs 1234 (employee_id), strf (driver_id), and aaaa (school_id) are each duplicated at least once. I would like to add a count column for each of the ID columns, and populate them with the count for each ID used, like so:
ID Columns with Counts
first_name
last_name
employee_id
employee_id_count
driver_id
driver_id_count
school_id
school_id_count
Mickey
Mouse
1234
2
abcd
1
wxyz
1
Donald
Duck
2423
1
heca
1
qwer
1
Mary
Poppins
1111
1
acbe
1
aaaa
3
Wiley
Cayote
1234
2
strf
2
aaaa
3
Daffy
Duck
1256
1
acbe
1
pqrs
1
Bugs
Bunny
9999
1
strf
2
yxwv
1
Pink
Panther
2222
1
zzzz
1
zzaa
1
Michael
Archangel
0000
1
rstu
1
aaaa
3
You can see that IDs 1234 and strf each have 2 in the count, and aaaa has 3. After generating this table, my goal is to pull out all records where any of the counts are greater than 1, like so:
All Records with One or More Duplicate IDs
first_name
last_name
employee_id
employee_id_count
driver_id
driver_id_count
school_id
school_id_count
Mickey
Mouse
1234
2
abcd
1
wxyz
1
Mary
Poppins
1111
1
acbe
1
aaaa
3
Wiley
Cayote
1234
2
strf
2
aaaa
3
Bugs
Bunny
9999
1
strf
2
yxwv
1
Michael
Archangel
0000
1
rstu
1
aaaa
3
Real World Perspective
In my real-world work, the JOIN'd table contains 100 columns, 15 different ID fields and over 30,000 records, and the final table came out to be 28 more than the original. This may seem like a small amount, but each of the 28 represent a broken link that we must fix.
Is there a simple way to get the counts populated like in the second table above? I have been wrestling with this for hours already, and have not been able to make this work. I tried some aggregate functions, but they cannot be used in table UPDATE operations.
The COUNT function, when used as an analytic function, can do what you want here, e.g.
WITH cte AS (
SELECT *,
COUNT(employee_id) OVER (PARTITION BY employee_id) employee_id_count,
COUNT(driver_id) OVER (PARTITION BY driver_id) driver_id_count,
COUNT(school_id) OVER (PARTITION BY school_id) school_id_count
FROM yourTable
)
SELECT *
FROM cte
WHERE
employee_id_count > 1
driver_id_count > 1
school_id_count > 1;

Merge two hive table(Different column size)- pyspark

I have one hive table with schema
Name ,Contact,Address,Subject
Name Contact Address Subject
abc 1111 Mumbai maths
egf 2222 nashik science
pqr 3333 delhi history
And other table with schema **Name ,Contact**
Name Contact
xyz 4444
mno 2222
Expected Output
Name Contact Address Subject
abc 1111 Mumbai maths
pqr 3333 delhi history
xyz 4444 null null
mno 2222 nashik science
I have tried join operation but not able get correct output
Use full join:
select coalesce(t2.name,t1.name) as name,
coalesce(t2.contact, t1.contact) as contact,
t1.address, t1.subject
from table1 t1
full join table2 t2
on t1.contact=t2.contact

Select a specific row from a table with duplicated entries based on one field

I have a table which holds data in the following format, however I would like to be able to create a query that checks whether the reference number is duplicated and only return the entry with the latest date_issued.
ref_no name gender place date_issued
xgb/358632/p John Smith M London 02.08.2016
Xgb/358632/p John Smith M London 14.06.2017
Rtu/638932/k Jane Doe F Birmingham 04.09.2017
The result from the query should be;
ref_no name gender place date_issued
Xgb/358632/p John Smith M London 14.06.2017
Rtu/638932/k Jane Doe F Birmingham 04.09.2017
Is there a fairly straightforward solution for this?
assuming the date column is type date or timestamp
select distinct on(ref_no) * from tablename order by refno,date desc;
this works beacuse distinct on supresses rows with duplicates of the expression in parenthese.