merging rows in postgres with matching fields - postgresql

I have a table format that appears in this type of format:
email | interest | major | employed |inserttime
jake#example.com | soccer | | true | 12:00
jake#example.com | | CS | true | 12:01
Essentially, this is a survey application and users sometimes hit the back button to add new fields. I later changed the INSERT logic to UPSERT so it just updated the row where email=currentUsersEmail , however for the data inserted prior to this code change there are many duplicate entries for single users. i have tried some group by's with no luck, as it continually says the
ID column must appear in the GROUP BY clause or be used in an
aggregate function.
Certainly there will be edge cases where there may be clashing data, in this case maybe user entered true for the employed column and then in the second he/she could have enter false. For now I am not going to take this into account yet.
I simply want to merge or flatten these values into a single row, in this case it would look like:
email | interest | major | employed |inserttime
jake#example.com | soccer | CS | true | 12:01
I am guessing I would take the most recent inserttime. I have been writing the web application in scala/play, but for this task i think probably using a language like python might be easier if i cannot do it directly through psql.

You can GROUP BY and flatten using MAX():
SELECT email, MAX(interest) AS interest,
MAX(major) AS major,MAX(employed) AS employed,
MAX(inserttime) AS inserttime
FROM your_table
GROUP BY email

Related

How to filter rows based on a condition and if the condition isn't met, grab another row in Talend?

It was hard to think of a title for this question, so hopefully that did make sense.
I will explain further. I have a flow of data from an Excel file and each row has one of two words in the last column. It will either contain "Open" or "Current".
So lets say I have an input that looks like this:
NAME | SSN | TYPE
John | 12345| Current
Katy | 99999| Current
Sam | 33333| Current
John | 12345| Open
Cody | 55555| Open
And the goal is grab only a person once. Each person has their unique id as their SSN. I want to grab Open rows if both Open and Current exist for that person. If only Current exists, then grab that.
So the final output should look like this:
NAME | SSN | TYPE
Katy | 99999| Current
Sam | 33333| Current
John | 12345| Open
Cody | 55555| Open
NOTE: As you can see, the first entry for John has been removed since he had an Open row.
I have attempted this already but it is sloppy and I figure there must be a better way. Here is an image of what I have done:
Talend flow
Here's how you can do it:
First sort the data by Name, and Type descending (this is important so that for each person, the Open record is on the top); then in the tMap filter it like this:
Numeric.sequence(row2.name, 1, 1) == 1
Only let the record through if this is the first we're seeing this name.

Lookup in LibreOffice Calc to match partial string of the search criterion

I'm putting my banking information in a LibreOffice Calc spreadsheet.
I have columns as follows, that I imported from my bank account:
Date | USD | Description | Category
----------+-------+---------------------------------------------+----------
2/28/2019 | 44.00 | POS 0123 2345 123456 FRED-MEYER #02 | groceries
2/27/2019 | 2.50 | PANDA EXPRESS #123 TIGARD OR - 123546789012 | lunch
These descriptions are very unpredictable, they're whatever merchants want them to be, but they can tell me useful information about what I spent my money on. I've used this information to manually enter the categories. But I'm looking to automate this for common things.
So, I created a separate sheet with lookup values, like so:
Expression | Category
-----------+----------
FRED-MEYER | groceries
PANDA | lunch
I am looking for a formula in the "Category" column of the first sheet to automatically determine categories, based on the lookup table in the second sheet. (Obviously I don't plan for the lookup table to be exhaustive, but whatever I don't put in there, I can enter in the first sheet manually, thus overwriting the formulas.)
I had this working fine in Excel using a nifty construct of SEARCH and MATCH. (I don't even understand it anymore and I don't have Excel to check.) But since I'm now a Linux user, I'm trying to use LibreOffice, and I've not been able to make this work with formulas. I tried SEARCH, MATCH, LOOKUP, VLOOKUP, FIND, with and without regexes and with different options on/off. But no success so far.
I think this is very similar to this question, though it was only answered for Excel (I'm using Calc).

Is it possible to make multiple fields default to the same date, but also be individually editable?

I am VERY new to Access - I was sort of thrust into designing a database for a research project I'm involved in. So, please bear with me because I know next to nothing :) The problem I am having is thus:
My database is for a medical research project, and is very time and date dependent, by which I mean I need to capture the date and time for each piece of data so that we end up with a sort of timeline of events for each subject.
As is, I have something like the following for each piece of data: (Each in it's own field)
ArrivalDate
ArrivalTime
HeartRateDate
HeartRateTime
HeartRateData
TemperatureDate
TemperatureTime
TemperatureData
BloodPressureDate
BloodPressureTime
BloodPressureData
There are around 200 similar pieces of data that I need to collect for each patient. To avoid having to re-enter the same data over and over, and also to reduce the potential for error, I would like to have all of the date fields in a given patient record default to the first one that is entered, in this case "Arrival Date". However, I also need each date field to be editable without affecting the others. The reason for this is that in the event that a patient's visit occurs over the span of a few days we can accurately record that.
I have tried messing around with the default value setting, as well as setting the control source to reference the "Arrival Date" field, but then of course any changes to one field affect them all. I am not even sure that what I am trying to do is possible but I will appreciate any help and/or suggestions!
Thank you in advance
Having all this data in separate columns of a big table isn't going to work. You don't measure things like temperature or blood pressure only once per patient, do you?
This is a classic one-to-many relation.
You should have a separate Measurements table, looking e.g. like this:
+--------+-----------+---------------+------------------+-----------+
| MeasID | PatientID | MeasType | MeasDateTime | MeasValue |
+--------+-----------+---------------+------------------+-----------+
| 1 | 1 | Temperature | 2017-05-17 14:30 | 38.2 |
| 2 | 1 | BloodPressure | 2017-05-17 14:30 | 130/90 |
| 3 | 1 | Temperature | 2017-05-17 18:00 | 38.5 |
| 4 | 2 | Temperature | etc. | |
+--------+-----------+---------------+------------------+-----------+
As Barmar wrote, there is no reason to have separate columns for date and time.
In the form where measurements are entered, you can use the BeforeInsert event to set MeasDateTime to the current time, with the Now() function.
So the user never has to enter it manually, but they can edit it if the measurement was at a different time than entering the data.

Crystal Reports 2011 - Suppressing Information Based on Certain Criteria

I'm going to attempt to word this question without being too confusing.
We have a report we want to show each patient and their insurance. Each of the insurance in the patient's record is number by an Order Number. However, we don't only want to show that; I want to put in certain criteria so that if Insurance A has order number 1 under the Patient ID, show all of this patient's insurance. If the patient does not have Insurance A in Order Number 1, do not show this patient nor any of their information on the report.
In the code below, guarantor is referring to insurance. So order number and guarantor name is what we're focusing on. Here's the code I've put into Section Expert for the Suppress option. What I assume is if it meets the criteria, TRUE will suppress the information, else FALSE will allow the information. However, this is not sufficient as it suppresses all of the other information.
if {billing_guar_order_no_ep.guarantor_order_number} = "1" AND
{billing_guar_order_no_ep.guarantor_name} = "Medicare" then
false
else
true
What I'm assuming is it will need to iterate or loop through every patient and if it finds this information, list ALL of the patient's information and move forward, else suppress and move forward. I hope this makes sense.
Example:
|Patient ID|Order Number|Guarantor Name|
| -------- | ---------- | ------------ |
|1 | 1|Medicare |
|1 | 2|Medicaid |
|2 | 1|Medicaid |
|2 | 2|Medicare |
In the above example, what I want is for the report to show everything from Patient #1 (including all order numbers) and to not even show Patient #2 in the report. However, what's happening is Patient #1 does show up, but only Order Number 1; it suppresses all the other information.
What am I missing?
The query that you want will be an adaptation of this:
select *
from data d
where not exists (
select 1
from data
where pat_id=d.pat_id
and order_id=1
and guarantor_name='Medicaid'
)
The 'Linking Expert' doesn't support this syntax, so you'll need to use a Command instead.
Process:
get the current query by selecting Database | Show SQL Query ...
create a new report
select 'add command' from within the database expert
paste the query, then adapt it

Currency Conversion Logic for multiple rows

I have a question regarding the currency conversion logic. Currently, we return a result set which has got many currency fields. Now, the requirement says that the users should have an option to select the output currency format
eg:
Account name | Actual Amount | Estimated Amount | Target Amount
XYZ | $ 2000 | $ 456.78 | $ 890.45
ABC | SD 2000 | SD 456.78 | SD 890.45
if the user now selects Yen as the output format, the result set should be
Account name | Actual Amount | Estimated Amount | Target Amount
XYZ | ¥ 2233 | ¥ 42356.78 | ¥ 82390.45
ABC | ¥ 21213000 | ¥ 41156.78 | ¥ 82390.45
The exchange rate is available, and I know that we could have a function call in the select statement to convert the currency columns. But, making the function call for each record increases the execution time.
Is there any other logic that could be used to improve the execution time.
When you use a function to do conversions, you have to know that for every single record the function will be executed. So when only look at the overhead of executing a function, you'll have as much executions as you have records.
You can better make a conversion table, where you put the conversion rate in it, and join against that one. Make sure you have a column which holds the currency in both the conversion rate table and you original table, and you are good to go. In the select statement you output a formula, like rate*Actual Amount.