Trying to find the last character in a non-standard length string - postgresql

I'm a relatively new coder and I've been struggling with the following problem for a few days. I am trying to separate the characters after the last period in an email address so I can group results by them. To do this for the text after the # symbol, I wrote the following code:
select lower(substring(email, position('#' in email))) as email
This code returns things like #gmail.com or #yahoo.com, which I can then group by in my longer query. However, I would also like to compare the .com results to the .net results. When I type a similar query:
select lower(substring(email, position('.' in email))) as email
it returns the first period in the email address. So my email would be returned as .lastname#gmail.com rather than .com. I've experimented with right( and left(, but these don't work with substring in Postgresql. Does anyone have any other suggestions? Thanks!

Try this. The trick was using reverse to find where that last period was.
SQL Fiddle Example
select
substring(email, char_length(email) - position('.' in (reverse(email))) + 1) as Domain
from yourTable

Try something like:
select regexp_matches(email, '[^.]+$', 'g')
from your_table

Related

Why does T-SQL think a an item is a column name?

Note: I am just starting with SQL/T-SQL and I am not searching for a more elegant solution, just for a fix.
So I'm trying to use the Stack Exchange Data Explorer to count how many times a badge has been awarded. My query goes something like this:
SELECT COUNT(Id)
FROM Badges
WHERE Name = "Scholar"
The results should be the number of times the Badge has been awarded. It, however, returns this error:
Line 3: Invalid column name 'Scholar'.
Replace double quote to single quote
SELECT COUNT(Id)
FROM Badges
WHERE Name = 'Scholar'

Grouping By with missing data

Image of Data and desired result:
I'm trying to aggregate volunteer hours from a Google spreadsheet a non-profit I volunteer for. We collect volunteer e-mail information and the time that each volunteer has contributed. Each volunteer only puts in their e-mail the first time. I've found examples online on how to send e-mails, but I'm having trouble aggregating the data. I think the trouble might be that not every row has an e-mail address associated with it.
I've been able to get the sum of hours worked by volunteer using QUERY(data, "select A, sum(C) Group By A", ) but can't figure out how to get the e-mail associated with each individual.
Thanks for the advice! The VLOOKUP and ArrayFormula functions were new to me. Here's how I solved it:
QUERY(data, "select A, B where B <>'' ", -1)
This allowed me to get the Key-Value pair (Name, Email) for each volunteer (solving the problem of people who volunteered multiple times, but only left their e-mail once). From there, I was able to generate the 'Name:Hours Worked' table off to the right with:
QUERY(data, "select A, sum(C) Group By A", ).
Then, I used VLOOKUP to query my Name-Email table to get the desired result of:
Name-Email-aggregatedHours
Thanks!
You can't achieve this with query. But you could apply vlookup to sorted table:
=ArrayFormula(VLOOKUP(UNIQUE(FILTER(A2:A,A2:A<>"")),SORT(A2:B,2,0),2,0))
and get email list for unique names.
First, clean up your data. You shoud be certain that at least one column has no typos an that this column appropiate identify which data corresponds to each volunteer. This is called key value. This also could be done by, but not limited to, filling up the missing values for each row. If this will be hard, then
Create a volunteer list without missing data.
Calculate the time contributed by each volunteer. If you was able to fill up the missing values, then you could use QUERY, I this case the QUERY formula should have to group by name and email, if not, then use SUMIF

Find a word inside text file using Pentaho Kettle/Spoon/PDI

I am creating a Data Comparison/Verification script using SQL and Spoon PDI. We're moving data between two servers, and to make sure we've got all the data we have SQL queries showing a date then the quantity of rows transferred.
Example:
Serv1: 20150522 | 100
Serv2: 20150522 | 100
The script will then try to union these values, and if it fails we'll get a fail email. However, we wish to change this setup to write the outcome to a text file, and based on that text file send either a pass or fail email.
The idea behind this is we have multiple tables we're comparing, so we wish to write all the outcomes of each comparison (eight) to a text file and based off the final text file, send the outcome - rather than spamming our email inbox if multiple steps fail.
The format of the text file we wish to have is either match -> send email or mismatch [step-name] [date] -> send email.
Usually I wouldn't ask a question if I haven't tried anything first, but I've searched everywhere on Google, tried the knowledge I currently have and nothing is going the way I wish it to. I believe this is due to the logic I am using.
I am not asking for a solution to this, or for someone to do it for me. I am simply asking for guidance along the correct path.
I would do this in a transformation where there are steps for each union where the result of each step is the comparison_name and the result. This would result in a data set at the end that looks something like this:
comparison_name | result
Union A | true
Union B | false
Union C | true
You would then be able to output those results to a text file in another step to get your result file to sent out regardless of whether the job passed or failed.
Lastly you would loop through the result row in the stream, and if all are true, you could do an email step to send out a "pass" email, and if one is false, send out a "fail" email.
EDIT:
To get the date of the pass or fail you could either get the date from each individual union query result by adding it to the query like so:
SELECT CURRENT_DATE
Or you could use the Get System Info step in spoon which has multiple ways of injecting the current date into the data stream. (system date fixed, start date range of the transformation, today 00:00:00, etc.)

SQL query with ranking order

All,
Need a help with one of the sql queries. I have a query which pulls up records on ranking order.
Select * from
(select count(*) cnt, customer_cd, smallint(Rank() Over(Order by count(8) Desc)) as rnk
from table.customer
Now, the result shows like,
Cnt Customer Cd
110 1- Retail
90 2-Human resources
20 3-Information Technology
11 Not Standard
I want to remove the description from it and will have only the Customer Codes such as 1,2,3,NS etc. Any help how to achieve this.
Thanks.
You could use LOCATE to find the position of the hyphen, assuming you always have a hyphen. Then, you could use SUBSTRING to get the portion of the string before the position found by LOCATE.
select substring(customer_cd,0,locate('-',customer_cd))
from table.customer
should show you what you will get.
You do seem to have some data (e.g. "Non Standard") that has no code at all. Such fields will come out as blank. If you want to replace that with some specific code, you can use a CASE...END expression.
select CASE when locate('-',customer_cd)==0 then ""
else substring(customer_cd,0, locate('-',customer_cd) ) END
from table.customer

Parameter to find all records or exclude NULL

I have found several articles on how to accomplish the reverse of what I want to do with several methods (IS NULL, CASE, COALESCE), but I think at this point I am more confused than ever after reading all this and probably making solution harder than it needs to be. I am new to T-SQL and I am currently using VS 2005 to build a basic medical report.
I have the Date Range parameter working properly by using a convert command to ignore time stamp giving me all records for the day or date range. I am now wanting to filter SSRS reprot by perliminary report date to find records with preliminary report, or, all records in table.
The database has NULL if no preliminary report was created
The database has time stamp if preliminary report was created. (showing date and time it was created)
I need to find all records not NULL, or all records. (using a parameter)
I have a parameter "Display Prelim Reports only?" #PrelimOnly with a YES or NO answer.
If I use the following it will show all records correctly (all records not NULL showing only records with Prelim report/time stamp present)
LIS_Results.Prelim_Report_Date <> '#PrelimOnly' ----User selects YES it passes NULL
however, if user selects NO, how would I get it to display all records including NULL?
Thank you for any help
Thank you both for your help, it was ultimately a combination of both that got it going. Syntax is as follows.
WHERE (#PrelimOnly = 'NO') AND (CONVERT(VARCHAR(10), LIS_Results.Final_Report_Date, 101) BETWEEN #ReportStartDate AND #ReportEndDate) OR (LIS_Results.Prelim_Report_Date IS NOT NULL) AND (CONVERT(VARCHAR(10), LIS_Results.Final_Report_Date, 101) BETWEEN #ReportStartDate AND #ReportEndDate)
Use an if statement in tsql to say if parameter is yes select records from table where conditions are true.
If no select records from table where conditions are true and date field is not null.
Since #PrelimOnly can only be YES or NO, use:
SELECT
...
FROM
...
WHERE #PrelimOnly = 'NO' or LIS_Results.Prelim_Report_Date is not null
...
If the parameter is NO, the left hand condition of the OR is satisfied and all rows are returned, otherwise only those non-null rows will be returned, as required.
here exist small trick:
((LIS_Results.Prelim_Report_Date <> '#PrelimOnly') OR (1=#AllowNull))
if user selected NO - set AllowNull argument to 1, other way set it to 0
NOTE: AllowNull - it is custom additional argument, you should add the same way as #PrelimOnly
another possible approach:
((LIS_Results.Prelim_Report_Date <> '#PrelimOnly') OR ('NO'='#PrelimOnly'))
for you full query you should do like this:
WHERE
(CONVERT(VARCHAR(10), LIS_Results.Final_Report_Date, 101) BETWEEN ReportStartDate AND ReportEndDate) AND
(
LIS_Results.Prelim_Report_Date is not null
or
('#PrelimOnly' = 'NO') // if instead of NO VS sends empty string replace it here
)
It was a combination of Iiya and Ian that got me the solution however the syntax was not complete and is as follows.
WHERE (#PrelimOnly = 'NO') AND (CONVERT(VARCHAR(10), LIS_Results.Final_Report_Date, 101)
BETWEEN #ReportStartDate AND #ReportEndDate) OR (LIS_Results.Prelim_Report_Date IS NOT NULL)
AND (CONVERT(VARCHAR(10), LIS_Results.Final_Report_Date, 101) BETWEEN #ReportStartDate AND
#ReportEndDate)
It requeired the Date and Time paramater to be repeated so that both paramters would still work, and the #PrelimOnly = 'NO' had to be first.