How to select first N row of each group - group-by

I have some data in my triple store like:
Subject Predicate Object
-----------------------------------------------------------------------------------
<http://Doc1> http://purl.org/dc/terms/created 2013
<http://Doc1> http://purl.org/dc/terms/contributor John
.
.
<http://Doc2> http://purl.org/dc/terms/created 2014
<http://Doc2> http://purl.org/dc/terms/contributor David
.
.
<http://Doc3> http://purl.org/dc/terms/created 2013
<http://Doc3> http://purl.org/dc/terms/contributor John
.
.
.
I want to select every triple where subject is Doc1:
SELECT ?subject ?predicate ?object
WHERE {
?subject ?predicate ?object
FILTER ( ?subject = <http://Doc1> )
}
That was easy! This is my output:
Subject Predicate Object
-----------------------------------------------------------------------------------
<http://Doc1> http://purl.org/dc/terms/created 2013
<http://Doc1> http://purl.org/dc/terms/contributor John
..
And now for each object i want to return first N triples. In fact I want to select N triples where some document was created in 2013 and N triples where contributor is John etc.
I tried to do something like:
SELECT ?subject ?predicate ?specObject
WHERE {
<http://Doc1> ?predicate ?specObject.
?subject ?predicate ?specObject
}
But this query returns every triple where object is 2013 and John. I need just first five triple for each group. How can I build this query?
Thanks!

Related

sparql select wikidata group_by and concat

I want to extract a list o players and a list of clubs where it has played, separated by commas.
SELECT DISTINCT ?playerLabel
(GROUP_CONCAT(?teamLabel ; separator=',') as ?teams)
WHERE {
?player wdt:P106 wd:Q937857 .
?player wdt:P2574 ?team
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?playerLabel
I have two problems:
I don't get a list of teams for each player, only the name, and variable ?teams empty.
If I don't use GROUP CONCAT and GROUP BY I obtain the team id, but I prefer the label of team.
For example 2 players...:
playerLabel teams
Cristiano Ronaldo Sporting Portugal, Manchester U, Real Madrid, Juventus, Manchester U
Leo Messi Barcelona, PSG
At least I need the Concat and group by, even with code...
thanks
You use P2574, which is "National-Football-Teams.com player ID". While National-Football-Teams.com lists all teams a player played for, this data is not accessible through the Wikidata Query Service. But Wikidata itself has a dedicated property for sports team member: P54.
So write ?player wdt:P54 ?team instead of ?player wdt:P2574 ?team.
Additionaly, you need to add ?team rdfs:label ?teamLabel . filter (lang(?teamLabel)='en') to be able to use ?teamLabel in GROUP_CONCAT().
Thus, the full working query looks like this (restricted to US players to avoid query time outs):
SELECT DISTINCT ?playerLabel (GROUP_CONCAT(?teamLabel ; separator=',') as ?teams)
WHERE {
?player wdt:P106 wd:Q937857 .
?player wdt:P27 wd:Q30 .
?player wdt:P54 ?team .
?team rdfs:label ?teamLabel . filter (lang(?teamLabel)='en')
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?playerLabel

TSQL PIVOT Function: Invalid column name error and creating calculated column

I'm pretty new to the PIVOT function and I have been trying to figure this out for the past day and a half so I thought I would create an account after lurking for so long and just ask.
I have a table with the layout as follows:
AsOfDt AcctNum MntYr Dt Category Count
4/15/2015 12345 Jan-15 1/18/2015 Registered User 1
4/15/2015 12346 Feb-15 2/7/2015 New Registration User 1
4/15/2015 12347 Jan-15 1/27/2015 Unique Account 1
4/15/2015 12348 Jan-15 1/24/2015 Registered User 1
This is the end result I am trying to achieve
MntYr Account Population New Registration User Registered User Unique Account
Jan 2015 330984 12 26212 26311
Feb 2015 331897 2953 58702 58894
Mar 2015 343561 950 29498 29638
Apr 2015 343181 675 8845 8916
Grand Total 1349623 4590 123257 123759
Here is the Query that I currently have built:
WITH BaseQuery AS (
SELECT
MntYr
,Category
,[Count]
FROM [dbo].[rpt_gen_WebPortal_TestingData]
)
SELECT [MntYr]
,'Account Population'
,'Unique Account'
,'Registered User'
,'New Registration User'
FROM BaseQuery
pivot (sum([count]) for MntYr
in ("Jan 2015", "Feb 2015", "Mar 2015", "Apr 2015" )
) AS Pivoting
My first question:
I am getting an error for my MntYr column in the second SELECT statement, "Invalid column name 'MntYr'." I really don't understand why this is throwing an error. What am I doing wrong with trying to pull that column when I explicitly name it in my BaseQuery pull?
My second question:
I would also like to create a calculated field based upon the percentage of (Unique Account / Account Population), but I'm not quite sure how to go about calculated fields in a PIVOT function. Any ideas on how to get started with this one?
Any and all help would be much appreciated!
Thanks.
Your pivot clause was wrong. You also don't need a CTE. Try this:
SELECT
MntYr
,[Account Population]
,[Unique Account]
,[Registered User]
,[New Registration User]
,case
when isnull([Account Population],0) = 0 then 0
else 100 * [Unique Account] / [Account Population]
end Pct
FROM (
SELECT
MntYr
,Category
,[Count]
FROM [dbo].[rpt_gen_WebPortal_TestingData]
) BaseQuery
pivot (sum([Count]) for Category
in ([Account Population]
,[Unique Account]
,[Registered User]
,[New Registration User] )
) AS Pivoting

Get staff name from table STAFF

can anyone provide me with solution for my coding. For your information, I made a query from my answer table which only consist id, staff_id, dept_name, question_id, ans, evaluator, and year. Below is my code:-
// Make a mysqli Connection
$connect = new mysqli('localhost', 'root', '', 'cpsdatabase');
//Mean by staff Id
$dept_name = $_GET['dept_name'];
$query = "SELECT staff_id,dept_name, AVG(ans)
FROM hodanswer WHERE dept_name='$dept_name'
group by staff_id";
$result=mysqli_query($connect, $query);
// Print out result
while($row = mysqli_fetch_array($result))
{
echo "The mean of staff id = &nbsp". $row['staff_id']."&nbsp&nbsp from department &nbsp".$row['dept_name']." &nbsp &nbsp &nbsp is &nbsp &nbsp". $row['AVG(ans)'];
echo "<br />";
}
I want to find mean and I did get the result. My problem is I want to retrieve the staff name based on staff id but staff name does not include in answer table. Staff name provided in staff table. How can I retrieve staff_name from table STAFF and display result based on code above. Please help me.
If I'm understanding your tables correctly then this should work. This will perform an inner join between tables STAFF and hodanswer and will display all staff id and staff name from table STAFF, respectively, where the staff id is equal to the staff_id(s) that are present in table hodanswer.
SELECT a.id, a.staff_name FROM STAFF a INNER JOIN hodanswer b ON a.id = b.staff_id;
Google up SQL INNER JOIN

SQL Give me the name of all the people that I sent the same file

I have a table that includes the userID that sent the file, the userID that the file was sent to, the filename and the date it was sent.
http://sqlfiddle.com/#!6/855cc6
I'm trying to get a statement that returns one row per filename sent with the list of records (one per file sent) with the names of the people I sent it to at the end of the row
Something like this:
01/08/2014 | "main doc" | "Jon P, Mike S, Ron W"
04/04/2014 | "other doc" | "Jon P, Mike S"
10/10/2014 | "last doc" | "Ron W"
(where the date is the oldest instance of the DateSent datetime field).
Sorry I don't know how to create functions in sqlfiddler so let's assume that there is a scalar function named "GetName(UserID)" that returns a name of the user passed as parameter. It returns one row only.
You can use FOR XML PATH to concatenate values like this:
SELECT DISTINCT
DateSent,
FileName,
SUBSTRING
(
(
SELECT CONCAT(',', t1.SentToUserID) --maybe GetName(t1.SentToUserID)
FROM FileSent t1
WHERE t1.FileName = t2.FileName AND t1.DateSent = t2.DateSent AND t1.UserID = t2.UserID
ORDER BY t1.FileName
FOR XML PATH ('')
), 2, 1000
) [SentFiles]
FROM FileSent t2
ORDER BY DateSent
Sample SQL Fiddle (two slightly different versions).
To get just the minimum date you can use MIN(DateSent) and GROUP BY on FileName and UserId
SELECT DISTINCT
MIN(DateSent) DateSent,
FileName,
STUFF ((SELECT CONCAT(',', t1.SentToUserID)
FROM FileSent T1
WHERE t1.FileName = t2.FileName AND t1.UserID = t2.UserID
FOR XML PATH('')
),1,1,'' ) [SentFiles]
FROM FileSent T2
GROUP BY FileName, UserID
SQL Fiddle for this.

How to select first and last records between certain date parameters?

I need a Query to extract the first instance and last instance only between date parameters.
I have a Table recording financial information with financialyearenddate field linked to Company table via companyID. Each company is also linked to programme table and can have multiple programmes. I have a report to pull the financials for each company
on certain programme which I have adjusted to pull only the first and last instance (using MIN & MAX) however I need the first instance.
after a certain date parameter and the last instance before a certain date parameter.
Example: Company ABloggs has financials for 1999,2000,2001,2004,2006,2007,2009 but the programme ran from 2001 to 2007 so I only want
the first financial record and last financial record between those years i.e. 2001 & 2007 records. Any help appreciated.
At the moment I am using 2 queries as I needed the data in a hurry but I need it in 1 query and only where financial year end dates are between parameters and only where there are minimum of 2 GVA records for a company.
Query1:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(
SELECT
ccx_companyname,
MAX(ccx_financialyearenddate) AS LatestDate
FROM Filteredccx_gva AS Filteredccx_gva_1
GROUP BY ccx_companyname
) AS min_1
INNER JOIN Filteredccx_gva AS gva
ON min_1.ccx_companyname = gva.ccx_companyname AND
min_1.LatestDate = gva.ccx_financialyearenddate
WHERE (gva.ccx_status = ACTUAL)
Query2:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(
SELECT
ccx_companyname,
MIN(ccx_financialyearenddate) AS FirstDate
FROM Filteredccx_gva AS Filteredccx_gva_1
GROUP BY ccx_companyname
) AS MAX_1
INNER JOIN Filteredccx_gva AS gva
ON MAX_1.ccx_companyname = gva.ccx_companyname AND
MAX_1.FirstDate = gva.ccx_financialyearenddate
WHERE (gva.ccx_status = ACTUAL)
Can't you just add a where clause using the first and last date parameters. Something like this:
SELECT <companyId>, MIN(<date>), MAX(<date>)
FROM <table>
WHERE <date> BETWEEN #firstDate AND #lastDate
GROUP BY <companyId>
declare #programme table (ccx_companyname varchar(max), start_year int, end_year int);
insert #programme values
('ABloggs', 2001, 2007);
declare #companies table (ccx_companyname varchar(max), ccx_financialyearenddate int);
insert #companies values
('ABloggs', 1999)
,('ABloggs', 2000)
,('ABloggs', 2001)
,('ABloggs', 2004)
,('ABloggs', 2006)
,('ABloggs', 2007)
,('ABloggs', 2009);
select c.ccx_companyname, min(ccx_financialyearenddate), max(ccx_financialyearenddate)
from #companies c
join #programme p on c.ccx_companyname = p.ccx_companyname
where c.ccx_financialyearenddate >= p.start_year and c.ccx_financialyearenddate <= p.end_year
group by c.ccx_companyname
having count(*) > 1;
You can combine your two original queries into a single query by including the MIN and MAX aggregates in the same GROUP BY query of the virtual table. Also including COUNT() and HAVING COUNT() > 1 ensures company must have at least 2 dates. So query should look like:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(SELECT
ccx_companyname,
ccx_status,
MIN(ccx_financialyearenddate) AS FirstDate,
MAX(ccx_financialyearenddate) AS LastDate,
COUNT(*) AS NumDates
FROM Filteredccx_gva AS Filteredccx_gva_1
WHERE (ccx_status = ACTUAL)
GROUP BY ccx_companyname, ccx_status
HAVING COUNT(*) > 1
) AS MinMax
INNER JOIN Filteredccx_gva AS gva
ON MinMax.ccx_companyname = gva.ccx_companyname AND
(MinMax.FirstDate = gva.ccx_financialyearenddate OR
MinMax.LastDate = gva.ccx_financialyearenddate)
WHERE (gva.ccx_status = MinMax.ccx_status)
ORDER BY gva.ccx_companyname, gva.ccx_financialyearenddate