What is the equivalent of SQL subquery in MongoDB? - mongodb

Is there a way to combine two finds in MongoDB similar to the SQL subqueries?
What would be the equivalent of something like:
SELECT * FROM TABLE1 WHERE name = (SELECT name FROM TABLE2 WHERE surname = 'Smith');
I need to get an uuid from one collection searching by email and then use it to filter in another. I would like if possible to make it with just one find instead of get the uuid, store it in a variable and then search second time using the variable...
Here are the two that I want combined somehow:
db.getCollection('person').find({email:'perry.goodwin#yahoo.com'}).sort({_id:-1});
db.getCollection('case').find({applicantUuid:'4a17e96c-caf9-4d78-a853-73e190005c63'});

Related

How to hash a query result with sha256 in PostgreSQL?

I want to somehow hash the result of a query in PostgreSQL. I have a query like
SELECT output FROM result;
And it returns a column composed only of integers. So I somehow want to hash the result of this query. Concatenate the values and hash, or somehow hash the query output directly. Simply I need a way to put it inside SELECT sha256(...). So please note that I do not want to get hash of every column entry, but one hash that somehow corresponds to the query output. Any ideas?
PostgreSQL doesn't come with a built-in streaming hash function exposed to the user, so the easiest way is to build the string in memory and then hash it. Of course this won't work with giant result sets. You can use digest from the pg_crypto extension. You also need to order your rows, or else you might get different results on the same data from one execution to the next if you get the rows in different orders.
select digest(string_agg(output::text,' ' order by output),'sha256')
from result;
Replace 1234 with your column name and add [from table_name] to this query:
select encode(digest(1234::text, 'sha256'), 'hex')
Or for multiple rows use this:
select encode(
digest(
(select array_agg(q1)::text[] from (select row(R.*)::text as q1 from (SELECT output FROM result)R)alias)::text
, 'sha256')
, 'hex')

How to combine DISTINCT and ORDER BY in array_agg of jsonb values in postgresSQL

Note: I am using the latest version of Postgres (9.4)
I am trying to write a query which does a simple join of 2 tables, and groups by the primary key of the first table, and does an array_agg of several fields in the 2nd table which I want returned as an object. The array needs to be sorted by a combination of 2 fields in the json objects, and also uniquified.
So far, I have come up with the following:
SELECT
zoo.id,
ARRAY_AGG(
DISTINCT ROW_TO_JSON((
SELECT x
FROM (
SELECT animals.type, animals.name
) x
))::JSONB
-- ORDER BY animals.type, animals.name
)
FROM zoo
JOIN animals ON animals.zooId = zoo.id
GROUP BY zoo.id;
This results in one row for each zoo, with a an aggregate array of jsonb objects, one for each animal, uniquely.
However, I can't seem to figure out how to also sort this by the parameters in the commented out part of the code.
If I take out the distinct, I can ORDER BY original fields, which works great, but then I have duplicates.
If you use row_to_json() you will lose the column names unless you put in a row that is typed. If you "manually" build the jsonb object with json_build_object() using explicit names then you get them back:
SELECT zoo.id, array_agg(za.jb) AS animals
FROM zoo
JOIN (
SELECT DISTINCT ON (zooId, "type", "name")
zooId, json_build_object('animal_type', "type", 'animal_name', "name")::jsonb AS jb
FROM animals
ORDER BY zooId, jb->>'animal_type', jb->>'animal_name'
-- ORDER BY zooId, "type", "name" is far more efficient
) AS za ON za.zooId = zoo.id
GROUP BY zoo.id;
You can ORDER BY the elements of a jsonb object, as shown above, but (as far as I know) you cannot use DISTINCT on a jsonb object. In your case this would be rather inefficient anyway (first building all the jsonb objects, then throwing out duplicates) and at the aggregate level it is plain impossible with standard SQL. You can achieve the same result, however, by applying the DISTINCT clause before building the jsonb object.
Also, avoid use of SQL key words like "type" and standard data types like "name" as column names. Both are non-reserved keywords so you can use them in their proper contexts, but practically speaking your commands could get really confusing. You could, for instance, have a schema, with a table, a column in that table, and a data type each called "type" and then you could get this:
SELECT type::type FROM type.type WHERE type = something;
While PostgreSQL will graciously accept this, it is plain confusing at best and prone to error in all sorts of more complex situations. You can get a long way by double-quoting any key words, but they are best just avoided as identifiers.

Create a query to select two columns; (Company, No. of Films) from the database

I have created a database as part of university assignment and I have hit a snag with the question in the title.
More likely I am being asked to find out how many films each company has made. Which suggests to me a group by query. But I have no idea where to begin. It is only a two mark question but the syntax is not clicking in my head.
My schema is:
CREATE TABLE Movie
(movieID CHAR(3) ,
title CHAR(36),
year NUMBER,
company CHAR(50),
totalNoms NUMBER,
awardsWon NUMBER,
DVDPrice NUMBER(5,2),
discountPrice NUMBER(5,2))
There are other tables but at first glance I don't think they are relevant to this question.
I am using sqlplus10
The answer you need comes from three basic SQL concepts, I'll step through them with you. If you need more assistance to create an answer from these hints, let me know and I can try to keep guiding you.
Group By
As you mentioned, SQL offers a GROUP BY function that can help you.
A SQL Query utilizing GROUP BY would look like the following.
SELECT list, fields, aggregate(value)
FROM tablename
--WHERE goes here, if you need to restrict your result set
GROUP BY list, fields
a GROUP BY query can only return fields listed in the group by statement, or aggregate functions acting on each group.
Aggregate Functions
Your homework question also needs an Aggregate function called Count. This is used to count the results returned. A simple query like the following returns the count of all records returned.
SELECT Count(*)
FROM tablename
The two can be combined, allowing you to get the Count of each group in the following way.
SELECT list, fields, count(*)
FROM tablename
GROUP BY list, fields
Column Aliases
Another answer also tried to introduce you to SQL column aliases, but they did not use SQLPLUS syntax.
SELECT Count(*) as count
...
SQLPLUS column alias syntax is shown below.
SELECT Count(*) "count"
...
I'm not going to provide you the SQL, but instead a way to think about it.
What you want to do is select where the company matches and count the total rows returned. That count is the number of films made by the specified company.
Hope that points you in the right direction.
Select company, count(*) AS count
from Movie
group by company
select * group by company won't work in Oracle.

Combine dynamic columns with select query

I am using a PostgreSQL database. I have one select query:
select userid, name, age from tbluser;
Now there is another table, tblcalculatedtax, which is generated on the fly and their column names are not predefined, the only mapping between that table and this table is userid. I want to get records after joining two tables. How can I get that?
Simpler:
SELECT *
FROM tbluser
JOIN tblcalculatedtax USING (userid)
Details in the fine manual about SELECT.
Your need SQL Joins. Here's a W3Schools tutorial: http://www.w3schools.com/sql/sql_join.asp
To quickly answer your question, though:
SELECT * FROM tbluser
INNER JOIN tblcalculatedtax
ON tbluser.userid=tblcalculatedtext.userid
The * selects all the columns, so you don't need to know their names. Of course, I'm not sure what use a column is to you if you don't know it's name: do you know what data it contains?

Detecting duplicate values in a column of a Datatable while traversing through It

I have a Datatable with Id(guid) and Name(string) columns. I traverse through the data table and run a validation criteria on the Name (say, It should contain only letters and numbers) and then adding the corresponding Id to a List If name passes the validation.
Something like below:-
List<Guid> validIds=new List<Guid>();
foreach(DataRow row in DataTable1.Rows)
{
if(IsValid(row["Name"])
{
validIds.Add((Guid)row["Id"]);
}
}
In addition to this validation I should also check If the name is not repeating in the whole datatable (even for the case-sensitiveness), If It is repeating, I should not add the corresponding Id in the List.
Things I am thinking/have thought about:-
1) I can have another List, check for the "Name" in the same, If It exists, will add the corresponding Guild
2) I cannot use HashSet as that would treat "Test" and "test" as different strings and not duplicates.
3) Take the DataTable to another one where I have the disctict names (this I havent tried and the code might be incorrect, please correct me whereever possible)
DataTable dataTableWithDistinctName = new DataTable();
dataTableWithDistinctName.CaseSensitive=true
CopiedDataTable=DataTable1.DefaultView.ToTable(true,"Name");
I would loop through the original datatable and check the existence of the "Name" in the CopiedDataTable, If It exists, I wont add the Id to the List.
Are there any better and optimum way to achieve the same? I need to always think of performance. Although there are many related questions in SO, I didnt find a problem similar to this. If you could point me to a question similar to this, It would be helpful.
EDIT :- The number of records might vary from 2000-3000.
Thanks
if you are looking to prevent duplicates, it may be grueling work, and I don't know how many records your dealing with at at atime... If a small set, I'd consider doing a query before each attempted insert from your LIVE source based on
select COUNT(*) as CountOnFile from ProductionTable where UPPER(name) = UPPER(name from live data).
If the result set CountOnFile > 0, don't add.
If you are dealing with a large dataset, like a bulk import, I would pull all the data into a temp table, then do a query where NOT IN... something like
create table OkToBeAdded as
select distinct upper( TempTable.Name ) as Name, GUID
from TempTable
where upper( TempTable.Name )
NOT IN ( select upper( LiveTable.Name )
from LiveTable
where upper( TempTable.Name ) = upper( LiveTable.Name )
);
insert into LiveTable ( Name, GUID )
select Name, GUID from OkToBeAdded;
Obviously, the SQL is sample and would need to be adjusted based on your specific back-end source
/* I did this entirely in SQL and avoided ADO.NET*/
/*I Pass the CSV of valid object Ids and split that in a table*/
DECLARE #TableTemp TABLE
(
TempId uniqueidentifier
)
INSERT INTO #TableTemp
SELECT cast(Data AS uniqueidentifier )AS ID FROM dbo.Split1(#ValidObjectIdsAsCSV,',')
/*Self join with table1 for any duplicate rows and update the column value*/
UPDATE Table1
SET IsValidated=1
FROM Table1 AS A INNER JOIN #TableTemp AS Temp
ON A.ID=Temp.TempId
WHERE NOT EXISTS (SELECT Name,Count(Name) FROM Table1
WHERE A.Name=B.Name
GROUP BY Name HAVING Count(Name)>1)