Select distinct rows from MongoDB - mongodb

How do you select distinct records in MongoDB? This is a pretty basic db functionality I believe but I can't seem to find this anywhere else.
Suppose I have a table as follows
--------------------------
| Name | Age |
--------------------------
|John | 12 |
|Ben | 14 |
|Robert | 14 |
|Ron | 12 |
--------------------------
I would like to run something like SELECT DISTINCT age FROM names WHERE 1;

db.names.distinct('age')

Looks like there is a SQL mapping chart that I overlooked earlier.
Now is a good time to say that using a distinct selection isn't the best way to go around querying things. Either cache the list in another collection or keep your data set small.

Related

PostgreSQL arabic case insensitive

I am looking for how to search a database using Arabic text. In Arabic there are some letters that can be written in different ways but in the results they should all show up if one of them is included in the where clause.
The famous example for this would be:
SELECT * FROM persons WHERE name = "اسامة";
+----+--------------+
| id | name |
+----+--------------+
| 3 | أسامه |
| 4 | أسامة |
| 5 | اسامه |
| 6 | اسَامه |
+----+--------------+
4 rows in set (0.00 sec)
I found a good and probably most performant way to do this by creating a custom collation on MySQL in this article but I have no idea how that is done or if it is possible at all in PostgreSQL.
Other ways that include changing the query itself to use Regex are not useful for my use case.
Can someone please guide me how to do the same

How to optimize inverse pattern matching in Postgresql?

I have Pg version 13.
CREATE TABLE test_schemes (
pattern TEXT NOT NULL,
some_code TEXT NOT NULL
);
Example data
----------- | -----------
pattern | some_code
----------- | -----------
__3_ | c1
__34 | c2
1_3_ | a12
_7__ | a10
7138 | a19
_123|123_ | a20
___253 | a28
253 | a29
2_1 | a30
This table have about 300k rows. I want to optimize simple query like
SELECT * FROM test_schemes where '1234' SIMILAR TO pattern
----------- | -----------
pattern | some_code
----------- | -----------
__3_ | c1
__34 | c2
1_3_ | a12
_123|123_ | a20
The problem is that this simple query will do a full scan of 300k rows to find all the matches. Given this design, how can I make the query faster (any use of special index)?
Internally, SIMILAR TO works similar to regexes, which would be evident by running an EXPLAIN on the query. You may want to just switch to regexes straight up, but it is also worth looking at text_pattern_ops indexes to see if you can improve the performance.
If the pipe is the only feature of SIMILAR TO (other than those present in LIKE) which you use, then you could process it into a form you can use with the much faster LIKE.
SELECT * FROM test_schemes where '1234' LIKE any(string_to_array(pattern,'|'))
In my hands this is about 25 times faster, and gives the same answer as your example on your example data (augmented with a few hundred thousand rows of garbage to get the table row count up to about where you indicated). It does assume there is no escaping of any pipes.
If you store the data already broken apart, it is about 3 times faster yet, but of course give cosmetically different answers.
create table test_schemes2 as select unnest as pattern, somecode from test_schemes, unnest(string_to_array(pattern,'|'));
SELECT * FROM test_schemes2 where '1234' LIKE pattern;

Know which table are affected by a connection

I want to know if there is a way to retrieve which table are affected by request made from a connection in PostgreSQL 9.5 or higher.
The purpose is to have the information in such a way that will allow me to know which table where affected, in which order and in what way.
More precisely, something like this will suffice me :
id | datetime | id_conn | id_query | table | action
---+----------+---------+----------+---------+-------
1 | ... | 2256 | 125 | user | select
2 | ... | 2256 | 125 | order | select
3 | ... | 2256 | 125 | product | select
(this will be the result of a select query from user join order join product).
I know I can retrieve id_conn througth "pg_stat_activity", and I can see if there is a running query, but I can't find an "history" of the query.
The final purpose is to debug the database when incoherent data are inserted into the table (due to a lack of constraint). Knowing which connection do the insert will lead me to find the faulty script (as I have already the script name and the id connection linked).

How to properly index strings for lookup and excepts, the PostgreSQL way

Due to infrastructure costs, I've been studying the possibility to migrate a few databases to PostgreSQL. So far I am loving it. But there are a few topics I am quite lost. I need some guidance on one of them.
I have an ETL process that queries "deltas" in my database and imports the new data. To do so, I use lookup tables that store hashbytes of some strings to facilitate the lookup. This works in SQL Server, but apparently things work quite differently in PostgreSQL. In SQL Server, using hashbytes + except is suggested when working with millions of rows.
Let's suppose the following table
+----+-------+------------------------------------------+
| Id | Name | hash_Name |
+----+-------+------------------------------------------+
| 1 | Mark | 31e9697d43a1a66f2e45db652019fb9a6216df22 |
| 2 | Pablo | ce7169ba6c7dea1ca07fdbff5bd508d4bb3e5832 |
| 3 | Mark | 31e9697d43a1a66f2e45db652019fb9a6216df22 |
+----+-------+------------------------------------------+
And my lookup table
+------------------------------------------+
| hash_Name |
+------------------------------------------+
| 31e9697d43a1a66f2e45db652019fb9a6216df22 |
+------------------------------------------+
When querying new data (Pablo's hash), I can advance from the simplified query bellow:
SELECT hash_name
FROM mytable
EXCEPT
SELECT hash_name
FROM mylookup
Thinking the PostgreSQL way, how could I achieve this? Should I index and use EXCEPT? Or is there a better way of doing so?
From my research, I couldn't find much regarding storing hashbytes. Apparently, it is a matter of creating indexes and choosing the right index for the job. More precisely: BTREE for single field indexes and GIN for multiple field indexes.

Is it possible to use different forms and create one row of information in a table?

I have been searching for a way to combine two or more rows of one table in a database into one row.
I am currently creating multiple web-based forms that connect to one table in my database. Is there any way to write some mysql and php code that will take separate form submissions and put them into one row of the database instead of multiple rows?
Here is an example of what is going into the database:
This is all in one table with three rows.
Form_ID represents the three different forms that I used to insert the data into the table.
Form_ID | Lot_ID| F_Name | L_Name | Date | Age
------------------------------------------------------------
1 | 1 | John | Evans | *NULL* | *NULL*
-------------------------------------------------------------
2 |*NULL* | *NULL* | *NULL* | 2017-07-06 | *NULL*
-------------------------------------------------------------
3 |*NULL* | *NULL* | *NULL* | *NULL* | 22
This is an example of three separate forms going into one table. Every time the submit button is hit the data just inserts down to the next row of information.
I need some sort of join or update once the submit button is hit to replace the preceding NULL values.
Here is what I want to do after the submit button is hit:
I want it to be combined all into one row but still in one table
Form_ID is still the three separate forms but only in one row now.
Form_ID |Lot_ID | F_Name | L_Name | Date | Age
----------------------------------------------------------
1 | 1 | John | Evans | 2017-07-06 | 22
My goal is once a one form has been submitted I want the next, different form submission to replace the NULL values in the row above it and so on to create a single row of information.
I found a way to solve this issue. I used UPDATE tablename SET columname = newColumnName WHERE Form_ID = newID
So this way when I want to update rows that have blanks spaces I have it finding the matching ID's