SPHINX SE DOUBTS - sphinx

What differences are there between group by and group sort?
How to implement set select in sphinx SE?

groupby: what function and attribute to use for grouping
groupsort: what order to retrieve the groups in.
use 'select'. But be careful that your sphinxSE table has the right columns to use the output
(btw, its sphinxSE not "sphinx SE")

Related

fuzzy match in postgresql

I have two table in my database , agridata and geoname. I am trying to find out geoid column for names in agridata like below
select geonameid , name from geoname where name in (select distinct district_name from agridata );
I want to do a fuzzy match of the names as exact names are not in database. How to go about it ?
You can use a variety of matching algorithms (see here), but I'm not 100% sure they will work with an in clause. I'd imagine you really want to use a soundex join e.g.
select distinct g.geonameid, g.name from geoname g join agridata a on soundex(a.name) = g.name
or similar.
If you've got a huge match set to deal with, you may want to consider using some kind of search index such as ElasticSearch/Solr.
Use extension for PostgreSQL called pg_trgm, implementation of trigram matching.
"We can measure the similarity of two strings by counting the number of trigrams they share. This simple idea turns out to be very effective for measuring the similarity of words in many natural languages"
I used it, it's very fast and gives great results.

ormlite select count(*) as typeCount group by type

I want to do something like this in OrmLite
SELECT *, COUNT(title) as titleCount from table1 group by title;
Is there any way to do this via QueryBuilder without the need for queryRaw?
The documentation states that the use of COUNT() and the like necessitates the use of selectRaw(). I hoped for a way around this - not having to write my SQL as strings is the main reason I chose to use ORMLite.
http://ormlite.com/docs/query-builder
selectRaw(String... columns):
Add raw columns or aggregate functions
(COUNT, MAX, ...) to the query. This will turn the query into
something only suitable for using as a raw query. This can be called
multiple times to add more columns to select. See section Issuing Raw
Queries.
Further information on the use of selectRaw() as I was attempting much the same thing:
Documentation states that if you use selectRaw() it will "turn the query into" one that is supposed to be called by queryRaw().
What it does not explain is that normally while multiple calls to selectColumns() or selectRaw() are valid (if you exclusively use one or the other),
use of selectRaw() after selectColumns() has a 'hidden' side-effect of wiping out any selectColumns() you called previously.
I believe that the ORMLite documentation for selectRaw() would be improved by a note that its use is not intended to be mixed with selectColumns().
QueryBuilder<EmailMessage, String> qb = emailDao.queryBuilder();
qb.selectColumns("emailAddress"); // This column is not selected due to later use of selectRaw()!
qb.selectRaw("COUNT (emailAddress)");
ORMLite examples are not as plentiful as I'd like, so here is a complete example of something that works:
QueryBuilder<EmailMessage, String> qb = emailDao.queryBuilder();
qb.selectRaw("emailAddress"); // This can also be done with a single call to selectRaw()
qb.selectRaw("COUNT (emailAddress)");
qb.groupBy("emailAddress");
GenericRawResults<String[]> rawResults = qb.queryRaw(); // Returns results with two columns
Is there any way to do this via QueryBuilder without the need for queryRaw(...)?
The short answer is no because ORMLite wouldn't know what to do with the extra count value. If you had a Table1 entity with a DAO definition, what field would the COUNT(title) go into? Raw queries give you the power to select various fields but then you need to process the results.
With the code right now (v5.1), you can define a custom RawRowMapper and then use the dao.getRawRowMapper() method to process the results for Table1 and tack on the titleCount field by hand.
I've got an idea how to accomplish this in a better way in ORMLite. I'll look into it.

How to use "DISTINCT ON (field)" in Doctrine 2?

I know how to use "DISTINCT" in Doctrine 2, but I really need to use "DISTINCT ON (field)" and I don't know how to do this with the QueryBuilder.
My SQL query looks like:
SELECT DISTINCT ON (currency) currency, amount FROM payments ORDER BY currency
And this query works perfect, but I can't use it with the QueryBuilder. Maybe I could write this query on some other way?
I would suggest that the SELECT DISTINCT ON (..) construct that PostgreSQL supports is outside the Object Relational Model (ORM) that is central to Doctrine. Or, perhaps put another way, because SELECT DISTINCT ON (..) is rare in SQL implementations Doctrine haven't coded for it.
Regardless of the actual logic for it not working, I would suggest you try Doctrine's "Native SQL". You need to map the results of your query to the ORM.
With NativeQuery you can execute native SELECT SQL statements and map
the results to Doctrine entities or any other result format supported
by Doctrine.
In order to make this mapping possible, you need to describe to
Doctrine what columns in the result map to which entity property. This
description is represented by a ResultSetMapping object.
With this feature you can map arbitrary SQL code to objects, such as
highly vendor-optimized SQL or stored-procedures.
SELECT DISTINCT ON (..) falls into vendor-optimized SQL I think, so using NativeQuery should allow you to access it.
Doctrine QueryBuilder has some limitations. Even if I didn't check if it's was possible with query builder, I do not hesitate to use DQL when I do not know how to write the query with query builder.
Check theses examples at
http://doctrine-orm.readthedocs.org/en/latest/reference/dql-doctrine-query-language.html#dql-select-examples
Hope this help.
INDEX BY can be used in DQL, allowing first result rows indexed by the defined string/int field to be overwritten by following ones with the same index:
SELECT
p.currency,
p.amount
FROM Namespace\To\Payments p INDEX BY p.currency
ORDER BY p.currency ASC
DQL - EBNF - INDEX BY

Sorting data in a oracle table

Is it possible to sort the data inside of an oracle table? Like ascending/descending via a certain column, alphabetically. Oracle 10g express.
You could try
Select *
from some_table
order by some_column asc
This will sort the results by some_column and place them in ascending order. Use desc instead of asc if you want descending order. Or did you mean to have the ordering in the physical storage itself?
I believe it's possible to specify the ordering/sorting of an indexed column in storage. It's probably closest to what you want. I don't usually use this index sort feature, but for more info see: http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10759/statements_5010.htm#i2062718
Perhaps you could use an index organized table - IOT to ensure that the data is stored ordered by index.
Have a look at the physical properties clause of the CREATE TABLE statement:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7002.htm#i2128663
What is the problem that you are trying to solve though? An IOT may or may not be what you should be using.
As defined by the relational model, the rows and columns in a table are unordered. That's the theory at least.
In practice, if you want the data to come out of a query in a particular order then you should always use the ORDER BY clause. The order of the output is not guarantee unless you provide that.
It would be possible to use an ORDER BY when inserting into a table but that doesn't guarantee the order that data will be output. A query might come out in the same order every time.... but that doesn't mean it will come out in the same order next time.
There were issues when Oracle 10g came out where aggregate queries (with GROUP BY) were not coming out sorted because users had come to rely on the data being sorted as a side-effect of the grouping. With the introduction of the HASH GROUP BY in addition to the SORT GROUP BY people were caught out. This was a useful reminder that ORDER BY should always be used.
What do you really mean ?
Are you just asking for the order by clause ?
http://www.1keydata.com/sql/sqlorderby.html
Oracle 10g Express supports ANSI SQL like most RDBM's so you can sort in the standard manner:
SELECT * FROM Persons ORDER BY LastName
A good tutorial on SQL can be found here: w3schools SQL
Oracle Express does have some limitations compared to the Enterprise Edition but not in the basic SQL dialect it supports.

sqlalchemy group_by error

The following works
s = select([tsr.c.kod]).where(tsr.c.rr=='10').group_by(tsr.c.kod)
and this does not:
s = select([tsr.c.kod, tsr.c.rr, any fields]).where(tsr.c.rr=='10').group_by(tsr.c.kod)
Why?
thx.
It doesn't work because the query isn't valid like that.
Every column needs to be in the group_by or needs an aggregate (i.e. max(), min(), whatever) according to the SQL standard. Most databases have always complied to this but there are a few exceptions.
MySQL has always been the odd one in this regard, within MySQL this behaviour depends on the ONLY_FULL_GROUP_BY setting: https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html
I would personally recommend setting the sql_mode setting to ANSI. That way you're largely compliant to the SQL standard which will help you in the future if you ever need to use (or migrate) to a standards compliant database such as PostgreSQL.
What you are trying to do is somehow valid in mysql, but invalid in standard sql, postgresql and common sense. When you group rows by 'kod', each row in a group has the same 'kod' value, but different values for 'rr' for example. With aggregate functions you can get some aspect of the values in this column for each group, for example
select kod, max(rr) from table group by kod
will give you list of 'kod's and the max of 'rr's in each group (by kod).
That being sad, in the select clause you can only put columns from the group by clause and/or aggregate functions from other columns. You can put whatever you like in where - this is used for filtering. You can also put additional 'having' clause after group that contains aggregate function expression that can also be used as post-group filtering.