Can I remove ORDER BY from a Django ORM query? - django-orm

It seems like Django by default adds ORDER BY to queries. Can I clear it?
from slowstagram.models import InstagramMedia
print InstagramMedia.objects.filter().query
SELECT
`slowstagram_instagrammedia`.`id`,
`slowstagram_instagrammedia`.`user_id`,
`slowstagram_instagrammedia`.`image_url`,
`slowstagram_instagrammedia`.`video_url`,
`slowstagram_instagrammedia`.`created_time`,
`slowstagram_instagrammedia`.`caption`,
`slowstagram_instagrammedia`.`filter`,
`slowstagram_instagrammedia`.`link`,
`slowstagram_instagrammedia`.`attribution_id`,
`slowstagram_instagrammedia`.`likes_count`,
`slowstagram_instagrammedia`.`type`
FROM
`slowstagram_instagrammedia`
ORDER BY
`slowstagram_instagrammedia`.`id`
ASC
```

Actually, just do a query.order_by() is enough.
This is specified in the docs although it is a bit hard to find. The docs say:
If you don’t want any ordering to be applied to a query, not even the default ordering, call order_by() with no parameters.
Here is the implementation of order_by, for your reference -
def order_by(self, *field_names):
"""
Returns a new QuerySet instance with the ordering changed.
"""
assert self.query.can_filter(), \
"Cannot reorder a query once a slice has been taken."
obj = self._clone()
obj.query.clear_ordering(force_empty=False)
obj.query.add_ordering(*field_names)
return obj

You can use: clear_ordering method from query
"""Removes any ordering settings.
If 'force_empty' is True, there will be no ordering in the resulting
query (not even the model's default).
"""
Example:
>>> from products.models import Product
>>> products = Product.objects.filter(shortdesc='falda').order_by('id')
>>> print products.query
SELECT "products_product"."id", "products_product"."shortdesc"
WHERE "products_product"."shortdesc" = falda
ORDER BY "products_product"."id" ASC
>>> products.query.clear_ordering()
>>> print products.query
SELECT "products_product"."id", "products_product"."shortdesc"
WHERE "products_product"."shortdesc" = falda

Try to use .order_by('?') at the end of queryset.

Related

Ecto - casting an array of strings to integers in a fragment

In an Ecto query I'm running against Postgres (13.6), I need to interpolate a list of ids into a fragment - this is generally not something I have a problem with, but in this case, the list of ids is being received as a list of strings that need to be cast to integers (or, more specifically, BIGINT). The query that I think that I need is as follows, which the troublesome bit being ANY(ARRAY?::BIGINT[]):
ModelA
|> where(
[ma],
fragment(
"EXISTS (SELECT * FROM model_b mb WHERE mb.a_id = ? AND mb.c_id = ANY(ARRAY?::BIGINT[]))",
a.id,
^c_ids
)
)
where c_ids would be a list like ["1449441579", "2345556834"]
However, when I run this, I get the error
(Postgrex.Error) ERROR 42703 (undefined_column) column "array$4" does not exist
referring to the generated SQL
ANY(ARRAY$4::BIGINT[])
Of course, I could convert the array of c_ids to integers beforehand in my app code, but I'd like to see if I can get it to cast in the query itself.
Writing the fragment in straight SQL works out just fine:
SELECT * FROM model_b mb WHERE mb.a_id = 1 AND mb.c_id = ANY(ARRAY['1449441579', '2345556834']::BIGINT[]);
What is the idiomatic way to get this kind of array casting to work in an Ecto fragment? Many thanks.
Just to codify my comment, I would do the integer conversion before the query. You can use the dynamic macro to support IN queries:
import Ecto.Query
alias YourApp.Repo
alias YourApp.SomeSchema, as: ModelA
strs = ["1", "2", "3"]
ids = Enum.map(strs, fn x -> String.to_integer(x) end)
conditions = dynamic([tbl], tbl.id in ^ids)
Repo.all(
from ma in ModelA,
where: ^conditions
)
|> IO.inspect()
I did find a post that led me to a potential solution here - https://elixirforum.com/t/interpolating-lists-into-ecto-query-fragment-1/16690/7 - however, I'm not convinced that this is optimal for me.
By using Enum.join and converting my initial array of strings into a string and then allowing Postgres to cast that string to an array of bigints, it worked out:
ModelA
|> where(
[ma],
fragment(
"EXISTS (SELECT * FROM model_b mb WHERE mb.a_id = ? AND mb.c_id = ANY(ARRAY?::BIGINT[]))",
a.id,
^Enum.join(c_ids, ",")
)
)
So I'm posting it here for reference and because it does "work", but I feel sort of silly doing this, because I specifically was trying to avoid processing the list before passing it to PG... so yeah I'd still really like to hear about the "right" way to do this...

How do you use aggregated values within PySpark SQL when() clause?

I am trying to learn PySpark, and have tried to learn how to use SQL when() clauses to better categorize my data. (See here: https://sparkbyexamples.com/spark/spark-case-when-otherwise-example/) What I can't seem to get addressed is how to insert actual scalar values into the when() conditions for comparison's sake explicitly. It seems the aggregate functions return more tabular values than actual float() types.
I keep getting this error message unsupported operand type(s) for -: 'method' and 'method' When I tried running functions to aggregate another column in the original data frame I noticed the result didn't seem to be a flat scaler as much as a table (agg(select(f.stddev("Col")) gives a result like: "DataFrame[stddev_samp(TAXI_OUT): double]") Here is a sample of what I am trying to accomplish if you want to replicate, and I was wondering how you might get aggregate values like the standard deviation and mean within the when() clause so you can use that to categorize your new column:
samp = spark.createDataFrame(
[("A","A1",4,1.25),("B","B3",3,2.14),("C","C2",7,4.24),("A","A3",4,1.25),("B","B1",3,2.14),("C","C1",7,4.24)],
["Category","Sub-cat","quantity","cost"])
psMean = samp.agg({'quantity':'mean'})
psStDev = samp.agg({'quantity':'stddev'})
psCatVect = samp.withColumn('quant_category',.when(samp['quantity']<=(psMean-psStDev),'small').otherwise('not small')) ```
psMean and psStdev in your example are dataframes, you need to use collect() method to extract the scalar values
psMean = samp.agg({'quantity':'mean'}).collect()[0][0]
psStDev = samp.agg({'quantity':'stddev'}).collect()[0][0]
You could also create one variable with all stats as pandas DataFrame and reference to it later in pyspark code:
from pyspark.sql import functions as F
stats = (
samp.select(
F.mean("quantity").alias("mean"),
F.stddev("quantity").alias("std")
).toPandas()
)
(
samp.withColumn('quant_category',
F.when(
samp['quantity'] <= stats["mean"].item() - stats["std"].item(),
'small')
.otherwise('not small')
)
.toPandas()
)

How to add a CASE column to SQLAlchemy output?

So far I've got basically the following:
MyTable.query
.join(…)
.filter(…)
The filter has a complicated case insensitive prefix check:
or_(*[MyTable.name.ilike("{}%".format(prefix)) for prefix in prefixes])
Now I have to retrieve the matching prefix in the result. In pseudo-SQL:
SELECT CASE
WHEN strpos(lower(my_table.foo), 'prefix1') = 0 THEN 'prefix1'
WHEN …
END AS prefix
WHERE prefix IS NOT NULL;
The SQLAlchemy documentation demonstrates how to use CASE within WHERE, which seems like a strange edge case, and not how to use it to get a new column based on an existing one.
The goal is to avoid duplicating the logic and prefix names anywhere.
You know that case is an expression. You can use it in any place where SQL allows it, and link to docs you have provided show how to construct the case expression.
from sqlalchemy import case, literal
from sqlalchemy.orm import Query
q = Query(case([
(literal(1) == literal(1), True),
], else_ = False))
print(q)
SELECT CASE WHEN (:param_1 = :param_2) THEN :param_3 ELSE :param_4 END AS anon_1

How to get only specific rows on DB, when date range fits SQL condition on a 'tsrange' datatype? [duplicate]

I have this query:
some_id = 1
cursor.execute('
SELECT "Indicator"."indicator"
FROM "Indicator"
WHERE "Indicator"."some_id" = %s;', some_id)
I get the following error:
TypeError: 'int' object does not support indexing
some_id is an int but I'd like to select indicators that have some_id = 1 (or whatever # I decide to put in the variable).
cursor.execute('
SELECT "Indicator"."indicator"
FROM "Indicator"
WHERE "Indicator"."some_id" = %s;', [some_id])
This turns the some_id parameter into a list, which is indexable. Assuming your method works like i think it does, this should work.
The error is happening because somewhere in that method, it is probably trying to iterate over that input, or index directly into it. Possibly like this: some_id[0]
By making it a list (or iterable), you allow it to index into the first element like that.
You could also make it into a tuple by doing this: (some_id,) which has the advantage of being immutable.
You should pass query parameters to execute() as a tuple (an iterable, strictly speaking), (some_id,) instead of some_id:
cursor.execute('
SELECT "Indicator"."indicator"
FROM "Indicator"
WHERE "Indicator"."some_id" = %s;', (some_id,))
Your id needs to be some sort of iterable for mogrify to understand the input, here's the relevant quote from the frequently asked questions documentation:
>>> cur.execute("INSERT INTO foo VALUES (%s)", "bar") # WRONG
>>> cur.execute("INSERT INTO foo VALUES (%s)", ("bar")) # WRONG
>>> cur.execute("INSERT INTO foo VALUES (%s)", ("bar",)) # correct
>>> cur.execute("INSERT INTO foo VALUES (%s)", ["bar"]) # correct
This should work:
some_id = 1
cursor.execute('
SELECT "Indicator"."indicator"
FROM "Indicator"
WHERE "Indicator"."some_id" = %s;', (some_id, ))
Slightly similar error when using Django:
TypeError: 'RelatedManager' object does not support indexing
This doesn't work
mystery_obj[0].id
This works:
mystery_obj.all()[0].id
Basically, the error reads Some type xyz doesn't have an __ iter __ or __next__ or next function, so it's not next(), or itsnot[indexable], or iter(itsnot), in this case the arguments to cursor.execute would need to implement iteration, most commonly a List, Tuple, or less commonly an Array, or some custom iterator implementation.
In this specific case the error happens when the classic string interpolation goes to fill the %s, %d, %b string formatters.
Related:
How to implement __iter__(self) for a container object (Python)
Pass parameter into a list, which is indexable.
cur.execute("select * from tableA where id =%s",[parameter])
I had the same problem and it worked when I used normal formatting.
cursor.execute(f'
SELECT "Indicator"."indicator"
FROM "Indicator"
WHERE "Indicator"."some_id" ={some_id};')
Typecasting some_id to string also works.
cursor.execute(""" SELECT * FROM posts WHERE id = %s """, (str(id), ))

AREL: writing complex update statements with from clause

I tried looking for an example of using Arel::UpdateManager to form an update statement with a from clause (as in UPDATE t SET t.itty = "b" FROM .... WHERE ...), couldn.t find any. The way I've seen it, Arel::UpdateManager sets the main engine on initialization and allows to set the various fields and values to update. Is there actually a way to do this?
Another aside would be to find out how to express Postgres posix regex matching into ARel, but this might be impossible by now.
As far as I see the current version of arel gem is not support FROM keyword for the sql query. You can generate a query using the SET, and WHERE keywords only, like:
UPDATE t SET t.itty = "b" WHERE ...
and the code, which copies a value from field2 to field1 for the units table, will be like:
relation = Unit.all
um = Arel::UpdateManager.new(relation.engine)
um.table(relation.table)
um.ast.wheres = relation.wheres.to_a
um.set(Arel::Nodes::SqlLiteral.new('field1 = "field2"'))
ActiveRecord::Base.connection.execute(um.to_sql)
Exactly you can use the additional method to update a relation. So we create the Arel's UpdateManager, assigning to it the table, where clause, and values to set. Values shell be passed to the method as an argument. Then we need to add FROM keyword to the generated SQL request, we add it only if we have access to external table of the specified one by the UPDATE clause itself. And at the last we executes the query. So we get:
def update_relation!(relation, values)
um = Arel::UpdateManager.new(relation.engine)
um.table(relation.table)
um.ast.wheres = relation.wheres.to_a
um.set(values)
sql = um.to_sql
# appends FROM field to the query if needed
m = sql.match(/WHERE/)
tables = relation.arel.source.to_a.select {|v| v.class == Arel::Table }.map(&:name).uniq
tables.shift
sql.insert(m.begin(0), "FROM #{tables.join(",")} ") if m && !tables.empty?
# executes the query
ActiveRecord::Base.connection.execute(sql)
end
The you can issue the the relation update as:
values = Arel::Nodes::SqlLiteral.new('field1 = "field2", field2 = NULL')
relation = Unit.not_rejected.where(Unit.arel_table[:field2].not_eq(nil))
update_relation!(relation, values)