How do I add a migration for uniqueness on two columns where I lowercase one column? - postgresql

I have two columns that store strings, :column_a, and :column_b.
I know that I can do:
add_index :table, [:column_a, :column_b], unique: true
But, I need to achieve the following:
add_index :table, [:column_a, 'lower(column_b)'], unique: true
This of course errors out when I try to migrate.
I get an error that lower(column_b) is not a column.
I am using PostgreSQL.
Honestly, at this point, I'm thinking of just having a column called column_b_lowercase that I index on.

I decided to just use SQL. Here is the code.
class AddUniqueIndexingForLowercaseColumn < ActiveRecord::Migration
def self.up
execute "CREATE UNIQUE INDEX table_column_a_lowercase_column_b_index ON table(column_a, lower(column_b))"
end
def self.down
remove_index :table, :table_column_a_lowercase_column_b_index
end
end

Related

activerecord unique constraint on multiple records

how should I write the activerecord migration to reflect this :
CREATE TABLE table (
c1 data_type,
c2 data_type,
c3 data_type,
UNIQUE (c2, c3)
);
This adds a unique constraint on one column, but what I'm looking for is to create the unique constraint on the combination of 2 columns, like explained in the section Creating a UNIQUE constraint on multiple columns.
EDIT
More precisely: I have a table account and a table balance_previous_month.
class CreateBalance < ActiveRecord::Migration[6.1]
def change
create_table :balance_previous_month do |t|
t.decimal :amount, :precision => 8, :scale => 2
t.date :value_date
t.belongs_to :account, foreign_key: true
t.timestamps
end
end
end
Since we're in January, the value date (i.e. balance at the end of the previous month) is 2020-12-31.
I want to put a constraint on the table balance_previous_month where per account_id, there can be only one value_date with a given amount. The amount can be updated, but a given account can't have 2 identical value_dates.
The link you added to the other post is not exactly equivalent to your request since one answer talks about enforcing uniqueness through the model while the other talks about using an index while in your example you are using a constraint. (Check this for more information on the difference between them).
There are 2 places where you can enforce uniqueness, application and database and it can be done in both places at the same time as well.
Database
So if you want to enforce uniqueness by using an index you can use this:
def change
add_index :table, [:c2, :c3], unique: true
end
If you want to add a constraint as in your example you will have to run a direct sql query in your migration as there is no built-in way in rails to do that.
def up
execute <<-SQL
ALTER TABLE table
ADD UNIQUE (c2, c3)
SQL
end
Check the link above for more info about the difference between them.
Application
Enforcing uniqueness through the model:
validates :c2, uniqueness: { scope: :c3 }
Thanks to Daniel Sindrestean, this code works :
class CreateBalance < ActiveRecord::Migration[6.1]
def change
create_table :balance_previous_month do |t|
t.decimal :amount, :precision => 8, :scale => 2
t.date :value_date
t.belongs_to :account, foreign_key: true
t.timestamps
end
execute <<-SQL
ALTER TABLE balance_previous_month
ADD UNIQUE (account_id, value_date) # ActiveRecord creates account_id
SQL
end
end

In Elixir with Postgres, how can I have the database return the enum values which are NOT in use?

I have an EctoEnum.Postgres:
# #see: https://en.wikipedia.org/wiki/ISO_4217
defmodule PricingEngine.Pricing.CurrencyEnum do
#options [
:AED,
:AFN,
# snip...
:ZWL
]
use EctoEnum.Postgres,
type: :currency,
enums: #options
def values, do: #options
end
This enum has been included in our Postgres database
We also have a structure:
defmodule PricingEngine.Pricing.Currency do
use Ecto.Schema
import Ecto.Changeset
schema "currencies" do
field(:currency, PricingEngine.Pricing.CurrencyEnum)
timestamps()
end
#doc false
def changeset(currency, attrs) do
currency
|> cast(attrs, [:currency])
|> validate_required([:currency])
|> unique_constraint(:currency)
end
end
We can currently successfully use the following functions to figure out which currencies are active/used:
def active_currency_isos do
Repo.all(select(Currency, [record], record.currency))
end
defdelegate all_currency_isos,
to: CurrencyEnum,
as: :values
def inactive_currency_iso do
Pricing.all_currency_isos() -- Pricing.active_currency_isos()
end
This works, but I'm led to believe this could be more efficient if we just asked the database for this information.
Any idea(s) how to do this?
If you want to get a list of all the used enums you should just do a distinct on the currency field. This uses the Postgres DISTINCT ON operator:
from(c in Currency,
distinct: c.currency,
select: c.currency
)
This will query the table, unique by the currency column, and return only the currency column values. You should get an array of all of the enums that exist in the table.
There are some efficiency concerns with doing it this way which could be mitigated by materialized views, lookup tables, in-memory cache etc. However, if your data set isn't extremely large, you should be able to use this for a while.
Edit:
Per the response, I will show how to get the unused enums.
There are 2 ways to do this.
Pure SQL
This query will get all of the used ones and do a difference from the entire set of available enums. The operator we use to do this is EXCEPT and you can get a list of all available enums with enum_range. I will use unnest to turn the array of enumerated types into individual rows:
SELECT unnest(enum_range(NULL::currency)) AS unused_enums
EXCEPT (
SELECT DISTINCT ON (c.name) c.name
FROM currencies c
)
You can execute this raw SQL in Ecto by doing this:
Ecto.Adapters.SQL.query!(MyApp.Repo, "SELECT unnest(...", [])
From this you'll get a Postgresx.Result that you'll have to get the values out of:
result
|> Map.get(:rows, [])
|> List.flatten()
|> Enum.map(&String.to_existing_atom/1)
I'm not really sure of a way to code this query up in pure Ecto, but let me know if you figure it out.
In Code
You can do the first query that I posted before with distinct then do a difference in the code.
query = from(c in Currency,
distinct: c.currency,
select: c.currency
)
CurrencyEnum.__enums__() -- Repo.all(query)
Either way is probably negligible in terms of performance so it's up to you.

optimising a SQL query with multiple min & max ranges

I'm having big problems with optimizing a SQL query that is taking ages to run on a set of data with ~300,000 rows.
I'm running the query on a stat_records table with decimal value and datetime recorded_at column.
I want to find out the MAX and MIN values in any of the following periods: all time, last year, last 6 months, last 3 months, last month, last 2 weeks.
The way I'm doing it right now, is by running the following SQL query individually for every interval specified above:
SELECT MIN("stat_records"."value")
FROM "stat_records"
INNER JOIN "stats" ON "stats"."id" = "stat_records"."stat_id"
WHERE "stat_records"."object_id" = $1
AND "stats"."identifier" = $2
AND ("stat_records"."recorded_at" BETWEEN $3 AND $4)
[["object_id", 1],
["identifier", "usd"],
["recorded_at", "2018-10-15 20:10:58.418512"],
["recorded_at", "2018-12-15 20:11:59.351437"]]
The table definition is:
create_table "stat_records", force: :cascade do |t|
t.datetime "recorded_at"
t.decimal "value"
t.bigint "coin_id"
t.bigint "object_id"
t.index ["object_id"], name: "index_stat_records_on_object_id"
t.index ["recorded_at", "object_id", "stat_id"], name: "for_upsert", unique: true
t.index ["recorded_at", "stat_id"], name: "index_stat_records_on_recorded_at_and_stat_id", unique: true
t.index ["recorded_at"], name: "index_stat_records_on_recorded_at"
t.index ["stat_id"], name: "index_stat_records_on_stat_id"
t.index ["value"], name: "index_stat_records_on_value"
end
This approach, however, takes forever to complete. I have indexes on the stat_records table on both value and recorded_at columns.
What am I missing here - what should I do to optimise this?
Perhaps there is some better approach where I could execute 1 query, and let postgres do the optimisations for me.
An index can only speed up queries that need smaller parts of a table (or sorting). So you can never expect an index to make the query over the whole time range faster.
Your solution could be materialized views. That way you can pre-aggregate the values and the resulting table is much smaller, so that queries will be faster. The disadvantage is that a materialized view needs to be refreshed regularly and contains slightly stale data in between.
An example:
CREATE MATERIALIZED VIEW stats_per_month AS
SELECT stat_records.object_id,
stats.identifier
date_trunc('month', stat_records.recorded_at) AS recorded_month,
min(stat_records.value) AS minval
FROM stat_records
INNER JOIN stats ON stats.id = stat_records.stat_id
GROUP BY stat_records.object_id,
stats.identifier
date_trunc('month', stat_records.recorded_at);
If you need month granularity for your query, you just query from the materialized view rather than from the original tables.
You could also use a hybrid solution and use the original query for small ranges, where stale data might hurt more. That should be fast with an index on recorded_at.

Using Ecto for Postgres fulltext search on GIN indexes

I have a simple model:
schema "torrents" do
field :name, :string
field :magnet, :string
field :leechers, :integer
field :seeders, :integer
field :source, :string
field :filesize, :string
timestamps()
end
And I want to search based on the name. I added the relevant extensions and indexes to my database and table.
def change do
create table(:torrents) do
add :name, :string
add :magnet, :text
add :leechers, :integer
add :seeders, :integer
add :source, :string
add :filesize, :string
timestamps()
end
execute "CREATE EXTENSION pg_trgm;"
execute "CREATE INDEX torrents_name_trgm_index ON torrents USING gin (name gin_trgm_ops);"
create index(:torrents, [:magnet], unique: true)
end
I'm trying to search using the search term, but I always get zero results.
def search(query, search_term) do
from(u in query,
where: fragment("? % ?", u.name, ^search_term),
order_by: fragment("similarity(?, ?) DESC", u.name, ^search_term))
end
SELECT t0."id", t0."name", t0."magnet", t0."leechers", t0."seeders", t0."source",
t0."filesize", t0."inserted_at", t0."updated_at" FROM "torrents"
AS t0 WHERE (t0."name" % $1) ORDER BY similarity(t0."name", $2) DESC ["a", "a"]
Is something wrong with my search function?
My initial guess is that because you're using the % operator, the minimum limit to match is too high for your queries. This limit defaults to 0.3 (meaning that the strings' trigrams are 30% similar). If this threshold isn't met, no results will be returned.
If that is the issue, this threshold is configurable in a couple of ways. You can either use set_limit (docs here), or set the limit on a per query basis.
The set_limit option can be a bit of a hassle, as it needs to be set per connection every time. Ecto (through db_connection) has an option to set a callback function for after_connect (docs here).
To change the limit per query, you can use the similarity function in the where clause, like this:
def search(query, search_term, limit = 0.3) do
from(u in query,
where: fragment("similarity(?, ?) > ?", u.name, ^search_term, ^limit),
order_by: fragment("similarity(?, ?) DESC", u.name, ^search_term))
end
To start, I would try that with a limit of zero to see if you get any results.

Formatting database when using multiple criteria

This should be a relatively simple question. I come from a Python background and don't do a lot of SQL stuff so thought I would ask this formatting question here.
Say I've got something that has
Criteria 1: True
Criteria 2: False
Criteria N: True
In Postgresql, is it better to set the database up as:
Column: Criteria
Row: [1:True,2:False,N:True]
or set each criteria as a column of its own?
Use three Boolean columns:
CREATE TABLE t (
criteria1 boolean,
criteria2 boolean,
criterian boolean
);
You can then formulate queries that involve these columns:
SELECT *
FROM t
WHERE criteria1 = true
AND criteria2 = false;
or
SELECT *
FROM t
WHERE criteria1 = false
OR criterian = true;
Relational databases are designed to do this. In addition, you can create an index on these columns.