Total number of "1s" in a Postgres bitmask - postgresql

Is there a way to get the total number of 1's in a Postgres "bit string" type?

# select length(replace(x::text, '0', '')) from ( values ('1010111101'::bit varying) ) as something(x);
length
--------
7
(1 row)
And approach without string conversion:
# select count(*) from ( select x, generate_series(1, length(x)) as i from ( values ('1010111101'::bit varying) ) as something(x) ) as q where substring(x, i, 1) = B'1';
count
-------
7
(1 row)

If you need it to be really efficient, here's a discussion: Efficiently determining the number of bits set in the contents of, a VARBIT field

I know, this is already an old topic, but I found a cool answer here: https://stackoverflow.com/a/38971017/4420662
So adapted it would be:
=# select length(regexp_replace((B'1010111101')::text, '[^1]', '', 'g'));
length
--------
7
(1 row)

Based on the discussion here, the above mentioned thread Efficiently determining the number of bits set in the contents of, a VARBIT field,
and the Bit Twiddling Hacks page, I published a PostgreSQL extension: pg_bitcount.
If you install that extension (see instructions there), you can count the number of bits set in a bitstring using:
# Register the extension in PostgreSQL
create extension pg_bitcount;
# Use the pg_bitcount function
select public.pg_bitcount(127::bit(8));
select public.pg_bitcount(B'101010101');
I compared a number of different algorithms for performance and using a table lookup seems to be the fastest. They are all much faster than converting to text and replacing '0' with ''.

You have a simple way using plpgsql here.

The one / first bit? Or the total number of bits flipped on? The former: bit mask (& 1) the bit. The latter: A nasty query, like:
SELECT (myBit & 1 + myBit >> 1 & 1 + myBit >> 2 & 1) AS bitCount FROM myBitTable;
I suppose, you could also cast to a string and count the 1's in PL/SQL.

Related

Display float column with leading sign

I am using PHP with PostgreSQL. I have the following query:
SELECT ra, de, concat(ra, de) AS com, count(*) OVER() AS total_rows
FROM mdust
LIMIT :pagesize OFFSET :starts
The columns ra and de are floats where de can be positive or negative, however, the de does not return the + associated with the float. It does however return the - negative sign. What I want is for the de column within concat(ra, de) to return back the positive or negative sign.
I was looking at this documentation for PostgreSQL which provides to_char(1, 'S9') which is exactly what I want but it only works for integers. I was unable to find such a function for floats.
to_char() works for float as well. You just have to define desired output format. The simple pattern S9 would truncate fractional digits and fail for numbers > 9.
test=> SELECT to_char(float '0.123' , 'FMS9999990.099999')
test-> , to_char(float '123' , 'FMS9999990.099999')
test-> , to_char(float '123.123', 'FMS9999990.099999');
to_char | to_char | to_char
---------+---------+----------
+0.123 | +123.0 | +123.123
(1 row)
The added FM modifier stands for "fill mode" and suppresses insignificant trailing zeroes (unless forced by a 0 symbol instead of 9) and padding blanks.
Add as many 9s before and after the period as you want to allow as many digits.
You can tailor desired output format pretty much any way you want. Details in the manual here.
Aside: There are more efficient solutions for paging than LIMIT :pagesize OFFSET :starts:
Optimize query with OFFSET on large table

Conversion does not give the decimal places when the result is a whole number in postgres

I am new to postgres and presently migrating from sql server to postgres and facing some problems. Kindly help me with this.
I am not being able to convert to decimal whenever the answer is in whole number. Whenever the answer is in whole number,
decimal conversion results in only giving the integer part as the answer.
For example :- If the result is 48 decimal conversion gives 48 whereas I want 48.00.
you can start from using numeric(4,2), instead of decimal, eg:
t=# select 48::numeric(4,2);
numeric
---------
48.00
(1 row)
or even:
t=# select 48*1.00;
?column?
----------
48.00
(1 row)
but keep in mind the fact you don't see zeroes in decimal does not mean the number is not decimal. eg here it is still float:
t=# select 48::float;
float8
--------
48
(1 row)
You can round the value by using the statement
select round(48,2);
It will return 48.00. You can also round to more decimal points.

PostgreSQL - making ts_rank take the ts_vector position as-is or defining a custom ts_rank function

I'm performing weighted search on a series of items in an e-commerce platform. The problem I have is ts_rank is giving me the exact same value for different combinations of words, even if the ts_vector gives different positions for each set of words.
Let me illustrate this with an example:
If I give ts_vector the word camas, it gives me the following:
'cam':1
If I give ts_vector the word sofas camas, it gives me the following:
'cam':2 'sof':1
So camas is getting different positions depending on the words combination.
When I execute the following statement:
select ts_rank(to_tsvector('camas'),to_tsquery('spanish','cama'));
PostgreSQL gives me 0.0607927 as the ts_rank computed value, whereas the computed value for the following statement:
select ts_rank(to_tsvector('sofas camas'),to_tsquery('spanish','cama'));
is the same value: 0.0607927.
How can this be?
The question I have in mind is the following: is there a way for ts_rank to consider the position for the words contained in the ts_vector structure as-is or is there a way to define a custom ts_rank function for me to take the position for the words as explained?
Any help would be greatly appreciated.
As the documentation sais about functions ts_rank and ts_rank_cd:
they consider how often the query terms appear in the document, how close together the terms are in the document, and how important is the part of the document where they occur
That is these functions ignore other words in calculation. For example, you can get different results for these queries:
postgres=# select ts_rank(to_tsvector('spanish', 'famoso sofas camas'),to_tsquery('spanish','famoso & cama'));
ts_rank
-----------
0.0985009
(1 row)
postgres=# select ts_rank(to_tsvector('spanish', 'famoso camas'),to_tsquery('spanish','famoso & cama'));
ts_rank
-----------
0.0991032
(1 row)
postgres=# select ts_rank(to_tsvector('spanish', 'sofas camas camas'),to_tsquery('spanish','cama'));
ts_rank
-----------
0.0759909
(1 row)
Also the documentation sais:
Different applications might require additional information for ranking, e.g., document modification time. The built-in ranking functions are only examples. You can write your own ranking functions and/or combine their results with additional factors to fit your specific needs.
You can get PostgreSQL code from GitHub. Needed function is ts_rank_tt.
You can also change the normalization options to take it into account the document length, which is ignored by default.
For example, if you add 1 as the third parameter, it divides the rank by 1 + the logarithm of the document length. With your example:
postgres=# select ts_rank(to_tsvector('spanish', 'camas'),to_tsquery('spanish','camas'), 1);
ts_rank
-----------
0.0607927
(1 row)
postgres=# select ts_rank(to_tsvector('spanish', 'sofas camas'),to_tsquery('spanish','camas'), 1);
ts_rank
-----------
0.0383559
(1 row)
Documentation: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING

How to return gaps in numbers stored as char with leading zeros?

I have a table with a char(5) field for tracking Bin Numbers. The numbers are stored with leading zeros. The numbers go from 00200 through 90000. There are a lot of gaps in the numbers already in use and I need to be able to query them out so the user knows which numbers are available to use.
Assume you have a table of valid bin numbers.
Table: bins
bin_num
--
00200
00201
00202
...
90000
Assume your table is named "inventory". The bin numbers returned by this query are the ones that aren't in "inventory".
select bins.bin_num
from bins
left join inventory t2
on bins.bin_num = t2.bin_num
where t2.bin_num is null
order by bins.bin_num
If your version of SQL Server supports analytic functions (and, solely for convenience, common table expressions), you can find most of the gaps like this.
with bin_and_next_bin as (
select bin, lead(bin) over (order by bin) next_bin
from inventory
)
select bin
from bin_and_next_bin
where cast(bin as integer) <> cast(next_bin as integer) - 1
Analytic functions don't require a table of valid bin numbers, although you can make a really strong case that you ought to have such a table in the first place. If you're working in an environment where you don't have such a table, and you're not allowed to build such a table, a common table expression can save the day. (It doesn't show "missing" bin numbers before the first used bin number, though, as it's written here.)
One other disadvantage of this statement is that the WHERE clause isn't sargable; it can't use an index. Yet another is that it assumes bin numbers can be cast to integer. The table-based approach doesn't assume anything about the value or data type of the bin number; it works just as well with mixed alphanumerics as it does with integers or anything else.
I was able to get exactly what I needed by reading this article by Pinal Dave
I created a stored procedure that returned the gaps in the bin number sequence starting from the first bin number to the last. In my application I group the bin numbers by Shop (Vehicles would be 1000 through 2000, Buildings 2001 through 3000, etc).
ALTER PROCEDURE [dbo].[spSelectLOG_BinsAvailable]
(#Shop varchar(9))
AS
BEGIN
declare #start as varchar(5) = (SELECT b.Start FROM BinShopCodeBlocks b WHERE b.Shop = #Shop)
declare #finish as varchar(5) = (SELECT b.Finish FROM BinShopCodeBlocks b WHERE b.Shop = #Shop)
SET NOCOUNT ON;
WITH CTE
AS
(SELECT
CAST(#Start as int) as start,
cast(#Finish as int) as finish
UNION ALL
SELECT
Start + 1,
Finish
FROM
CTE
WHERE
Start < Finish
)
SELECT
RIGHT('00000' + CAST(Start AS VARCHAR(5)), 5)
FROM CTE
WHERE
NOT EXISTS
(SELECT *
FROM
BinMaster b
WHERE
b.BinNumber = RIGHT('00000' + CAST(Start AS VARCHAR(5)), 5)
)
OPTION (MAXRECURSION 0);
END

TSQL Select comma list to rows

How do I turn a comma list field in a row and display it in a column?
For example,
ID | Colour
------------
1 | 1,2,3,4,5
to:
ID | Colour
------------
1 | 1
1 | 2
1 | 3
1 | 4
1 | 5
The usual way to solve this is to create a split function. You can grab one from Google, for example this one from SQL Team. Once you have created the function, you can use it like:
create table colours (id int, colour varchar(255))
insert colours values (1,'1,2,3,4,5')
select colours.id
, split.data
from colours
cross apply
dbo.Split(colours.colour, ',') as split
This prints:
id data
1 1
1 2
1 3
1 4
1 5
Another possible workaround is to use XML (assuming you are working with SQL Server 2005 or greater):
DECLARE #s TABLE
(
ID INT
, COLOUR VARCHAR(MAX)
)
INSERT INTO #s
VALUES ( 1, '1,2,3,4,5' )
SELECT s.ID, T.Colour.value('.', 'int') AS Colour
FROM ( SELECT ID
, CONVERT(XML, '<row>' + REPLACE(Colour, ',', '</row><row>') + '</row>') AS Colour
FROM #s a
) s
CROSS APPLY s.Colour.nodes('row') AS T(Colour)
I know this is an older post but thought I'd add an update. Tally Table and cteTally table based splitters all have a major problem. They use concatenated delimiters and that kills their speed when the elements get wider and the strings get longer.
I've fixed that problem and wrote an article about it which may be found at he following URL. http://www.sqlservercentral.com/articles/Tally+Table/72993/
The new method blows the doors off of all While Loop, Recursive CTE, and XML methods for VARCHAR(8000).
I'll also tell you that a fellow by the name of "Peter" made an improvement even to that code (in the discussion for the article). The article is still interesting and I'll be updating the attachments with Peter's enhancements in the next day or two. Between my major enhancement and the tweek Peter made, I don't believe you'll find a faster T-SQL-Only solution for splitting VARCHAR(8000). I've also solved the problem for this breed of splitters for VARCHAR(MAX) and am in the process of writing an article for that, as well.