Prevent stemming of words starting with # in PostgreSQL full text search

Prevent stemming of words starting with # in PostgreSQL full text search - postgresql

Basically, I want to be able to get an exact match (hashtag included) for queries like this:
=#SELECT to_tsvector('english', '#adoption');
to_tsvector
-------------
'adopt':1
Instead, I want for words starting with #, to see:
=#SELECT to_tsvector('english', '#adoption');
to_tsvector
-------------
'#adoption':1
Is this possible with psql full text search?

Before you search or index, you could replace each # character with some other character that you don't use in your texts, but which changes the parser's interpretation:
test=> SELECT alias, lexemes FROM ts_debug('english', '#adoption');
┌───────────┬─────────┐
│ alias │ lexemes │
├───────────┼─────────┤
│ blank │ │
│ asciiword │ {adopt} │
└───────────┴─────────┘
(2 rows)
test=> SELECT alias, lexemes FROM ts_debug('english', '/adoption');
┌───────┬─────────────┐
│ alias │ lexemes │
├───────┼─────────────┤
│ file │ {/adoption} │
└───────┴─────────────┘
(1 row)

Related

Match all files that are inside a folder and ignore one folder inside

I have the following directory structure:
root
├─ files
│ ├─ folder1
│ │ ├─ file1.js
│ | └─ file2.js
│ ├─ folder2
│ │ └─ file3.js
│ ├─ file4.js
| └─ file5.js
└─ config.js
How can I match every file inside of file (and subdirectories) except the files that are in folder1, in this case file3.js, file4.js and file5.js?
I know I could exclude folder1 with the following: files/!(folder1)/*.js, but this only matches file3.js.

Try **/files/{*.js,!(folder1)*/*.js}. You can test using globster.xyz

There is probably a more elegant way to do this as I am not too familiar with glob, but I think this will get what you are asking for.
import glob
exclude_pattern = ['folder1']
file_list = glob.glob('./files/**/*', recursive=True)
for pattern in exclude_pattern:
exclude_patternmatch = list(filter(lambda x: pattern in x, file_list))
for item in exclude_patternmatch:
file_list.remove(item)
print(file_list)
output:
['./files/file6.js', './files/file5.js', './files/folder2/file3.js', './files/folder2/file4.js']

Clickhouse: split output on select

Performing a select on Clickhouse, on a MergeTree table that is loaded from a KafkaEngine table via a Materialized View, a simple select shows output split in groups in the clickhouse-client:
:) select * from customersVisitors;
SELECT * FROM customersVisitors
┌────────day─┬─────────createdAt───┬──────────────────_id─┬───────────mSId─┬───────xId──┬─yId─┐
│ 2018-08-17 │ 2018-08-17 11:42:04 │ 8761310857292948227 │ DV-1811114459 │ 846817 │ 0 │
│ 2018-08-17 │ 2018-08-17 11:42:04 │ 11444873433837702032 │ DV-2164132903 │ 780066 │ 0 │
└────────────┴─────────────────────┴──────────────────────┴────────────────┴────────────┴─────┘
┌────────day─┬─────────createdAt───┬──────────────────_id─┬───────────────────mSId──┬────────xId─┬─yId─┐
│ 2018-08-17 │ 2018-08-17 10:25:11 │ 14403835623731794748 │ DV-07680633204819271839 │ 307597 │ 0 │
└────────────┴─────────────────────┴──────────────────────┴─────────────────────────┴────────────┴─────┘
3 rows in set. Elapsed: 0.013 sec.
Engine is ENGINE = MergeTree(day, (mSId, xId, day), 8192)
Why does the output appear splitted in two groups?

If I'm not mistaken, the output is split when the data came from different blocks, also often it leads to being processed in different threads. If you want to get rid of it, wrap your query in outer select
select * from (...)

MergeTree Engine is designed for faster WRITE and READ operations.
Fater writes are achieved by inserting data in parts and then the data is merged offline into a single part for faster reads.
you can see the data partition the following directory :
ls /var/lib/clickhouse/data/database_name/table_name
If you run the following query, you will find this that the data is now available in a single group and also a new partition is available at the above location :
optimize table MY_TABLE_NAME
Optimize table forces merging of partition, but in usual cases, you can just leave it on Click house .

how to use \timing in postgres

I want to know the time that it takes to execute a query in Postgres, I see a lot of response that propose to use \timing, but I'm newbie in Postgres and I don't know how to use it, can anyone help
thank you in advance

You can use \timing only with the command line client psql, since this is a psql command.
It is a switch that turns execution time reporting on and off:
test=> \timing
Timing is on.
test=> SELECT 42;
┌──────────┐
│ ?column? │
├──────────┤
│ 42 │
└──────────┘
(1 row)
Time: 0.745 ms
test=> \timing
Timing is off.

Postgres: `cache lookup failed for constraint 34055`

I'm have an OID that is generating a tuple that is evidently not valid.
This is the error I get when trying to delete a table in psql after some \set VERBOSITY verbose:
delete from my_table where my_column = 'some_value';
ERROR: XX000: cache lookup failed for constraint 34055
LOCATION: ri_LoadConstraintInfo, ri_triggers.c:2832
This is what I found elsewhere.
2827 : /*
2828 : * Fetch the pg_constraint row so we can fill in the entry.
2829 : */
2830 548 : tup = SearchSysCache1(CONSTROID, ObjectIdGetDatum(constraintOid));
2831 548 : if (!HeapTupleIsValid(tup)) /* should not happen */
2832 0 : elog(ERROR, "cache lookup failed for constraint %u", constraintOid);
2833 548 : conForm = (Form_pg_constraint) GETSTRUCT(tup);
2834 :
2835 548 : if (conForm->contype != CONSTRAINT_FOREIGN) /* should not happen */
2836 0 : elog(ERROR, "constraint %u is not a foreign key constraint",
I read this means the OID is being referenced in other places. Where are these other places and does anyone know how I to clean something like this up?
I really like the /* should not happen */ comment on line 2831.

I'd say that this means that you have catalog corruption.
Foreign key constraints are internally implemented as triggers. When that trigger fires, it tries to find the constraint that belongs to it. This seems to fail in your case, and that causes the error.
You can see for yourself:
SELECT tgtype, tgisinternal, tgconstraint
FROM pg_trigger
WHERE tgrelid = 'my_table'::regclass;
┌────────┬──────────────┬──────────────┐
│ tgtype │ tgisinternal │ tgconstraint │
├────────┼──────────────┼──────────────┤
│ 5 │ t │ 34055 │
│ 17 │ t │ 34055 │
└────────┴──────────────┴──────────────┘
(2 rows)
Now try to look up that constraint:
SELECT conname
FROM pg_constraint
WHERE oid = 34055;
┌─────────┐
│ conname │
├─────────┤
└─────────┘
(0 rows)
To recover from such a corruption, you should restore your latest good backup.
You can try to salvage your data by using pg_dumpall to dump the running PostgreSQL cluster, create a new cluster and restore the dump there. If you are lucky, you now have a good copy of your cluster and you can use that. If the dump or the restore fail because of data inconsistencies, you have to use more advanced methods.
As always in case of data corruption, it is best to first stop the cluster with
pg_ctl stop -m immediate
and make a physical backup of the data directory. That way you have a copy if your salvage operation further damages the data.

Using box-drawing Unicode characters in batch files

I am making a batch file that uses these characters:
˧ ˥ ˪ ˫
It is not working, it just terminates itself.
I have seen people use characters like this:
Å
It isnt the character but it turns into a character, can someone give me a list of these, shows the type of letter above and what it turns into?

If you want to write console batch files that use those characters, you need an editor that will save the batch file using the console's code page. To check what that is, type:
C:\>chcp
Active code page: 437
This is the result for my US Windows system. Western European versions of Windows will often be code page 850.
A good editor is Notepad++. Set that encoding in the editor (Encoding, Character sets, Western European, OEM-US) and copy the following characters into it:
#echo off
echo ╔═╦═╗ ┌─┬─┐ ╓─╥─╖ ╒═╤═╕
echo ║ ║ ║ │ │ │ ║ ║ ║ │ │ │
echo ╠═╬═╣ ├─┼─┤ ╟─╫─╢ ╞═╪═╡
echo ║ ║ ║ │ │ │ ║ ║ ║ │ │ │
echo ╚═╩═╝ └─┴─┘ ╙─╨─╜ ╘═╧═╛
Save the file as test.bat and run it from the console:
C:\>test
╔═╦═╗ ┌─┬─┐ ╓─╥─╖ ╒═╤═╕
║ ║ ║ │ │ │ ║ ║ ║ │ │ │
╠═╬═╣ ├─┼─┤ ╟─╫─╢ ╞═╪═╡
║ ║ ║ │ │ │ ║ ║ ║ │ │ │
╚═╩═╝ └─┴─┘ ╙─╨─╜ ╘═╧═╛
When you open the file again in Notepad++, it is possible that you will see something like:
#echo off
echo ÉÍËÍ» ÚÄÂÄ¿ ÖÄÒÄ· ÕÍÑÍ¸
echo º º º ³ ³ ³ º º º ³ ³ ³
echo ÌÍÎÍ¹ ÃÄÅÄ´ ÇÄ×Ä¶ ÆÍØÍµ
echo º º º ³ ³ ³ º º º ³ ³ ³
echo ÈÍÊÍ¼ ÀÄÁÄÙ ÓÄÐÄ½ ÔÍÏÍ¾
Since there is no indication in the file what code page the characters in it represent, Notepad++ may choose the so-called ANSI code page. On US Windows that is Windows-1252. Just select the OEM-US encoding again to display it properly.

The characters you can use depend on the console codepage that is set. You can see it with chcp. On many systems it's 850 or 437. You can then look up the characters in that code page or find one that supports the characters you need and use chcp in your batch file to set it early on.
Note though that this is a setting for the process, so if someone needs to continue working with the console window afterwards it might not be nice to change their drapes to another colour. Also code page 65001 is UTF-8, but has a set of problems and drawbacks that make it rather tricky to use.
Note also that Notepad is not a useful text editor for writing batch files that need more than ASCII, because the legacy encoding in the non-console part of Windows is a different one. This might be what you mean that Å turns into another character.

You just need this equivalences file that works in any text editor, including the standard Windows Notepad:
Notepad: ┌┬┐ ├┼┤ └┴┘ ─ │
cmd.exe: ÚÂ¿ ÃÅ´ ÀÁÙ Ä ³
Notepad: ╔╦╗ ╠╬╣ ╚╩╝ ═ ║
cmd.exe: ÉË» ÌÎ¹ ÈÊ¼ Í º
Copy this table into a text file, that you must save as Unicode encoding. Then, when you want to insert a character in your Batch file, just choose the one below the graphic char you want to show.
Note: These characters are correct for code pages 850 or 437.
This table was copied from this answer.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Prevent stemming of words starting with # in PostgreSQL full text search - postgresql

Related

Match all files that are inside a folder and ignore one folder inside

Clickhouse: split output on select

how to use \timing in postgres

Postgres: `cache lookup failed for constraint 34055`

Using box-drawing Unicode characters in batch files

Categories

Resources