Clickhouse: split output on select - apache-kafka

Performing a select on Clickhouse, on a MergeTree table that is loaded from a KafkaEngine table via a Materialized View, a simple select shows output split in groups in the clickhouse-client:
:) select * from customersVisitors;
SELECT * FROM customersVisitors
┌────────day─┬─────────createdAt───┬──────────────────_id─┬───────────mSId─┬───────xId──┬─yId─┐
│ 2018-08-17 │ 2018-08-17 11:42:04 │ 8761310857292948227 │ DV-1811114459 │ 846817 │ 0 │
│ 2018-08-17 │ 2018-08-17 11:42:04 │ 11444873433837702032 │ DV-2164132903 │ 780066 │ 0 │
└────────────┴─────────────────────┴──────────────────────┴────────────────┴────────────┴─────┘
┌────────day─┬─────────createdAt───┬──────────────────_id─┬───────────────────mSId──┬────────xId─┬─yId─┐
│ 2018-08-17 │ 2018-08-17 10:25:11 │ 14403835623731794748 │ DV-07680633204819271839 │ 307597 │ 0 │
└────────────┴─────────────────────┴──────────────────────┴─────────────────────────┴────────────┴─────┘
3 rows in set. Elapsed: 0.013 sec.
Engine is ENGINE = MergeTree(day, (mSId, xId, day), 8192)
Why does the output appear splitted in two groups?

If I'm not mistaken, the output is split when the data came from different blocks, also often it leads to being processed in different threads. If you want to get rid of it, wrap your query in outer select
select * from (...)

MergeTree Engine is designed for faster WRITE and READ operations.
Fater writes are achieved by inserting data in parts and then the data is merged offline into a single part for faster reads.
you can see the data partition the following directory :
ls /var/lib/clickhouse/data/database_name/table_name
If you run the following query, you will find this that the data is now available in a single group and also a new partition is available at the above location :
optimize table MY_TABLE_NAME
Optimize table forces merging of partition, but in usual cases, you can just leave it on Click house .

Related

Can't get diff output from NestedStack in AWS CDK

I have an issue when I'm working with NestedStack in AWS CDK, the problem is when I'm using the cdk diff command its returns the diff output return to me.
But when I have a diff inside my NestedStack it's just a reference and i really don't know what it will change inside my NestedStack.
[~] AWS::CloudFormation::Stack IAMPolicyStack.NestedStack/IAMPolicyStack.NestedStackResource IAMPolicyStackNestedStackIAMPolicyStackNestedStackResource4B98A1D2
├─ [~] NestedTemplate
│ └─ [~] .Resources:
│ └─ [~] .CDKMetadata:
│ └─ [~] .Properties:
│ └─ [~] .Analytics:
│ ├─ [-] v2:deflate64:H4sIAAAAAAAA/zPSMzLWM1BMLC/WTU7J1s3JTNKr9kstLklNCS5JTM7WcU7LC0otzi8tSk4FsZ3z81IySzLz82p1AipLMvLz9I31DA30TBSzijMzdYtK80oyc1P1giA0AJZoScZcAAAA
│ └─ [+] v2:deflate64:H4sIAAAAAAAA/zPSMzLWM1BMLC/WTU7J1s3JTNKr9kstLklNCS5JTM7WcU7LC0otzi8tSk4FsZ3z81IySzLz82p1AipLMvLz9I31LPUMjRSzijMzdYtK80oyc1P1giA0ALbtmvJcAAAA
Hope someone out there has hit the same issue as me and has a solution about how to get the diff output from NestedStack.
Updated 30/05/2022
The featuer look like its not ready yet, but its coming in the feature based my ticket here - https://github.com/aws/aws-cdk/issues/20392
This commit adds a workaround where you can run
cdk deploy --no-execute
and then view the nested stack changes in the CloudFormation change set that gets created.
Edit: Looks like this has not been merged yet as of 1.163.1, here is the open issue.

Powershell navigating to unknown directory

I have stumbled the unfortunate situation, having to be in a directory in which another directory is located:
C:\Test\[Folder with unknown name]\theFileINeed.txt
The structure mentioned above originates from a Zip-file from an external source. So i can not change the structure.
My goal is to navigate to the Directory with the unknown name, so it is my working directroy and I can execute further commands there. (Like Get-Childitem e.g.)
Is there a simple way to e.g. use the cd command to move into that directory?
I have fiddled around a bit with Resolve-Path but couldn't find a helpful solution.
Thanks in advance.
Consider this structure:
C:\TMP\TEST
├───unknowndir1
│ │ nonuniquefile.txt
│ │ uniquefile.txt
│ │
│ ├───nonuniquesubdir
│ └───uniquesubdir
└───unknowndir2
│ nonuniquefile.txt
│
└───nonuniquesubdir
You could do cd .\test\*\uniquesubdir but you can't cd .\test\*\nonuniquesubdir as you'll gen an error (...) path (...) resolved to multiple containers. The same error is even with cd .\test\*\uniquesubdir\.. as if it didn't even check for existence of uniquesubdir.
So if you want to enter unknown directory based of a file it contains, you'd have to do something like this: cd (Get-Item .\test\*\uniquefile.txt).DirectoryName.
It will fail if you use nonuniquefile.txt as it again resolves to two different directories. You could enter the first of these directories with cd (Get-Item .\test\*\nonuniquefile.txt).DirectoryName[0] if you don't care which of them you use.

Match all files that are inside a folder and ignore one folder inside

I have the following directory structure:
root
├─ files
│ ├─ folder1
│ │ ├─ file1.js
│ | └─ file2.js
│ ├─ folder2
│ │ └─ file3.js
│ ├─ file4.js
| └─ file5.js
└─ config.js
How can I match every file inside of file (and subdirectories) except the files that are in folder1, in this case file3.js, file4.js and file5.js?
I know I could exclude folder1 with the following: files/!(folder1)/*.js, but this only matches file3.js.
Try **/files/{*.js,!(folder1)*/*.js}. You can test using globster.xyz
There is probably a more elegant way to do this as I am not too familiar with glob, but I think this will get what you are asking for.
import glob
exclude_pattern = ['folder1']
file_list = glob.glob('./files/**/*', recursive=True)
for pattern in exclude_pattern:
exclude_patternmatch = list(filter(lambda x: pattern in x, file_list))
for item in exclude_patternmatch:
file_list.remove(item)
print(file_list)
output:
['./files/file6.js', './files/file5.js', './files/folder2/file3.js', './files/folder2/file4.js']

Prevent stemming of words starting with # in PostgreSQL full text search

Basically, I want to be able to get an exact match (hashtag included) for queries like this:
=#SELECT to_tsvector('english', '#adoption');
to_tsvector
-------------
'adopt':1
Instead, I want for words starting with #, to see:
=#SELECT to_tsvector('english', '#adoption');
to_tsvector
-------------
'#adoption':1
Is this possible with psql full text search?
Before you search or index, you could replace each # character with some other character that you don't use in your texts, but which changes the parser's interpretation:
test=> SELECT alias, lexemes FROM ts_debug('english', '#adoption');
┌───────────┬─────────┐
│ alias │ lexemes │
├───────────┼─────────┤
│ blank │ │
│ asciiword │ {adopt} │
└───────────┴─────────┘
(2 rows)
test=> SELECT alias, lexemes FROM ts_debug('english', '/adoption');
┌───────┬─────────────┐
│ alias │ lexemes │
├───────┼─────────────┤
│ file │ {/adoption} │
└───────┴─────────────┘
(1 row)

Postgres: `cache lookup failed for constraint 34055`

I'm have an OID that is generating a tuple that is evidently not valid.
This is the error I get when trying to delete a table in psql after some \set VERBOSITY verbose:
delete from my_table where my_column = 'some_value';
ERROR: XX000: cache lookup failed for constraint 34055
LOCATION: ri_LoadConstraintInfo, ri_triggers.c:2832
This is what I found elsewhere.
2827 : /*
2828 : * Fetch the pg_constraint row so we can fill in the entry.
2829 : */
2830 548 : tup = SearchSysCache1(CONSTROID, ObjectIdGetDatum(constraintOid));
2831 548 : if (!HeapTupleIsValid(tup)) /* should not happen */
2832 0 : elog(ERROR, "cache lookup failed for constraint %u", constraintOid);
2833 548 : conForm = (Form_pg_constraint) GETSTRUCT(tup);
2834 :
2835 548 : if (conForm->contype != CONSTRAINT_FOREIGN) /* should not happen */
2836 0 : elog(ERROR, "constraint %u is not a foreign key constraint",
I read this means the OID is being referenced in other places. Where are these other places and does anyone know how I to clean something like this up?
I really like the /* should not happen */ comment on line 2831.
I'd say that this means that you have catalog corruption.
Foreign key constraints are internally implemented as triggers. When that trigger fires, it tries to find the constraint that belongs to it. This seems to fail in your case, and that causes the error.
You can see for yourself:
SELECT tgtype, tgisinternal, tgconstraint
FROM pg_trigger
WHERE tgrelid = 'my_table'::regclass;
┌────────┬──────────────┬──────────────┐
│ tgtype │ tgisinternal │ tgconstraint │
├────────┼──────────────┼──────────────┤
│ 5 │ t │ 34055 │
│ 17 │ t │ 34055 │
└────────┴──────────────┴──────────────┘
(2 rows)
Now try to look up that constraint:
SELECT conname
FROM pg_constraint
WHERE oid = 34055;
┌─────────┐
│ conname │
├─────────┤
└─────────┘
(0 rows)
To recover from such a corruption, you should restore your latest good backup.
You can try to salvage your data by using pg_dumpall to dump the running PostgreSQL cluster, create a new cluster and restore the dump there. If you are lucky, you now have a good copy of your cluster and you can use that. If the dump or the restore fail because of data inconsistencies, you have to use more advanced methods.
As always in case of data corruption, it is best to first stop the cluster with
pg_ctl stop -m immediate
and make a physical backup of the data directory. That way you have a copy if your salvage operation further damages the data.