Clickhouse-copier DB::Exception: Requested cluster 'xxx' not found - apache-zookeeper

I'm testing Clickhouse-copier for copying data from one cluster to another.
I set up one-node one-replica cluster called xxx.
SELECT *
FROM system.clusters
┌─cluster─┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─
│ xxx │ 1 │ 1 │ 1 │ 127.0.0.1
└─────────┴───────────┴──────────────┴─────────────┴───────────
┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┐
│ 127.0.0.1 │ 9000 │ 1 │ default │ │
┴──────────────┴──────┴──────────┴─────────┴──────────────────┘
I also created database on this cluster cluster_xxx and two tables local_data and dist_data.
CREATE TABLE cluster_xxx.local_data on cluster xxx (
`countryName` String,
`countryCode` String,
`indicatorName` String,
`indicatorCode` String
) ENGINE = MergeTree()
ORDER BY countryName
SETTINGS index_granularity = 8192
CREATE TABLE cluster_xxx.dist_data on cluster xxx
(`countryName` String,
`countryCode` String,
`indicatorName` String,
`indicatorCode` String
) ENGINE = Distributed(xxx, cluster_xxx, local_data)
Then I prepared two config files for Clickhouse-copier
zookeeper.zml:
<yandex>
<logger>
<level>trace</level>
<size>100M</size>
<count>3</count>
</logger>
<zookeeper>
<node>
<host>localhost</host>
<port>2181</port>
</node>
</zookeeper>
</yandex>
and schema.xml
<yandex>
<remote_servers>
<source_cluster>
<shard>
<replica>
<host>127.0.0.1</host>
<port>9000</port>
</replica>
</shard>
</source_cluster>
<target_cluster>
<shard>
<replica>
<host>192.168.0.110</host>
<port>9000</port>
</replica>
</shard>
</target_cluster>
</remote_servers>
<max_workers>1</max_workers>
<tables>
<table_events>
<cluster_pull>xxx</cluster_pull>
<database_pull>cluster_xxx</database_pull>
<table_pull>dist_data</table_pull>
<cluster_push>test_cluster</cluster_push>
<database_push>cluster_test</database_push>
<table_push>dist_data</table_push>
<engine>ENGINE=MergeTree('/clickhouse/tables/test_cluster/cluster_test/dist_data',
'{replica}')</engine>
<sharding_key>rand()</sharding_key>
</table_events>
</tables>
</yandex>
which I put to Zookeeper zookeeper-client create /clickhouse/description "$(cat schema.xml)"
When I run clickhouse-copier --config-file=zookeeper.zml --task-path=/clickhouse exception is thrown
2019.06.12 23:06:06.668703 [ 1 ] {} <Error> : virtual int
DB::ClusterCopierApp::main(const std::vector<std::basic_string<char> >&): Code: 170, e.displayText() =
DB::Exception: Requested cluster 'xxx' not found, Stack trace:
0. clickhouse-copier(StackTrace::StackTrace()+0x16) [0x6834a66]
1. clickhouse-copier(DB::Exception::Exception(std::string const&, int)+0x1f) [0x317311f]
2. clickhouse-copier(DB::Context::getCluster(std::string const&) const+0x7f) [0x5e6115f]
3. clickhouse-copier(DB::ClusterCopier::init()+0x1181) [0x3213b51]
4. clickhouse-copier(DB::ClusterCopierApp::mainImpl()+0x5dd) [0x320383d]
5. clickhouse-copier(DB::ClusterCopierApp::main(std::vector<std::string, std::allocator<std::string> > const&)+0x1a) [0x315619a]
6. clickhouse-copier(Poco::Util::Application::run()+0x26) [0x6a84ec6]
7. clickhouse-copier(Poco::Util::ServerApplication::run(int, char**)+0x136) [0x6a9f076]
8. clickhouse-copier(mainEntryClickHouseClusterCopier(int, char**)+0x9a) [0x32001aa]
9. clickhouse-copier(main+0x179) [0x314e609]
10. /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f345138a3d5]
11. clickhouse-copier() [0x316fd37]
What might be the reason that Clickhouse-copier doesn't see my cluster? Which point in configuration process do I miss?
Additional information:
I run Clickhouse-copier on source machine.
Source and destination machines are vms and run on Centos 7
Cluster on the destination server is not set up, because there was no need for it, the error regards source cluster
Firewall is turned off.

It looks like the error in schema.xml: source_cluster and target_cluster tags under remote_servers should be named as names of clusters.
You need to replace source_cluster with xxx and target_cluster with test_cluster.
schema.xml:
<yandex>
<remote_servers>
<xxx> <!-- ← ← ← -->
<shard>
<replica>
<host>127.0.0.1</host>
<port>9000</port>
</replica>
</shard>
</xxx>
<test_cluster> <!-- ← ← ← -->
<shard>
<replica>
<host>192.168.0.110</host>
<port>9000</port>
</replica>
</shard>
</test_cluster>
</remote_servers>
<max_workers>1</max_workers>
<tables>
<table_events>
<cluster_pull>xxx</cluster_pull>
<database_pull>cluster_xxx</database_pull>
<table_pull>dist_data</table_pull>
<cluster_push>test_cluster</cluster_push>
<database_push>cluster_test</database_push>
<table_push>dist_data</table_push>
<engine>ENGINE=MergeTree('/clickhouse/tables/test_cluster/cluster_test/dist_data', '{replica}')</engine>
<sharding_key>rand()</sharding_key>
</table_events>
</tables>
</yandex>

Related

ClickHouse cannot parse json message's timestamp filed from Kafka

I have table in ClickHouse configured to read messages in JSON format from Kafka but there is error with parsing time field, when I try to read table:
SELECT *
FROM mydb.kafka
Error TCPHandler: Code: 27. DB::Exception: Cannot parse input: expected '"' before: '.753844305Z"}': (while reading the value of key created_at): while parsing Kafka message (topic: mytopic, partition: 0, offset: 0)': While executing SourceFromInputStream. (CANNOT_PARSE_INPUT_ASSERTION_FAILED)
JSON message has this field "created_at":"2021-10-17T14:33:19.753844305Z"
How I created table:
CREATE TABLE IF NOT EXISTS mydb.kafka (  id bigint,  name String,  created_at DateTime ) ENGINE = Kafka() SETTINGS  kafka_broker_list = 'localhost:9094',  kafka_topic_list = 'mytopic',  kafka_group_name = 'sample_group',  kafka_format = 'JSONEachRow';
There are two types:
DateTime 2021-10-17T14:33:19 (32 bits)
DateTime64(n) 2021-10-17T14:33:19.753 (n=3) (64 bits)
But anyway, you have to enable date_time_input_format=best_effort because default formatting for DateTime is 2021-10-17 14:33:19
$ cat /etc/clickhouse-server/users.d/date_time_input_format.xml
<?xml version="1.0"?>
<yandex>
<profiles>
<default>
<date_time_input_format>best_effort</date_time_input_format>
</default>
</profiles>
</yandex>

liquibase lock Could not acquire change log lock in postgresql docker Image

liquibase lock in Postgrease in docker
Caused by: liquibase.exception.LockException: Could not acquire
change log lock. Currently locked by 85c1e0340e82 (172.18.0.12) since
6/18/20, 11:36 AM
Caused by: liquibase.exception.LockException: Could not acquire change log lock. Currently locked by 85c1e0340e82 (172.18.0.12) since 6/18/20, 11:36 AM
at liquibase.lockservice.StandardLockService.waitForLock(StandardLockService.java:236)
at liquibase.Liquibase.update(Liquibase.java:184)
at liquibase.Liquibase.update(Liquibase.java:179)
at liquibase.integration.spring.SpringLiquibase.performUpdate(SpringLiquibase.java:366)
at liquibase.integration.spring.SpringLiquibase.afterPropertiesSet(SpringLiquibase.java:314)
at org.springframework.boot.autoconfigure.liquibase.DataSourceClosingSpringLiquibase.afterPropertiesSet(DataSourceClosingSpringLiquibase.java:46)
at io.github.jhipster.config.liquibase.AsyncSpringLiquibase.initDb(AsyncSpringLiquibase.java:118)
at io.github.jhipster.config.liquibase.AsyncSpringLiquibase.afterPropertiesSet(AsyncSpringLiquibase.java:103)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1855)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1792)
... 16 common frames omitted
After some research:
I found solutions.
Find the details of docker image.
%> docker ps -a --filter "name=docker-compose"
%>. CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
b5b26f985       postgres:12.3       "docker-entrypoint.s…"   5 hours ago         Up 19 minutes       5432/tcp            docker-compose
Get into image environment
%> docker exec -it b5b26f985 bash 
%>root#b5b26f985:/# ls
root#b5b26f985:/# bin  boot  dev  docker-entrypoint-initdb.d  docker-entrypoint.sh  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
now connect to PG
root#b5b26f985e9b:/# psql -h localhost -U <username >
<username >=# select * from DATABASECHANGELOGLOCK;
 id | locked |       lockgranted       |          lockedby         
----+--------+-------------------------+----------------------------
  1 | t      | 2020-06-18 11:36:08.825 | 85c1e0340e82 (172.18.0.12)
(1 row)
Description of table locked type my change form system to system and
DB to DB so so good to check the data type.
<username >=# \d DATABASECHANGELOGLOCK;
                    Table "public.databasechangeloglock"
   Column    |            Type             | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+---------
 id          | integer                     |           | not null |
 locked      | boolean                     |           | not null |
 lockgranted | timestamp without time zone |           |          |
 lockedby    | character varying(255)      |           |          |
Indexes:
    "databasechangeloglock_pkey" PRIMARY KEY, btree (id)
Update query
=# update DATABASECHANGELOGLOCK set LOCKED=false, LOCKGRANTED=null, LOCKEDBY=null where ID=1;
UPDATE 1
<username >=# SELECT * FROM DATABASECHANGELOGLOCK;
 id | locked | lockgranted | lockedby
----+--------+-------------+----------
  1 | f      |             |
(1 row)
--Try now this should work . Happy coding.

SQL Server 2016 XML Shredding

Have been trying to figure this out for a while without success, read like 10 posts and some other examples and the MS help, not resonating, need to shred some xml data with the following format:
<ncf_report xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://cp.com/rules/client">
<admin>
<quoteback name="abcd">ABCD A</quoteback>
<product_group>Abcd ABcd Abcd</product_group>
<pnc_account>123456</pnc_account>
<pnc_account_name>ABC</pnc_account_name>
<product_reference>123456789</product_reference>
<report_type>ABCDE</report_type>
<status>ABCDE</status>
<ownership>ABCD</ownership>
<report_code>1234</report_code>
<report_description>Abcde/report_description>
<purpose>ABCDEFGH</purpose>
<date_request_ordered>05/05/2020</date_request_ordered>
<date_request_received>05/05/2020</date_request_received>
<date_request_completed>05/05/2020</date_request_completed>
<time_report_processed>1028</time_report_processed>
<multiple_scores_ordered>false</multiple_scores_ordered>
<vendor name="Abcd" address="Abcd" />
<report>
<sequence>0000000001</sequence>
<count>0000000001</count>
</report>
</admin>
<report>
<alerts_scoring>
<scoring>
<score status="Abcd">
<model_label>ABCD</model_label>
<score>123</score>
<rating_state>AB</rating_state>
<classification> ABCD </classification>
<reason_codes>
<code>12</code>
<description>ABCD</description>
</reason_codes>
<reason_codes>
<code>12</code>
<description>ABCD</description>
</reason_codes>
<reason_codes>
<code>12</code>
<description>ABCD ABCD ABCD</description>
</reason_codes>
<reason_codes>
<code>12</code>
<description>ABCD ABCD ABCD</description>
</reason_codes>
</score>
</scoring>
<general>ABCD ABCD ABCD ORIGINAL REPORT DATE: 12/12/2000</general>
<general>ABCD ABCD ABCD</general>
<general> ABCD ABCD ABCD</general>
<general narrativeCode="Abcd Abcd">ABCD ABCD ABCD</general>
<general narrativeCode=" Abcd Abcd">ABCD ABCD ABCD</general>
<general narrativeCode=" Abcd Abcd">ABCD ABCD ABCD</general>
</alerts_scoring>
<vendor_dataset>
<subjects>
<subject type="Abcd" relationship_to_data="Abcd">
<name type="Abcd">
<first>XXXX</first>
<middle>X</middle>
<last>XXXX</last>
</name>
<birth_date>01/01/1900</birth_date>
<ssn>999999999</ssn>
<address type="Abcd" ref="1" />
<address type="Abcd" ref="2" />
<address type="Abcd" ref="3" />
</subject>
</subjects>
<addresses>
<address id="1">
<street1>ABCD</street1>
<city>ABCD</city>
<state>AB</state>
<postalcode>12345</postalcode>
<zip4>1234</zip4>
<date_first_at_address>01/02/1900</date_first_at_address>
<date_last_at_address>01/02/1900</date_last_at_address>
</address>
<address id="2">
<house>123</house>
<street1>ABCDE</street1>
<city>ABCDE</city>
<state>AB</state>
<postalcode>12345</postalcode>
<zip4>1234</zip4>
<date_first_at_address>00/00/1900</date_first_at_address>
<date_last_at_address>00/00/1900</date_last_at_address>
</address>
<address id="3">
<street1>ABCDE</street1>
<city>ABCDE</city>
<state>AB</state>
<postalcode>12345</postalcode>
<zip4>1234</zip4>
<date_first_at_address>00/00/1900</date_first_at_address>
<date_last_at_address>00/00/1900</date_last_at_address>
</address>
</addresses>
</vendor_dataset>
<summary>
<date_oldest_trade>00/00/1900</date_oldest_trade>
<date_latest_trade>00/00/1900</date_latest_trade>
<date_latest_activity>00/00/1900</date_latest_activity>
<includes_bankruptcies flag="true" date="02/02/2009" />
<includes_other_records public_records="false" collection="true" consumer_statement="false" />
<credit_range high="123456" low="1234" number_trade_lines="12" />
**<account_status_counters>
<account type="current" description="Pays Account as Agreed" status="1">12</account>
<account type="current" description="Status Not Known" status=" ">7</account>
<account type="former" description="Pays/Paid 30-60 Days or Max 2 Payments Past Due" status="2">5</account>
<account type="former" description="Pays/Paid 60-90 Days or Max 3 Payments Past Due" status="3">4</account>
<account type="former" description="Bad Debt" status="9">6</account>
</account_status_counters>**
I currently going down the path of trying to use the xml procedure but I could not get to the finish line with openxml as well. Trying to extract data in highlighted at bottom of xml
EXEC sp_xml_preparedocument #hdoc OUTPUT, #CreditScoreXML
SELECT * FROM OPENXML(#hdoc, '/<ncf_report xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://cp.com/rules/client">/admin/summary/account_status_counters')
WITH
(
[Ref_Number] VARCHAR(10) 'product_reference',
[current_account_type] VARCHAR(10) './account/#type',
[current_account_type_description] VARCHAR(50) './account/#description',
[current_account_type_description] VARCHAR(1) './account/#status'
You can define the namespace for your XML using WITH XMLNAMESPACES statement, then you can extract the values you need with .value().
I don't understand exactly the information you are trying to extract, but this should put you on the right track (I only put the first row of your xml to save space, you should put the entire XML fragment in the #xml variable):
declare #xml xml set #xml='
<ncf_report xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://cp.com/rules/client">
...
'
;WITH XMLNAMESPACES ('http://cp.com/rules/client' as ns1)
select
#xml.value('(ns1:ncf_report/ns1:admin/ns1:product_reference)[1]', 'varchar(10)') as Ref_Number
,#xml.value('(ns1:ncf_report/ns1:report/ns1:summary/ns1:account_status_counters/ns1:account[#type="current" and #status ="1"]/#description)[1]', 'varchar(50)') as CurrentDescription
,#xml.value('(ns1:ncf_report/ns1:report/ns1:summary/ns1:account_status_counters/ns1:account[#type="current" and #status ="1"])[1]', 'int') as CurrentStatus
,#xml.value('(ns1:ncf_report/ns1:report/ns1:summary/ns1:account_status_counters/ns1:account[#type="current" and #status =" "]/#description)[1]', 'varchar(50)') as CurrentDescription_2
,#xml.value('(ns1:ncf_report/ns1:report/ns1:summary/ns1:account_status_counters/ns1:account[#type="current" and #status =" "])[1]', 'int') as CurrentStatus_2
This sample query would extract:

Postgres-pgloader-transformation in columns

Loading flat file to postgres table.I need to do few transformations while reading the file and load it.
Like
-->Check for characters, if it is present, default some value. Reg_Exp can be used in oracle. How the functions can be called in below syntax?
-->TO_DATE function from text format
-->Check for Null and defaulting some value
-->Trim functions
-->Only few columns from source file should be loaded
-->Defaulting values, say for instance, source file has only 3 columns. But we need to load 4 columns. One column should be defaulted with some value
LOAD CSV
FROM 'filename'
INTO postgresql://role#host:port/database_name?tablename
TARGET COLUMNS
(
alphanm,alphnumnn,nmrc,dte
)
WITH truncate,
skip header = 0,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by '|',
batch rows = 100,
batch size = 1MB,
batch concurrency = 64
SET work_mem to '32 MB', maintenance_work_mem to '64 MB';
Kindly help me, how this can be accomplished used pgloader?
Thanks
Here's a self-contained test case for pgloader that reproduces your use-case, as best as I could understand it:
/*
Sorry pgloader version "3.3.2" compiled with SBCL 1.2.8-1.el7 Doing kind
of POC, to implement in real time work. Sample data from file:
raj|B|0.5|20170101|ABCD Need to load only first,second,third and fourth
column; Table has three column, third column should be defaulted with some
value. Table structure: A B C-numeric D-date E-(Need to add default value)
*/
LOAD CSV
FROM inline
(
alphanm,
alphnumnn,
nmrc,
dte [date format 'YYYYMMDD'],
other
)
INTO postgresql:///pgloader?so.raja
(
alphanm,
alphnumnn,
nmrc,
dte,
col text using "constant value"
)
WITH truncate,
fields optionally enclosed by '"',
fields escaped by double-quote,
fields terminated by '|'
SET work_mem to '12MB',
standard_conforming_strings to 'on'
BEFORE LOAD DO
$$ drop table if exists so.raja; $$,
$$ create table so.raja (
alphanm text,
alphnumnn text,
nmrc numeric,
dte date,
col text
);
$$;
raj|B|0.5|20170101|ABCD
Now here's the extract from running the pgloader command:
$ pgloader 41287414.load
2017-08-15T12:35:10.258000+02:00 LOG Main logs in '/private/tmp/pgloader/pgloader.log'
2017-08-15T12:35:10.261000+02:00 LOG Data errors in '/private/tmp/pgloader/'
2017-08-15T12:35:10.261000+02:00 LOG Parsing commands from file #P"/Users/dim/dev/temp/pgloader-issues/stackoverflow/41287414.load"
2017-08-15T12:35:10.422000+02:00 LOG report summary reset
table name read imported errors total time
----------------------- --------- --------- --------- --------------
fetch 0 0 0 0.007s
before load 2 2 0 0.016s
----------------------- --------- --------- --------- --------------
so.raja 1 1 0 0.019s
----------------------- --------- --------- --------- --------------
Files Processed 1 1 0 0.021s
COPY Threads Completion 2 2 0 0.038s
----------------------- --------- --------- --------- --------------
Total import time 1 1 0 0.426s
And here's the content of the target table when the command is done:
$ psql -q -d pgloader -c 'table so.raja'
alphanm │ alphnumnn │ nmrc │ dte │ col
═════════╪═══════════╪══════╪════════════╪════════════════
raj │ B │ 0.5 │ 2017-01-01 │ constant value
(1 row)

Unable to set the ID of an instance

What is unique about my situation is that the ID's can not be randomly assigned so I set it's value within the instance. I created several instances of an entity using the modeler. Below is the XML created:
<cf:entity name="Test4" namespace="Amikids.TimeTracking" categoryPath="/Amikids.TimeTracking">
<cf:property name="Id" key="true" typeName="int" />
<cf:property name="Name" />
<cf:instance>
<cf:instanceValue name="Id">10</cf:instanceValue>
<cf:instanceValue name="Name">Test 1</cf:instanceValue>
</cf:instance>
<cf:instance>
<cf:instanceValue name="Id">20</cf:instanceValue>
<cf:instanceValue name="Name">Test 2</cf:instanceValue>
</cf:instance>
<cf:instance>
<cf:instanceValue name="Id">30</cf:instanceValue>
<cf:instanceValue name="Name">Test 3</cf:instanceValue>
</cf:instance>
</cf:entity>
There are 2 things that are not working as expected:
The records inserted do not use the ID specificed in the model/xml. Instead they were created incrementally starting at 1:
(The below is displayed in a code snippet only to prevent StackOverflow from reformatting my list so all records appear on one line)
ID Name
1 Test 1
2 Test 2
3 Test 3
When I build the model a second time duplicate records are inserted.
(The below is displayed in a code snippet only to prevent StackOverflow from reformatting my list so all records appear on one line)
ID Name
1 Test 1
2 Test 2
3 Test 3
4 Test 1
5 Test 2
6 Test 3
Although specifying the ID in the instance does not appear to be work, as a simple work around I created the records using code, which allowed me to specify the ID. This has been verified with the following code snippet.
Amikids.TimeTracking.Test4 test4 = new Amikids.TimeTracking.Test4();
test4.Id = 100;
test4.Name = "Test 100";
test4.Save();
test4 = new Amikids.TimeTracking.Test4();
test4.Id = 200;
test4.Name = "Test 200";
test4.Save();