Connect QEMU-KVM VMs using vhost-user-client and ovs-dpdk - overlay

My goal is to connect two QEMU-KVM VMs on an overlay network. Each VM is running on a separate physical host and must have a static IP on the network 10.0.0.0/24. To achieve this goal, I want to use an OVS bridge with DPDK. I want to use the vhost-user-client protocol to connect the OVS bridge with the VMs.
My physical setup is the following: two physical machines equipped with a Mellanox ConnectX6-DX, and connected back-to-back (no physical switch). What I want to achieve is this:
+------------------+ +------------------+
| HOST_1 | | HOST_2 |
| | | |
| +------------+ | | +------------+ |
| | VM_1 | | | | VM_2 | |
| | | | | | | |
| | +--------+ | | | | +--------+ | |
| | | ens_2 | | | | | | ens_2 | | |
| | |10.0.0.1| | | | | |10.0.0.2| | |
| +-+---+----+-+ | | +-+---+----+-+ |
| | | | | |
| vhost-client-1 | | vhost-client-1 |
| | | | | |
| +-----+------+ | | +-----+------+ |
| | bridge | | | | bridge | |
| | br0 | | | | br0 | |
| |192.168.57.1| | | |192.168.57.2| |
| +-----+------+ | | +-----+------+ |
| | | | | |
| +---+--- | | +---+---+ |
| | dpdk0 | | | | dpdk0 | |
+----+---+--+------+ +----+---+---+-----+
| |
+-------------------------------+
I successfully created the OVS bridge (here, br0) and the DPDK port (here, dpdk0). On each physical machine, I am able to ping the bridge on the other machine. Then, I created a vhost-user-client port and attached it to the bridge. On each guest, I assigned a static IP according to the above picture, and the ens2 interface is up.
However, at this point I am not able to ping VM2 from VM1 or vice-versa. It seems like no traffic is exchanged through the vhost-client port at all. Ping fails with the Destination Host Unreachable message.
Some useful information:
ovs-vsctl show
Bridge br0
datapath_type: netdev
Port br0
Interface br0
type: internal
Port dpdk0
Interface dpdk0
type: dpdk
options: {dpdk-devargs="0000:01:00.0"}
Port vhost-client-1
Interface vhost-client-1
type: dpdkvhostuserclient
options: {vhost-server-path="/tmp/vhost-client-1"}
ovs_version: "2.16.1"
ovs-vsctl -- --columns=name,ofport list Interface
name : br0
ofport : 65534
name : dpdk0
ofport : 6
name : vhost-client-1
ofport : 2
ovs-ofctl dump-flows br0
cookie=0x0, duration=104.689s, table=0, n_packets=0, n_bytes=0, in_port="vhost-client-1" actions=output:dpdk0
cookie=0x0, duration=99.573s, table=0, n_packets=4, n_bytes=924, in_port=dpdk0 actions=output:"vhost-client-1"
ovs-ofctl show br0
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000b8cef64def2e
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
2(vhost-client-1): addr:00:00:00:00:00:00
config: 0
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
6(dpdk0): addr:b8:ce:f6:4d:ef:2e
config: 0
state: 0
current: AUTO_NEG
speed: 0 Mbps now, 0 Mbps max
LOCAL(br0): addr:b8:ce:f6:4d:ef:2e
config: 0
state: 0
current: 10MB-FD COPPER
speed: 10 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
Libvirt XML configuration (relevant parts)
<domain type='kvm'>
<name>ubuntu-server</name>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<interface type='vhostuser'>
<mac address='52:54:00:16:a5:76'/>
<source type='unix' path='/tmp/vhost-client-1' mode='server'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</interface>
</devices>
</domain>
Which configuration option am I missing? I have followed several guides but still I am unable to route any traffic between my VMs.
I suspect that the problem is related to the LINK_DOWN status of the vhost-client-1 port as reported by the ovs-ofctl show command. I've tried to set that status as UP with the command ovs-ofctl mod-port br0 vhost-client-1 up. Even though the command did not fail, nothing changed.
Any thoughts?

Eventually, I managed to solve my problem. Vipin's answer was useful, but did not solve the issue. The configuration option I was missing was the numa option within the cpu element.
I post the working configuration file just in case it is useful for other people. The first part is about memory backing (under the domain element):
<memory unit='KiB'>[VM memory size]</memory>
<currentMemory unit='KiB'>[VM memory size]</currentMemory>
<memoryBacking>
<hugepages>
<page size='2048' unit='KiB'/>
</hugepages>
<locked/>
<source type='file'/>
<access mode='shared'/>
<allocation mode='immediate'/>
<discard/>
</memoryBacking>
But we also needed the numa configuration, even if our machine had just one processor:
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>qemu64</model>
<feature policy='require' name='x2apic'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='lahf_lm'/>
<feature policy='disable' name='svm'/>
<numa>
<cell id='0' cpus='0-1' memory='[VM memory size]' unit='KiB' memAccess='shared'/>
</numa>
</cpu>

suspect that the problem is related to the LINK_DOWN status of the vhost-client-1 port
[Answer] Yes, the cause for packet not reaching VM is OVS sees the interface down. Even with the presence of a rule to forward traffic from dpdk0 to vhost-client-1 the packets will be dropped.
The primary cause of link not up can be due to the qemu configuration memory backed with hugepages not being used. Vhost-USER port created by DPDK=OVS resides on the huge page memory area and access to the same is required.
A similar DPDK-QEMU StackOverflow address the use of viritio-client (via cmdline). So please adapt the settings shared into the link to virsh xml
template:
<memory unit='KiB'>[VM memory size]</memory>
<currentMemory unit='KiB'>[VM memory size]</currentMemory>
<memoryBacking> <hugepages/> </memoryBacking>
[EDIT-1] Based on the information shared from #ellere Vipin's answer was useful, but did not solve the issue. The configuration option I was missing was the numa option within the cpu element.
depending whether the memory used is from NUMA-SOCKET please add appropriate tag

Related

Issue in postgresql HA mode switching of Master node

I am new in postgresqlDB configuration. I am trying to configure postgresDB in HA mode with the help of pgpool and Elastic IP. Full setup is in AWS RHEL 8 servers.
pgpool version : 4.1.2
postgres version - 12
Below links I have followed during the configuration:
https://www.pgpool.net/docs/pgpool-II-4.1.2/en/html/example-cluster.html#EXAMPLE-CLUSTER-STRUCTURE
https://www.pgpool.net/docs/42/en/html/example-aws.html
https://www.enterprisedb.com/docs/pgpool/latest/03_configuring_connection_pooling/
Currently the postgres and pgpool services are up in all 3 component nodes. But if I am stopping master postgres service/server whole setup is going down and standby node is not taking the place of master. Please find the status of the pool nodes when master is down:
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+--------------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | server1 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2022-10-12 12:10:13
1 | server2 | 5432 | up | 0.333333 | standby | 0 | true | 0 | | | 2022-10-13 09:16:07
2 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | | | 2022-10-13 09:16:07
Any help would be appreciated. Thanks in advance.

Unable to create new database

When I create a new mysql db, slashdb's test connection fails.
Here is how I log into mysql:
$ mysql -u 7stud -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 15
Server version: 5.5.5-10.4.13-MariaDB Homebrew
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| chat |
| ectoing_repo |
| ejabberd |
| information_schema |
| mydb |
| mysql |
| performance_schema |
| test |
+--------------------+
8 rows in set (0.00 sec)
mysql> use mydb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+----------------+
| Tables_in_mydb |
+----------------+
| cheetos |
| greetings |
| mody |
| people |
+----------------+
4 rows in set (0.00 sec)
mysql> select * from people;
+----+--------+------+
| id | name | info |
+----+--------+------+
| 1 | 7stud | abc |
| 2 | Beth | xxx |
| 3 | Diane | xyz |
| 4 | Kathy | xyz |
| 5 | Kathy | xyz |
| 6 | Dave | efg |
| 7 | Tom | zzz |
| 8 | David | abc |
| 9 | Eloise | abc |
| 10 | Jess | xyz |
| 11 | Jeffsy | 2.0 |
| 12 | XXX | xxx |
| 13 | XXX | xxx |
+----+--------+------+
13 rows in set (0.00 sec)
In the slashdb form for creating a new database, here is the info I entered:
Hostname: 127.0.0.1
Port: 80
Database Login: 7stud
Database Password: **
Database Name: mydb
Then I hit the "Test Connection" button, whereupon I get a spinning wheel, which disappears after a few minutes, but no "Connection Successful" message. What am I doing wrong?
Now, I'm using port 3306:
mysql> SHOW GLOBAL VARIABLES LIKE 'PORT';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| port | 3306 |
+---------------+-------+
1 row in set (0.00 sec)
but when slashdb tries to connect, I get the error:
Host localhost:3306 is not accessible
Your port is wrong in the database connection. You said that your MySQL is configured on port 3306, but you also posted your SlashDB config for database with port 80. Please change that to 3306.
Also, not sure how if you don't need to enable remote access to MySQL. Even if your SlashDB is running on the same machine as the MySQL database, it uses TCP/IP to connect.

Privilege separation set gid program

To come to my point, i need to explain the context:
I got a daemon process that opens a posix mq for communication. Clients are in the same group like that daemon to communicate with it. The clients also opens posix mq's and subscribe to the daemon. To be able to communicate, the client mq's must have the same group that the daemon can answer to them.
So far so good, i set the client set gid (chmod g+s client). On a Qt based Desktop (LXQT), the client starts and works as expected. On gtk+ based Desktop (Name LXDE on raspberry pi), it fails to start as gtk+ prevents a set uid/gid programs to use it's library.
As a result i extracted the creation of the mq_open() to a external executable that is set gid (chmod g+s) and uses setegid() to the saved set gid.
The client creates a socketpair(), fork(), execve() and sends the fd through the socketpair (AF_UNIX/SOCK_STREAM) to the client.
The requirements i need to fullfill:
mq's must be readable from all members of the mqclients group
Rights must be 0660 on the mq's
avoid set gid(chmod g+s) on the client
Keep the possible security impact of chmod g+sas small as possible.
Now my point/questions:
I would like to avoid to handle SIGCHLD and kill(), wait() for the mq-opener in the client and daemon. I would just like to readmsg() on the socketpair() and get an error if the mq-opener dies for whatever reason. Just write no signal handler for SIGCHLD?
The fork() procedure and connection with the mq-opener is pretty big. Is there a more simple way to do this?
Can the mq-opener (which starts as a unprivileged user) do the double fork() and drop its parent/child connection to the parent? At what point drops the parent/child relationship?
Would it be better to create a mq-opener daemon that just handle the creation of the mq`s?
To make it a little more clear: Here a diagram:
+-----------+ +------------+
|Daemon | |Client |
+-----------+ +------------+
|File | |File |
|User | |User |
|mqdaemon | |pi |
| | | |
|Group | |Group |
|mqdaemon | |pi |
| | | |
|Rights | +------------+ |Rights |
|a-s | |mq-opener | |a-s |
+-----------+ +------------+ +------------+
|Process | |File | |Process |
|User | |User | |User |
|mqdaemon | |mq-opener | |pi |
| | | | | |
|Group | |Group | |Group |
|mqclients | |mqclients | |pi |
+----+------+ | | +--+---------+
^ | |Rights | | ^
| | |g+s | | |
| | | | | |
| | +------------+ | |
| | fork() |Process | fork() | |
| +------------>|User |<---------+ |
| |(forked) | |
| send_fd() | | send_fd() |
+---------------+|Group |+------------+
|mqclients |
+------------+

postgresql slow: primary key on 80M row table

I've a 64GB RAM machine, 20 CPUs, and this is the command to create a primary key over a table with 80M rows:
ALTER TABLE ONLY wikipedia_article ADD CONSTRAINT pagelinks_pkey PRIMARY KEY (language, title);
The problem I've got with this is that:
only 1 CPU is used up to 100% (avg load 99%), the rest aren't used at all
the write speed during the primary key creation is quite low, 2.6 M/s, read speed missing completely from the report produced by pg_activity
This is the table structure:
Table "public.wikipedia_article"
Column | Type | Modifiers | Storage | Stats target | Description
--------------+------------------+-----------+----------+--------------+-------------
language | text | not null | extended | |
title | text | not null | extended | |
langcount | integer | | plain | |
othercount | integer | | plain | |
totalcount | integer | | plain | |
lat | double precision | | plain | |
lon | double precision | | plain | |
importance | double precision | | plain | |
osm_type | character(1) | | extended | |
osm_id | bigint | | plain | |
infobox_type | text | | extended | |
population | bigint | | plain | |
website | text | | extended | |
Has OIDs: no
The import happens automatically via pg_restore, this is the source
http://www.nominatim.org/data/wikipedia_article.sql.bin
Any ideas on what I can try to make things faster? I changed the value of the conf varible "maintenance_work_mem" to half of the RAM size. I also tried to change some of the kernel settings, with no joy:
sysctl -w kernel.shmmax=17179869184
sysctl -w kernel.shmall=1048576
sysctl vm.overcommit_memory=1
sysctl vm.swappiness=10
The OS is running on a VM, in Digital Ocean, on SSD drives, I was expecting this could work faster for such a VM configuration.
Server info: PostgreSQL 9.3.13 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4, 64-bit
Many thanks, VG
PostgreSQL uses for one statement one CPU - so CPU utilization 100% is expected. You can try to set maintenance_work_mem for some higher value. The default is too low - you can try '2GB' for example.

Unable to start HandlerSocket with mariadb

For some reason, I cannot get HandlerSocket to start listening when I start mariadb (version
10.0.14). I am using Cent OS 6.5.
my.cnf has the following settings:
handlersocket_port = 9998
handlersocket_port_wr = 9999
handlersocket_address = 127.0.0.1
Calling "SHOW GLOBAL VARIABLES LIKE 'handlersocket%'" from the mariaDb prompt shows:
+-------------------------------+-----------+
| Variable_name | Value |
+-------------------------------+-----------+
| handlersocket_accept_balance | 0 |
| handlersocket_address | 127.0.0.1 |
| handlersocket_backlog | 32768 |
| handlersocket_epoll | 1 |
| handlersocket_plain_secret | |
| handlersocket_plain_secret_wr | |
| handlersocket_port | 9998 |
| handlersocket_port_wr | 9999 |
| handlersocket_rcvbuf | 0 |
| handlersocket_readsize | 0 |
| handlersocket_sndbuf | 0 |
| handlersocket_threads | 16 |
| handlersocket_threads_wr | 1 |
| handlersocket_timeout | 300 |
| handlersocket_verbose | 10 |
| handlersocket_wrlock_timeout | 12 |
+-------------------------------+-----------+
I can start mariadb successfully, but when I check to see which ports are actively listening,
neither 9998 nor 9999 show up. I've checked the mysqld.log file, but no errors seem to be occurring.
Answering my own question here -
SELINUX needed to be set to permissive mode to get HandlerSocket started.