cephfs seems to have a problem with linux 5.11 kernels

cephfs seems to have a problem with linux 5.11 kernels - ceph

Upgraded fedora33 lately and found my cephfs mounts won't work anymore. After hours of debugging and looking around, I realized a new kernel 5.11.X was installed. Before I had 5.10.X. Did reboot with 5.10 and everything was fine. To verify the kernel version is the problem I installed a recent ubuntu 21.04 with kernel 5.11.0: showed the same problem. Now I have fixed my kernel to boot to 5.10 and I can live with that, but there seems to be a serious problem with > 5.10 kernels.
I'm using octopus. Any ideas?
Adding ms_mode=legacy does not help.
When I try to mount I get lot's of kernel logs starting with:
Apr 26 09:22:15 ubuntu kernel: libceph: no match of type 2 in addrvec
Apr 26 09:22:15 ubuntu kernel: libceph: corrupt full osdmap (-2) epoch 64001 off 3154 (0000000073edcb82 of 00000000aaa67e88-00000000ea93de62)
Apr 26 09:22:15 ubuntu kernel: osdmap: 00000000: 08 07 72 20 00 00 09 01 9e 12 00 00 86 bb d6 c5 ..r ............
Apr 26 09:22:15 ubuntu kernel: osdmap: 00000010: ae 96 4c 78 8a 5e 50 62 3f 0a e5 24 01 fa 00 00 ..Lx.^Pb?..$....
Apr 26 09:22:15 ubuntu kernel: osdmap: 00000020: 54 f0 53 5d 3a fd ae 0e 3e ea 85 60 07 ab 94 2b T.S]:...>..`...+
Apr 26 09:22:15 ubuntu kernel: osdmap: 00000030: 06 00 00 00 02 00 00 00 00 00 00 00 1d 05 44 01 ..............D.
Apr 26 09:22:15 ubuntu kernel: osdmap: 00000040: 00 00 01 02 02 02 20 00 00 00 20 00 00 00 00 00 ...... ... .....
.....
Apr 26 09:22:15 ubuntu kernel: libceph: osdc handle_map corrupt msg
....
Magnus

I can confirm this. I'm booting my linux with 5.10.X and it works well but when I switch to the 5.11.X I have the corrupt message and cannot attach my rbd volumes.
Something is wrong with it. Can you open an issue to ceph and post here the issue, please?

Related

Control USART RTS pin from driver on embedded board

I'm porting the lirc_serial kernel module to work on our embedded board. We only need to implement the IR transmitter function.
For only the transmitter, the custom driver need only control the RTS pin on /dev/ttyS0, from within the module.
On standard hardware, the driver loads:
00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
NEW APPROACH
I have not figured out how to deactivate the current driver for the serial port, so rather than create a new driver, how would you use the current 8250-dw driver to change the RTS pin on behalf of my kernel module? Something like this?
Using line disciplines per this article on the slip driver looked promising, Just take slip.c and remove the network side of the code. But it needs a user space program (slattach or dip) to open /dev/ttyS0 and activate the line discipline.
Is that possible (or a good idea) from within a kernel module?
In this similar question, How do I open/write/read a uart device from a kernel module?, Ian Abbott suggested backporting serdev to kernel 4.9.
That's getting a bit involved and we're already behind schedule. Is there an easier way?
ORIGINAL QUESTION
However, the embedded board (based on the BayTrail Atom E3845) has the serial port controller in memory mapped I/O:
80860F0A:00: ttyS0 at MMIO 0x90a0c000 (irq = 39, base_baud = 2764800) is a 16550A
80860F0A:01: ttyS1 at MMIO 0x90a0e000 (irq = 40, base_baud = 2764800) is a 16550A
I'm new to driver development. I guess 0x90a0c000 is the physical address of the controller?
To probe the module, I first remapped 0x90a0c000 to a virtual address using ioremap_nocache and then tried to reserve the memory using request_mem_region. That failed.
ioVirtBase = ioremap_nocache(iommap, 8);
TQTRACE("ecp_serial_probe: devm_ioremap for MMIO 0x%X returned 0x%X\n", (uint32_t)iommap, (uint32_t)ioVirtBase);
if (ioVirtBase != NULL)
{
tqDumpBuffer(ioVirtBase, 8);
}
tqRes = request_mem_region((uint32_t)ioVirtBase, 8, ECP_DRIVER_NAME);
TQTRACE("ecp_serial_probe: request_mem_region for 0x%X returned 0x%X\n", (uint32_t)ioVirtBase, (uint32_t)tqRes);
if (!tqRes)
{
TQTRACE("ecp_serial_probe: Cannot request memory at 0x%X\n", (uint32_t)iommap);
return -ENXIO;
}
Is this the correct order of the functions?
Also, it seems request_mem_region fails because the device is under control of 80860F0A ?? There is no such entry in lsmod but there is an entry in /sys/devices.
Do I need to unload that driver to control the USART? How?
# ls -l /sys/devices/platform/80860F0A\:00
lrwxrwxrwx 1 root root 0 Jul 8 23:30 driver -> ../../../bus/platform/drivers/dw-apb-uart
-rw-r--r-- 1 root root 4096 Jul 9 17:02 driver_override
lrwxrwxrwx 1 root root 0 Jul 9 17:14 firmware_node -> ../../LNXSYSTM:00/LNXSYBUS:00/80860F0A:00
-r--r--r-- 1 root root 4096 Jul 9 17:02 modalias
drwxr-xr-x 2 root root 0 Jul 9 17:02 power
lrwxrwxrwx 1 root root 0 Jul 8 23:30 subsystem -> ../../../bus/platform
drwxr-xr-x 3 root root 0 Jul 8 23:30 tty
-rw-r--r-- 1 root root 4096 Jul 9 17:02 uevent
drwxr-xr-x 3 root root 0 Jul 8 23:30 VCOM0001:00
dmesg output below. Dumping the data at the remapped virtual address is not consistent. Sometimes all 0xFF, other times, 00 00 00 00 41 02 1C 48. I don't understand that either...
MARK Tue Jul 9 17:45:35 SGT 2019
ecp_serial: ecp_serial_exit_module()
Spectre V2 : System may be vulnerable to spectre v2
ecp_serial: loading module not compiled with retpoline compiler.
ecp_serial: ecp_serial_init_module()
ecp_serial: ecp_serial_init()
ecp_serial: ecp_serial_probe() iommap=0x90A0C000
ecp_serial: ecp_serial_probe: devm_ioremap for MMIO 0x90A0C000 returned 0xE3296000
ecp_serial: Dump address 0xE3296000:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000 FF FF FF FF FF FF FF FF
ecp_serial: ecp_serial_probe: request_mem_region for 0xE3296000 returned 0x0
ecp_serial: ecp_serial_probe: Cannot request memory at 0x90A0C000
platform ecp_serial.0: lirc_dev: driver ecp_serial registered at minor = 0
MARK Tue Jul 9 17:46:08 SGT 2019
ecp_serial: ecp_serial_exit_module()
Spectre V2 : System may be vulnerable to spectre v2
ecp_serial: loading module not compiled with retpoline compiler.
ecp_serial: ecp_serial_init_module()
ecp_serial: ecp_serial_init()
ecp_serial: ecp_serial_probe() iommap=0x90A0C000
ecp_serial: ecp_serial_probe: devm_ioremap for MMIO 0x90A0C000 returned 0xE32A2000
ecp_serial: Dump address 0xE32A2000:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0000 00 00 00 00 41 02 1C 48
ecp_serial: ecp_serial_probe: request_mem_region for 0xE32A2000 returned 0x0
ecp_serial: ecp_serial_probe: Cannot request memory at 0x90A0C000
platform ecp_serial.0: lirc_dev: driver ecp_serial registered at minor = 0
What proc/iomem has to say
90a0c000-90a0cfff : 80860F0A:00
90a0e000-90a0efff : 80860F0A:01
So indeed, the memory is under control of another driver... But how to unload it if it is not listed in lsmod ?
# rmmod 80860F0A:00
ERROR: Module 80860F0A:00 does not exist in /proc/modules
# rmmod 80860F0A
ERROR: Module 80860F0A does not exist in /proc/modules
OS INFO
# uname -a
Linux ecp 4.4.127-1.el6.elrepo.i686 #1 SMP Sun Apr 8 09:44:43 EDT 2018 i686 i686 i386 GNU/Linux
# cat /etc/centos-release
CentOS release 6.6 (Final)

Debugging an unnecessary newline character in a Kubernetes secret

I have an environment variable called GOOGLE_MAPS_DIRECTIONS_API_KEY, populated by a Kubernetes secret YAML:
apiVersion: v1
kind: Secret
metadata:
name: google-maps-directions-api-secret
type: Opaque
data:
GOOGLE_MAPS_DIRECTIONS_API_KEY: QUl...QbUpqTHNJ
The secret was created by copy-pasting the result of running echo -n "AIz..." | base64 on my API key. I've provided the beginning and the end of the key in this code snippet, to show that there is no newline in the key included in the secret file.
Here is what I see when I run cat google-maps-directions-api-key-secret.yaml | hexdump -C:
00000000 61 70 69 56 65 72 73 69 6f 6e 3a 20 76 31 0a 6b |apiVersion: v1.k|
00000010 69 6e 64 3a 20 53 65 63 72 65 74 0a 6d 65 74 61 |ind: Secret.meta|
00000020 64 61 74 61 3a 0a 20 20 6e 61 6d 65 3a 20 67 6f |data:. name: go|
00000030 6f 67 6c 65 2d 6d 61 70 73 2d 64 69 72 65 63 74 |ogle-maps-direct|
00000040 69 6f 6e 73 2d 61 70 69 2d 73 65 63 72 65 74 0a |ions-api-secret.|
00000050 74 79 70 65 3a 20 4f 70 61 71 75 65 0a 64 61 74 |type: Opaque.dat|
00000060 61 3a 0a 20 20 47 4f 4f 47 4c 45 5f 4d 41 50 53 |a:. GOOGLE_MAPS|
00000070 5f 44 49 52 45 43 54 49 4f 4e 53 5f 41 50 49 5f |_DIRECTIONS_API_|
00000080 4b 45 59 3a 20 51 55 6c 36 59 56 4e 35 51 7a 68 |KEY: QUl6YVN5Qzh|
...
000000b0 51 62 55 70 71 54 48 4e 4a |QbUpqTHNJ|
000000b9
But! When I step into a Node.JS interpreter inside of the pod, I see the following:
> process.env.GOOGLE_MAPS_DIRECTIONS_API_KEY
'AIz...jLsI\n'
There is an auxiliary newline character appended to the end of the string!
This is, frankly, extremely frustrating. I have several questions on this subject.
Can you spot my error? E.g. at what point in the secret propagation pipeline am I accidentally inserting that newline?
What Unix command should I use to print a newline character to console in such a way that it is interpreted literally (as a \n), so that I can actually see it?
Is it considered bad practice to inject code removing trailing newlines from environment variables into my container image? I know this is not technically correct, but this hurts like hell.

If you previously created the secret without the -n option to echo, verify the Secret persisted in the API (kubectl get secret/google-maps-directions-api-secret -o yaml) matches the secret in your yaml file, and also verify the consuming app has been redeployed since the secret was updated with the correct value

I don't see anything odd with how your secret looks. As you alluded to, the first thing I would do is exec into the pod, drop into bash, and echo out the environment variable to confirm it's propagated incorrectly. After doing a quick test, the newline should show up fine with a printf:
printf '%s' $GOOGLE_MAPS_DIRECTIONS_API_KEY
If it looks fine when printing it from bash, then the issue is with how node is interpreting it. If it looks messed up, then you need to take another look at how you're generating it.
FYI if the result of process.env is actually your API key, you should revoke it ASAP as you just published it in your question.
As for whether it's bad practice to strip newlines, yes. This can cause unexpected issues down the line if an actual piece of secret information contains a newline.

J1939 RTR Issue

I have and issue with rtr frames using candump and cansend.
Dumping the broadcasted data is no issue.
Architecture -
Raspberry pi with a pican shield reading data from a J1939 simulator.
I run candump to receive all messages on the bus. Then get an ack frame back from the simulator when I execute a cansend for pgn feec. Im requesting a preprogrammed VIN but I get nothing back. Here is what Im seeing from candump:
can0 18FEF500 [8] 7D FF FF 40 25 4B FF FF '}..#%K..'
can0 18FEE900 [8] D1 4B 03 00 D1 4B 03 00 '.K...K..'
can0 18FEF700 [8] FF FF FF FF E0 01 FF FF '........'
can0 18FECA00 [8] 03 FF 00 00 00 00 00 00 '........'
can0 00FEEC00 [0] remote request
can0 18E80000 [8] 01 FF FF FF FF EC FE 00 '........'
can0 0CF00300 [8] FF 7D 7D FF FF FF FF FF '.}}.....'
can0 18FE6C00 [8] FF FF FF FF FF FF 80 7D '.......}'
can0 0CF00400 [8] FF FF 7D 80 7D FF FF FF '..}.}...''
The E800 PGN is a standard ack message.
And message I am sending while candump is running:
cansend can0 00feec00#r
Basically, I'm not getting the PGN for VIN back. Any ideas?

Turns out there are a couple of issues here.
1- #r is not supported with J1939
2- you don't request pgns by asking for that pgn directly. the method is to send data to a specific pgn which handles requests. example below:
EA 00 is the PGN to send data to. Inside the data message lives the pgn we want to request (LSB) so PGN FEE5 is now E5FE. Three bytes are required which is why 00 is in the message below.
Here is the working request for Engine Hours:
cansend 18EA00FF#E5FE00
and the reponse:
21 00 00 00 8F 01 00 00

Mac OS - Deployd - mongod fails to start

I'm trying to use Deployd on my Mac. I've installed mongoDB and added it's bin folder to my $PATH - mongod runs perfectly with my user. The problem appears when I try to run Deployd, mongod fails to run.
I runned it with DEBUG=* dpdand the results I've got are:
starting deployd v0.8.0...
mongod starting mongod +0ms
mongod <Buffer 32 30 31 35 2d 30 33 2d 31 32 54 31 39 3a 34 30 3a 34 31 2e 30 36 30 2b 30 31 30 30 20 49 20 43 4f 4e 54 52 4f 4c 20 20 5b 69 6e 69 74 61 6e 64 6c 69 ... > +158ms
mongod <Buffer 32 30 31 35 2d 30 33 2d 31 32 54 31 39 3a 34 30 3a 34 31 2e 30 36 30 2b 30 31 30 30 20 49 20 43 4f 4e 54 52 4f 4c 20 20 5b 69 6e 69 74 61 6e 64 6c 69 ... > +2ms
server started with options {"port":2403,"db":{"host":"127.0.0.1","port":4660,"name":"-deployd"},"env":"development"} +44ms
socket.io:server initializing namespace / +0ms
socket.io:server creating engine.io instance with opts {"log level":0,"path":"/socket.io"} +1ms
socket.io:server attaching client serving req handler +1ms
mongod <Buffer 32 30 31 35 2d 30 33 2d 31 32 54 31 39 3a 34 30 3a 34 31 2e 31 30 36 2b 30 31 30 30 20 49 20 4e 45 54 57 4f 52 4b 20 20 5b 69 6e 69 74 61 6e 64 6c 69 ... > +5ms
internal-resources constructed +10ms
listening on port 2403
type help for a list of commands
dpd > mongod error: 1 +757ms
mongod killing mongod +0ms
Failed to start MongoDB (Make sure 'mongod' are in your $PATH or use dpd --mongod option. Ref: http://docs.deployd.com/docs/basics/cli.html)
The only way I've got deploy to run is with sudo dpd -d. I've changed /data/db's ownership from root to my user. I also changed the ownership of mongod and ./mongodb/bin.
Does someone knows what I'm missing?
Thanks in advance.

Try to pass the path to your mongod executable via the '-m' parameter
dpd -m /path/to/mongod/
as described here http://docs.deployd.com/docs/basics/cli.html

Have you ensured there are no extra mongo.lock and local files in your data folder.
I had the same issue and deleting these extra files solved the problem.
(I think these are generated when mongo is shut down ungracefully).

Why does my Perl CGI program return a server error?

I recently got into learning cgi and I set up an Ubuntu server in vbox. The first program I wrote was in Python using vim through ssh. Then I installed Eclipse on my Windows 7 station and created the exact same Perl file; just a simple hello world deal.
I tried running it, and I was getting a 500 on it, while the Python code in the same dir (/usr/lib/cgi-bin) was showing up fine. Frustrated, I checked and triple-checked the permissions and that it began with #!/usr/bin/perl. I also checked whether or not AddHandler was set to .pl. Everything was set fine, and on a whim I decided to write the same exact code within the server using vim like I did with the Python file.
Lo and behold, it worked. I compared them, thinking I'd gone mad, and they are exactly the same. So, what's the deal? Why is a file made in Windows 7 on Eclipse different than a file made in Ubuntu server with vim? Do they have different binary headers or something? This can really affect my development environment.
#!/usr/bin/perl
print "Content-type: text/html\n\n";
print "Testing.";
Apache error log:
[Tue Aug 07 12:32:02 2012] [error] [client 192.168.1.8] (2)No such file or directory: exec of '/usr/lib/cgi-bin/test.pl' failed
[Tue Aug 07 12:32:02 2012] [error] [client 192.168.1.8] Premature end of script headers: test.pl
[Tue Aug 07 12:32:02 2012] [error] [client 192.168.1.8] File does not exist: /var/www/favicon.ico
This is the continuing error I get.

I think you have some spurious \r characters on the first line of your Perl script when you write it in Windows.
For example I created the following file on Windows:
#!/usr/bin/perl
code goes here
When viewed with hexdump it shows:
00000000 23 21 2f 75 73 72 2f 62 69 6e 2f 70 65 72 6c 0d |#!/usr/bin/perl.|
00000010 0a 0d 0a 63 6f 64 65 20 67 6f 65 73 20 68 65 72 |...code goes her|
00000020 65 0d 0a |e..|
00000023
Notice the 0d - \r that I've marked out in that. If I try and run this using ./test.pl I get:
zsh: ./test.pl: bad interpreter: /usr/bin/perl^M: no such file or directory
Whereas if I write the same code in Vim on a UNIX machine I get:
00000000 23 21 2f 75 73 72 2f 62 69 6e 2f 70 65 72 6c 0a |#!/usr/bin/perl.|
00000010 0a 63 6f 64 65 20 67 6f 65 73 20 68 65 72 65 0a |.code goes here.|
00000020
You can fix this in one of several ways:
You can probably make your editor save "UNIX line endings" or similar.
You can run dos2unix or similar on the file after saving it
You can use sed: sed -e 's/\r//g' or similar.
Your apache logs should be able to confirm this (If they don't crank up the logging a bit on your development server).

Sure, it can.
One environment might have a module installed that the other might not.
Perl might be installed in different locations in the two environment.
The environments might have different versions of Perl.
The environments might have different operating systems.
The permissions might be setup incorrectly in one of the environments.
etc
But instead of speculating wildly like this, why don't you check the error log for what error you actually got?

No, they are just text files. Of course, it's possible to write unportable programs, trivially by using system() or other similar services which depend on the environment.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

cephfs seems to have a problem with linux 5.11 kernels - ceph

I can confirm this. I'm booting my linux with 5.10.X and it works well but when I switch to the 5.11.X I have the corrupt message and cannot attach my rbd volumes. Something is wrong with it. Can you open an issue to ceph and post here the issue, please?

Related

Control USART RTS pin from driver on embedded board

Debugging an unnecessary newline character in a Kubernetes secret

J1939 RTR Issue

Mac OS - Deployd - mongod fails to start

Why does my Perl CGI program return a server error?

Categories

Resources