Can you Desolder and replace RAM in Pi boards? - raspberry-pi

Friendly Electronics has come out with a great small compact but very powerful board. It is called the Nano Pi fire 3. It has 8 cores which is great for programers like myself who want to utilize the cores for clusters and multiprocessing AI machines. However, it only has 1 GB of ram. Therefore, one of the processes I am trying to run can easily take up the ram of the entire board. It is fairly easy and straight forward to attach a cheap memory stick compatible with the board to allow for more swap space. This would be done so that the program can continue to run and not cause the board to entirely run out of memory. However, swap space is slow. Especially because the Nano Pi only allows for a USB 2.0, there is no real fast way to utilize the swap space. Unless I am wrong, then please correct and teach me how I would use the swap space in a faster manner. Now, assuming I am correct, the only way to truly speed up the board would be to actually give it a better physical RAM card. On a Raspberry Pi this is impossible/if you don't REALLY know what you are doing dumb. The reason I say this is because the Raspberry Pi places their RAM cards directly on top of the CPU's. I am not going to post the schematics of the Raspberry Pi, but you are welcome to look them up and fact check me. However, the Nano Pi and other boards from Friendly Electronics don't do this.
(here is the layout of the Nano Pi Fire 3):
This image is from http://wiki.friendlyarm.com/wiki/index.php/NanoPi_Fire3
As you can see, the board has 1 GB of DDR3 RAM. It is easy and fairly inexpensive to buy a 16 GB chip of DDR3 RAM. Which is a very viable solution for myself and probably many people trying to get around their issues with slow boards. Is it possible to achieve what I want to do?

Related

What Raspberry Pi would be the best to host these Discord bots?

Because the country is in lockdown and we are learning from home, we can't go to college to use raspberry pi there, I can't spare money to get one and I also don't have anything I could use to build a project with it, so I asked if I can host a discord bot that I'm working on for fun on the pi and professor told me to do an analysis of various pi's and various version of the bot to find with what I could get away when choosing to host one. So here is a hypothetical situation:
There are 3 bots: A-fun bot, B-moderation bot, C-fun bot with DB,
Bot A: Has commands like !blackjack that uses reactions and embeds to portray the game, cards are represented by their number values and various others. Can play music off Youtube by using ytdl, has skip, stop and other commands, has queues, also the bot can get images and jokes from various site API's with axios.
Bot B: Basic moderation bot, no fun or music commands.
Bot C: Has the same commands as Bot A, except it also connects to mongoDB and stores user data there, thus it also has economy.
My questions are:
What kind of Raspberry would I need to host either of these bots?
Could I get away with Raspberry Pi Zero for Bot B?
How many servers could the bots be in before it crashes, how many
people?
I understand that it all depends on the dataflow and how many interactions it has to handle, but the more input I can get on this the better.
Note: All of these hypothetical bots are written using Node.js
note: I write bots in python, so these estimates may be a little off.
In general, a simple discord connection does not use very many resources (eg. moderation commands that are used occasionally.) More servers does not require more processing power, but one can assume that a bot being in more servers will lead to an increase in bot usage.
Making more requests via HTTP and receiving more requests over the gateway will increase resource consumption. Auto-deleting messages may increase resource usage more than expected.
As for bot B(a)(no message filter), you could probably get away with a raspberry pi 0/0w for 10-20 servers. Bot B(b)(has message filter) will require more RAM and CPU power. I would recommend a Raspberry Pi 2 for the word filter.
Writing games using Discord results in many requests for reactions, editing messages, and possibly an AI. I am not sure how the economy works on bot C, but using MongoDB should not take too much additional CPU power. Depending on the number of servers it is in, you may want a faster SD card and more RAM.
**For bots A and C, it really depends on how much it is used. A small bot (active use in 1-2 servers) would probably only need 1gb of ram. For a lager bot, I would recommend investing in 2+ GB of ram, especially for bot C. If you are planning on making one of the "fun" bots public, I would recommend at least 4 GB ram. **
Alternative Option:
most small (<10 servers) bots can be run on a decent computer (eg. dual core 2ghz, 8gb ram) with no significant performance reduction.
TL;DR:
Pi0 will work for bot Ba. Get more ram/a better processor for bot Bb. I recommend 2gb if private, 4gb if public RAM for bot A/C, a faster processor for bot C especially. most discord bots will not crash unless you are absolutely straining the hardware.
a raspberry pi 4 (8gb) could probably run all three bots at once
You could maybe get away with a Pi Zero W if your doing a basic bot, but I would recommend a Pi 3 or a Pi 4 for more advanced bots. It can also depend on how much data you are storing. You can try using the PIs power itself or use repl on chromium on the pi.
You would need a high storage SD Card for DBs and unexpected growth.
If your using a pi4 with more than 2gb of ram you could get around 75 servers with a very good network connection. With a pi3 you can maybe get 40 servers. With a pi zero w around 15 servers. A lot of this depends on the cpu and the network connection. This is guessing that each server has around 100 people.
TL;DR - Pi Zero W for basic bots, Pi 3 or 4 for more advanced bots.

Getting set-up for neural networks: external or internal GPU?

I'm not sure I should be asking this here, but this seems like the one place I know of where people have a lot of practical experience with neural networks.
I'm trying to set up a rig for neural nets. I've got a desktop tower, but I'd need to get a bigger power supply for the RTX 2080 GPU I'm considering and possibly more (noisy) fans.
But an option might be to attach an external GPU. An external solution addresses the above issues and might make it possible to add GPUs if needed and perhaps makes the GPU mobile (use it at home and office).
Does anyone know what kind of performance hit an external GPU would involve? The RTX 2080 has a memory bandwidth of 448 GB/s compared with the T3's transfer rate of 5GB/s or, if it's possible to connect an external GPU directly to the PCI-e bus, maybe 32 GB/s. Then again, I don't know how much bandwidth is needed between the graphics card and the rest of the computer. I'm also not sure whether linux works with external GPUs.
Anyone have any relevant info?

How do I write a simple bare-metal program?

Hypothetical here: Let's say you have a processor attached to some form of USB-storage and a motor. How would I write a simple bare-metal program to tell the motor to move for 10 seconds? I want to learn how to bare metal program, and having a program to look at and analyze would be wonderful. (Any language would be great)
1) you have to understand how the processor in question boots. there is the core processor itself then the non-volatile storage. for example a raspberry pi is a little unique in that there is something in logic (or an on chip rom?) that reads the sd card, boots up the gpu, then the gpu copies the arm program to ram and releases the arm into that ram. most microcontrollers have on board flash and ram and the flash is mapped into the address space that the processor boots from and/or there is a vendor supplied bootloader that boots the processor then calls your code.
2) you have to learn how to enable and initialize the peripherals you care to use. a timer perhaps if you want to count to 10 seconds.
3) write the application.
debugging is the trick, you can sometimes use a hardware debugger via jtag or sometimes use a rom monitor via gdb or some other program on some interface like uart. or one that almost always is available either a blinking led or a uart to send text or numbers out to see what is going on. and of course an oscilloscope, you can wiggle gpios or do other things and see them on the scope.
driving a motor is too generic you need to specifically know what kind of motor and how to drive it, likely zero chance you are driving it directly from the microcontroller, you might have something outside a transistor h bridge or something that isolates the microcontroller or you have a specific type of motor driver chip/circuit that you speak to either through discrete signals, or i2c or spi, or other to tell it to drive the motor, and then maybe some analog to deal with the high power or maybe that chip is a hybrid. so you have to know all that or at least you have to know the programming side of all that, what interface and/or what signals have to have state changes in what way to cause the motor to do something. it could be as simple as a pwm that you create that is amplified between you and the motor. pwm may involve first learning how to mess with one of the timers then either another peripheral or a subset of the timer to make a pwm signal out of it. a scope is really helpful here too or a logic analyzer, or if you have another microcontroller you can sample a gpio in faster than the signal being generated you can turn it into a logic analyzer.
start with finding a board, blink an led, figure out how to run a timer, accurately blink an led. figure out the clock rates you are really running at instead of guessing, figure out how to configure the uart, send some characters out that, now you have a debug interface a knowledge of what your timer reference clock speeds are and can now try to count to 10 seconds, and then get into the signals needed for the motor. expect to blow up a few boards, buy some spares.

How do interrupts work in multi-core system?

I want to write code for interrupts of the buttons on Raspberry pi 2. This board uses QUAD Core Broadcom BCM2836 CPU (ARM architecture). That mean, only one CPU is on this board (Raspberry pi 2). But I don't know how do interrupts in multi-core system. I wonder whether interrupt line is connected to each core or one CPU. So, I found the paragraph below via Google:
Interrupts on multi-core systems
On a multi-core system, each interrupt is directed to one (and only one) CPU, although it doesn't matter which. How this happens is under control of the programmable interrupt controller chip(s) on the board. When you initialize the PICs in your system's startup, you can program them to deliver the interrupts to whichever CPU you want to; on some PICs you can even get the interrupt to rotate between the CPUs each time it goes off.
Does this mean that interrupts happen with each CPU? I can't understand exactly above info. If interrupts happen to each core, I must take account of critical section for shared data on each interrupt service routine of the buttons.
If interrupts happen to each CPU, I don't have to take account of critical section for shared data. What is correct?
To sum up, I wonder How do interrupts in multi-core system? Is the interrupt line is connected to each core or CPU? So, should I have to take account of critical section for same interrupt?
your quote from google looks quite generic or perhaps even leaning on the size of x86, but doesnt really matter if that were the case.
I sure hope that you would be able to control interrupts per cpu such that you can have one type go to one and another to another.
Likewise that there is a choice to have all of them interrupted in case you want that.
Interrupts are irrelevant to shared resources, you have to handle shared resources whether you are in an ISR or not, so the interrupt doesnt matter you have to deal with it. Having the ability to isolate interrupts from one peripheral to one cpu could make the sharing easier in that you could have one cpu own a resource and other cpus make requests to the cpu that owns it for example.
Dual, Quad, etc cores doesnt matter, treat each core as a single cpu, which it is, and solve the interrupt problems as you would for a single cpu. Again shared resources are shared resources, during interrupts or not during interrupts. Solve the problem for one cpu then deal with any sharing.
Being an ARM each chip vendors implementation can vary from another, so there cannot be one universal answer, you have to read the arm docs for the arm core (and if possible the specific version as they can/do vary) as well as the chip vendors docs for whatever they have around the arm core. Being a Broadcom in this case, good luck with chip vendor docs. They are at best limited, esp with the raspi2. You might have to dig through the linux sources. No matter what, arm, x86, mips, etc, you have to just read the documentation and do some experiments. Start off by treating each core as a standalone cpu, then deal with sharing of resources if required.
If I remember right the default case is to have just the first core running the kernel7.img off the sd card, the other three are spinning in a loop waiting for an address (each has its own) to be written to get them to jump to that and start doing something else. So you quite literally can just start off with a single cpu, no sharing, and figure that out, if you choose to not have code on the other cpus that touch that resource, done. if you do THEN figure out how to share a resource.

OS memory isolation

I am trying to write a very thin hypervisor that would have the following restrictions:
runs only one operating system at a time (ie. no OS concurrency, no hardware sharing, no way to switch to another OS)
it should be able only to isolate some portions of RAM (do some memory translations behind the OS - let's say I have 6GB of RAM, I want Linux / Win not to use the first 100MB, see just 5.9MB and use them without knowing what's behind)
I searched the Internet, but found close to nothing on this specific matter, as I want to keep as little overhead as possible (the current hypervisor implementations don't fit my needs).
What you are looking for already exists, in hardware!
It's called IOMMU[1]. Basically, like page tables, adding a translation layer between the executed instructions and the actual physical hardware.
AMD calls it IOMMU[2], Intel calls it VT-d (please google:"intel vt-d" I cannot post more than two links yet).
[1] http://en.wikipedia.org/wiki/IOMMU
[2] http://developer.amd.com/documentation/articles/pages/892006101.aspx
Here are a few suggestions / hints, which are necessarily somewhat incomplete, as developing a from-scratch hypervisor is an involved task.
Make your hypervisor "multiboot-compliant" at first. This will enable it to reside as a typical entry in a bootloader configuration file, e.g., /boot/grub/menu.lst or /boot/grub/grub.cfg.
You want to set aside your 100MB at the top of memory, e.g., from 5.9GB up to 6GB. Since you mentioned Windows I'm assuming you're interested in the x86 architecture. The long history of x86 means that the first few megabytes are filled with all kinds of legacy device complexities. There is plenty of material on the web about the "hole" between 640K and 1MB (plenty of information on the web detailing this). Older ISA devices (many of which still survive in modern systems in "Super I/O chips") are restricted to performing DMA to the first 16 MB of physical memory. If you try to get in between Windows or Linux and its relationship with these first few MB of RAM, you will have a lot more complexity to wrestle with. Save that for later, once you've got something that boots.
As physical addresses approach 4GB (2^32, hence the physical memory limit on a basic 32-bit architecture), things get complex again, as many devices are memory-mapped into this region. For example (referencing the other answer), the IOMMU that Intel provides with its VT-d technology tends to have its configuration registers mapped to physical addresses beginning with 0xfedNNNNN.
This is doubly true for a system with multiple processors. I would suggest you start on a uniprocessor system, disable other processors from within BIOS, or at least manually configure your guest OS not to enable the other processors (e.g., for Linux, include 'nosmp'
on the kernel command line -- e.g., in your /boot/grub/menu.lst).
Next, learn about the "e820" map. Again there is plenty of material on the web, but perhaps the best place to start is to boot up a Linux system and look near the top of the output 'dmesg'. This is how the BIOS communicates to the OS which portions of physical memory space are "reserved" for devices or other platform-specific BIOS/firmware uses (e.g., to emulate a PS/2 keyboard on a system with only USB I/O ports).
One way for your hypervisor to "hide" its 100MB from the guest OS is to add an entry to the system's e820 map. A quick and dirty way to get things started is to use the Linux kernel command line option "mem=" or the Windows boot.ini / bcdedit flag "/maxmem".
There are a lot more details and things you are likely to encounter (e.g., x86 processors begin in 16-bit mode when first powered-up), but if you do a little homework on the ones listed here, then hopefully you will be in a better position to ask follow-up questions.