Unable to disable the hardware prefetcher - prefetch

I am trying to disable the hardware prefetcher to run some memory benchmarks on an Intel core i5 2500. The problem is that there is no option whatsoever in my BIOS to enable or disable the prefetcher. So I am trying to use msr-tools to disable the prefetcher. But msr-tool is failing to write some specific values to the required register (0x1a0h).
$ rdmsr -p 0 0x1a0
850089
$ wrmsr -p 0 0x1a0 0x850289
wrmsr: CPU 0 cannot set MSR 0x000001a0 to 0x0000000000850289
This is the same case for all cpus. But if I try to write the value 0x850088 (simply chosen for testing) it will successfully write that value.
Can anyone point out as to where the problem is and what is the solution for this?
Also I felt weird that there is no prefetcher disabling option in my BIOS. Is this the situation with some version of BIOS?
Thanks.

sudo modprobe msr
sudo rdmsr -p $i 0x1a0 -f 3:0
L2 hardware prefetcher - bit 0
L2 adjacent cache line prefetcher - bit 1
DCU prefetcher - bit 2
DCU IP prefetcher - bit 3
[1]
I get 9 as the response. If the bit is set that prefetcher is OFF.
L2 hardware prefetcher and DCU IP prefetecher are off for me.
To switch them on:
$ sudo rdmsr -p 0 0x1a0
850089
# now we update bits 0 and 3
$ sudo wrmsr -p 0 0x850080 # 0b1001 = 0x9 wheras 0b0000 = 0x0
# check it took
$ sudo rdmsr -p 0 0x1a0
850080
all prefetchers are now on
To switch them off it's similar
$ sudo wrmsr -p0 0x85008f
#this fails for me, I can set bit 1 so L2 adacent cache line prefetcher stays on
$ suod wrmsr -p0 0x85008d
# the others can be switched off
[1] https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors

If you have a Nehalem, Westmere, Sandy Bridge, Ivy Bridge, Haswell, or Broadwell Intel CPU, you can enable/disable various prefetchers with bits 0:4 of MSR 0x1a4. See: https://software.intel.com/content/www/us/en/develop/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors.html
# E.g. showing initially all enabled:
rdmsr -a -c 0x1a4
> 0xf
> <etc>
# Disable all 4 prefetchers, by setting all 4 bits:
wrmsr -a 0x1a4 $(( 2**0 | 2**1 | 2**2 | 2**3))
rdmsr -a -c 0x1a4
> 0xf
> <etc>
This gives a performance improvement of about 5% for me, with an application that has a random-access memory intensive workload.
For older Intel Core / NetBurst architectures, it seems to be a different MSR, 0x1a0, with the ability to enable/disable 2 different prefetchers via bits 9 and 19. See:
https://software.intel.com/content/www/us/en/develop/articles/optimizing-application-performance-on-intel-coret-microarchitecture-using-hardware-implemented-prefetchers.html
To disable/enable them you might do something like (untested, cause I don't have a Core CPU - so I get an error if I try set these 2 bits):
MSRORIG=$(printf "0x%x\n" $(rdmsr -c 0x1a0))
# To set bits 9 and 19 and disable the prefetchers:
wrmsr -a 0x1a0 $((MSRORIG | 1<<19 | 1<<9))
# To unset and enable:
wrmsr -a 0x1a0 $((MSRORIG & ~(1<<19 | 1<<9)))

Related

xxd not doing anything on alpine linux

I'm trying to use xxd to follow a tutorial but it's not printing anything from the Alpine Linux container that I'm trying to run it in.
I am running: xxd -ps -c 1000 <valid-file-path>. When I do this, it just prints out the usage instructions:
~ # xxd -ps -c 1000 $FILE_PATH
BusyBox v1.31.1 () multi-call binary.
Usage: xxd [OPTIONS] [FILE]
Hex dump FILE (or stdin)
-g N Bytes per group
-c N Bytes per line
-p Show only hex bytes, assumes -c30
-l LENGTH Show only first LENGTH bytes
-s OFFSET Skip OFFSET bytes
I seem to be calling it correctly according to the printed usage instructions. What am I doing wrong?
Alpine comes with busybox, which is a smaller version of the utilities that come with say, Ubuntu, GNU Coreutils. If you've heard people say "GNU/Linux", this is what they're referring to - many of the utilities you use on the command line were written by the Free Software Foundation.
Busybox xxd doesn't have the -ps option because it was rewritten to be smaller. It prints out the usage instructions because -ps is not valid. If you run this on macos or linux, you'll get different versions of the original xxd.
As you've found, apk add xxd will install this "original" xxd.

tshark piping output results in packet counter

When I try to pipe tshark output to anything I cannot see the traffic any longer. Tshark just shows a packet counter. How can I prevent this?
sudo tshark -i enp60s0 -f "tcp" -T fields -e ip.src -e ip.dst -e tcp.srcport -e tcp.dstport -e tcp.checksum -e tcp.options -E header=y | column -t
Context
For context, this is the command and output you are talking about:
$ sudo tshark -i enp60s0 -f "tcp" -T fields -e ip.src -e ip.dst -e tcp.srcport \
-e tcp.dstport -e tcp.checksum -e tcp.options -E header=y | column -t
Capturing on 'Eth: enp60s0'
45
Tshark will send control information like where it's capturing and the packet count to stderr instead of stdout. If you don't want to see this control information, send stderr to dev null:
$ sudo tshark ... 2>/dev/null | column -t
Method
We can also continuously generate a new capture every second that we can then read with tshark (see tshark's manpage for full details). This is similar to #ChristopherMaynard's solution, but you don't need to wait for the capture to finish. Saving (-w) with the -b duration:1 will save a new capture every second:
#!/usr/bin/env bash
sudo tshark -w temp.pcap -b duration:1 2>/dev/null &
i=0
while true
do if [ "$i" != "$(ls -1A . | wc -l)" ]; then
newfile="$(ls -t | head -n1)"
sudo tshark -r "$newfile" 2>/dev/null | column -t
fi
sleep 0.1
done
Verification
Running this, we get output like below. Note that we are reading new packet captures, so tshark adds numbers starting at 1 for each new capture it's reading.
1 0.000000 192.168.2.1 → 192.168.2.242 DNS 134 Standard query response 0xfd75 No such name PTR 1.2.168.192.in-addr.arpa SOA localhost 6c:96:cf:d8:7f:e7 ← 78:8a:20:d9:f9:11
2 0.000412 192.168.2.242 → 192.168.2.1 DNS 87 Standard query 0x2a9b PTR 249.249.16.104.in-addr.arpa 78:8a:20:0d:05:e7 ← 6c:96:cf:d8:7f:e7
3 0.023726 192.168.2.1 → 192.168.2.242 DNS 149 Standard query response 0x2a9b No such name PTR 249.249.16.104.in-addr.arpa SOA cruz.ns.cloudflare.com 6c:96:cf:d8:7f:e7 ← 78:8a:20:d9:f9:11
4 0.024091 192.168.2.242 → 192.168.2.1 DNS 85 Standard query 0x2f71 PTR 40.2.168.192.in-addr.arpa 78:8a:20:0d:05:e7 ← 6c:96:cf:d8:7f:e7
1 1.026460 192.168.2.242 → 192.168.2.255 UDP 86 57621 → 57621 Len=44 ff:ff:ff:ff:ff:ff ← 6c:96:cf:d8:7f:e7
2 1.048071 192.168.2.1 → 192.168.2.242 DNS 135 Standard query response 0x2f71 No such name PTR 40.2.168.192.in-addr.arpa SOA localhost 6c:96:cf:d8:7f:e7 ← 78:8a:20:d9:f9:11
3 1.048555 192.168.2.242 → 192.168.2.1 DNS 87 Standard query 0xe77d PTR 25.206.252.198.in-addr.arpa 78:8a:20:0d:05:e7 ← 6c:96:cf:d8:7f:e7
4 1.125073 192.168.2.1 → 192.168.2.242 DNS 118 Standard query response 0xe77d PTR 25.206.252.198.in-addr.arpa PTR stackoverflow.com 6c:96:cf:d8:7f:e7 ← 78:8a:20:d9:f9:11
The column command needs to read all of the input in order to decide how wide to make each column, so you can't use column in this context. (You can test this by issuing your tshark command and then elsewhere issuing a killall tshark and you will then see all your output.
Instead, I think you will have to redirect your output to a file and then once you're finished your tshark capture session, you can cat file | column -t if you want. If you want to see the output on the screen as well as redirect it to a file for later processing, you can pipe it to tee and provide tee with the name of the file to write to. For example, tshark [options] | tee file, but the output you see won't be as nicely formatted until you later do cat file | column -t.

Open File keeps growing despite emptied content

How can I pipe a text stream into a file and, while the file is still in use, wipe it for job rotation?
Long version:
I've been struggling for a while onto an apparently minor issue, that's making my experiments impossible to continue.
I have a software collecting data continuously from external hardware (radiotelescope project) and storing in a csv format. Being the installation at a remote location I would, once a day, copy the saved data in a secure place and wipe the file content while, for the same reason, I can NOT to stop the hardware/software, thus software such as log rotation wouldn't be an option.
For as much effort spent see my previous post, it seems the wiped file keeps growing although empty.
Bizarre behavior, showing file size, truncate file, show file size again:
pi#tower /media/data $ ls -la radio.csv ;ls -s radio.csv;truncate radio.csv -s 1; ls -la radio.csv ;ls -s radio.csv
-rw-r--r-- 1 pi pi 994277 Jan 18 21:32 radio.csv
252 radio.csv
-rw-r--r-- 1 pi pi 1 Jan 18 21:32 radio.csv
0 radio.csv
Then, as soon as more data comes in:
pi#tower /media/data $ ls -la radio.csv ;ls -s radio.csv
-rw-r--r-- 1 pi pi 1011130 Jan 18 21:32 radio.csv
24 radio.csv
I thought to pipe the output into a sed command and save right away, with no luck altogether. Also, filesystem/hardware doesn't seems buggy (tried different hardware/distro/filesystem).
Would anyone be so nice to give me a hint how to proceed?
Thank you in advance.
Piped into tee with -a option. The file was kept open by originating source.
Option APPEND of tee helped to stick at the EOF new data; in the given issue, appending to the beginning when file zeroed.
For search engine and future reference:
sudo rtl_power -f 88M:108M:10k -g 1 - | tee -a radio.csv -
Now emptying the file with
echo -n > radio.csv
gets the file zeroed as expected.

Bash: how to make a substitution in a "live" pipe?

In my office firewall I use a command like this:
$ sudo tcpdump -v -s 1500 -i eth0 port 25 | grep 'smtp: S'
to monitor LAN clients sending mail (I need to early detect any possible spammer bot from some client, we have very looooose security policies, here... :-().
So far, so good: I have a continuous output as soon any client sends an email.
But, if I add some filter to get a cleaner output, something like this:
$ sudo tcpdump -v -s 1500 -i eth0 port 25 | grep 'smtp: S' | perl -pe 's/(.*?\)) (.*?)\.\d+ \>(.*)/$2/'
(here I intend to get only source ip/name), I do not get any output until tcpdump output is more than (bash?) buffer size... (or at least I suppose so...).
Nothing changes using 'sed' instead of 'perl'...
Any hint to get a continuous output of filtered data?
Put stdbuf before the first command:
sudo stdbuf -o0 tcpdump ...
But, if I add some filter to get a cleaner output, something like
this:
Use the --line-buffered option for grep:
--line-buffered
Use line buffering on output. This can cause a performance
penalty.
try maybe a sed --unbuffered (or -u sometimes like on AIX) to have a stram version (not waiting the EOF)

Multiple simultaneous downloads using Wget?

I'm using wget to download website content, but wget downloads the files one by one.
How can I make wget download using 4 simultaneous connections?
Use the aria2:
aria2c -x 16 [url]
# |
# |
# |
# ----> the number of connections
http://aria2.sourceforge.net
Wget does not support multiple socket connections in order to speed up download of files.
I think we can do a bit better than gmarian answer.
The correct way is to use aria2.
aria2c -x 16 -s 16 [url]
# | |
# | |
# | |
# ---------> the number of connections here
Official documentation:
-x, --max-connection-per-server=NUM: The maximum number of connections to one server for each download. Possible Values: 1-16 Default: 1
-s, --split=N: Download a file using N connections. If more than N URIs are given, first N URIs are used and remaining URLs are used for backup. If less than N URIs are given, those URLs are used more than once so that N connections total are made simultaneously. The number of connections to the same host is restricted by the --max-connection-per-server option. See also the --min-split-size option. Possible Values: 1-* Default: 5
Since GNU parallel was not mentioned yet, let me give another way:
cat url.list | parallel -j 8 wget -O {#}.html {}
I found (probably)
a solution
In the process of downloading a few thousand log files from one server
to the next I suddenly had the need to do some serious multithreaded
downloading in BSD, preferably with Wget as that was the simplest way
I could think of handling this. A little looking around led me to
this little nugget:
wget -r -np -N [url] &
wget -r -np -N [url] &
wget -r -np -N [url] &
wget -r -np -N [url]
Just repeat the wget -r -np -N [url] for as many threads as you need...
Now given this isn’t pretty and there are surely better ways to do
this but if you want something quick and dirty it should do the trick...
Note: the option -N makes wget download only "newer" files, which means it won't overwrite or re-download files unless their timestamp changes on the server.
Another program that can do this is axel.
axel -n <NUMBER_OF_CONNECTIONS> URL
For baisic HTTP Auth,
axel -n <NUMBER_OF_CONNECTIONS> "user:password#https://domain.tld/path/file.ext"
Ubuntu man page.
A new (but yet not released) tool is Mget.
It has already many options known from Wget and comes with a library that allows you to easily embed (recursive) downloading into your own application.
To answer your question:
mget --num-threads=4 [url]
UPDATE
Mget is now developed as Wget2 with many bugs fixed and more features (e.g. HTTP/2 support).
--num-threads is now --max-threads.
I strongly suggest to use httrack.
ex: httrack -v -w http://example.com/
It will do a mirror with 8 simultaneous connections as default. Httrack has a tons of options where to play. Have a look.
As other posters have mentioned, I'd suggest you have a look at aria2. From the Ubuntu man page for version 1.16.1:
aria2 is a utility for downloading files. The supported protocols are HTTP(S), FTP, BitTorrent, and Metalink. aria2 can download a file from multiple sources/protocols and tries to utilize your maximum download bandwidth. It supports downloading a file from HTTP(S)/FTP and BitTorrent at the same time, while the data downloaded from HTTP(S)/FTP is uploaded to the BitTorrent swarm. Using Metalink's chunk checksums, aria2 automatically validates chunks of data while downloading a file like BitTorrent.
You can use the -x flag to specify the maximum number of connections per server (default: 1):
aria2c -x 16 [url]
If the same file is available from multiple locations, you can choose to download from all of them. Use the -j flag to specify the maximum number of parallel downloads for every static URI (default: 5).
aria2c -j 5 [url] [url2]
Have a look at http://aria2.sourceforge.net/ for more information. For usage information, the man page is really descriptive and has a section on the bottom with usage examples. An online version can be found at http://aria2.sourceforge.net/manual/en/html/README.html.
wget cant download in multiple connections, instead you can try to user other program like aria2.
use
aria2c -x 10 -i websites.txt >/dev/null 2>/dev/null &
in websites.txt put 1 url per line, example:
https://www.example.com/1.mp4
https://www.example.com/2.mp4
https://www.example.com/3.mp4
https://www.example.com/4.mp4
https://www.example.com/5.mp4
try pcurl
http://sourceforge.net/projects/pcurl/
uses curl instead of wget, downloads in 10 segments in parallel.
They always say it depends but when it comes to mirroring a website The best exists httrack. It is super fast and easy to work. The only downside is it's so called support forum but you can find your way using official documentation. It has both GUI and CLI interface and it Supports cookies just read the docs This is the best.(Be cureful with this tool you can download the whole web on your harddrive)
httrack -c8 [url]
By default maximum number of simultaneous connections limited to 8 to avoid server overload
use xargs to make wget working in multiple file in parallel
#!/bin/bash
mywget()
{
wget "$1"
}
export -f mywget
# run wget in parallel using 8 thread/connection
xargs -P 8 -n 1 -I {} bash -c "mywget '{}'" < list_urls.txt
Aria2 options, The right way working with file smaller than 20mb
aria2c -k 2M -x 10 -s 10 [url]
-k 2M split file into 2mb chunk
-k or --min-split-size has default value of 20mb, if you not set this option and file under 20mb it will only run in single connection no matter what value of -x or -s
You can use xargs
-P is the number of processes, for example, if set -P 4, four links will be downloaded at the same time, if set to -P 0, xargs will launch as many processes as possible and all of the links will be downloaded.
cat links.txt | xargs -P 4 -I{} wget {}
I'm using gnu parallel
cat listoflinks.txt | parallel --bar -j ${MAX_PARALLEL:-$(nproc)} wget -nv {}
cat will pipe a list of line separated URLs to parallel
--bar flag will show parallel execution progress bar
MAX_PARALLEL env var is for maximum no of parallel download, use it carefully, default here is current no of CPUs
tip: use --dry-run to see what will happen if you execute command.
cat listoflinks.txt | parallel --dry-run --bar -j ${MAX_PARALLEL} wget -nv {}
make can be parallelised easily (e.g., make -j 4). For example, here's a simple Makefile I'm using to download files in parallel using wget:
BASE=http://www.somewhere.com/path/to
FILES=$(shell awk '{printf "%s.ext\n", $$1}' filelist.txt)
LOG=download.log
all: $(FILES)
echo $(FILES)
%.ext:
wget -N -a $(LOG) $(BASE)/$#
.PHONY: all
default: all
Consider using Regular Expressions or FTP Globbing. By that you could start wget multiple times with different groups of filename starting characters depending on their frequency of occurrence.
This is for example how I sync a folder between two NAS:
wget --recursive --level 0 --no-host-directories --cut-dirs=2 --no-verbose --timestamping --backups=0 --bind-address=10.0.0.10 --user=<ftp_user> --password=<ftp_password> "ftp://10.0.0.100/foo/bar/[0-9a-hA-H]*" --directory-prefix=/volume1/foo &
wget --recursive --level 0 --no-host-directories --cut-dirs=2 --no-verbose --timestamping --backups=0 --bind-address=10.0.0.11 --user=<ftp_user> --password=<ftp_password> "ftp://10.0.0.100/foo/bar/[!0-9a-hA-H]*" --directory-prefix=/volume1/foo &
The first wget syncs all files/folders starting with 0, 1, 2... F, G, H and the second thread syncs everything else.
This was the easiest way to sync between a NAS with one 10G ethernet port (10.0.0.100) and a NAS with two 1G ethernet ports (10.0.0.10 and 10.0.0.11). I bound the two wget threads through --bind-address to the different ethernet ports and called them parallel by putting & at the end of each line. By that I was able to copy huge files with 2x 100 MB/s = 200 MB/s in total.
Call Wget for each link and set it to run in background.
I tried this Python code
with open('links.txt', 'r')as f1: # Opens links.txt file with read mode
list_1 = f1.read().splitlines() # Get every line in links.txt
for i in list_1: # Iteration over each link
!wget "$i" -bq # Call wget with background mode
Parameters :
b - Run in Background
q - Quiet mode (No Output)
If you are doing recursive downloads, where you don't know all of the URLs yet, wget is perfect.
If you already have a list of each URL you want to download, then skip down to cURL below.
Multiple Simultaneous Downloads Using Wget Recursively (unknown list of URLs)
# Multiple simultaneous donwloads
URL=ftp://ftp.example.com
for i in {1..10}; do
wget --no-clobber --recursive "${URL}" &
done
The above loop will start 10 wget's, each recursively downloading from the same website, however they will not overlap or download the same file twice.
Using --no-clobber prevents each of the 10 wget processes from downloading the same file twice (including full relative URL path).
& forks each wget to the background, allowing you to run multiple simultaneous downloads from the same website using wget.
Multiple Simultaneous Downloads Using curl from a list of URLs
If you already have a list of URLs you want to download, curl -Z is parallelised curl, with a default of 50 downloads running at once.
However, for curl, the list has to be in this format:
url = https://example.com/1.html
-O
url = https://example.com/2.html
-O
So if you already have a list of URLs to download, simply format the list, and then run cURL
cat url_list.txt
#https://example.com/1.html
#https://example.com/2.html
touch url_list_formatted.txt
while read -r URL; do
echo "url = ${URL}" >> url_list_formatted.txt
echo "-O" >> url_list_formatted.txt
done < url_list.txt
Download in parallel using curl from list of URLs:
curl -Z --parallel-max 100 -K url_list_formatted.txt
For example,
$ curl -Z --parallel-max 100 -K url_list_formatted.txt
DL% UL% Dled Uled Xfers Live Qd Total Current Left Speed
100 -- 2512 0 2 0 0 0:00:01 0:00:01 --:--:-- 1973
$ ls
1.html 2.html url_list_formatted.txt url_list.txt