wget not saving images while --page-requisites is on

wget not saving images while --page-requisites is on - wget

I try to save images from a site using wget. I have --page-requisites in the command line but it doesn't save the images. For the rest everything goes so fine, it even saves the extension.
wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
http://leveldesigninspirationmachine.tumblr.com/
Why doesn't it get the images?

Related

wget to mirror site make exception to `--no-parent`

I want to mirror a site using wget:
wget --mirror \
--convert-links \
--adjust-extension \
--page-requisites \
--no-parent \
--wait=2 \
--progress=bar \
--show-progress \
--output-file=$LOG_FILE \
--directory-prefix=$DIR_PATH \
$URL
Now, it has been working well but I have come accross a website where the main page from which I want to start is under https://www.website.org/unique_path/here.html but it contains references to files or links that are like: https://www2.website.org/unique_path/there.pdf. However, --no-parent prevents the download of the content under www2... URL. Is there a way to circumvent this? (Or some option that would explicitly work as --no-parent by specifying some wildcard expression that it is ok to go and download here and there?

Is there a way to circumvent this?
You are apparently looking for Spanning Hosts options, you must provide -H option and then you might deliver comma-separated list of acceptable domains via -D, using your example
wget <your current options here> -H -D www.website.org,www2.website.org <your URL here>

reject-regex in wget

I'm trying to mirror a section of intranet TWiki for offline usage as follows:
wget \
--user=twiki \
--password=******** \
--recursive \
-l 2 \
--adjust-extension \
--page-requisites \
--convert-links \
--reject-regex '\?rev=' \
--reject-regex '/twiki/rdiff/' \
--reject-regex '/twiki/attach/' \
--reject-regex '/twiki/edit/' \
--reject-regex '/twiki/oops/' \
--reject-regex '\?raw=on' \
--reject-regex '\?cover=print' \
http://twiki/cgi-bin/twiki/view/SectionToMirror/
For some reason all --reject-regex are ignored. The content I want to reject appears in the copy. Running the above command without any --reject-regex rules renders the same results.
What am I doing wrong?

Build all packages for an image

Is it possible to build all of the packages for a specific image? I know I can build packages individually, but ideally would like to build all of them at once, through a single command.
Alternatively, is there a way to prevent the do_rootfs task from being executed for a particular image.
Cheers, Donal

First make an image that contains a packagegroup (or just list your dependencies there).
$ cat sources/meta-custom/recipes-custom/images/only-packages-image.bb
SUMMARY = "All dependencies no image"
LICENSE = "CLOSED"
version = "##DISTRO_VERSION##"
BB_SCHEDULER = "speed"
# option 1 - packagegroup, package list can be reused in real image
CORE_IMAGE_BASE_INSTALL += "\
packagegroup_all-depends \
"
# option 2 - list deps here, package list can not be reused in real image
CORE_IMAGE_BASE_INSTALL += "\
lshw \
systemd \
cronie \
glibc \
sqlite \
bash \
python3-dev \
python3-2to3 \
python3-misc \
python3-pyvenv \
python3-modules \
python3-pip \
wget \
apt \
pciutils \
file \
tree \
\
wpa-supplicant \
dhcpcd \
networkmanager \
curl-dev \
curl \
hostapd \
iw \
"
# remove the rootfs step
do_rootfs() {
}
Second make your packagegroup if you opted to reuse the list of packages
$ cat sources/meta-custom/recipes-custom/packagegroups/packagegroup-alldeps.bb
PACKAGE_ARCH = "${MACHINE_ARCH}"
inherit packagegroup
RDEPENDS_${PN} = " \
lshw \
systemd \
cronie \
glibc \
sqlite \
bash \
python3-dev \
python3-2to3 \
python3-misc \
python3-pyvenv \
python3-modules \
python3-pip \
wget \
apt \
pciutils \
file \
tree \
\
wpa-supplicant \
dhcpcd \
networkmanager \
curl-dev \
curl \
hostapd \
iw \
"
Finally build your new image placeholder
$ bitbake only-packages-image

In Yocto >=4.0 this is actually pretty easy to achieve. The packagegroup method did not work for me at all.
I don't know if this works in older versions though.
Create a new file in your custom layer, e.g. meta-custom/classes/norootfs.bbclass and put the following lines in there (as far as I noticed the order does not matter):
deltask do_deploy
deltask do_image
deltask do_rootfs
deltask do_image_complete
deltask do_image_setscene
then in your meta-custom/recipes-core/images/myimage.bb add norootfs to your other inherit commands
e.g. the most basic one
inherit core-image norootfs
You will notice your number of tasks decreasing by a fair amount (mine from ~4700 to ~3000) and there is no complete rootfs image anymore in build/tmp/deploy/images, except for bzImage and modules, just the plain ipk files in build/tmp/deploy/ipk.
I got this information by looking at https://docs.yoctoproject.org/ref-manual/tasks.html?highlight=do_image and .bbclass files in meta/classes where deltask is frequently used.

Magento 2 web configuration stuck

I have a problem installing Magento 2. I set the web configuration and i click on the next button, but it is not going on the next step. I tried to set the admin url.

Yo can try install it via comand line.
bin/magento setup:install --backend-frontname="adminlogin" \
--key="YOUR MAGENTO2 REPO KEY" \
--db-host="localhost" \
--db-name="DB_NAME" \
--db-user="MYSQL USERNAME" \
--db-password="PASSWORD FOR MYSQL USER" \
--language="en_US" \
--currency="USD" \
--timezone="America/New_York" \
--use-rewrites=1 \
--use-secure=0 \
--base-url="http://YOUR.DOMAIN" \
--base-url-secure="https://YOUR.DOMAIN"" \
--admin-user=adminuser \
--admin-password=admin123# \
--admin-email=admin#newmagento.com \
--admin-firstname=admin \
--admin-lastname=user \
--cleanup-database
Try it and let me see errorlog, if it doesn't worked for you.
Maybe you have troubles with server or permissions configs.

Exclude webpages containing specific string while recursively dowload a website using wget

I'm trying to scrape a website recursively, but I want to exclude some webpages under that domain, containing the string "unnecessary pages". The string is not present in the URL. Here's the original command to build from:
wget -r --no-parent http://www.website.com
For example; I want to scrape the wikipedia. But I want to exclude articles that contain the keyword "drugs".
Any ideas?
Thanks in advance!

One way to do this is with the following options. It will scrape a site beginning at any path you choose to start and will exclude directories you have specified in LIST:
$ wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains somesite.tld \
--no-parent \
--exclude-directories=LIST \
www.somesite.tld/path/to/start

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

wget not saving images while --page-requisites is on - wget

Related

wget to mirror site make exception to `--no-parent`

reject-regex in wget

Build all packages for an image

Magento 2 web configuration stuck

Exclude webpages containing specific string while recursively dowload a website using wget

Categories

Resources