I try to save images from a site using wget. I have --page-requisites in the command line but it doesn't save the images. For the rest everything goes so fine, it even saves the extension.
wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
http://leveldesigninspirationmachine.tumblr.com/
Why doesn't it get the images?
Related
I want to mirror a site using wget:
wget --mirror \
--convert-links \
--adjust-extension \
--page-requisites \
--no-parent \
--wait=2 \
--progress=bar \
--show-progress \
--output-file=$LOG_FILE \
--directory-prefix=$DIR_PATH \
$URL
Now, it has been working well but I have come accross a website where the main page from which I want to start is under https://www.website.org/unique_path/here.html but it contains references to files or links that are like: https://www2.website.org/unique_path/there.pdf. However, --no-parent prevents the download of the content under www2... URL. Is there a way to circumvent this? (Or some option that would explicitly work as --no-parent by specifying some wildcard expression that it is ok to go and download here and there?
Is there a way to circumvent this?
You are apparently looking for Spanning Hosts options, you must provide -H option and then you might deliver comma-separated list of acceptable domains via -D, using your example
wget <your current options here> -H -D www.website.org,www2.website.org <your URL here>
I'm trying to mirror a section of intranet TWiki for offline usage as follows:
wget \
--user=twiki \
--password=******** \
--recursive \
-l 2 \
--adjust-extension \
--page-requisites \
--convert-links \
--reject-regex '\?rev=' \
--reject-regex '/twiki/rdiff/' \
--reject-regex '/twiki/attach/' \
--reject-regex '/twiki/edit/' \
--reject-regex '/twiki/oops/' \
--reject-regex '\?raw=on' \
--reject-regex '\?cover=print' \
http://twiki/cgi-bin/twiki/view/SectionToMirror/
For some reason all --reject-regex are ignored. The content I want to reject appears in the copy. Running the above command without any --reject-regex rules renders the same results.
What am I doing wrong?
Is it possible to build all of the packages for a specific image? I know I can build packages individually, but ideally would like to build all of them at once, through a single command.
Alternatively, is there a way to prevent the do_rootfs task from being executed for a particular image.
Cheers, Donal
First make an image that contains a packagegroup (or just list your dependencies there).
$ cat sources/meta-custom/recipes-custom/images/only-packages-image.bb
SUMMARY = "All dependencies no image"
LICENSE = "CLOSED"
version = "##DISTRO_VERSION##"
BB_SCHEDULER = "speed"
# option 1 - packagegroup, package list can be reused in real image
CORE_IMAGE_BASE_INSTALL += "\
packagegroup_all-depends \
"
# option 2 - list deps here, package list can not be reused in real image
CORE_IMAGE_BASE_INSTALL += "\
lshw \
systemd \
cronie \
glibc \
sqlite \
bash \
python3-dev \
python3-2to3 \
python3-misc \
python3-pyvenv \
python3-modules \
python3-pip \
wget \
apt \
pciutils \
file \
tree \
\
wpa-supplicant \
dhcpcd \
networkmanager \
curl-dev \
curl \
hostapd \
iw \
"
# remove the rootfs step
do_rootfs() {
}
Second make your packagegroup if you opted to reuse the list of packages
$ cat sources/meta-custom/recipes-custom/packagegroups/packagegroup-alldeps.bb
PACKAGE_ARCH = "${MACHINE_ARCH}"
inherit packagegroup
RDEPENDS_${PN} = " \
lshw \
systemd \
cronie \
glibc \
sqlite \
bash \
python3-dev \
python3-2to3 \
python3-misc \
python3-pyvenv \
python3-modules \
python3-pip \
wget \
apt \
pciutils \
file \
tree \
\
wpa-supplicant \
dhcpcd \
networkmanager \
curl-dev \
curl \
hostapd \
iw \
"
Finally build your new image placeholder
$ bitbake only-packages-image
In Yocto >=4.0 this is actually pretty easy to achieve. The packagegroup method did not work for me at all.
I don't know if this works in older versions though.
Create a new file in your custom layer, e.g. meta-custom/classes/norootfs.bbclass and put the following lines in there (as far as I noticed the order does not matter):
deltask do_deploy
deltask do_image
deltask do_rootfs
deltask do_image_complete
deltask do_image_setscene
then in your meta-custom/recipes-core/images/myimage.bb add norootfs to your other inherit commands
e.g. the most basic one
inherit core-image norootfs
You will notice your number of tasks decreasing by a fair amount (mine from ~4700 to ~3000) and there is no complete rootfs image anymore in build/tmp/deploy/images, except for bzImage and modules, just the plain ipk files in build/tmp/deploy/ipk.
I got this information by looking at https://docs.yoctoproject.org/ref-manual/tasks.html?highlight=do_image and .bbclass files in meta/classes where deltask is frequently used.
I have a problem installing Magento 2. I set the web configuration and i click on the next button, but it is not going on the next step. I tried to set the admin url.
Yo can try install it via comand line.
bin/magento setup:install --backend-frontname="adminlogin" \
--key="YOUR MAGENTO2 REPO KEY" \
--db-host="localhost" \
--db-name="DB_NAME" \
--db-user="MYSQL USERNAME" \
--db-password="PASSWORD FOR MYSQL USER" \
--language="en_US" \
--currency="USD" \
--timezone="America/New_York" \
--use-rewrites=1 \
--use-secure=0 \
--base-url="http://YOUR.DOMAIN" \
--base-url-secure="https://YOUR.DOMAIN"" \
--admin-user=adminuser \
--admin-password=admin123# \
--admin-email=admin#newmagento.com \
--admin-firstname=admin \
--admin-lastname=user \
--cleanup-database
Try it and let me see errorlog, if it doesn't worked for you.
Maybe you have troubles with server or permissions configs.
I'm trying to scrape a website recursively, but I want to exclude some webpages under that domain, containing the string "unnecessary pages". The string is not present in the URL. Here's the original command to build from:
wget -r --no-parent http://www.website.com
For example; I want to scrape the wikipedia. But I want to exclude articles that contain the keyword "drugs".
Any ideas?
Thanks in advance!
One way to do this is with the following options. It will scrape a site beginning at any path you choose to start and will exclude directories you have specified in LIST:
$ wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains somesite.tld \
--no-parent \
--exclude-directories=LIST \
www.somesite.tld/path/to/start