I'm trying to crawl a website which needs to be logged in with wget but it stops everytime it finds a logout url (https://example.com/logout/).
I've tried excluding the directories but without success.
This is my command:
wget --content-disposition --header "Cookie: session_cookies" -k -m -r -E -p --level=inf --retry-connrefused -D site.com -X */logout/*,*/settings/* -o log.txt https://example.com/
I've tried with -R option instead of -X but that didn't work.
Can be solved by the keyword "--reject-regex", like this: "--reject-regex logout", see:wget-devTips
Related
I'm attempting to execute the following CURL call via commandline:
curl -i -H "Content-Type:application/json" \
-u "m19389#dev.acp.co.com:qE2P/N7y1k.\(" \
-X GET -d "https://wefd.it.co.com:3905/events/com.co.mpm.dev.29160-wgdhfgd-v1/dsd/dsdfds-0-0-1-7c49768976-2g7kq"
I've added quotes around the arguments and all, and I'm definitely inclusing the curl url, so I'm not sure what's going on here. Where did I go wrong?
I tried following command to create user attributes but nothing working
/opt/keycloak/bin/kcadm.sh create users/b33088e5-321e-4b2f-afa6-7dca1871084e/user-attributes -r master -s name=user-attributes -s 'config."appid"=["APP_ID"]' -s 'config."tenantId"=["T0"]' -s 'config."ugId"=["Admin_UserGroup"]'
Error
Resource not found for url:
https://135.250.45.68:8666/auth/admin/realms/master/users/b33088e5-321e-4b2f-afa6-7dca1871084e/user-attributes
Next i tried this command
/opt/keycloak/bin/kcadm.sh create components -r master -s name=user-attribute -s providerId=user-attribute -s parentId=1295a70f-25f7-4e45-bcb8-285d750 1c6d9 -s 'config."appid"=["APP_ID"]' -s 'config."tenantId"=["T0"]' -s 'config."ugId"=["Admin_UserGroup"]'
End with following error
HTTP error - 400 Bad Request
After too many hit & try,also with the help of my teammate we found the solution and we have to fire below command through admin-cli to create user attributes
/opt/keycloak/bin/kcadm.sh create users -s username=admin111 -s enabled=true -r master -s "attributes.tenantId=value" -s "attributes.ugId=ugId" -s "attributes.appId=app"
I'm trying to semi mirror a site. What I want is to download all of the MP3s and make sure I'm not redownloading those that I already have (hence the "mirror" part). I've typed in the following:
wget -m -nd -e robots=off --random-wait -A "*.mp3" -P FOLDER http://www.example.com/
And it downloads all the MP3s on the Current Page. It never follows the links to the "Next Page" or the likes. I've replaced the -m with -N -c -r without success. What other options can I use?
Try:
wget ‐‐execute robots=off ‐‐recursive ‐‐accept mp3,MP3 --random-wait ‐‐no-parent ‐‐continue ‐‐no-clobber //site.com/
I'm running wget --recursive --no-parent --adjust-extension --convert-links --page-requisites --restrict-file-names=windows --keep-session-cookies --load-cookies cookies.txt http://DOMAIN/private/ and it correctly downloads the private/index.html file.
I inspected this file and it is the correct page shown only with successful authentication. It contains markup like:
<ul><li><a class="CP___PAGEID_56400" href="http://DOMAIN/private/page1.html">My private page</a></li>...
However, after fetching all the resources (images etc.) it seems to think it's finished and shuts down after 'converting links'.
If I skip --no-parent it keeps going. So is the --no-parent flag somehow confusing wget as to subpages?
Finally realized that wget is obeying robots.txt! I changed my command to wget -e robots=off --wait 0.25 --recursive --no-parent ... and got it working. I added the --wait 0.25 since I didn't want to clobber the server either.
Right now my wget is:
wget -Ncq -e \"convert-links=off\" --load-cookies /dev/null --tries=50 --timeout=45 --no-check-certificate \"$download\" -O $prefix$title.webm &
and it runs in the background, i dont want it to be in the background. How can i fix this?
wget -Ncq -e \"convert-links=off\" --load-cookies /dev/null --tries=50 --timeout=45 --no-check-certificate \"$download\" -O $prefix$title.webm #&
& sends to background
# is a comment