Uploading zip file works in curl, but not in Powershell - why? - powershell

I'm switching from regular files to zip files doing an upload, and was told I'd need to use a header in this format - Content-Type: application/zip.
So, I can get my file to upload properly via curl with the following:
curl --verbose --header "Content-Type: application/zip" --data-binary #C:\Junk\test.zip "http://client.xyz.com/submit?username=test#test.com&password=testpassword&job=test"
However, when I write a simple powershell script to do the same thing, I run into problems - the data isn't loaded. I don't know how to get a good error message returned, so I don't know the details, but bottom line the data isn't getting in.
$FullFileName = "C:\Junk\test.zip"
$wc = new-object System.Net.WebClient -Verbose
$wc.Headers.Add("Content-Type: application/zip")
$URL = "http://client.xyz.com/submit?username=test#test.com&password=testpassword&job=test"
$wc.UploadFile( $URL, $FullFileName )
# $wc.UploadData( $URL, $FullFileName )
I've tried using UploadData instead of UploadFile, but that doesn't appear to work either.
Thanks,
Sylvia

I don't necessarily have a solution but I think the issue is that you are trying to upload a binary file using the WebClient object. You most likely will need the UploadData method but I think you are going to have to run the zip file into an array of bytes to upload. That I'm not sure of off the top of my head.
If you haven't, be sure to look at the MSDS docs for this class and methods: http://msdn.microsoft.com/en-us/library/system.net.webclient_methods.aspx

Now that I look at it again I think you need: $wc.Headers.Add("Content-Type", "application/zip") because the collection is key/value paired. Check out this SO question:
WebClient set headers
Also if your still having issues you might need to add a user agent header. I think curl has it's own.
$userAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2;)"
$wc.Headers.Add("user-agent", $userAgent)

Related

SolrCloud Configset API upload returns 500 "KeeperErrorCode = NoNode"

Situation
First of all I must mention that I'm using Solr 8.1.1 and am running the default "solr -e cloud" to do some testing. This is running on a Windows Azure VM. I'm trying to create a PowerShell script that will do some setup on the SolrCloud. The first step in this is uploading a custom Configset. I was using https://lucene.apache.org/solr/guide/8_1/configsets-api.html as guide and the PowerShell command if you take away all the parameters and such boils down to the following:
Invoke-WebRequest -Uri "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=MyConfig" -Method Post -ContentType "application/octet-stream" -InFile "config.zip"
EDIT: For clarity the contents of the ZIP is as follows: https://imgur.com/a/OHR1bf1
Problem
When I run the above command however I'm met with the following error:
Invoke-WebRequest : { "responseHeader":{ "status":500, "QTime":11}, "error":{ "msg":"KeeperErrorCode = NoNode for /configs/MyConfig/lang/contractions_ca.txt", "trace":"org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /configs/MyConfig/lang/contractions_ca.txt\r\n\tat org.apache.zookeeper.KeeperException.create(KeeperException.java:114)\r\n\tat
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)\r\n\tat org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:792)\r\n\tat
org.apache.solr.common.cloud.SolrZkClient.lambda$create$7(SolrZkClient.java:415)\r\n\tat org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:71)\r\n\tat
org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:415)\r\n\tat org.apache.solr.handler.admin.ConfigSetsHandler.createZkNodeIfNotExistsAndSetData(ConfigSetsHandler.java:201)\r\n\tat
org.apache.solr.handler.admin.ConfigSetsHandler.handleConfigUploadRequest(ConfigSetsHandler.java:181)\r\n\tat org.apache.solr.handler.admin.ConfigSetsHandler.handleRequestBody(ConfigSetsHandler.java:111)\r\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\r\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:796)\r\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:762)\r\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:522)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)\r\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\r\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\r\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\r\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\r\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\r\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\r\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\r\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\r\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:502)\r\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\r\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\r\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\r\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\r\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\r\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\r\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\r\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\r\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\r\n\tat java.lang.Thread.run(Thread.java:748)\r\n",
"code":500}}
At line:1 char:1
Observations
When I first "failed" I had created a zip file from my config which contained an additional top level folder (ea instead of MyConfig/solrconfig.xml etc my ZIP was MyConfig/MyConfig/solrconfig.xml) and when I used this the command was run successful but the second command (creating a collection) would fail because it could not find solrconfig.xml. This tells me that the ZIP is correctly present in the request and Solr does seem capable of processing it but once I correct it to an actual configset it massively fails?
EDIT: I was asked about this and whether using "conf" in the zip would work. As I mentioned here this results in a successful upload (https://imgur.com/a/JHLZ8td) however as you can see it does not match the other config sets and when you try to create a collection with this set you will get Error CREATEing SolrCore 'Test_shard1_replica_n1': Unable to create core [Test_shard1_replica_n1] Caused by: Can't find resource 'solrconfig.xml' in classpath or '/configs/Sitecore', cwd=C:\solr-8.1.1\server
Question(s)
What am I doing wrong? Is this a bug?
Going back through some work I did on SolrCloud a while ago, I am reminded of one annoying issue I hit:
I got odd issues uploading the schema config zip files if I had created that zip using "Send to Compressed Folder" in the Windows UI, or via Compress-Archive in PowerShell. I found that compressing the data with 7Zip did work, however.
I suspect there's something incompatible between the Windows zip code (which I think is quite old, and something they licensed ages ago?) and how Solr/ZooKeeper deals with extracting the files again?
I just ran into the same issue without using Windows zip code. I was trying to upload a configset to Solr 7.7.3 from a conf directory containing a "lang" subdirectory with a bunch of files. I got the NoNode error for /configs/_myconfigsetname_/lang/stopwords_eu.txt. The configset was being zipped on the fly through a recursive directory walk in Java, sending each filename to the Zip file using Java's ZipOutputStream. The resulting zipped bytes were then sent to Solr/Zookeeper.
This code worked fine for conf directories without subdirectories. It turned out that when there is a subdirectory, it was necessary to create a ZipEntry for the directory (e.g. lang/) before adding files to the Zip stream such as lang/stopwords_eu.txt.

Jenkins http plugin to upload file using rest

I am trying to upload a file to a rest server from jenkins using http plugin. I have a jenkins pipelie where a step involves loading a file(type formData)
to a server using rest.
the server side method uses two parameters:
(#FormDataParam("file") InputStream file, #FormDataParam("fileName") String fileName)
I am using the below method
def filename = "${WORKSPACE}/Test.txt"
data="""{ \"fileName\" : \"Test.txt\" }"""
resp3 = httpRequest consoleLogResponseBody: true,url: "http://<url>",contentType:'APPLICATION_OCTETSTREAM',customHeaders:[[name:'Authorization', value:"Basic ${auth}"]],httpMode: 'POST',multipartName: 'Test.txt',uploadFile: "${filename}",requestBody:data,validResponseCodes: '200'
but when I run the status code is 400 and in the server logs the message is that no filestream and no filename is received i.e not able to get both the arguments.
Please let me know where it is getting wrong
Regards
You can try using curl instead of built-in Jenkins methods:
curl -XPOST http://<url> -H 'Content-Type: application/octet-stream' -H 'Authorization: Basic ${auth}' --data-binary '{\"fileName\" : \"Test.txt\" }'
You can debug it first from within shell. Once it's working, wrap it in sh directive:
sh "curl ..."
Since I was running on windows so bat + curl worked for me .With this workaround I was able to transfer files using jenkins and rest
However using httpRequest from jenkins inbuild library is still not working.

Python Screen Scraper works within Eclipse, but not from command line

I'm writing a simple screen scraping script using python 2.7 with Eclipse PyDev. When running or debugging from within Eclipse everything works fine. However, when I run my program from the command line the server always returns a Response 500 error code. I've tried running the script and the compiled versions from the command line but get the same result -- Response 500. I've also tried some arbitrary things like adding a delay, repeated attempts, etc. but I do not know what Eclipse is doing that is different than python ran the command line.
First, where's a good place to start digging if I encounter something like this again?
Second, any ideas on how to get this working from the command line?
Code snippet below for reference
from requests import Request, Session
content_type = 'application/x-www-form-urlencoded'
headers2 = {"User-Agent" : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)',
"Content-Type" : content_type,
"Referer" : url
}
url = loginPage
payload = {"email" : username, "password" : password}
req = Request ('POST', url, data=payload, headers=headers2)
prepped = req.prepare()
s = Session()
resp = s.send(prepped)
print resp # Response 200 (good) from both within Eclipse and from cmd
resp = s.get(targetPage)
print resp # Response 200 (good) from Eclipse, Response 500 (generic web error) from cmd
s.get (logOutPage)
s.close()
Got an answer from somebody. Thanks go to user Justinsaccount from reddit.
First, I was using batch files to save typing and not directly using the command line.
Secondly, when printing out the parameters from inside the program and then comparing the eclipse version versus the .bat version, the .bat version was short a few characters which was the give away.
One of the parameters was a url that had a space character: http://somewhere.com/some page.
In strict URL, this turns into: http://somewhere.com/some%20page
When run from the command line http://somewhere.com/some%20page
works just fine. However, in a batch file the % needed to be escaped so what I got was: http://somewhere.com/some0page
which is why the server through an error -- that page didn't exist. What I needed to do was escape the % character: http://somewhere.com/some%%20page. After that change things worked just fine.

wget files from FTP-like listings

So, site that used to use FTP now has an HTTP front-end and won't allow FTP connections. The site in question (for an example directory) will show a page with links to different dates. Inside each of these different dates, there are many files, and I typically just need to get some file with some clear pattern e.g. *h17v04*.hdf. I thought this could work:
wget -I "${PLATFORM}/${PRODUCT}/${YEAR}.*" -r -l 4 \
--user-agent="Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1" \
--verbose -c -np -nc -nd \
-A "*h17v04*.hdf" http://e4ftl01.cr.usgs.gov/$PLATFORM/$PRODUCT/
where PLATFORM=MOLT, PRODUCT=MOD09GA.005 and YEAR=2004, for example. This seems to start looking into all the useful dates, finds the index.html, and then just skips to the next directory, without downloading the relevant hdf file:
--2013-06-14 13:09:18-- http://e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/
Reusing existing connection to e4ftl01.cr.usgs.gov:80.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/index.html'
[ <=> ] 174,182 134K/s in 1.3s
2013-06-14 13:09:20 (134 KB/s) - `e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/index.html' saved [174182]
Removing e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.01/index.html since it should be rejected.
--2013-06-14 13:09:20-- http://e4ftl01.cr.usgs.gov/MOLT/MOD09GA.005/2004.01.02/
[...]
If I ignore the -A option, only the index.html file is downloaded to my system, but it appears it's not parsed and the links are not followed. I don't really know what more is required to make this work, as I can't see why it doesn't!!!
SOLUTION
In the end, the problem was due to an old bug in the local version of wget. However, I ended up writing my own script for downloading MODIS data from the server above. The script is pure Python, and is available from here.
Consider to use pyModis instead of wget which is a Free and Open Source Python based library to work with MODIS data. It offers bulk-download for user selected time ranges, mosaicking of MODIS tiles, and the reprojection from Sinusoidal to other projections, convert HDF format to other formats. See
http://www.pymodis.org/

Powershell: Downloadfile 404 but site exists

I'm having a little problem which looks very simple... but I just don't get it!
I try to download the website content of: http://cspsp.gshi.org/ (if you try to access it via www.cspsp.gshi.org you get to the wrong page....)
For this I do it like that in Powershell:
(New-Object System.Net.WebClient).DownloadFile( 'http://cspsp.gshi.org/', 'save.htm' )
I can acess the website with Firefox and download its contents easily but Powershell always outputs something like that:
The remoteserver returned an Error: (404) Nothing found. (translated from German).
I'm not sure what I'm doing wrong here. Other websites like Google just work fine.
It appears that the site relies on the User-Agent request headers being sent by HTTP clients, and that System.Net.WebClient doesn't send even a default value (at least, it didn't when I hit my own, local servers.)
Either way, this worked for me:
$request = (New-Object System.Net.WebClient)
$request.headers['User-Agent'] = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.40 Safari/537.17"
$request.DownloadFile('http://cspsp.gshi.org/', 'saved.html')