Why is this particular website auto-formatting dates when parsed programatically? - date

I am trying to get a list of certain links from a website using Jsoup 1.10.1. The following is a snippet which I've isolated from the rest of my code in an attempt to diagnose the problem:
public static void main(String[] args) throws IOException {
URL link = new URL("https://www.ncdc.noaa.gov/gibbs/availability/1979-01-01");
Document doc = Jsoup.parse(link, 600);
Elements links = doc.select(".availableChannels > a");
System.out.println(links.get(0));
}
In theory this should print out the contents of the first link under the .availableChannelsclass on the provided URL, which should be IR .
However, Jsoup instead auto-formats the yyyy-mm-dd date that appears within the a href, and as a result the code snippet prints out IR , which is undesired.
How do I stop Jsoup from automatically formatting dates?
UPDATE
I decided to write a similar program in Python 2.7 to see what would happen if I read from that particular page (https://www.ncdc.noaa.gov/gibbs/availability/1979-01-01). Turns out that the yyyy-mm-dd that appears in IR is still getting formatted into IR when I open and print the page's source using python.
import urllib
link = "https://www.ncdc.noaa.gov/gibbs/availability/1979-01-01";
f = urllib.urlopen(link);
myfile = f.read();
print myfile;
I guess the question becomes: Why is this particular website automatically formatting dates when accessed through non-standard web browser means? I've changed the question accordingly to reflect this.

It's because you need to set the Accept-Language header on the HTTP request.
The Accept-Language request HTTP header advertises which languages the
client is able to understand, and which locale variant is preferred.
Using content negotiation, the server then selects one of the
proposals, uses it and informs the client of its choice with the
Content-Language response header. (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language)
So it seems like if you don't set the header, the server hosting the website returns a variant that doesn't use your preferred local time settings that you see in your browser.
public static void main(String[] args) throws IOException {
URL link = new URL("https://www.ncdc.noaa.gov/gibbs/availability/1979-01-01");
Document doc = Jsoup.connect(link.toString())
.header("Accept-Language", "en-GB").get();
Elements links = doc.select(".availableChannels > a");
System.out.println(links.get(0));
}
Output:
IR

Related

How can REST API pass large JSON?

I am building a REST API and facing this issue: How can REST API pass very large JSON?
Basically, I want to connect to Database and return the training data. The problem is in Database I have 400,000 data. If I wrap them into a JSON file and pass through GET method, the server would throw Heap overflow exception.
What methods we can use to solve this problem?
DBTraining trainingdata = new DBTraining();
#GET
#Produces("application/json")
#Path("/{cat_id}")
public Response getAllDataById(#PathParam("cat_id") String cat_id) {
List<TrainingData> list = new ArrayList<TrainingData>();
try {
list = trainingdata.getAllDataById(cat_id);
Gson gson = new Gson();
Type dataListType = new TypeToken<List<TrainingData>>() {
}.getType();
String jsonString = gson.toJson(list, dataListType);
return Response.ok().entity(jsonString).header("Access-Control-Allow-Origin", "*").header("Access-Control-Allow-Methods", "GET").build();
} catch (SQLException e) {
logger.warn(e.getMessage());
}
return null;
}
The RESTful way of doing this is to create a paginated API. First, add query parameters to set page size, page number, and maximum number of items per page. Use sensible defaults if any of these are not provided or unrealistic values are provided. Second, modify the database query to retrieve only a subset of the data. Convert that to JSON and use that as the payload of your response. Finally, in following HATEOAS principles, provide links to the next page (provided you're not on the last page) and previous page (provided you're not on the first page). For bonus points, provide links to the first page and last page as well.
By designing your endpoint this way, you get very consistent performance characteristics and can handle data sets that continue to grow.
The GitHub API provides a good example of this.
My suggestion is no to pass the data as a JSON but as a file using multipart/form-data. In your file, each line could be a JSON representing a data record. Then, it would be easy to use a FileOutputStream to receive te file. Then, you can process the file line by line to avoid memory problems.
A Grails example:
if(params.myFile){
if(params.myFile instanceof org.springframework.web.multipart.commons.CommonsMultipartFile){
def fileName = "/tmp/myReceivedFile.txt"
new FileOutputStream(fileName).leftShift(params.myFile.getInputStream())
}
else
//print or signal error
}
You can use curl to pass your file:
curl -F "myFile=#/mySendigFile.txt" http://acme.com/my-service
More details on a similar solution on https://stackoverflow.com/a/13076550/2476435
HTTP has the notion of chunked encoding that allows you send a HTTP response body in smaller pieces to prevent the server from having to hold the entire response in memory. You need to find out how your server framework supports chunked encoding.

XPages REST and date format

I have an XPages page which contains the REST service component. I'm using the "documentJsonService".
Awesome component and everything else is working fine, but I'm having issues with the date formats and don't know what to do.
The Notes Document where I'm reading the data from, contains a DateTime item having a proper date e.g. 01.09.2014 (finnish format: dd.MM.yyyy). The REST component returns the date in "2014-09-01" (string). This is fine. However when I do a HTTP POST to the server with the same exact data, Domino changes the "2014-09-01" string date into 09.01.2014 Notes Date time item.
Don't know any more what to do. Why Domino gives date in format A and when I give it back in same format, something strange happens.
This same happens on Linux and Windows environments.
Domino version is 9.0.1.
Thanks already. I'm more or less lost with this "feature" :)
I would say: broken as designed. To my knowledge the JSON format returned is always in the form yyyy-mm-dd, while the format expected when posting depends on the browser locale. You would need to "hack around it".
I'm not a big fan of the ready baked JSON services, I'd rather roll my own, where I can be very specific with the formats and (more importantly) add validation before I write data back. You can find a sample on my blog
Basically you implement a bean like this:
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import com.ibm.domino.services.ServiceException;
import com.ibm.domino.services.rest.RestServiceEngine;
import com.ibm.xsp.extlib.component.rest.CustomService;
import com.ibm.xsp.extlib.component.rest.CustomServiceBean;
public class CustomSearchHelper extends CustomServiceBean {
#Override
public void renderService(CustomService service, RestServiceEngine engine) throws ServiceException {
HttpServletRequest request = engine.getHttpRequest();
HttpServletResponse response = engine.getHttpResponse();
response.setHeader("Content-Type", "application/json; charset=UTF-8");
// Your code goes here!
}
}
you need to check in the request what method GET or POST was used, but then it is easy to continue. While you are on it: the OpenNTF Domino API makes your life much easier.

How to make Rest service with JSONP capability to be run in Sharepoint 2007 (MOSS)?

We need to access a sharepoint 2007 site from javascript. Basically we need to use the search.asmx service but since that does not support rest nor jsonp it can't be used directly.
The original plan was to make custom wcf service with support for rest and jsonp. This was a small undertaking but when I gave the service to the Sharepoint guys none of them could package it to a wsp package for installation in sharepoint 2007 and get it working.
According to this question Rest Webservices for Sharepoint 2007 this might not be so easy and a httpmodule is required for rest-typed urls. The other idea about running it as a standalone app might not be enough since I think that the service needs access to SPContext.
Would it be possible to just create an Application Page and there in the code behind override Render, clear the output buffer, change mime type and render the json-serialized data? So the url would be http://spsite/mycustomquery.aspx?q=mysearchtext&start=0&count=200&callback=mycallbackfunction.
An application page would at least support Get but does it have access to SPContext?
Here is the wcf service I started with.
Contract
[ServiceContract]
public interface IRestSPQuery
{
[OperationContract]
[WebGet(UriTemplate = "query/{queryText}/{startAt}/{count}?callback={callback}", ResponseFormat = WebMessageFormat.Json)]
[JSONPBehavior(callback = "callback")]
ResultTable Query(string queryText, string startAt, string count, string callback);
}
Implementation
public ResultTable Query(string queryText, string startAt, string count, string callback)
{
//http://sharepointsite/_vit_bin/CustomQuery/RestSPQuery.svc/Query/searchtext/0/200?callback=myfunction
KeywordQuery keywordQuery = new KeywordQuery(SPContext.Current.Site);
keywordQuery.StartRow = startAt;
keywordQuery.RowLimit = count;
keywordQuery.SortList.Add("Rank", SortDirection.Descending);
keywordQuery.QueryText = queryText;
ResultTableCollection searchResults = keywordQuery.Execute();
ResultTable relevantResultsTable = searchResults[ResultType.RelevantResults];
return relevantResultsTable;
}
You could try adding an ".ashx" file to your solution that implements IHttpHandler. According to this blog article you can do it by adding an Application Page to your solution but save it as a ".ashx" extension. The article is written for SharePoint 2010 but you will have to check if it works for 2007. Following the rest of the article you should be able to set it up for REST/JSONP.
I ended up creating a custom aspx page and override the Render method and there output json/jsonp and also change the content type to application/json.
The solution and a ready to deploy wsp-file can be found here http://www.filedropper.com/restqueryservice.

The definitive guide to posting a Facebook Feed item using pure C#

Does anyone have a definitive way to post to a user's wall, using nothing but the .NET Framework, or Silverlight?
Problems deriving from people's attempts have been asked here on SO, but I cannot find a full, clear explanation of the Graph API spec and a simple example using WebClient or some similar class from System.Net.
Do I have to send all feed item properties as parameters in the query string? Can I construct a JSON object to represent the feed item and send that (with the access token as the only parameter)?
I expect its no more than a 5 line code snippet, else, point me at the spec in the FB docs.
Thanks for your help,
Luke
This is taken from how we post to a user's wall. We place the data for the post in the request body (I think we found this to be more reliable than including all the parameters in the query part of the request), it has the same format as a URL encoded query string.
I agree that the documentation is rather poor at explaining how to interact with a lot of resources. Typically I look at the documentation for information on fields and connections, then work with the Graph API Explorer to understand how the request needs to be constructed. Once I've got that down it's pretty easy to implement in C# or whatever. The only SDK I use is Facebook's Javascript SDK. I've found the others (especially 3rd party) are more complicated, buggy, or broken than rolling my own.
private void PostStatus (string accessToken, string userId)
{
UriBuilder address = new UriBuilder ();
address.Scheme = "https";
address.Host = "graph.facebook.com";
address.Path = userId + "/feed";
address.Query = "access_token=" + accessToken;
StringBuilder data = new StringBuilder ();
data.Append ("caption=" + HttpUtility.UrlEncodeUnicode ("Set by app to describe the app."));
data.Append ("&link=" + HttpUtility.UrlEncodeUnicode ("http://example.com/some_resource_to_go_to_when_clicked"));
data.Append ("&description=" + HttpUtility.UrlEncodeUnicode ("Message set by user."));
data.Append ("&name=" + HttpUtility.UrlEncodeUnicode ("App. name"));
data.Append ("&picture=" + HttpUtility.UrlEncodeUnicode ("http://example.com/image.jpg"));
WebClient client = new WebClient ();
string response = client.UploadString (address.ToString (), data.ToString ());
}
I don't know much about .net or silverlight, but the facebook api works with simple http requests.
All the different sdks (with the exception of the javascript one) are mainly just wrappers for the http requests with the "feature" of adding the access token to all requests.
Not in all requests the parameters are sent as querystring, in some POST requests you need to send them in the request body (application/x-www-form-urlencoded), and you can not send the data as json.
If the C# sdk is not to your liking, you can simply create one for your exact needs.
As I wrote, you just need to wrap the requests, and you can of course have a method that will get a json as parameter and will break it to the different parameters to be sent along with the request.
I would point you to the facebook documentation but you haven't asked anything specific so there's nothing to point you to except for the landing page.

Help with a Windows Service/Scheduled Task that must use a web browser and file dialogs

What I'm Trying To Do
I'm trying to create a solution of any kind that will run nightly on a Windows server, authenticate to a website, check a web page on the site for new links indicating a new version of a zip file, use new links (if present) to download a zip file, unzip the downloaded file to an existing folder on the server, use the unzipped contents (sql scripts, etc.) to build an instance of a database, and log everything that happens to a text file.
Forms App: The Part That Sorta Works
I created a Windows Forms app that uses a couple of WebBrowser controls, a couple of threads, and a few timers to do all that except the running nightly. It works great as a Form when I'm logged in and run it, but I need to get it (or something like it) to run on it's own like a Service or scheduled task.
My Service Attempt
So, I created a Windows Service that ticks every hour and, if the System.DateTime.Now.Hour >= 22, attempts to launch the Windows Forms app to do it's thing. When the Service attempts to launch the Form, this error occurs:
ActiveX control '8856f961-340a-11d0-a96b-00c04fd705a2' cannot be instantiated because the current thread is not in a single-threaded apartment.
which I researched and tried to resolve by either placing the [STAThread] attribute on the Main method of the Service's Program class or using some code like this in a few places including the Form constructor:
webBrowseThread = new Thread(new ThreadStart(InitializeComponent));
webBrowseThread.SetApartmentState(ApartmentState.STA);
webBrowseThread.Start();
I couldn't get either approach to work. In the latter approach, the controls on the Form (which would get initialized inside IntializeComponent) don't get initialized and I get null reference exceptions.
My Scheduled Task Attempt
So, I tried creating a nightly scheduled task using my own credentials to run the Form locally on my dev machine (just testing). It gets farther than the Service did, but gets hung up at the File Download Dialog.
Related Note: To send the key sequences to get through the File Download and File Save As dialogs, my Form actually runs a couple of vbscript files that use WScript.Shell.SendKeys. Ok, that's embarassing to admit, but I tried a few different things including SendMessage in Win32 API and referencing IWshRuntimeLibrary to use SendKeys inside my C# code. When I was researching how to get through the dialogs, the Win32 API seemed to be the recommended way to go, but I couldn't figure it out. The vbscript files was the only thing I could get to work, but I'm worried now that this may be the reason why a scheduled task won't work.
Regarding My Choice of WebBrowser Control
I have read about the System.WebClient class as an alternative to the WebBrowser control, but at a glance, it doesn't look like it has what I need to get this done. For example, I needed (or I think I needed) the WebBrowser's DocumentCompleted and FileDownload events to handle the delays in pages loading, files downloading, etc. Is there more to WebClient that I'm not seeing? Is there another class besides WebBrowser that is more Service-friendly and would do the trick?
In Summary
Geez, this is long. Sorry! It would help to even have a high level recommendation for a better way to do what I'm trying to do, because nothing I've tried has worked.
Update 10/22/09
Well, I think I'm closer, but I'm stuck again. I should end up with a decent-sized zip file with several files in it, but the zip file resulting from my code is empty. Here's my code:
// build post request
string targetHref = "http://wwwcf.nlm.nih.gov/umlslicense/kss/login.cfm";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(targetHref);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
// encoding to use
Encoding enc = Encoding.GetEncoding(1252);
// build post string containing authentication information and add to post request
string poststring = "returnUrl=" + fixCharacters(targetDownloadFileUrl);
poststring += getUsernameAndPasswordString();
poststring += "&login2.x=0&login2.y=0";
// convert to required byte array
byte[] postBytes = enc.GetBytes(poststring);
request.ContentLength = postBytes.Length;
// write post to request
Stream postStream = request.GetRequestStream();
postStream.Write(postBytes, 0, postBytes.Length);
postStream.Close();
// get response as stream
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream responseStream = response.GetResponseStream();
// writes stream to zip file
FileStream writeStream = new FileStream(fullZipFileName, FileMode.Create, FileAccess.Write);
ReadWriteStream(responseStream, writeStream);
response.Close();
responseStream.Close();
The code for ReadWriteStream looks like this.
private void ReadWriteStream(Stream readStream, Stream writeStream)
{
// taken verbatum from http://www.developerfusion.com/code/4669/save-a-stream-to-a-file/
int Length = 256;
Byte[] buffer = new Byte[Length];
int bytesRead = readStream.Read(buffer, 0, Length);
// write the required bytes
while (bytesRead > 0)
{
writeStream.Write(buffer, 0, bytesRead);
bytesRead = readStream.Read(buffer, 0, Length);
}
readStream.Close();
writeStream.Close();
}
The building of the post string is taken from my previous forms app that works. I compared the resulting values in poststring for both sets of code (my working forms app and this one) and they're identical.
I'm not even sure how to troubleshoot this further. Anyone see anything obvious as to why this isn't working?
Conclusion 10/23/09
I finally have this working. A couple of important hurdles I had to get over. I had some problems with the ReadWriteStream method code that I got online. I don't know why, but it wasn't working for me. A guy named JB in Claudio Lassala's Virtual Brown Bag meeting helped me to come up with this code which worked much better for my purposes:
private void WriteResponseStreamToFile(Stream responseStreamToRead, string zipFileFullName)
{
// responseStreamToRead will contain a zip file, write it to a file in
// the target location at zipFileFullName
FileStream fileStreamToWrite = new FileStream(zipFileFullName, FileMode.Create);
int readByte = responseStreamToRead.ReadByte();
while (readByte != -1)
{
fileStreamToWrite.WriteByte((byte)readByte);
readByte = responseStreamToRead.ReadByte();
}
fileStreamToWrite.Flush();
fileStreamToWrite.Close();
}
As Will suggested below, I did have trouble with the authentication. The following code is what worked to get around that issue. A few comments inserted addressing key issues I ran into.
string targetHref = "http://wwwcf.nlm.nih.gov/umlslicense/kss/login.cfm";
HttpWebRequest firstRequest = (HttpWebRequest)WebRequest.Create(targetHref);
firstRequest.AllowAutoRedirect = false; // this is critical, without this, NLM redirects and the whole thing breaks
// firstRequest.Proxy = new WebProxy("127.0.0.1", 8888); // not needed for production, but this helped in order to debug the http traffic using Fiddler
firstRequest.Method = "POST";
firstRequest.ContentType = "application/x-www-form-urlencoded";
// build post string containing authentication information and add to post request
StringBuilder poststring = new StringBuilder("returnUrl=" + fixCharacters(targetDownloadFileUrl));
poststring.Append(getUsernameAndPasswordString());
poststring.Append("&login2.x=0&login2.y=0");
// convert to required byte array
byte[] postBytes = Encoding.UTF8.GetBytes(poststring.ToString());
firstRequest.ContentLength = postBytes.Length;
// write post to request
Stream postStream = firstRequest.GetRequestStream();
postStream.Write(postBytes, 0, postBytes.Length); // Fiddler shows that post and response happen on this line
postStream.Close();
// get response as stream
HttpWebResponse firstResponse = (HttpWebResponse)firstRequest.GetResponse();
// create new request for new location and cookies
HttpWebRequest secondRequest = (HttpWebRequest)WebRequest.Create(firstResponse.GetResponseHeader("location"));
secondRequest.AllowAutoRedirect = false;
secondRequest.Headers.Add(HttpRequestHeader.Cookie, firstResponse.GetResponseHeader("Set-Cookie"));
// get response to second request
HttpWebResponse secondResponse = (HttpWebResponse)secondRequest.GetResponse();
// write stream to zip file
Stream responseStreamToRead = secondResponse.GetResponseStream();
WriteResponseStreamToFile(responseStreamToRead, fullZipFileName);
responseStreamToRead.Close();
sl.logScriptActivity("Downloading update.");
firstResponse.Close();
I want to underscore that setting AllowAutoRedirect to false on the first HttpWebRequest instance was critical to the whole thing working. Fiddler showed two additional requests that occurred when this was not set, and it broke the rest of the script.
You're trying to use UI controls to do something in a windows service. This will never work.
What you need to do is just use the WebRequest and WebResponse classes to download the contents of the webpage.
var request = WebRequest.Create("http://www.google.com");
var response = request.GetResponse();
var stream = response.GetResponseStream();
You can dump the contents of the stream, parse the text looking for updates, and then construct a new request for the URL of the file you want to download. That response stream will then have the file, which you can dump on the filesystem and etc etc.
Before you wonder, GetResponse will block until the response returns, and the stream will block as data is being received, so you don't need to worry about events firing when everything has been downloaded.
You definitely need to re-think your approach (as you've already begun to do) to eliminate the Forms-based application approach. The service you're describing needs to operate with no UI at all.
I'm not familiar with the details of System.WebClient, but since it
provides common methods for sending
data to and receiving data from a
resource identified by a URI,
it will probably be your answer.
At first glance, WebClient.DownloadFile(...) or WebClient.DownloadFileAsync(...) will do what you need.
The only thing I can add is that once you have scraped your screen and have the fully qualified name of the file you want to download, you could pass it along to the Windows/DOS command 'get' which will fetch files via HTTP. You can also script a command-line FTP client if desired. It's been a long time since I tried something like this in Windows, but I think you're almost there. Once you have fetched the correct file, building a batch file to do everything else should be pretty easy. If you are more comfortable with Unix, google "unix services for windows" just keep an eye on the services they start running (DHCP, etc). There are some nice utilities which will let your treat dos as a unix-like shell (ls -l, grep, etc) Finally, you could try another language like Perl or Python but I don't think that's the kind of advice you were looking for. :)