Getting Data from website - matlab

So the website constantly changes the data that it displays, and I want to get that data every several seconds and log it in a spreadsheet. The problem is in order to get to the page, I have to have a cookie which I get when I log in. Unfortunately I only know how to program in MATLAB. MATLAB has a function for this, urlread, but it doesn't deal with cookies. What can I do to get to that page? Can anyone help me with this? Point me into a direction where a programing noob like me can succeed please.

You could use wget to download content while using HTTP cookies. I will be using StackOverflow.com as example target. Here are the steps to follow:
1) Obtain the wget command tool. For Mac or Linux, I think it is already available. On Windows, you can get it from the GnuWin32 project or from one of the many other ports (Cygwin, MinGW/MSYS, etc..).
2) Next we need to obtain an authenticated cookie by logging into the website in question. You can use your preferred browser for this.
In Internet Explorer, you can produce it using "File menu > Import and Export > Export Cookies". In Firefox, I used the Cookie Exporter extension to export cookies to text file. For Chrome, there should be similar extensions
Obviously you only need to do this step once, as long as the cookies have not yet expired!
3) Once you locate the cookie file exported, we can use wget to fetch the web page and provide it with this cookie. This of course can be performed from inside MATLAB using the SYSTEM function:
%# fetch page and save it to disk
url = 'http://stackoverflow.com/';
cmd = ['wget --cookies=on --load-cookies=./cookies.txt ' url];
system(cmd, '-echo');
%# process page: I am simply viewing it using embedded browser
web( ['file:///' strrep(fullfile(pwd,'index.html'),'\','/')] )
Parsing the web page is a whole other topic that I will not go into. Once you get the data you seek, you can interact with Excel spreadsheets using the XLSREAD and XLSWRITE functions.
4) Finally you can write this in a function, and make it execute on regular intervals using the TIMER function

Try using the java.net.* classes.
You should be able to use them directly in the MATLAB workspace, as described here: http://www.mathworks.co.uk/help/techdoc/matlab_external/f4863.html

Matlab has built-in functions for web downloading. For http sites, there is webread.m and websave.m. For FTPs, there is mget.m

Related

Can I upload a file to onedrive via Windows 10 command line?

I need to upload a file to OneDrive, via the command line. This will be done through a batch file which is distributed to end users.
From searching on Stack Overflow, I find questions like this one which say that you need to register an app and create an app password, using Azure. I don't have the necessary permissions to do this in the organization where I work, nor can I do anything that requires an admin account. So I can't any install software - I have to use what comes with Windows 10. I can't use VBA either as that's blocked.
I've managed to download files from OneDrive without anything like that, using the process described here:
Open the URL in either of the browser.
Open Developer options using Ctrl+Shift+I.
Go to Network tab.
Now click on download. Saving file isn’t required. We only need the network activity while browser requests the file from the server.
A new entry will appear which would look like “download.aspx?…”.
Right click on that and Copy → Copy as cURL.
Paste the copied content directly in the terminal and append ‘--output file.extension’ to save the content in file.extension since
terminal isn’t capable of showing binary data.
Example:
curl https://xyz.sharepoint.com/personal/someting/_layouts/15/download.aspx?UniqueId=cefb6082%2D696e%2D4f23%2D8c7a%2
…. some long text ….
cCtHR3NuTy82bWFtN1JBRXNlV2ZmekZOdWp3cFRsNTdJdjE2c2syZmxQamhGWnMwdkFBeXZlNWx2UkxDTkJic2hycGNGazVSTnJGUnY1Y1d0WjF5SDJMWHBqTjRmcUNUUWJxVnZYb1JjRG1WbEtjK0VIVWx2clBDQWNyZldid1R3PT08L1NQPg==;
cucg=1’ --compressed --output file.extension
I tried to do something similar after clicking 'upload' on the browser, but didn't find anything useful when trying to filter the requests.
I found these two questions but there is no keyboard shortcut to upload, AFAICT. Also the end user will be uploading a file to a folder I've shared with them from my OneDrive. Opening Chrome or Edge as a minimised window is fine, but I can't just shove a window in their face which automatically clicks on things - they won't like that.
It's just occurred to me that I might be able to use an office application to Save As the file to the necessary onedrive folder, where the keyboard shortcuts are pretty stable, but have no idea how to achieve that via the command line.
The best and more secure way to accomplish this goal I think is going to be with the Rest API for OneDrive.
(Small Files <4MB)
https://learn.microsoft.com/en-us/onedrive/developer/rest-api/api/driveitem_put_content?view=odsp-graph-online
(Large files)
https://learn.microsoft.com/en-us/onedrive/developer/rest-api/api/driveitem_createuploadsession?view=odsp-graph-online
You still need a Azure AD App Registration (which your admin should be able to configure for you), to provide API access to services in Azure. Coding with the API is going to be far easier and less complicated, not to mention more versatile.

Searching inside JSONs in Chrome devtools

Is there a possibility to searching inside all JSON objects from all available responses in the network tab? Currently it works, but very randomly and isn't much reliable. Sometimes and especially in a smaller responses it's ok but when you have more assets almost always looking for, e.g. specific params value ends unsuccessfully. Do you know any smart solution of that issue? I've checked and first question associated with it has already few years and Google devs still haven't responded.
Example: I have object ID in response body, but cannot find it by search CTRL+F
I think one way is to save all the response in a file (manually or automatically, if possibile by using a browser extension).
After you have stored all the responses in a file you can parse the file and find things inside the file by using a script or just regex.
You can save the answers (as HAR file) manually (I use firefox) by right clicking on a network response inside the developer console panel.
I found that is the same for chrome.
Look here:
https://developers.google.com/web/tools/chrome-devtools/network/reference
I didn't search if there is a way to automatically store all the responses received by a browser. I'm not sure, but I think it isn't possible :/

Generate multiple sheets in one pdf file

I am trying to generate a pdf from a Tableau workbook which has two sheets using the url method:
E.g: https://TableauServer/views/workbook/sheet1?:format=pdf&parameter=value
I am doing this in a program which will issue the url request to the url. The url works fine for one sheet. But the problem is how to generate one pdf file with both sheets in it?
If you first put your two sheets into a single dashboard and then use the URL for the published dashboard (still using the format=pdf parameter), this should work just fine.
We know it's possible because within the Tableau pages itself if you download a PDF it gives you several formatting options, including the option to put all the worksheets in a workbook into a single PDF.
I couldn't find any documentation on it though. What I ended up doing was looking at the network console in the browser (usually F12) when I downloaded the PDF from the browser by clicking the Download button. That showed me the URL end point and the JSON body the server expected in the request payload.
The endpoint URL wasn't too cryptic and ended with "commands/tabsrv/pdf-export-server". The challenge was to take the JSON in the request payload and find the right settings to get it into a single PDF.
This method is a more technical approach and requires very little coding skills; any language that has functions for http calls will work (I use python for it).
If you don't mind doing it outside a browser, tabcmd has lots of functionality to control PDF generation at the command line.

Powershell method of downloading file from a website with a changing URL?

I have been given a task that involves downloading a single file every day from a website. Let's call it "https://test.example.com". I have credentials that allow me to login to the site, where a Flash interface then presents the files that are available for download. After the file is downloaded, it is then processed in a variety of ways. I have already put together the Powershell that handles all that, I am just having a hard time with automating the actual download of the file.
I used the Flash interface to download a few files while watching the network activity, and found that it is actually pulling the file from this URL:
https://test.example.com/link/EBDB7F67EF3B28XX99NCAD9920160423/file.zip
Therefore, I was able to put this together in order to automatically get the file via my PS script:
$url = 'https://test.example.com/link/EBDB7F67EF3B28XX99NCAD9920160423/file.zip'
$output = "C:\Downloads\file.zip"
Invoke-WebRequest -Uri $url -OutFile $output
However, the long string of numbers in the URL changes every day. The only discernible pattern I can find is that the last eight digits are always the date on which that particular file is posted.
Is there a good way to approach this? I've been experimenting with wildcards and patterns, as well as checking the HTML for elements that I can filter, but I am having a hard time finding the correct solution.
This is very hard to automate. You can't drive Flash from the script unless it is specifically designed for that. As I see it now your only options are:
Contact site devs if possible, maybe they can give you a details on function that generates link. This gives me an idea - perhaps you can reverse engineer Flash code to find that function details yourself. Use flash decompiler for this.
Simulate the user browsing the flash site. This can be done in one of the following ways:
Autohotkey - you can record mouse clicking relative to the browser window and execute the script again. Unless flash interface is too dynamic and unpredictive it will work.
Sikuli - another automation language which relies on picture segment recognition.
All above 2.* methods produce fragile automation code as they depend on browser settings (zoom, theme) and even OS settings. For this reason you need to dedicate one machine for that in all probability (virtual machine ofc). Decompiling flash code and re-implementing the url generting code in powershell will make it a reliable 100%.
As somebody said in comments this is not a powershell queestion but browser automation question.

Download google trends data in MATLAB

I am trying to download google trends data using MATLAB. However, when I run the following command, I am not able to download the data.
!"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" http://www.google.com/trends/trendsReport?q=MSFT&geo=US&content=1&export=1&graph=all_csv
However, when I paste the URL part into google chrome it will download. How can I get this to work in MATLAB?
I'm assuming that you want the data in MATLAB. To do this, it is best to not make a system call to Google Chrome, but rather to use the tools integrated within MATLAB to do this such as websave.
So you'd want to do something like:
filename = websave('filename.csv', 'http://www.google.com/trends/trendsReport?q=msft&geo=us&cmpt=q&content=1&export=1')
That being said, this may be a little tricky because Google likely requires authentication for access to that page so you would need to use the weboptions input to provide the necessary credentials.