Fetching 1 minute bars from Yahoo Finance - yahoo-api

I'm trying to download 1 minute historical stock prices from Yahoo Finance, both for the current day and the previous ones.
Yahoo (just like Google) supports up to 15 days worth of data, using the following API query:
http://chartapi.finance.yahoo.com/instrument/1.0/AAPL/chartdata;type=quote;range=1d/csv
The thing is that data keeps on changing even when the markets are closed! Try refreshing every minute or so and some minute bars change, even from the beginning of the session.
Another interesting thing is that all of these queries return slightly different data for the same bars:
http://chartapi.finance.yahoo.com/instrument/2.0/AAPL/chartdata;type=quote;range=1d/csv
Replace the bold number with 100000 and it will still work but return slightly different data.
Does anyone understand this?
Is there a modern YQL query that can fetch historical minute data instead of this API?
Thanks!

Historical minute data is not as easily accessible as we all would like. I have found that the most affordable way to gather Intraday Stock Price data is to develop automated scripts that log price information for whenever the markets are open.
Similar to the Yahoo data URL that you shared, Bloomberg maintains 1-Day Intraday Price information in JSON format like this : https://www.bloomberg.com/markets/api/bulk-time-series/price/AAPL%3AUS?timeFrame=1_DAY
The URL convention appears easy to input on your own once you have a list of Ticker Symbols and an understanding of the consistent syntax.
To arrive at that URL initially though, without having any idea for guessing / reverse-engineering it, I simply went here https://www.bloomberg.com/quote/AAPL:US and used Developer Tools on my browser and tracked a background GET request which led me to that URL. I wouldn't be surprised if you could employ similar methods on other Price Data-related websites.
You can also write scripts to track price data as fast as your internet goes. One python package that I find pretty handy and is ystockquote
You can have it request price data every couple of seconds and log that into a daily time series database.

Yes there is other APIs.
I don't know if it can still help but if you need intraday data, there is a API on rapidapi called (Quotient) which allows to pull intraday (at 1-min level), EOD market (FX, Crypto, Stocks (US, CANADIAN, UK, AUSTRALIA, EUROPE), ETFs and Futures. It also provides earnings, dividends, splits and a lot others informations.

Related

How to deal with large ICS files?

My application (php/laravel, but irrelevant here) holds calendar entries for its users (comparable to a car logbook), and some users want to sync those events to their calendar app of choice. I started looking into the ics standard (RFC 5545 etc.) and created an endpoint that generates those files.
Problem: The files are getting huge. Some users have their entire driving history with hundreds and thousands of entries in the application, generating and transfering those MBs of ICS files will take ages (using php, anyways), let alone doing that everytime the calendar app tries to sync.
Question(s): What is the preferred way of dealing with huge ICS documents? HTTP headers and caching is one thing, but how do other people solve this problem? Just send events of the last year? Is there a (pagination?) spec that I haven't found yet?
This is historical data, so it is not going to change. You could offer batches by time period and cache the historical batches. Last years, or anything before the last 4 weeks, never gets updated for example. They do a one-off import of each historical batch into a separate 'driving history' calendar. No more subscribing? Or maybe they can only subscribe for the last month say?
One cannot import & subscribe into the same calendar, so it does mean they would have at least 2 calendars - 1 historical calendar used for imports, and 1 'current' that will update with yesterday's ride. Of course there is then manual effort for anyone who wants to always have the old data as when events fall off the 'current' calendar, at some point they'd have to go import the latest 'old' events.

How can I search for past sent emails with Sendgrid?

As Sendgrid's documentation makes clear, their web GUI activity page is only searchable for the past 7 days.
How do I search for activity from farther in the past?
Web API documentation is here, but I can't find anything about just plain searching for info on sent emails. All I see are endpoints for seeing particular categories of emails' various fates, like blocks, bounces, invalid emails, and "filters", which seem like actions and not like filters.
It's got to be possible to just find info about some particular sent email, right?
It's not possible. As you noted, the documentation clearly states that:
Email activity only shows the most recent 7 days. To access data in
real time, we recommend that you consider implementing our Event
Webhook.
If you want to record all the history associated with your account you should record and save it yourself. You can record all the emails you send provided you have an endpoint to do so. See here: https://sendgrid.com/docs/User_Guide/Settings/parse.html
Later Edit:
"real time" means "as it happens", it does not mean "history searchable at any point in time".
When you use an API, as a developer, the responsibility to log all API calls and responses lies with you. While it's true that bounces aren't necessarily reported in the API call response, the SendGrid API offers several ways in which you can be notified. Personal opinion: I know this functionality is often omitted in the MVP because you need to go to market as soon as possible, but an ELK stack is not that hard to set up.
There are several ways you can look for bounces and other events as you can see here: https://sendgrid.com/docs/Classroom/Track/Bounces/bounce_reports_how_can_i_be_notified.html
Webhook for events: http://sendgrid.com/docs/API_Reference/Webhooks/event.html
Enabling Bounce Forwarding on your account
Bounce API: https://sendgrid.com/docs/API_Reference/Web_API_v3/bounces.html
If you really need to find out what happened on day X with email send Y, you can contact their Support team. They can probably look it up for you.
Personal opinion:
That 7 days is not a random number. I'm willing to bet that SendGrid does in fact log all calls you made but it can't provide them for an earlier time. When you use Facebook API, Twitter API, etc. You don't expect them to provide you with historical data of every API call you made. This is an ungodly amount of data. We're talking about an API that is used to send probably upwards of millions of emails per day, maybe even more. I believe they actually did the math and recalling historical data from earlier would put an unnecessary strain on the system, it would take a long time to answer such a request.
I'm sorry if I went on a bit of a rant but people often don't think about the volume of data needed to store such things and how much it would cost to search it.

How scalable is Parse? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've been considering using Parse.com's service for my backend, but I'm skeptical about its scalability.
Can it really handle several thousand simultaneous users? If not, is their any good way transitioning away from it?
I know the question may be old, but wanted to provide my 2 cents for others out there who may be considering parse....
Under the simplest of scenarios, parse may work well. As soon as you need to scale up to more complex queries, I have personally found nothing but headaches.
Queries are limited to 1000 records. Initially, you may think this is not an issue, until you start dealing with sub queries, and realize weird data is returned because the sub query cuts records off without warning or error. (FYI, the default is 100 records unless you specify a limit up to 1000, so the problem is even worse if you are not paying attention).
For some strange reason there is a limit to the number of times you can issue a count query in a min. (and this limit appears to be really low). Be prepared to try and throttle your code so you don't hit this limit, otherwise errors are thrown.
Background Jobs do not run reliably. I have had a background job set to run every 5 min, and there are times it takes 20+ min before the job will kick in.
Lots of Timeouts. This is the one that gives me the most heartburn.
A. If you have a cloud function that takes a while to process, you have about 6 or 7 seconds to get it done or it will cut you off.
B. I get the feeling that there is a general instability with the system. Periodically, I run into issues which seems to last for about an hour or so where timeouts happen more frequently (and with relatively simple functions that should return immediately).
I fully regret my decision to use parse, and I am doing all I can to keep the app alive long enough for us to get funding, so we can move off the platform. If anyone has any better alternatives to parse, I am all ears.
[Edit: after three amazing years with the team, I've decided to move on and am no longer a Parse or Facebook employee. The team is in great hands and has done amazing things. The entire backend has been rewritten to increase performance and reliability dramatically. The roadmap is amazing, and I expect great things to come from the team. At the time of my departure, Parse powered over 600,000 applications and served a mind boggling number of requests each day. Were each Parse push to be sent to a unique person, they could form the world's fourth largest country in one day. For future help with Parse, please either post questions here with the parse.com tag or post to the parse-developers Google group.]
Full disclosure: I'm a Parse engineer.
Parse already hosts thousands of apps, let alone users. When we exited beta in late march, we announced over 10,000 applications running on Parse with a 40% month-over-month growth rate. Parse is staffed by a world-class team, many with years of experience in big data and high volume traffic.
We welcome your traffic with open arms; you will be in the company of great teams like Band of the Day and Hipmunk. We are so confident in our services that we built our One Click Export system so people like you can try Parse risk free. If you feel Parse does not meet your performance expectations, we will gladly send you off with all of your data intact.
We chose Parse as the backend for our app.
Conclusion: DON'T.
Stability is a disaster, performance is a disaster too, and so is support (probably because they can't really help you because all the issues are non-reproducible).
Running even the simplest of functions can lead to random timeouts inside Parse (I am talking about simple PFUser login calls for instance):
Error: Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo=0x17e42480 {NSErrorFailingURLStringKey=https://api.parse.com/2/client_events, NSErrorFailingURLKey=https://api.parse.com/2/client_events, NSLocalizedDescription=The request timed out., NSUnderlyingError=0x17d10e60 "The request timed out."} (Code: 100, Version: 1.2.20)
We encounter timeouts on a daily basis, and this is with an app we are testing with 10 users max!
This is the typical one we get back all the time, at completely arbitrary moments and impossible to reproduce. Calling a Cloud Code function that does a few queries and a few inserts:
{"code":124,"message":"Request timed out"}
Try the same 10 minutes later and it runs in less than a second. Try again 20 minutes later and it takes 30 seconds to execute.
Because there is no transactionality it is really a lot of fun when storing for instance 3 objects in 1 Cloud Code function, where Parse decides to bail out of the function randomly after let's say having saved 2 of the 3 objects. Great to keep your database consistent.
The "best" ones we got where these. Mind you, this is the actual data coming back from a Cloud Code function:
{"code":107,"message":"Received an error with invalid JSON from Parse: <!DOCTYPE html>\n<html>\n<head>\n <title>We're sorry, but something went wrong (500)</title>\n <style type=\"text/css\">\n body { background-color: #fff; color: #666; text-align: center; font-family: arial, sans-serif; }\n div.dialog {\n width: 25em;\n padding: 0 4em;\n margin: 4em auto 0 auto;\n border: 1px solid #ccc;\n border-right-color: #999;\n border-bottom-color: #999;\n }\n h1 { font-size: 100%; color: #f00; line-height: 1.5em; }\n </style>\n</head>\n\n<body>\n <!-- This file lives in public/500.html -->\n <div class=\"dialog\">\n <h1>We're sorry, but something went wrong.</h1>\n <p>We've been notified about this issue and we'll take a look at it shortly.</p>\n </div>\n</body>\n</html>\n"}
The stuff I describe here is not something that happens once in a blue moon in our project. Except for the 500 errors (which I encountered twice in a month) all the others are seen on a daily basis.
So yes, it's very easy to get started with, but you must take into account that you are working on an unstable platform, so make sure you got your retries and exponential backoff systems up and running, because you will need this!
What worries me the most is that I have no idea what would happen once 20.000 people start using my app on this backend.
edit:
Right now I have this when doing a PFUser login:
Error: Error Domain=PF_AFNetworkingErrorDomain Code=-1011 "Expected status code in (200-299), got 502" UserInfo=0x165ec090 {NSLocalizedRecoverySuggestion=<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>
, PF_AFNetworkingOperationFailingURLResponseErrorKey=<NSHTTPURLResponse: 0x16615c10> { URL: https://api.parse.com/2/get } { status code: 502, headers {
"Cache-Control" = "no-cache";
Connection = "keep-alive";
"Content-Length" = 107;
"Content-Type" = "text/html; charset=utf-8";
Date = "Mon, 08 Sep 2014 13:16:46 GMT";
Server = "nginx/1.6.0";
} }, NSErrorFailingURLKey=https://api.parse.com/2/get, NSLocalizedDescription=Expected status code in (200-299), got 502, PF_AFNetworkingOperationFailingURLRequestErrorKey=<NSMutableURLRequest: 0x166f68b0> { URL: https://api.parse.com/2/get }} (Code: 100, Version: 1.2.20)
Isn't it great?
If you're writing a small/simple app (or a throwaway prototype) with little to no logic on the backend then go for it, but for something larger/scalable it's best to avoid it, I can say that from first hand experience. It all sounds good with their user management, push notifications, abstracted storage and what not but in the end it's not worth the trouble. Namely I was developing the backend for an app on Parse, clients were so much into it because it sounded cool and promising (strong marketing I guess), being bought by Facebook and what not, but a few weeks into production major issues/limitations with the platform started arising, what should be a simple app turned out to be a nightmare to develop and scale.
The result/conclusion of the project:
- broke the time window for a relatively simple app - it should have lasted 2-3 months, it lasted almost a year and still isn't stable/reliable, if we used a custom stack it'd be done inside the time window for sure cause I made a similar demo project in 5-10 days with a custom node stack
- lost the client's trust, they're now remaking the app with another team who'll use a custom stack
- lost loads of cash for breaking the time window and trying to make it work
- did so much overtime cause of it that it started to reflect on my health
- never using some platform/solution that promises to have it all, always going with a custom/tried stack
First were the stability issues and constant failing of the platform like server downtimes and random errors, but they have all that sorted out (that was at the start-mid of 2014), but the following problems remain:
you can't debug your code, at least at the time being (there are ways you could make it work with an additional node server and some obscure lib)
the limits are ridiculous, a scalable platform which can do 50-60 API request per second (or more depending on your subscription), which isn't as low it sounds until you start to do strain testing, and when you hit it your code will constantly fail
API calls are measured like this: calling a server function (Parse job) - 1 call, querying the database - 1 call, another query (cause they don't have some advanced/complex query system in place, if you have a more complex database schema you'll realise very soon what I mean) - 1 call, if you need to get more than 1000 queries guess what - query again, etc., query for count (you need to do it as a separate query) which is unreliable (tends to return an approximation for a few thousand entries)
creating/saving ~1000+ simple objects is a strain on the platform/database, deleting 1000 or more objects, even more so, which is ridiculously fast for normal databases, but on Parse it tends to take 5-10 minutes (if you check it more closely it deletes 20 objects per batch)
no way to use most of the npm packages (only the pure JS ones by including the source directly)
if you go and read Parse forums you'll see users downvoting/roasting the Parse team constantly for the platform's lack of features and needing to jump through hoops for arbitrary logic implementation like fetching random entries and similar stuff
they support Stripe integration, but if you want to use Paypal or some other payment service (we decided to use Paypal cause it has a vastly superior country support over Stripe) you can't make it work on Parse, for Paypal integration I had to use a separate server to pull it off
no easy way to sync users and handle concurrency issues, you have to use hacks and some funny logic you wouldn't use or admit using nowhere never
want 100+, let alone 1000+ simultaneous users, good luck pulling that off
when you want to find out the number of entries in a table, you can hit the limit on calling the count query which it's funny, not documented and totally ridiculous, and in the end returns an approximate number
modularity is foreign to the platform, the functions you call from your jobs can't last more than a couple of seconds (7 seconds I think) and when you take into consideration the query time it's bound to happen a lot with more complex queries and some complex logic
You can have something like Cron jobs but they can't last more than 15 minutes (due to the low performance of the platform like multiple queries that's very, very short), they are limited to 2-3-4 simultaneous jobs depending on your subscription fee, and have a very limited/poor scheduling system in place (e.g. you can't edit it from your code, it's very limited so you have to use hacks to run the same job at 2 exact times during the day or something similar, it can't watch for time savings etc.)
When you get an error on the server it can be totally misleading, check the forums for that, can't remember anything from top of my mind
Push notifications are regularly late as much as 20-30 minutes
An arbitrary example: you want to fetch a random item from their database, your app makes the call to a job that'll provide it (1 API call), the job queries the database, but you have to make 2 calls, first to get the count of the items (1 API call) and then a second one to get a random item (1 API call), this is 3 API calls for that functionality, and with 60 requests per second, 20 users can make that call at a given time before hitting the request limit and the platform going haywire, after you include other users browsing through app screens and stuff, you see where this leads...
If it were any good wouldn't Facebook who bought it every mention using it for even some of their apps? I'd suggest 3 things:
- first - don't listen to the Parse guy, it's his platform so he has to promote it, listen to people who have been using it to make something using it
- second - if you need a serious and scalable platform and don't want to go fully custom, go for Amazon Cloud services or something similar that's tested and reliable
- third - stay away from the platform if you have any server side experience, if you don't then go and hire a backend dev for the project, it will be cheaper and you'll get a working solution in the end
I have spent the day looking into parse.com and here is my current opinion based on what I've found (Please bear in mind that I have only very brief experience of developing with the SDK as yet)..
Parse.com clearly has some very attractive positives which is why I found myself looking into it, but for the sake of debate I will concentrate on being critical as the great positives are all listed on their website. (Well done parse.com for attempting to solve such a great problem!)...
In the testimonials, Hipmunk is the biggest name I would say. It is listed as an app which uses the data portion of the SDK. Without approaching Hipmunk developers, I can't know for sure but I can't imagine them storing ALL their data in the parse.com cloud.
After trying and browsing most of the apps listed. None really stand out as being hugely dependent on a server back-end so I find it impossible to get an idea of whether or not scalability has been solved using parse.com based on these.
The website states 40,000 apps and counting. I feel (but do not know) that based on the app gallery, this figure is based on the amount of apps in their user-base, and not real live production apps in the app-stores. The app gallery would feature far more big names if that many apps were using parse.com.
Parse.com is a very new concept, and very different even to its closest rivals. So without concrete evidence on how scalable and stable (and all the rest) it is, then it is very hard for a developer on a project to consider committing to it as there is too much at stake.
I ran tests for my own answer to similar question and it can be VERY, VERY FAST. However , the results you get may depend on the details of your implementation...
Test compared Android SDK to Android using native HTTP stack making Parse/REST calls...
Test Details:
Test environment - newest Android version on 10 month old phone over fast WIFI connection.
( upload 63 pictures where avg filesize=80K )
test 1 using the android SDK RESULT=Slow performance
test 2 using native REST calls over android RESULT=VERT FAST
--EDIT-- as there is interest here....
Regarding http thruput , the parse SDK(android) and performance, it may be that parse.com has not optimized performance on the way that they implement android asyncTask() in the parse.android SDK? How the work that required 8 min. on parse.sdk could be done in 3 seconds on an optimized REST , DIY framework ( see links for details on implementations), i really do not know. If parse have not fixed their SDK implementation since these comparison tests ran, then you probably dont want their default SDK asnycTask stuff doing anything approaching a real workload on the network.
The great attraction about Parse (and similar SaaS) is that you can save tens of thousands on back-end development costs. Given that the back-end is often the most expensive aspect of a Web app; that head-ache is suddenly poof.
The problem with Parse and most (all) SaaS is that the region, power, memory, bandwidth, scalability, thresholds, alerts and various actions are out of your control.
Same with Shopify. It's a great Saas with comprehensive control over products, orders, inventory, and aesthetics -- but zero control over the machine. So, today's SaaS is not a heck of a lot different than godaddy. They invariably oversell or max-out their machines in order to make money; and you are stuck if you really care about ass-kicking performance. You cannot even buy that level of service.
I would like something AT LEAST as powerful and comprehensive as the AWS console. Most techies know and accept that Heroku and Parse are both hosted on AWS. Who cares. So charge more for the added service, but don't deny access to those critical low-level tools that make a Site and App and the user experience zing. Hint to those Parse employees.
At any rate, in answer to the question:
The Parse API is simple JSON. So you can pump out the data in the same JSON format that a Parse application expects.
You might even be able to utilize their PFObject (iOS). At some point, all that highlevel API goes to a common HTTP request/response. The good thing about REST's generality means common-of-the-shelf; things like http, url, strings, and utf. No funky Orb here.
Parse is great to start with especially helper functions/features about user management. But I started encountering issues ..
Long execution/ping times, 1000 object limit INCLUDING subqueries, no datacenters at europe (as far as I know)
It would've been a divine platform if they could sort performance and stability issues. I somehow regret developing with it, but I put 5000+ lines of code so I'm going to stick with it.
Maybe they should separate their DEV apps and PROD apps environments, and only allow PROD apps after some kind of supervision, or create a different environment with only paying customers?
We are in 2014, $20/month servers can handle unoptimized websites(60 not-cached db queries on homepage) with 1 million visits/month, this shouldn't be that hard come on Parse!
It's ok for prototyping the apps, especially if the iOS/Android developer doesn't know how to build a DB/API backend himself.
It's not ok at all, when it comes to developing an application with a logic that requires queries more complex than:
SELECT * FROM 'db' WHERE 'column' = 'value' LIMIT 100;
Related queries and inner joins do not exist on Parse. And good luck updating/removing 320 000 records if you need (that's the number I'm working with now).
The only thing that is really useful is handling the Users through the SDK. If I could find a good docs or even tutorial how to handle/create users through iOS/Android apps using Django and DRF/Tastypie, I'm instantly converting everything is being developed in our company to use that.

Realtime backend platform for reporting / dashboards?

I will build a dashboard system for my apps, where a page will have several widgets that draw charts, tables and glyphs representing potentially unrelated data.
The client will be HTML5 and I can push for only modern web browser.
My big problem is what backend use for this. I want to store "tables" for use in the charts and in real-time update the widgets.
For example, a invoicing widget will show how much $$ have been collected today. In the "table" will have a row for each total of the invoice:
inv = 1; total = 50
Total: 50
and the widget will draw that. When new data is pushed:
inv = 2; total = 100
Total: 150
The widget will show in realtime the total to the end-user.
The data is private for the user company. Eventually I will need to purge too old data (ie: I only need to keep as much data is necessary to proper evaluation of the info need for the end-user. For example, only keep 1 month of invoicing totals).
I'm thinking in use something like http://www.firebase.com/ or http://pusher.com/ but I suspect only solve the "notify in realtime" part of the equation. As far as I understand, they not let me get past data (ie: If the data is update in the weekend and the user open his dashboard to see what happened)
Then I see http://derbyjs.com/ and the possibility to use mongodb.
I wonder which backend/platform will bring me closer to the build of this system. I have experience with python/django/.net/postgress but could accept the use of something else if solve best this kind of app behavior.
Firebase offers both the "notify in relatime" part that you mention, as well as persistent data storage. Take a look at the tutorial, which walks you through building a real-time persisted chat app (the past chat messages are stored in Firebase and are sent back to the client every time you reload). And you can do much more complicated stuff like the real-time charts / widgets that you mention as well.
The big limitation with Firebase right now is that we're in closed beta and the data is currently unprotected (anybody can read and write your data). The security features are coming soon though.
Some other backend platforms you may want to evaluate are: Meteor and Simperium. Firebase and Simperium are cloud services where your data is stored in the cloud and you don't have to manage any servers of your own, while Meteor and DerbyJS are platforms that you have to install and run on your own server.
I would recommend signalR. It's amazing and you can literally do anything with it. Check it out: www.signalr.net and if you have any problems simply go to www.jabbr.net You will find a very helpful community there. I implemented a notification mechanism similar to facebook together with real time monitoring and a small chat in the same web site.

Third party data delivery of lots of data

Does anyone know how sites that have a real-time feed of a lot of data work? I am referring to something like a stock site, where they can tell you in real time (well, 20 minute delay mostly, but still real-time - 20 minutes as I understand it).
They have thousands of data pieces delivered to them every second, I would imagine: MSFT 25.00 +.23 VOL 12000 ???? for each stock that had a change during some interval.
So, is there just a constant feed of small pushes going on? Or do you think a site will pull from the place that has the real data and say "give me all changes since 12:23:45 CST to now" type query?
I ask this because at work we might have a situation where we need to have at our application's fingertips real time information like this, and it won't make sense to hit our third party provider over and over and over again every second...
Generally there is a server/client protocol defined between the 2 parties. In the company I work for the connection is maintained at all times.
Here is info on real time data feeds to go with your stock example
NYSE,NASDAQ
It is common for data providers to also have FTP sites with (delayed) batched data. One that comes to mind is the NWS EMWIN
Sites like Twitter feed data to certain approved sites in real-time via XMPP (Wiki link).
In the broadest terms, a push model is going to be the best way of achieving "real time" transfer, particularly if you're talking about a large amount of data.
However you do always have a problem when using a purely push model of how to recover from missed data.
Depending on the nature of your data that may not be a problem (thinking of video delivery as an analogue, where the amount of data is huge but there is sufficient redundancy for it to recover from missing data). And if you have any control over the data you may be able to build some redundancy in. For example, on every change event you can provide absolute values rather than changes, or previous value and new value.
I've done this making an attempt to retrieve the stock quote from the source, and falling back to a timestamped on-disk cache of the quote when the main source fails or times out.