is there a service or software for metering data downloads - service

We have a range of web applications here that allow users to download selected data from a number of databases and online services. Mainly Environmental information. We can track users visiting web pages using tools like Piwik or Google Analytics. We also want to track the amount of resource or data that they use, possibly also applying limits to record downloads.
If this was a single DB system we could track rows delivered within the db. However here we have a SOA with a range of sources and sinks. What I envisage is a service that can be messaged by other systems to register or track the amount of a resource used.
e.g User Andrew was sent 125MB of water quality data.
The central data metering service tracks usage messages from a variety of sources, produces reports and where appropriate applies caps or billing limits.
This service might be expanded to include processing as well as data download.
I would consider this to be a not unusual requirement but I can't find much in the way of existing software for it - perhaps because I am not using the correct terminology.
SO my questions:
What would you call this service - what keywords will lead me to existing systems?
What solutions already exist in this area - in particular FOSS or cloud based systems?
Could something like Google Analytics be persuaded to operate in this fashion?

It would be possible to do with the measurement protocol from Google Universal Analytics in conjunction with the user id feature in Analytics and one or more custom dimensions.
The measurement protocol is a language agnostic vaguely REST-like (inasfar as you send a bunch of parameters to an endpoint) protocol to send tracking data to the Google servers.
User id is a feature to recognize authenticated users across devices and multiple visits.
If the various parts of your setup send http calls build to the measurement protocol and include the user id to recognize the user and a value for a custom dimension for the file size (or rather a custom metric if you want to have sums and averages) and maybe a custom dimension for the file name you can send this to you Analytics account and build a custom report for downloads.
Note that the user id is an internal id that is used to link together visits by the same user from multiple devices - it is not something that shows up in the reports that would allow you to report on individual users in the Analytics interface (if you want that you need to include another id as custom dimension, and you have to check with the Google TOS what kind of id is allowed). Plus you'd need a dedicated data view in GA for sessions with a user id which will not show unauthenicated users.

Related

Flutter – question about architecture, providers and fetching data from server

I am a rather fresh Flutter programmer so please excuse any flaws in the questions below…
I am struggling with a structural/ architecture dilemma. Here is the background:
App rationale:
my app allows its users to check little jobs available in their area and if they find time and are in a proper location to execute the job for a remuneration,
the app uses standard REST API (not Firebase) so that the server cannot be relied on sending status change notifications to trigger re-fetching of data,
the critical elements are (1) up-to date list of jobs for a given address - other user may have already taken on a job in an address (timed refresh of list e.g. every 5 mins), and (2) the app needs to keep track of the user’s location and accordingly ask the server for jobs if the user relocates by more than 2km in less than the refresh time,
The challenge:
I guess that on the basic level the app should have the following providers: (1) auth – providing the authToken, (2) geolocation – regularly checking user’s location, (3) jobList - for particular location (fetches high level job descriptions and addresses(, (4) jobDetails – fetches exact instructions for carrying a particular job,
as you can see: (2) geolocation and (3) jobList – need to refresh programmatically (at interval or on some change of geolocation), while (1) auth, (4) jobDetails are triggered by the user.
The Big Question ;) is … what is the proper architecture for the above type of app? More specifically:
should I use services for connecting to the server API and these would in turn be used by the providers?
how to ensure programmatic refetch of jobList on timer and relocation event from geolocation?
how to continually listen to location changes to detect a relocation but not overwhelming the app with processing?
should I store the (quickly outdating) jobLIst data just in its object class or should I use settings provider or a local db or maybe there is an easy way of storing the latest JSON response not to have to build the settings provider or db mapping?
in all my call to Auth api I need to provide the deviceId - how to make it available accross the app - this is pretty static but is needed in authentication so should checking it be a part of the auth provider?
If you could comment on the above or suggest a source of relevant examples I would be really grateful.
Thanks and cheers!
Here are my thoughts:
how to continually listen to location changes to detect a relocation but not overwhelming the app with processing?
You can rely on third party to do this for you. Such as: geolocator. With this, you can specify the amount of distance the user must have moved before the package notifies you of the change in user location.
should I store the (quickly outdating) jobLIst data just in its object class or ...
Since it is likely for a job listing app to use this data often and in various places, I would prefer to use db. It would be helpful in the long run too, if you plan to have some sort of analytics done on the mobile end or to gather any insights.
in all my call to Auth api I need to provide the deviceId - how to make it available accross the app ...
When you app is initialized, you could fetch the deviceID and store it in shared_preferences. Then in auth api, you could just retrieve it before making the API call.
should I use services for connecting to the server API and these would in turn be used by the providers?
As for geo location, geolocator can update you about the change in location and you could make an API call based on that.
However, if you plan to have a timer based approach to refresh your job listing, then you must realize that your users are likely to face issues arising from your inconsistent data. If you have plans to tackle it, then this implementation here might help. But I strongly feel that server supporting push notifications or maybe a web socket approach would be ideal here.

Dropbox app with tiered users

Preface:
I'm hoping to upgrade an existing application by adding cloud backup and syncing of the customers data. We want this to be as seamless as possible, but also for the customers only interface to the data to be via the applications front-end interface.
Our application can be connected to the oil pipe of a machine, collects data on the oil condition. When a test has completed we want to push this to the cloud. Because of the distinct test nature of the data (as opposed to one big trend) most IoT platforms don't suit very well, so we're aiming to release a slightly modified version of the application which doesn't have the connection to the sensors and this will be our remote front-end.
Since the existing application uses a relatively simple file structure to store it's data, if we simply replicate these files in the cloud, the remote front-end version can just download these to the same location and it'll work fine. Thus this has lead us to Dropbox (or any recommended more appropriate cloud storage system).
We hope to use the Dropbox API directly in our application to push and pull the files as necessary. All of this so far we believe is perfectly achievable.
Question: Is it possible - and if so how would we go about - to setup a user system with the below requirements
The users personal dropbox is not used
Dropbox is completely hidden from the user
The application vendor has a top level user who has access to all data (for analytic, we do not want to store confidential or sensitive data).
When the user logs in they only have access to their folder and any attackers could not disrupt the overall structure. (We understand that if an attacker got the master account then all is lost, but that is an internal issue to keep it secure. As long as the user accounts are isolated this is okay.)
Alternative Question Is anyone aware of a storage system or IoT system which would better suite this use case? We will still require backups/loss prevention as part of the service.

Calendar events form entire Lotus Notes

I'm trying to fetch all events from entire system by using REST API to synchronize with own application. I was extracting events for every user using REST API for his own calendar file.
For example:
Fetch johndoe.nsf/api/calendar/events
Fetch jasonmartin.nsf/api/calendar/events
Fetch jeanmoore.nsf/api/calendar/events
etc.
It's working with low number of users. But I need to do it for around 2,5k users, which kills my system.
Is there any central database from I can extract this data?
I tried this with resource reservation databese, but only what I got was empty response.
No, there is no central database of calendar events. There couldn't be. Notes and Domino is a distributed environment. Information can be spread over dozens of servers.
But you could write Java or C application that runs on the Domino server and aggregates the information from all the users' calendars into one central database, and that application will probably run faster than your remote calls through the REST API. But you'll still have to make REST API calls into that central database, and the sum of the activity will be greater than what you are dealing with now.
Maybe my iCal freeware tool could help you
http://abdata.ch/publish-ibm-domino-calendar-entries-in-icalendar-format/

Need advice: How to share a potentially large report to remote users?

I am asking for advice on possibly better solutions for the part of the project I'm working on. I'll first give some background and then my current thoughts.
Background
Our clients can use my company's products to generate potentially large data sets for use in their industry. When the data sets are generated, the clients will file a processing request to us.
We want to send the clients a summary email which contains some statistical charts as well as sampling points from the data sets so they can do some initial quality control work. If the data sets are of bad quality, they don't need to file any request.
One problem is that the charts and sampling points can be potentially too large to be sent in an email. The charts and the sampling points we want to include in the emails are pictures. Although we can use low-quality format such as JPEG to save space, we cannot control how many data sets would be included in the summary email, so the total size could still exceed the normal email size limit.
In terms of technologies, we are mainly developing in Python on Ubuntu 14.04.
Goals of the Solution
In general, we want to present a report-like thing to the clients to do some initial QA. The report may contains external links but does not need to be very interactive. In other words, a static report should be fine.
We want to reduce the steps or things that our clients must do to read the report. For example, if the report can be just an email, the user only needs to 1). log in and 2). open the email. If they use a client software, they may skip 1). and just open and begin to read.
We also want to minimize the burden of maintaining extra user accounts for both us and our clients. For example, if the solution requires us to register a new user account, this solution is, although still acceptable, not ranked very high.
Security is important because our clients don't want their reports to be read by unauthorized third parties.
We want the process automated. We want the solution to provide programming interface so that we can automate the report sending/sharing process.
Performance is NOT a critical issue. Our user base is not large. I think at most in hundreds. They also don't generate data that frequently, at most once a week. We don't need real-time response. Even a delay of a few hours is still acceptable.
My Current Thoughts of Solution
Possible solution #1: In-house web service. I can set up a server machine and develop our own web service. We put the report into our database and the clients can then query via the Internet.
Possible solution #2: Amazon Web Service. AWS is quite mature but I'm not sure if they could be expensive because so far we just wanna share a report with our remote clients which doesn't look like a big deal to use AWS.
Possible solution #3: Google Drive. I know Google Drive provides API to do uploading and sharing programmatically, but I think we need to register a dedicated Google account to use that.
Any better solutions??
You could possibly use AWS S3 and Cloudfront. Files can easily be loaded into S3 using the AWS SDK's and API. You can then use the API to generate secure links to the files that can only be opened for a specific time and optionally from a specific IP.
Files on S3 can also be automatically cleaned up after a specific time if needed using lifecycle rules.
Storage and transfer prices are fairly cheap with AWS and remember that the S3 storage cost indicated is by the month so if you only have an object loaded for a few days then you only pay for a few days.
S3: http://aws.amazon.com/s3/pricing
Cloudfront: https://aws.amazon.com/cloudfront/pricing/
Here's a list of the SDK's for AWS:
https://aws.amazon.com/tools/#sdk
Or you can use their command line tools for Windows batch or powershell scripting:
https://aws.amazon.com/tools/#cli
Here's some info on how the private content urls are created:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html
I will suggest to built this service using mix of your #1 and #2 options. You can do the processing and for transferring the data leverage AWS S3 which is quiet cheap.
Example: 100GB costs like approx $3.
Also AWS S3 will be beneficial as you are covered for any disaster on your local environment your data will be safe in S3.
For security you can leverage data encryption and signed URLS in AWS S3.

How to balance REST api and Openedness to prevent data stealing

One of our web site is a common "Announce for free your apartment".
Revenues are directly associated to number of public usage and announces
registered (argument of our marketing department).
On the other side, REST pushes to maintain a clear api when designing your
api (argument of our software department) which is a data stealing
invitation to any competitors. In this view, the web server becomes
almost an intelligent database.
We clearly identified our problem, but have no idea how to resolve these
contraints. Any tips would help?
Throttle the calls to the data rich elements by IP to say 1000 per day (or triple what a normal user would use)
If you expose data then it can be stolen. And think about search elements that return large datasets even if they are instigated by javascript or forms - I personally have written trawlers that circumvent these issues.
You may also think (if data is that important) about decrypting it in the client based on keys and authentication sent from the server (but this only raises the bar not the ability to steal.
Add captcha/re-captcha for users who are scanning too quickly or too much.
In short:
As always only expose the minimum API to do the job (attack surface minimisation)
Log and throttle
Force sign in(?). This at least MAY put off some scanners
Use capthca mechanism for users you think may be bots trawling your data