Billing by tag in Google Compute Engine - google-cloud-storage

Google Compute Engine allows for a daily export of a project's itemized bill to a storage bucket (.csv or .json). In the daily file I can see X-number of seconds of N1-Highmem-8 VM usage. Is there a mechanism for further identifying costs, such as per tag or instance group, when a project has many of the same resource type deployed for different functional operations?
As an example, Qty:10 N1-Highmem-8 VM's are deployed to a region in a project. In the daily bill they just display as X-seconds of N1-Highmem-8.
Functionally:
2 VM's might run a database 24x7
3 VM's might run batch analytics operation averaging 2-5 hrs each night
5 VM's might perform a batch operation which runs in sporadic 10 minute intervals through the day
final operation writes data to a specific GS Buckets, other operations read/write to different buckets.
How might costs be broken out across these four operations each day?

The Usage Logs do not provide 'per-tag' granularity at this time and it can be a little tricky to work with the usage logs but here is what I recommend.
To further break down the usage logs and get better information out of em, I'd recommend trying to work like this:
Your usage logs provide the following fields:
Report Date
MeasurementId
Quantity
Unit
Resource URI
ResourceId
Location
If you look at the MeasurementID, you can choose to filter by the type of image you want to verify. For example VmimageN1Standard_1 is used to represent an n1-standard-1 machine type.
You can then use the MeasurementID in combination with the Resource URI to find out what your usage is on a more granular (per instance) scale. For example, the Resource URI for my test machine would be:
https://www.googleapis.com/compute/v1/projects/MY_PROJECT/zones/ZONE/instances/boyan-test-instance
*Note: I've replaced the "MY_PROJECT" and "ZONE" here, so that's that would be specific to your output along with the name of the instance.
If you look at the end of the URI, you can clearly see which instance that is for. You could then use this to look for a specific instance you're checking.
If you are better skilled with Excel or other spreadsheet/analysis software, you may be able to do even better as this is just an idea on how you could use the logs. At that point it becomes somewhat a question of creativity. I am sure you could find good ways to work with the data you gain from an export.

9/2017 update.
It is now possible to add user defined labels, then track usage and billing by these labels for Compute and GCS.
Additionally, by enabling the billing export to Big Query, it is then possible to create custom views or hit Big Query in a tool more friendly to finance people such as Google Docs, Data Studio, or anything which can connect to Big Query. Here is a great example of labels across multiple projects to split costs into something friendlier to organizations, in this case a Data Studio report.

Related

Streaming data Complex Event Processing over files and a rather long period

my challenge:
we receive files every day with about 200.000 records. We keep the files for approx 1 year, to support re-processing, etc..
For the sake of the discussion assume it is some sort of long lasting fulfilment process, with a provisioning-ID that correlates records.
we need to identify flexible patterns in these files, and trigger events
typical questions are:
if record A is followed by record B which is followed by record C, and all records occured within 60 days, then trigger an event
if record D or record E was found, but record F did NOT follow within 30 days, then trigger an event
if both records D and record E were found (irrespective of the order), followed by ... within 24 hours, then trigger an event
some pattern require lookups in a DB/NoSql or joins for additional information either to select the record, or to put into the event.
"Selecting a record" can be simple "field-A equals", but can also be "field-A in []" or "filed-A match " or "func identify(field-A, field-B)"
"days" might also be "hours" or "in previous month". Hence more flexible then just "days". Usually we have some date/timestamp in the record. The maximum is currently "within 6 months" (cancel within setup phase)
The created events (preferably JSON) needs to contain data from all records which were part of the selection process.
We need an approach that allows to flexibly change (add, modify, delete) the pattern, optionally re-processing the input files.
Any thoughts on how to tackle the problem elegantly? May be some python or java framework, or does any of the public cloud solutions (AWS, GCP, Azure) address the problem space especially well?
thanks a lot for your help
After some discussions and readings, we'll try first Apache Flink with the FlinkCEP library. From the docs and blog entries it seems to be able to do the job. It also seems AWS's choice, running on their EMR cluster. We didn't find any managed service on GCP nor Azure providing the functionalities. Of course we can always deploy and manage it ourselves. Unfortunately we didn't find a Python framework

Multiple users working remotely on Tally ERP9

I am not very sure whether this is the right forum to ask this question or not.
We are having a TallyERP9 server with Multiple Licenses. Now our 3 users working remotely on the same Data. We have set up Google Drive for Data Syncing. But most of the time its giving issue due to synchronisation process.
What could be the best soltion so that multiple users can work on same data from Remote Locations?
This is the answer - http://mirror.tallysolutions.com/Downloads/TallyTips/GettingStartedwithDataSynchronisation.pdf
Thanks to #MitaleeRao...
Edit placed here for brevity:
These are 2 points I've noted regarding the Tally architecture:
The database is a flat file in a tree structure, and there are numerous checkpoints at each level for maintaining this inheritance (for e.g., a voucher has inventory entries that have stock items, which have units, etc.).
The SOAP XML protocol that Tally uses does not have multi-threading capabilities - i.e., the Tally server will only accept one request and give a response at a time.
The Data syncronisation that Tally has introduced is probably the automating of exporting the XML of all masters/vouchers and importing them onto the central Database (whether on the Tally.NET server or on a local computer with a static IP). Not sure how the Google Drive client works, but I'm assuming it is a variation of the same (i.e., XML based data export and then import onto a main computer).

Best practices for parameterizing load of multiple CSV files in Data Factory

I am experimenting with Azure Data Factory to replace some other data-load solutions we currently have, and I'm struggling with finding the best way to organize and parameterize the pipelines to provide the scalability we need.
Our typical pattern is that we build an integration for a particular Platform. This "integration" is essentially the mapping and transform of fields from their data files (CSVs) into our Stage1 SQL database, and by the time the data lands in there, the data types should be set properly and the indexes set.
Within each Platform, we have Customers. Each Customer has their own set of data files that get processed in that Customer context -- within the scope of a Platform, all Customer files follow the same schema (or close to it), but they all get sent to us separately. If you looked at our incoming file store, it might look like (simplified, there are 20-30 source datasets per customer depending on platform):
Platform
Customer A
Employees.csv
PayPeriods.csv
etc
Customer B
Employees.csv
PayPeriods.csv
etc
Each customer lands in their own SQL schema. So after processing the above, I should have CustomerA.Employees and CustomerB.Employees tables. (This allows a little bit of schema drift between customers, which does happen on some platforms. We handle it later in our stage 2 ETL process.)
What I'm trying to figure out is:
What is the best way to setup ADF so I can effectively manage one set of mappings per platform, and automatically accommodate any new customers we add to that platform without having to change the pipeline/flow?
My current thinking is to have one pipeline per platform, and one dataflow per file per platform. The pipeline has a variable, "schemaname", which is set using the path of the file that triggered it (e.g. "CustomerA"). Then, depending on file name, there is a branching conditional that will fire the right dataflow. E.g. if it's "employees.csv" it runs one dataflow, if it's "payperiods.csv" it loads a different dataflow. Also, they'd all be using the same generic target sink datasource, the table name being parameterized and those parameters being set in the pipeline using the schema variable and the filename from the conditional branch.
Are there any pitfalls to setting it up this way? Am I thinking about this correctly?
This sounds solid. Just be aware that you if you define column-specific mappings with expressions that expect those columns to be present, you may have data flow execution failures if those columns are not present in your customer source files.
The ways to protect against that in ADF Data Flow is to use column patterns. This will allow you to define mappings that are generic and more flexible.

JMeter to record results on hourly basis

I have a JMeter project with multiple GET and POST requests and assertions for these. I use Aggregate results and View results tree listeners, but none of these can store results on hourly basis. I tried JMeterPlugins-Standard and JMeterPlugins-Extras packages and jp#gc - Graphs Generator listener, but all of them use aggregated data instead of hourly data. So I would like to get number of successful and failed requests/assertions per hour, maybe a bar chart would be most suitable for this purpose.
I'm going to suggest a non-conventional design-level solution: name your samplers dynamically with hour (or date and hour), so that each hour the name will change, and thus they will appear in different category, i.e.:
The code for such name is:
${__time(dd:hh,)} the rest of sampler name
Such sampler will appear in the following way in Aggregate Report (here I simulated it with minutes/seconds, but same will happen with days/hours, just on larger scale):
Pros and cons of such approach:
Simple, you can aggregate anything by hour, minute, or any other time slice while test is running, and not by analysis after execution.
Not listener-dependant, can be used with pretty much any listener or visualizer
If you want to also have overall stats, it will require to sum up every sub-category. So it alters data, but in the way that it can still can be added back to original relatively easy.
Calculating __time before every sampler will not be unnoticed completely from performance perspective, but I don't think it will add visible overhead to a script.
You could get the same data by properly aggregating JTL or CSV (whichever you use) after execution, so it doesn't provide you with anything that is not possible to achieve using standard methods
Script needs altering to make this happen. if you have 100s of samplers, it's going to take a while. And if you want to change back...
You might want to use Filter Results Tool which has --start-offset and --end-offset parameters, you can "cut" your results file into "interesting" pieces and plot them according to your requirements.
You can install Filter Results Tool using JMeter Plugins Manager
Also be aware that according to JMeter Best Practices you should
Use as few Listeners as possible; if using the -l flag as above they can all be deleted or disabled.
Don't use "View Results Tree" or "View Results in Table" listeners during the load test, use them only during scripting phase to debug your scripts.
You can get whatever information you need from the .jtl results file, you can specify test results location via -l command-line argument
To get summarized results per hour add to your test plan Generate Summary Results:
Generates a summary of the test run so far to the log file and/or standard output
Update interval in jmeter.properties to your needs ,1 hour, 3600 seconds:
summariser.interval=3600
You will get summary per hour of your requests.
You can try with Jmeter backend Listener. It has integration with graphite and Influxdb. After storing the results in these time series database you can display the result in Grafana dashboard. Grafana has its own filtering of showing the results in hourly, monthly, daily basis and so on.

Service invoked too many times: trigger

We are trying to implement a suite of spreadsheets that will handle budget figures for a set of stores. Everything works fine until we try to implement a spreadsheet that will collect data from all store spreadsheets and present statistics. Due to the limitation of ImportRange, of a maximum of 50 uses per spreadsheet doc, we have been implementing a Google docs script instead to handle the importing of data. But now when we have made a copy of the document to have one for each month, we are getting problems with our time triggers. We have setup a trigger to run the script once every minute and that results in an error message stating; Service invoked too many times: trigger.
What are the limitations here? And how do we best solve this?
We are also getting some other error messages and would like to know how to solve these;
Document tEHGO48zIBIFYRpb7Xhjwqg is missing (perhaps it was deleted?) (line 191)
Exceeded maximum execution time
Service error: Spreadsheets (line 290)
Where can we find documentation describing the different limitations and error messages?
Quota Limits for many services used with Google Apps Scripts have now been published on the Dashboard at:
https://docs.google.com/macros/dashboard
Just happened the same to me. It seems there is a non-published limit:
Premier accounts usually have larger quotas for every limitation.
The argument is that the account is better verified and less likely to exploit to resources.
But neither the regular limitations or Premier's better quotas are published by Google. And it seems that Googlers can't say it here in the forums either. The only well defined GAS limitation is the email quota, accessible through:
MailApp.getRemainingDailyQuota()
Which is 500 for regular accounts and 1500 for Premier.
Source: Google Support forums
Solutions are:
Join several scripts into one big trigger in case there is a limit in number of triggers
Optimize code (join loops, refresh only the necessary fields, etc.) in case it is based in CPU usage
Move minute timer triggers to OnEdit or OnOpen triggers whenever possible
Get a Premium Account
For your other errors, I haven't encountered any similar. You should post some details on the script or publish some code so we can debug it.