String Cleaning Function Producing Unexpected Output - data-cleaning

I am looking to clean the profile column of the following dataframe:
name profile
6 Pedro ["\n Design ...
7 Leonardo ["\n Design ...
8 Daniel ["\n JavaScript ...
9 Mario ["\n JavaScript ...
10 Christi ["\n Design ...
I've tested the following function on individual rows...
def clean_profile(row):
for index, row in new_df2["profile"].items():
str_row = str(row)
clean_row = (
'""'.join(str_row)
.replace(",", "")
.replace('""', "")
.replace("\\n ", "")
.replace(" ", "")
)
return clean_row
...and found it to transform this str:
'["\\n Design ","\\n Design "]'
to this cleaned string:
'["Design","Design"]'
(the extra replace methods are necessary to clean really messy strings like this one:)
'{"Tools "" Google Analytics ":null," Google Adsense ":null," MailChimp ":null," Google Adwords ","Containers "" Docker ","Digital "" SEO ":null," Email Marketing ":null," Article Writing ":null," Market Research ":null," Social Media ":null," Inbound Marketing ","*Nix "" Ubuntu ":null," Linux ","Java "" Java ","Python "" Django ":null," Python ":null," Flask ","Databases "" MySQL Management ":null," MongoDB Management ":null," PostgreSQL Management ","Visual "" Brand Design ":null," Graphic Design ":null," Logo Design ","HTML "" HTML ","Version Control "" Git ","PHP "" Laravel ":null," Wordpress ":null," PHP ":null," Symfony ","Mobile "" React Native ","Ruby "" Ruby ":null," Sinatra ":null," Rails ","Project Management "" Agile Methodology ":null," Client Management ":null," Scrum ","English "" Written English ":null," Spoken English ","Configuration Management "" Chef ","Webserver "" Nginx ":null," Apache ","CDN "" AWS CloudFront ":null," Cloudflare ","Other "" C++ ","Experience "" Creative Direction ":null," UI/UX Design ":null," Wireframing ","JavaScript "" JavaScript ":null," TypeScript ":null," Redux ":null," Angular JS ":null," Angular ":null," D3.js ":null," Node.js ":null," React ":null," Flux ":null," Express ","CSS "" SASS ":null," LESS ":null," CSS ","Hosting "" Heroku ":null," Digital Ocean ":null," AWS ","Automated Testing "" TDD ":null," Automated Testing ":null," BDD ":null," Jest ","Traditional "" Outbound Marketing ":null," Brand Strategy ","Data Science "" Data Science ":null," Data Analysis ":null," Machine Learning ":null," Data Visualization ":null," R ":null," Statistics "}'
When I loop through the all rows of the dataframe, I get either this repeated for all rows:
["JavaScriptDevOpsPHPJavaScriptDevOpsPHP"]
or this:
<function clean_profile at 0x0845CB20>
I've tried a few different things and nothing has worked...is anyone able to explain what's going on here and maybe suggest a better way of cleaning these strings?
Thank you!

It looks like you didn't declare an initial clean_row variable outside of the for loop, so your clean_row will always equal whatever string you cleaned last.
def clean_profile(row):
clean_row = "" //added this line
for index, row in new_df2["profile"].items():
str_row = str(row)
clean_row = (
'""'.join(str_row)
.replace(",", "")
.replace('""', "")
.replace("\\n ", "")
.replace(" ", "")
)
return clean_row
Additionally, I'd look into the strip function for your string cleaning. Here is a good example: https://www.geeksforgeeks.org/clean-the-string-data-in-the-given-pandas-dataframe/

Related

DB monitoring on NewRelic with rubyagent on ECS Fargate with Aurora serverless (PGSQL) not working

We’re using the New Relic rubyagent gem to monitor a rails 6.0 app, deployed on AWS ECS using RDS Serverless PGSQL as a database. However, the DB does not show up under external services or databases in New Relic. I'd like to monitor for slow transactions using New Relic's features around that. Any suggestion what is happening here?
config/newrelic.yml:
common: &default_settings
license_key: <%= ENV['NEWRELIC_LICENSE_KEY'] %>
agent_enabled: auto
app_name: my_app
monitor_mode: true
developer_mode: false
log_level: info
attributes:
include: job.sidekiq.args.*
browser_monitoring:
auto_instrument: false
audit_log:
enabled: false
transaction_tracer:
enabled: true
transaction_threshold: apdex_f
record_sql: raw
stack_trace_threshold: 0.500
explain_enabled: false
error_collector:
enabled: true
capture_source: true
ignore_errors: "ActionController::RoutingError,Sinatra::NotFound,ActiveRecord::RecordNotFound,CGI::Session::CookieStore::TamperedWithCookie,ActionController::UnknownAction,AbstractController::ActionNotFound,Mongoid::Errors::DocumentNotFound,Sinatra::NotFound,Sidekiq::Limiter::OverLimit"
development:
<<: *default_settings
monitor_mode: false
developer_mode: true
test:
<<: *default_settings
monitor_mode: false
production:
<<: *default_settings
monitor_mode: true
The New Relic Ruby agent and all the agents New Relic puts out there (background information I'm a 7 year New Relic employee who is a Product Manager for the APM UI but not the agent itself) have instrumentation built into them that if you query databases using specific libraries that the Ruby Agent supports, we track and time those interactions and in the case of Databases trace slow SQL, request explain plans etc.
https://docs.newrelic.com/docs/apm/agents/ruby-agent/getting-started/ruby-agent-requirements-supported-frameworks/#databases
The Ruby agent has a number of libraries it supports for Database interaction. Pull enhanced instance details when using specific versions
https://docs.newrelic.com/docs/apm/agents/ruby-agent/getting-started/ruby-agent-requirements-supported-frameworks/#instance_details
and for external HTTP requests (that aren't instrumented database requests that use these libraries
https://docs.newrelic.com/docs/apm/agents/ruby-agent/getting-started/ruby-agent-requirements-supported-frameworks/#http_clients
We show in the Externals section. This might help clarify why it's not appearing as expected. My best guess is you're using a library that NR currently doesn't instrument.
If I'm wrong and it's listed. Absolutely file a support ticket for our Tech Support team to investigate and if it's not listed, you can talk to your sales person if you're working for a company paying for NR to file a feature request to suggest support for such a library that you are using.
Also New Relic open sources our agents these days so if you wanted to write instrumentation for your preferred library and submit it you can find the agent code here
https://github.com/newrelic/newrelic-ruby-agent
Hope that's helpful

Is there an API for IBM Cloud PowerPC Virtual server instances - power-iaas?

I am looking to implement support for legacy AIX services that can scale.
What I would like is to create an AIX VM or at least start a VM programatically.
Is there an API for this.
I have found CLI at
https://cloud.ibm.com/docs/power-iaas-cli-plugin?topic=power-iaas-cli-plugin-power-iaas-cli-reference
I have checked https://cloud.ibm.com/docs?tab=api-docs
I have checked the requests made by the web page.
I see it uses cloud.ibm.com/graphql/doServer
with payload like
{"operationName":"doServer",
"variables":{"serverId":"SOME-ID","action":"start"}
"query":"mutation doServer($serverId: String!, $action: String!)
{\n PowerDoServer(serverId: $serverId, action: $action)
{\n success\n __typename\n }\n}\n"}
Thanks!
You could check the referenced API docs for IBM Cloud compute. There is a set of APIs dealing with Power Virtual Machines and it has a REST API to create a new instance.

SSHKit command or Capistrano task to filter/replace tokens on upload

I'm using Capistrano to deploy configuration files for a legacy non-Ruby application, that for arcane legacy reasons need to be parameterized with the fully-qualified name of the target host, e.g.
name: myservice-stg
identifier: myservice-stg-1.example.org:8675
baseURI: http://myservice-stg-1.example.org:8675
Apart from that, for a given environment, there's no difference between the config files, so I'd like to be able to just define a template (example uses Mustache but could be ERB or whatever):
name: myservice-stg
identifier: {{fqhn}}:8675
baseURI: http://{{fqhn}}:8675
My current idea for a hack is just to use gsub and a StringIO:
config_tmpl = File.open('/config/src/config.txt')
config_txt = config_tmpl.gsub('{{fqhn}}', host.hostname)
upload!(StringIO.new(config_txt), 'dest/config.txt')
But it seems like there ought to be a more standard, out-of-the box solution.
Tools like Ansible and Chef are great for this, but might be overkill if this is all you're trying to do.
Your proposed solution looks fairly standard. Using ERB (or other templating system) wouldn't be that much more work and provides flexibility/reusability down the road:
template_path = File.open('/config/src/config.txt.erb')
config_txt = ERB.new(File.new(template_path).read).result(binding)
upload! StringIO.new(config_txt), 'dest/config.txt', mode: 0644
ERB:
name: myservice-stg
identifier: <%= host.hostname %>:8675
baseURI: http://<%= host.hostname %>:8675

How to use alchemyAPI news data in Bluemix Node-RED?

I am using Bluemix environment and Node-RED flow editor. While trying to use the feature extract node that comes built-in Node-RED for the AlchemyAPI service, I am finding it hard to use it.
I tried connecting it to the HTTP request node, HTTP response node, etc, but no result. Maybe I am not completing the connections procedure correctly?
I need this code to get Twitter news and news using AlchemyAPI news data for specific companies and also give a sentiment score to and get store in IBM HDFS.
Here is the code:
[{"id":"8bd03bb4.742fc8","type":"twitter
in","z":"5fa9e76b.a05618","twitter":"","tags":"Ashok Leyland, Tata
Communication, Welspun, HCL Info,Fortis H, JSW Steel, Unichem Lab,
Graphite India, D B Realty, Eveready Ind, Birla Corporation, Camlin
Fine Sc, Indian Economy, Reserve Bank of India, Solar Power,
Telecommunication, Telecom Regulatory Authority of
India","user":"false","name":"Tweets","topic":"tweets","x":93,"y":92,"wires":[["f84ebc6a.07b14"]]},{"id":"db13f5f.f24ec08","type":"ibm
hdfs","z":"5fa9e76b.a05618","name":"Dec12Alchem","filename":"/12dec_alchem","appendNewline":true,"overwriteFile":false,"x":564,"y":226,"wires":[]},{"id":"4a1ed314.b5e12c","type":"debug","z":"5fa9e76b.a05618","name":"","active":true,"console":"false","complete":"false","x":315,"y":388,"wires":[]},{"id":"f84ebc6a.07b14","type":"alchemy-feature-extract","z":"5fa9e76b.a05618","name":"TrailRun","page-image":"","image-kw":"","feed":true,"entity":true,"keyword":true,"title":true,"author":"","taxonomy":true,"concept":true,"relation":"","pub-date":"","doc-sentiment":true,"x":246,"y":160,"wires":[["c0d3872.f3f2c78"]]},{"id":"c0d3872.f3f2c78","type":"function","z":"5fa9e76b.a05618","name":"To
mark tweets","func":"msg.payload={tweet:
msg.payload,score:msg.features};\nreturn
msg;\n","outputs":1,"noerr":0,"x":405,"y":217,"wires":[["db13f5f.f24ec08","4a1ed314.b5e12c"]]},{"id":"4181cf8.fbe7e3","type":"http
request","z":"5fa9e76b.a05618","name":"News","method":"GET","ret":"obj","url":"https://gateway-a.watsonplatform.net/calls/data/GetNews?apikey=&outputMode=json&start=now-1d&end=now&count=1&q.enriched.url.enrichedTitle.relations.relation=|action.verb.text=acquire,object.entities.entity.type=Company|&return=enriched.url.title","x":105,"y":229,"wires":[["f84ebc6a.07b14"]]},{"id":"53cc794e.ac3388","type":"inject","z":"5fa9e76b.a05618","name":"GetNews","topic":"News","payload":"","payloadType":"string","repeat":"","crontab":"","once":false,"x":75,"y":379,"wires":[["4181cf8.fbe7e3"]]}]
First you have to bind an Alchemy service instance to your node-red application.
Then you can develop your application, here is an example using the http and Feature Extract nodes:
Here is the node flow for this basic sample if you want to try:
[{"id":"e191029.f1e6f","type":"function","z":"2fc2a93f.d03d56","name":"","func":"msg.payload = msg.payload.url;\nreturn msg;","outputs":1,"noerr":0,"x":276,"y":202,"wires":[["12082910.edf7d7"]]},{"id":"12082910.edf7d7","type":"alchemy-feature-extract","z":"2fc2a93f.d03d56","name":"","page-image":"","image-kw":"","feed":"","entity":true,"keyword":true,"title":true,"author":true,"taxonomy":true,"concept":true,"relation":true,"pub-date":true,"doc-sentiment":true,"x":484,"y":203,"wires":[["8a3837f.f75c7c8","d164d2af.2e9b3"]]},{"id":"8a3837f.f75c7c8","type":"debug","z":"2fc2a93f.d03d56","name":"Alchemy Debug","active":true,"console":"true","complete":"true","x":736,"y":156,"wires":[]},{"id":"fb988171.04678","type":"http in","z":"2fc2a93f.d03d56","name":"Test Alchemy","url":"/test_alchemy","method":"get","swaggerDoc":"","x":103.5,"y":200,"wires":[["e191029.f1e6f"]]},{"id":"d164d2af.2e9b3","type":"http response","z":"2fc2a93f.d03d56","name":"End Test Alchemy","x":749,"y":253,"wires":[]}]
You can use curl to test it, for example:
curl -G http://yourapp.mybluemix.net/test_alchemy?url=<your url here>
or use your browser as well:
http://yourapp.mybluemix.net/test_alchemy?url=http://myurl_to_test_alchemy
You can see the results in the node-red debug tab or your can see it in application logs:
$ cf logs yourapp --recent

Amazon Simple Email Service - "From" email verification

I am signed up for AWS SES (with instance, S3 and my website running nicely). I also have rec'd approval for sending out email without receiver verification and "mass production" OK. The only thing I'm left with is having my 3 "from" email addresses verified. Started to download Perl, as was suggested to run email-verification scripts -- but got no where with the installation. Do have my credentials ready to use.
There is an AWS SES API to use for verification which I can't find... suspect that it has something to do with AWS's sdk which I could figure out how to install.
So my question: is there a simple, straight forward way to get my email addresses to Amazon for verification via a response email they send? Their documentation is somewhat confusing.
You have to go validate your email address through their web service (like the perl script is doing). You can also use their SDK's that they publish, which are wrappers around their web service. For example, if you have Visual Studio, you can use the AWS SDK for .NET (also available as a Nuget package: PM> Install-Package AWSSDKForNET) and set up a simple console application to do something like this:
using Amazon.SimpleEmail;
using Amazon.SimpleEmail.Model;
static class Program {
static void Main() {
var client = new AmazonSimpleEmailServiceClient("yourAccessKey", "yourSecretKey");
var request = new VerifyEmailAddressRequest { EmailAddress = "yourEmailAddress" };
client.VerifyEmailAddress(request);
}
}
They also have SDK's available in PHP and Java that work pretty much the same.
This is an old post but came up in a search.
You can now verify emails directly from the SES control panel (See screen shot)
AWS SES Panel