Use Cygnus to store historical data from Orion ContextBroker in a local Hadoop database

Use Cygnus to store historical data from Orion ContextBroker in a local Hadoop database - fiware-orion

We are currently working in a project where we use Orion ContextBroker to store information from different sensors and Wirecloud to show them in a web page.
We want to store historical data from these sensors in order to show them in a graph. I have looked around the Fiware documentation and they recommend to store the data in a Cosmos instance of Fi-lab, through Cygnus.
The thing is that we would like to store that historical data in a local Hadoop based server we have in our company, not in Cosmos, because we are running this project in a local net where we don't have internet access, and also to have that information stored in our local server.
Is it possible to configure Cygnus to redirect the output data to my file system? If so, which files must be configured in order to achieve this?
Thank you

The answer is yes. Cygnus is meant to persist context data in whatever HDFS-based filesystem (as the one used by Cosmos), thus nothing special has to be done when configuring Cygnus.
If you download the lastest version (0.7.0 at the moment of writting this), you will need to configure:
A cygnus_instance_default.conf file from cygnus_instance.conf.template. This is the instance configuration. From 0.7.1 is possible to have multiple instance configurations that are run in a parallel way, and they all have to called cygnus_instance_<whatever>.conf.
A agent.conf file from agent.conf.template. This is the Flume specific configuration that you will find in the README.md.

Related

How to monitor over 500+ servers using Grafana from SQL server as data source

Currently we're monitoring our SQL servers running in Windows platform via MS SQL server reporting services using shared data sources. To confirm what I mean, we don't store data at centralized server to monitor over 500 target servers. We keep monitoring data on local SQL database servers and use shared data source in SSRS to create dashboards.
Now in our firm we're encouraged to use Grafana as dashboard since they have purchased or running some Grafana server licensing. What I know of Grafana instance is that it can be given to us to monitor SQL servers as described above.
My question is how would Grafana dynamically connect to those 500 plus servers? I see it creates data source once but how will I change or create multiple data sources when I have around 1000 servers to monitor?
Please suggest guide.

You may have to code a bit and use data source provisioning and/or Grafana datasource API for it to pickup the new data source.
If you could set up a system (user-data/ init script/IaC) where this API is called everytime a new server comes up, then you will be able to maintain the data sources without maintainance.

Mirth Connect send old messages when changing server

I have a Mirth application installed in Ubuntu server. I try to move the application from one server to another server (DRC server). When I moved the application, somehow the Mirth keep sending old messages to the channel.
The source of sending channel is using Database Reader and connecter type for destinations is using TCP Sender. Im using Mirth Connect version 3.5.2
Does anyone know why this is happening. Is there any log files that I need to clear when moving the application from one server to another?

This can happen for several reasons. Application logic, queued messages. My guess is you moved appdata directory along with installation, if so you must be seeing similar stats from where you moved.
Mirth stores all channels information, transactions etc. by default under appdata folder. If you are using default settings it'll use derby db. You can connect to that DB with any DB client support JDBC. i.e.
SQuirelL or DB Visualizer and that can give you an idea what's happening.
I recommend you to make a clear setup. Then, you can export/import your channels into your new environment. You can also consider using any other DB, oracle/sqlserver/mysql.. for Mirth. Current version is 3.9.10 and it has better support for DBs other than derby.
As mentioned in the comments your application logic also matters.

How to take backup of Tableau Server Repository(PostgreSQL)

we are using 2018.3 version of Tableau Server. The server stats like user login, and other stats are getting logged into PostgreSQL DB. and the same being cleared regularly after 1 week.
Is there any API available in Tableau to connect the DB and take backup of data somewhere like HDFS or any place in Linux server.
Kindly let me know if there are any other way other than API as well.
Thanks.

You can enable access to the underlying PostgreSQL repository database with the tsm command. Here is a link to the documentation for your (older) version of Tableau
https://help.tableau.com/v2018.3/server/en-us/cli_data-access.htm#repository-access-enable
It would be good security practice to limit access to only the machines (whitelisted) that need it, create or use an existing read-only account to access the repository, and ideally to disable access when your admin programs are complete (i.e.. enable access, do your query, disable access)
This way you can have any SQL client code you wish query the repository, create a mirror, create reports, run auditing procedures - whatever you like.
Personally, before writing significant custom code, I’d first see if the info you want is already available another way, in one of the built in admin views, via the REST API, or using the public domain LogShark or TabMon systems or with the Addon (for more recent versions of Tableau) the Server Management Add-on, or possibly the new Data Catalog.
I know at least one server admin who somehow clones the whole Postgres repository database periodically so he can analyze stats offline. Not sure what approach he uses to clone. So you have several options.

Is there a way to upload data from FTP server to Amazon S3? [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
Is there a way to connect to an Amazon S3 bucket with FTP or SFTP rather than the built-in Amazon file transfer interface in the AWS console? Seems odd that this isn't a readily available option.

There are three options.
You can use a native Amazon Managed SFTP service (aka AWS Transfer for SFTP), which is easier to set up.
Or you can mount the bucket to a file system on a Linux server and access the files using the SFTP as any other files on the server (which gives you greater control).
Or you can just use a (GUI) client that natively supports S3 protocol (what is free).
Managed SFTP Service
In your Amazon AWS Console, go to AWS Transfer for SFTP and create a new server.
In SFTP server page, add a new SFTP user (or users).
Permissions of users are governed by an associated AWS role in IAM service (for a quick start, you can use AmazonS3FullAccess policy).
The role must have a trust relationship to transfer.amazonaws.com.
For details, see my guide Setting up an SFTP access to Amazon S3.
Mounting Bucket to Linux Server
Just mount the bucket using s3fs file system (or similar) to a Linux server (e.g. Amazon EC2) and use the server's built-in SFTP server to access the bucket.
Install the s3fs
Add your security credentials in a form access-key-id:secret-access-key to /etc/passwd-s3fs
Add a bucket mounting entry to fstab:
<bucket> /mnt/<bucket> fuse.s3fs rw,nosuid,nodev,allow_other 0 0
For details, see my guide Setting up an SFTP access to Amazon S3.
Use S3 Client
Or use any free "FTP/SFTP client", that's also an "S3 client", and you do not have setup anything on server-side. For example, my WinSCP or Cyberduck.
WinSCP has even scripting and .NET/PowerShell interface, if you need to automate the transfers.

Update
S3 now offers a fully-managed SFTP Gateway Service for S3 that integrates with IAM and can be administered using aws-cli.
There are theoretical and practical reasons why this isn't a perfect solution, but it does work...
You can install an FTP/SFTP service (such as proftpd) on a linux server, either in EC2 or in your own data center... then mount a bucket into the filesystem where the ftp server is configured to chroot, using s3fs.
I have a client that serves content out of S3, and the content is provided to them by a 3rd party who only supports ftp pushes... so, with some hesitation (due to the impedance mismatch between S3 and an actual filesystem) but lacking the time to write a proper FTP/S3 gateway server software package (which I still intend to do one of these days), I proposed and deployed this solution for them several months ago and they have not reported any problems with the system.
As a bonus, since proftpd can chroot each user into their own home directory and "pretend" (as far as the user can tell) that files owned by the proftpd user are actually owned by the logged in user, this segregates each ftp user into a "subdirectory" of the bucket, and makes the other users' files inaccessible.
There is a problem with the default configuration, however.
Once you start to get a few tens or hundreds of files, the problem will manifest itself when you pull a directory listing, because ProFTPd will attempt to read the .ftpaccess files over, and over, and over again, and for each file in the directory, .ftpaccess is checked to see if the user should be allowed to view it.
You can disable this behavior in ProFTPd, but I would suggest that the most correct configuration is to configure additional options -o enable_noobj_cache -o stat_cache_expire=30 in s3fs:
-o stat_cache_expire (default is no expire)
specify expire time(seconds) for entries in the stat cache
Without this option, you'll make fewer requests to S3, but you also will not always reliably discover changes made to objects if external processes or other instances of s3fs are also modifying the objects in the bucket. The value "30" in my system was selected somewhat arbitrarily.
-o enable_noobj_cache (default is disable)
enable cache entries for the object which does not exist. s3fs always has to check whether file(or sub directory) exists under object(path) when s3fs does some command, since s3fs has recognized a directory which does not exist and has files or subdirectories under itself. It increases ListBucket request and makes performance bad. You can specify this option for performance, s3fs memorizes in stat cache that the object (file or directory) does not exist.
This option allows s3fs to remember that .ftpaccess wasn't there.
Unrelated to the performance issues that can arise with ProFTPd, which are resolved by the above changes, you also need to enable -o enable_content_md5 in s3fs.
-o enable_content_md5 (default is disable)
verifying uploaded data without multipart by content-md5 header. Enable to send "Content-MD5" header when uploading a object without multipart posting. If this option is enabled, it has some influences on a performance of s3fs when uploading small object. Because s3fs always checks MD5 when uploading large object, this option does not affect on large object.
This is an option which never should have been an option -- it should always be enabled, because not doing this bypasses a critical integrity check for only a negligible performance benefit. When an object is uploaded to S3 with a Content-MD5: header, S3 will validate the checksum and reject the object if it's corrupted in transit. However unlikely that might be, it seems short-sighted to disable this safety check.
Quotes are from the man page of s3fs. Grammatical errors are in the original text.

Answer from 2014 for the people who are down-voting me:
Well, S3 isn't FTP. There are lots and lots of clients that support S3, however.
Pretty much every notable FTP client on OS X has support, including Transmit and Cyberduck.
If you're on Windows, take a look at Cyberduck or CloudBerry.
Updated answer for 2019:
AWS has recently released the AWS Transfer for SFTP service, which may do what you're looking for.

Or spin Linux instance for SFTP Gateway in your AWS infrastructure that saves uploaded files to your Amazon S3 bucket.
Supported by Thorntech

Amazon has released SFTP services for S3, but they only do SFTP (not FTP or FTPES) and they can be cost prohibitive depending on your circumstances.
I'm the Founder of DocEvent.io, and we provide FTP/S Gateways for your S3 bucket without having to spin up servers or worry about infrastructure.
There are also other companies that provide a standalone FTP server that you pay by the month that can connect to an S3 bucket through the software configuration, for example brickftp.com.
Lastly there are also some AWS Marketplace apps that can help, here is a search link. Many of these spin up instances in your own infrastructure - this means you'll have to manage and upgrade the instances yourself which can be difficult to maintain and configure over time.

WinSCp now supports S3 protocol
First, make sure your AWS user with S3 access permissions has an “Access key ID” created. You also have to know the “Secret access key”. Access keys are created and managed on Users page of IAM Management Console.
Make sure New site node is selected.
On the New site node, select Amazon S3 protocol.
Enter your AWS user Access key ID and Secret access key
Save your site settings using the Save button.
Login using the Login button.

Filezilla just released a Pro version of their FTP client. It connects to S3 buckets in a streamlined FTP like experience. I use it myself (no affiliation whatsoever) and it works great.

As other posters have pointed out, there are some limitations with the AWS Transfer for SFTP service. You need to closely align requirements. For example, there are no quotas, whitelists/blacklists, file type limits, and non key based access requires external services. There is also a certain overhead relating to user management and IAM, which can get to be a pain at scale.
We have been running an SFTP S3 Proxy Gateway for about 5 years now for our customers. The core solution is wrapped in a collection of Docker services and deployed in whatever context is needed, even on-premise or local development servers. The use case for us is a little different as our solution is focused data processing and pipelines vs a file share. In a Salesforce example, a customer will use SFTP as the transport method sending email, purchase...data to an SFTP/S3 enpoint. This is mapped an object key on S3. Upon arrival, the data is picked up, processed, routed and loaded to a warehouse. We also have fairly significant auditing requirements for each transfer, something the Cloudwatch logs for AWS do not directly provide.
As other have mentioned, rolling your own is an option too. Using AWS Lightsail you can setup a cluster, say 4, of $10 2GB instances using either Route 53 or an ELB.
In general, it is great to see AWS offer this service and I expect it to mature over time. However, depending on your use case, alternative solutions may be a better fit.

talend , mongoDB connection

I am facing a problem with mongo DB connection.
I have succefully imported tMongo components it to my Talend Open Studio 5.1.1 and by copying the mongo 1.3.jar file to lib/java folder, my Mongo DB jobs are running successfully, but the problem is even if I provide some fake server path(IP) and fake port for mongoDB, my job is running without an error and it is giving me 1 row with no data. and same goes with right IP and port.
How do I resolve it.

I think the connection is not working. As you must be knowing, mongoDB checks that the connection is actually working or not when you perform a query on it.
(Yeah, it doesn't check for a successful connection when you just connect to it ).
I would suggest to instead add the mongoDB components present in Talend for Big Data by following the steps below:
Components provided for MongoDB are :
tMongoDBInput, tMongoDBOutput, tMongoDBConnection etc.
Or you can Download the components from http://www.talendforge.org/exchange/ and search for Mongo instead of using Talend Big Data. But I would suggest use Talend for big Data for it.
The components will be zipped format , Unzip the same. In Talend Big data you will find the components in Component folder.
Copy these Unzipped Components to the installation Path of TOS.
C:TalendTOS_DI-Win32-r84309V5.1.1pluginsorg.talend.designer.components.localprovider_5.1.1.r84309components
Copy the mongo-1.3.jar file in the component folder into the C:TalendTOS_DI-Win32-r84309-V5.1.1libjava
In many systems you might not be able to see this file then go with ADMINISTRATOR priviliges.
optional for few systems——>>> Inside index.xml add
save index.xml
Restart TOS
Then you will be able to use them as normal components.
Cheers!

The reason for the Job running without any error could be due to the connection / meta-data you have used for the Mongo Connector. It doesn't is not possible for the job to run without any error even after giving fakepath.
I guess you might configured (re-modified) the repository connection but using a built-in meta data for component.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse