How can you debug stored javascript functions in MongoDB? - mongodb

I'm thinking of moving some workflow logic from C# code to stored JS in MongoDB (for example, wen a user sends a message, a bunch or records is to be created in different collections, which right now I do in C#), but I'm concerned whether I would be able to debug that JS code if things don't work correctly.

There isn't a particular facility for that. One thing you could do is run some of that code in the mongo shell, which can execute exactly the same javascript as the server. The shell doesn't have a debugger but with its interactive prompt it would be much easier to try things, inspect variables, etc.
Personally I would not necessarily recommend moving code into the server. Note it is possible to send several write operations (such as inserts) in a row and then after sending several ask for a single acknowledgement. Thus that scenario is not necessarily slow even if there is some nontrivial network latency.
Alternatively you could run C# code on the same server as the mongod process and thereby get extremely low latency on turnarounds of requests. One way to do that would be to make a web server that is written in C# and encapsulates the logic suggested above.

I guess you can write some debug information into separate collection and see how things are going, but it seems to me that actual debugging is not possible.

Related

Why it's not enough to prevent SQL injection from front-end?

All most all the resources about preventing SQL injection are talking about preventing it from fron-end and back-end ,with database level. Why do we need to do all those things?
Is it not enough to do it from front end, by just preventing the user from sending malicious SQL codes as inputs.
Because most client-side code can be bypassed since it executes on the client's machine. Basically any code that protects against bad input on the client-side is there to provide better feedback for an honest user and also to reduce low-hanging fruit type of attacks.
The back-end code is there to make sure any malicious user who bypassed your front-end security (with a crafted http request or w/e) will not be able to inject bad input into one of your SQL dynamic query. This is usually achieved by sanitizing input on the back-end and using parameterized queries.
Just because you button up your front-end doesn't guarantee SQL Injection safety. All the front end does is show pretty things to the user. The back end is where all the work is done and because the front-end must talk to the back-end in some way means you have a potential security issue.
I don't know if your application will be Winforms or a Web application, but that doesn't matter. I can use a program such as Process Explorer to manipulate the data that gets sent to your back-end, if its a Windows application.
If its a web application, then, similarly, I can use a tool such as Fiddler to manipulate the data that gets sent to your back end.
Moral of the story is always button up your back end and never let your back end assume that the data its getting from the front end is hunky dory!
Defense in-depth is a really, really good thing. Consider this, your app takes values as parameters to a query or perhaps even take user input to form a query. You do the right thing at the app level to correctly escape the input so injection attempts do nothing there and data is safely read or written to the database. Now, what if
the data that is written to the database itself is malicious code? The next stored procedure that reads from the table may now be executing random code.
the application code passes the "safe" data to the backend which is then used in a stored procedure or function (e.g. de-serialize, cast, etc...). Once again, you could be executing malicious code.
You could argue that instead of escaping the input, you could parse the input at the app level to strip/reject certain value, strongly type, regex everywhere, etc... but there are many situations where these restriction cannot be implemented because the app is intended to support free-flow text that may legitimately have suspicious looking characters especially if you support international character sets. (E.g. names, descriptions, notes, etc...).
Finally, do/should/can DBAs really count on the app or app dev to get everything right every time?

Share a MongoDB instance between Meteor apps without lag in reactivity?

This question has been asked multiple times, here and here, and the answer to get this working is fairly straight forward: add an environmental variable to your bash_profile and all Meteor instances on your localhost will share that MONGO_URL.
What I've noticed however is that while this may be the case, there's quite a bit of latency in the "reactivity" of Meteor. I've tested this with two very lean Meteor apps, with empty collections. Inserting a document to a collection from one Meteor app, where my second app is querying that same collection and printing out a field from the documents does work, but there's a noticeable lag before it updates. I've ruled out the possibility of the collection insertion being the source of the lag (simple console.log callback on the client of the first app, logging the id of the newly inserted document).
My purpose for having multiple apps (two to be precise) sharing the same MongoDB is to separate an admin panel from a mobile app without going crazy regarding name-spacing and bloat. This configuration works, but I'm not sure it's the "proper" way of accomplishing the task, and it certainly seems to be causing a performance hit.
Any insight into this matter would be appreciated. Thank you!
EDIT: To clarify, the db URL I'm using is on my localhost, and isn't something hosted online.
When you use an external database, by default meteor will use periodic polling (every few seconds) in order to observe any changes. The delay you are experiencing is a result of this polling process. You can remove the delay and reduce your app's CPU usage by taking advantage of meteor's oplog tailing feature. In order to use it you will:
Get access to a mongodb instance with the oplog turned on.
Set the environment variable MONGO_OPLOG_URL so your app(s) can read the oplog.
Personally, I'd recommend compose.io for this. They provide exactly this as part of their basic elastic deployment. See this post for detailed instructions.
For users who wish to connect to the oplog created locally for you, you can obtain the URL via:
MongoInternals.defaultRemoteCollectionDriver().mongo._oplogHandle._oplogUrl
It should end up looking something like mongodb://127.0.0.1:3001/local

What are the limitations of the flask built-in web server

I'm a newbie in web server administration. I've read multiple times that flask built-in web server is not designed for "production", and must be used only for tests and debug...
But what if my app touchs only a thousand users who occasionnaly send data to the server ?
If it works, when will I have to bother with the configuration of a more sophisticated web server ? (I am looking for approximative metrics).
In a nutshell, I would love to find what the builtin web server can do (with approx thresholds) and what it cannot.
Thanks a lot !
There isn't one right answer to this question, but here are some things to keep in mind:
With the right amount of horizontal scaling, it is quite possible you could keep scaling out use of the debug server forever. When exactly you would need to start scaling (or switch to using a "real" web server) would also depend on the environment you are hosting in, the expectations of the users, etc.
The main issue you would probably run into is that the server is single-threaded. This means that it will handle each request one at a time, serially. This means that if you are trying to serve more than one request (including favicons, static items like images, CSS and Javascript files, etc.) the requests will take longer. If any given requests happens to take a long time (say, 20 seconds) then your entire application is unresponsive for that time (20 seconds). This is only the default, of course: you could bump the thread counts (or have requests be handled in other processes), which might alleviate some issues. But once again, it can still be slow under a "high" load. What is considered a "high" load will be dependent on your application and the expectations of a maximum acceptable response time.
Another issue is security: if you are concerned at ALL about security (and not just the security of the data in the application itself, but the security of the box that will be running it as well) then you should not use the development server. It is not ready to withstand any sort of attack.
Finally, the development server could just fail outright. It is not designed to be used as a long-running process (days, weeks, months), and so it has not been well tested to work in this capacity.
So, yes, it has limitations. Yes, you could still conceivably use it in production. And yes, I would still recommend using a "real" web server. If you don't like the idea of needing to install something like Apache or Nginx, you can still go with a solution that is still as easy as "run a python script" by using some of the WSGI Standalone servers, which can run a server that is designed to be in production with something just as simple as running python run_app.py in the command line. You typically just need to create a 4-5 line python script to import and create the server object, point it to your Flask app, and run it.
gunicorn could be run with only the following on the command line, no extra script needed:
gunicorn myproject:app
...where "myproject" is the Python package that contains the app Flask object. Keep in mind that one of developers of gunicorn would probably recommend against this approach. See https://serverfault.com/questions/331256/why-do-i-need-nginx-and-something-like-gunicorn.
The OP has long-since moved on, but for those who encounter this question in the future I would just add that setting up an Apache server, even on a laptop, is free and pretty easy. It can be readily configured for as few or as many features as you want just by uncomment in or commenting out lines in the config file. There might be an even easier GUI method for doing that nowdays, but just editing the configs is simple.

How can I communicate across Perl CGI scripts?

I am searching for efficient ways of communication across two Perl
scripts. I have two scripts; Script 1 generates some data. I want my
Script 2 to be able to access that information.
The easiest/dumbest
way is to write the data generated by Script 1 as a file and read it
later using Script 2. Is there any other way than this? Can I store
the data in memory and make it available to Script 2 (of course with
support from my Linux )? Meaning malloc some data by Script 1 and make
Script 2 able to access it.
There is no guarantee that Script 2 will be run after Script 1. So
there should be some way to free that memory using a watchdog timer.
Let me reveal some more context. I am running these scripts on a web-server using CGI-Perl. So at the click of a button Script 1 is run and it generates a html web-page. Now the user can add some inputs to to this generated web-page and click a button on this new page.Now Script 2 should be able to read the data on new web-page.I can post the data back to web-server again but a more efficient way is to keep a copy of generated page in server also and make it available to script 2. Now, I would like to avoid writing down the generated page as a file. I was thinking of storing it in memory
This depends somewhat on your usage... one large set of data? Many small messages? Di you canre at all about data persistance? Is it TOTALLY asynchronous?
Some of the options are:
For any but the most high performace web sites, the best approach is to write our the HTML pages to files!. Unless the intrer-process communication is benchmarked to be the botttleneck in performance, don't both with any of the non-file solutions (shared memory, cache, intermediate server).
Specifically for two CGI scripts on the same server, if you run them under mod_perl or some other arrangement which shares Perl interpreter between 2 CGI processes, you can develop a package to serve as cache, which -with its package level variable - would be preserved in memory by mod_perl as long as mod_perl is running and can thus be used by a writer CGI process and a reader CGI process to communicate. Of course the usual synchronization/deadlock and persistance issues associated with reader/writer need to be considered.
As an alternative, use Apache::Session sessions to store inter-session data.
As you noted, shared memory. For example use IPC::ShareLite, IPC::Cache, or this solution from perlmonks.
Also, please check Chapter 16 Recipe 12 "Sharing Variables in Different Processes" from O'Reilly's "Perl Cookbook" (no link since non-pirated versions aren't online anywhere I know of)
Use a permanent medium. A file is one option. A database is another.
For async, use an intermediate messaging system (MQ, Tibco, something more lightweight). Probably a bit of an overkill in this scenario but a valid option to be aware of. This one is likely to be pretty stablem solid and optmized, but possibly not free and less flexible/tailored.
Or roll your own simple messaging system server - it's not THAT complicated for very simple one you seem to need.
Listen on one port for requests from first process to store data, listen on another port for requests from consumer process to send you that data, store the data in a storage area in memory and purge it when it expires using alarms or separate watcher child process).
You've tagged your question as "cgi". Are they both CGI programs? In that case, they can just talk to each other by making HTTP requests.
However, you'll have to tell a lot more about why you are trying to do this and what you need to accomplish for us to help you. It's certainly easy for Perl programs to communicate with each other in some fashion, but that doesn't mean it's the right answer for you.
When you have complex requirements for interaction among CGI programs, you probably want to move to a web framework that handles a lot of those details for you. Catalyst might be where'd you want to start. There's even a book for it.

How to limit the effect of client modifications to production systems

Our shop has developed a few WEB/SMS/DB solution for a dozen client installations. The applications have some real-time performance requirements, and are just good enough to function properly. The problem is that the clients (owners of the production servers) are using the same server/database for customizations that are causing problems with the performance of the applications that we created and deployed.
A few examples of clients' customizations:
Adding large tables with many text datatypes for the columns that get cast to other data types in the queries
No primary keys, indexes, or FK constraints
Use of external scripts that use count(*) from table where id = x, in a loop from the script, to determine how to construct more queries later in the same script. (no bulk actions that the planner can optimize or just do everything in a single pass)
All new code files on the server are created/owned by root, with 0777 permissions
The clients don't take suggestions/criticism well. If we just go ahead and try to port/change the scripts ourselves, the old code can come back, clobbering any changes that we make! Or with out limited knowledge of their use cases, we break functionality while trying to optimize their changes.
My question is this: how can we limit the resources to queries/applications other that what we create and deploy? Are there any pragmatic options in scenarios like this? We prided ourselves in having an OSS solution, but it seems that it's become a liability.
We use PG 8.3 running on a range on Linux Distos. The clients prefer php, but shell scripts, perl, python, and plpgsql are all used on the system in one form or another.
This problem started about two minutes after the first client was given full access to the first computer, and it hasn't gone away since. Anytime someone whose priorities are getting business oriented work done quickly they will be sloppy about it and screw up things for everyone. That's just how things work, because proper design and implementation are harder than cheap hacks. You're not going to solve this problem, all you can do is figure out how to make it easier for the client to work with you than against you. If you do it right, it will look like excellent service rather than nagging.
First off, the database side. There's now way to control query resources in PostgreSQL. The main difficulty is that tools like "nice" control CPU usage, but if the database doesn't fit in RAM it may very well be I/O usage that is killing you. See this developer message summarizing the issues here.
Now, if in fact it's CPU the clients are burning through, you can use two techniques to improve that situation:
Install a C function that changes the process priority (example 1, example 2) and make sure whenever they run something it gets called first (maybe put it into their psql config file, there are other ways).
Write a script that looks for postmaster processes spawned by their userid and renice them, make it run often in cron or as a daemon.
It sounds like your problem isn't the particular query processes they're running, but rather other modifications they're making to the larger structure. There's only one way to cope with that: you have to treat the client like they're an intruder and use the approaches of that portion of the computer security field to detect when they screw things up. Seriously! Install an intrusion detection system like Tripwire on the server (there are better tools, that's just the classic example), and have it alert you when they touch anything. New file that's 0777? Should jump right out of a proper IDS report.
On the database side, you can't directly detect the database being modified usefully. You should do a pg_dump of the schema every day into a file (pg_dumpall -g and pg_dump -s, then diff that against the last one you delivered and again alert you when it's changed. If you manage that this well, the contact with the client turns into "we noticed you changed on the server...what is it you're trying to accomplish with that?" which makes you look like you're really paying attention to them. That can turn into a sales opportunity, and they may stop fiddling with things as much just knowing you're going to catch it immediately.
The other thing you should start doing immediately is install as much version control software as you can on each client box. You should be able to login to each system, run the appropriate status/diff tool for the install, and see what's changed. Get that mailed to you regularly too. Again, this works best if combined with something that dumps the schema as a component to what it manages. Not enough people use serious version control approaches on the code that lives in the database.
That's the main set of technical approaches useful here. The rest of what you've got is a classic consulting client management problem that's far more of a people problem than a computer one. Cheer up, it could be worse--FSM help you if you give them ODBC access and they discover they can write their own queries in Access or something simple like that.