searching with Sphinx - sphinx

I'm at an impasse that probably has a simple solution, but I can't see it. I've done everything in the Sphinx documentation up to the point of the Quick Tour, but when I test the search using test.php in PuTTy, it returns zero results.
I've put in all my correct database info in sphinx.conf and I've assembled the SQL query. I'm not getting any errors at all, just that it says it's returning 0 results every time I search.
Is it looking at my databases? Let me know if you need to see any code. searchd is running (as far as I can tell).

Sphinx has 2 different phases:
1) Indexing
2) Searching
I belive from your question that you skipped by mistake part where you need to index data (run indexer) so searching would have data to search through. In indexing part sphinx will take all of data from your db and search will actually be searching that and not your DB.

Make sure that indexer --all showing that it found and indexed actual documents.
Besides the API there is another convenient method to test sphinx using SphinxQL
Add line "listen = 9306:mysql41" line in searchd section in sphinx.conf as described in http://astellar.com/2011/12/replacing-mysql-full-text-search-with-sphinx/ and start the daemon.
Then run
mysql -h0 -P 9306
and then fire the query against sphinx
SELECT * FROM <your_sphinx_index>;
Hope that helps!

Related

When testing POST (create mongo entries), how to delete entries in DB w/ Jmeter after testing, if you don't have DELETE endpoints?

I'm sure I can write an easy script that simply drops the entire collection from the database but that seems very clumsy as a long term solution.
Currently, we don't have delete endpoints that actually DELETE, we have PUT endpoints that mark the entry as "DONT SHOW/REMOVED" and another "undelete endpoint" that restores the viewing since we technically don't want to delete any data in our implementation of this medical database, for liability purposes.
Does Jmeter have a way where I can make it talk to Mongo and delete? I know there is a deprecated way to talk to mongo via Jmeter but not sure about any modern solutions.
Since I can't add unused code into the repo, does this mean the only solution is for me to make a "extra endpoint" outside of the repo that Jmeter can access to delete each entry?
Seems like a viable solution just not sure if that's the only way to go about it and if I'm missing something.
MongoDB Test Elements were deprecated due to low interest as keeping the MongoDB driver which is being shipped with JMeter up-to-date would require extra effort and the number of users of the MongoDB Test Elements was not that high.
Mailing List Message
Associated JMeter issue
However given you don't test MongoDB per se and plan to use JMeter MongoDB elements only for setup/teardown actions I believe you can go ahead.
You can get MongoDB test elements back by adding the next line to user.properties file:
not_in_menu
This will "unhide" MongoDB Source Config and MongoDB Script elements which you will be able to use for cleaning up the DB. See How to Load Test MongoDB with JMeter for more information, sample queries, tips and tricks.

Why are only empty rows copied when copying db from different server?

Getting my head around mongo atm. I am trying to copy a complete database from a server to my pc:
db.copyDatabase(fromdb, todb, fromhost)
The fromHost db contains 4 collections with rows in it. For some reason the local version of this db has all the collections but are empty:
db1 0.000GB
db2 0.000GB
What am I missing why are the rows empty?
Q: Why are the rows empty?
A: It looks like something went wrong.
If you haven't already, I would try db.getLastError() to see if there's any error message.
I would also look at this link:
How do I copy a database from one MongoDB server to another?
If you are using --auth, you'll need to include your username/password
in there...
Also you must be on the "destination" server when you run the command.
db.copyDatabase(<from_db>, <to_db>, <from_hostname>, <username>,
<password>);
If all that doesn't work, you might want to try something like
creating a slave of the database you want to copy ...
Finally, review the materials on the MongoDb "copyDatabase" man page:
https://docs.mongodb.org/manual/reference/method/db.copyDatabase/
Please post back with any additional details (e.g. error message).
And, if you get it working, please post back what was wrong, and how you fixed it!
Good luck!

Is it possible to use sphinxsearch with postgresql directly wihtout defining index for each table

I have a postgresql database and i want to use sphinx search heavily to get a lot of data from many tables (more than 30 table) , do i have to define index for each table or i can just define listen socket and it will work fine?
I tried the normal way which is define index for each table and it's working fine but i have to define the index for all tables!
I'm trying to define listen on the searchd section on sphinx.conf but it's not working.
No. Sphinx doesn't have 'auto-indexes'. They have to be created explicitly.
Frankly the variations are too many. What fields to include, what rows to include (eg exclude 'deleted' rows) etc. What attributes should be included . Too much to be deduced universally.
Having said that, the config file can be created by code. So the code knows how how you want each index to work, so just generates the config file automatically. But its probably only worth the trouble if your tables change regually.
I have implemented postgres trigger procedure, and python worker to feed Sphinx RT index. Take a look
https://github.com/narg/sphinx-search-feeder

Querying the Sphinx Search Index

I am using the Sphinx Search engine and I have an issue where a few files are not showing up in the search results and definitely should be. I have checked to make sure no info. is missing that would prevent these files from appearing.
Is there some way for me to query the index directly to see if these records are in there, or to see whether or not a specific record is there?
I found a similar post on the subject:
Sphinx Search Index
So, it appears it is possible to do, but that post is not detailed enough on how to do it. I am not following what exactly is going on in that post, in other words. Do I just put this directly into the command line?
Or is there a tutorial available on this? I searched and could not locate one.
Sphinx provides connection through mysql's protocol, so you can use any of mysql's clients to connect and execute queries:
http://dev.mysql.com/doc/refman/5.5/en/programs-client.html
If you will install command-line client, you should connect like this:
$ mysql -h0 -P9306
Sphinx supports custom subset of SQL, called SphinxQL, you can use it to query data from index. There is documentation about SphinxQL:
http://sphinxsearch.com/docs/latest/sphinxql-reference.html

Is there a PostgreSQL equivalent of SQL Server profiler?

I need to see the queries submitted to a PostgreSQL server. Normally I would use SQL Server profiler to perform this action in SQL Server land, but I'm yet to find how to do this in PostgreSQL. There appears to be quite a few pay-for tools, I am hoping there is an open source variant.
You can use the log_statement config setting to get the list of all the queries to a server
https://www.postgresql.org/docs/current/static/runtime-config-logging.html#guc-log-statement
Just set that, and the logging file path and you'll have the list. You can also configure it to only log long running queries.
You can then take those queries and run EXPLAIN on them to find out what's going on with them.
https://www.postgresql.org/docs/9.2/static/using-explain.html
Adding to Joshua's answer, to see which queries are currently running simply issue the following statement at any time (e.g. in PGAdminIII's query window):
SELECT datname,procpid,current_query FROM pg_stat_activity;
Sample output:
datname | procpid | current_query
---------------+---------+---------------
mydatabaseabc | 2587 | <IDLE>
anotherdb | 15726 | SELECT * FROM users WHERE id=123 ;
mydatabaseabc | 15851 | <IDLE>
(3 rows)
I discovered pgBadger (https://pgbadger.darold.net/) and it is a fantastic tool that saved my life many times. Here is an example of an
report.
If you open it and go to 'top' menu you can see the slowest queries and the time consuming queries.
Then you can ask details and see nice graphs that show you the queries by hour and if you use detail button you can see the SQL text in a pretty form. So I can see that this tool is free and perfect.
I need to see the queries submitted to a PostgreSQL server
As an option, if you use pgAdmin (on my picture it's pgAdmin 4 v2.1). You can observe queries via "Dashboard" tab:
Update on Jun, 2022. Answering to the questions in the comments.
Question 1: My long SQL query gets truncated, is there any workaround?
Follow steps below:
Close pgAdmin
Find postgresql.conf file. On my computer it is located in c:\Program Files\PostgreSQL\13\data\postgresql.conf. If you can't find it - read this answer for more details.
Open postgresql.conf file and find property called track_activity_query_size. As you see by default the value is 1024 that means - all queries bigger than 1024 symbols will be truncated. Uncomment this property and set a new value, for example:
track_activity_query_size = 32768
Restart PostgreSQL service on your computer
P.S: now everything is ready, but keep in mind that this change can slightly decrease the performance. From development/debugging standpoint you won't see any difference, but better don't forget to revert this property in 'production' environment. For more details read this article.
Question 2: I ran my function/method that triggers SQL query but I still can't see it in pgAdmin, or sometimes I see it but it runs so quickly so I can't even expand the session on 'Dashboard' tab?
Answer: Try to run your application in 'debug' mode and set a breakpoint right before you close the connection to the database. At the same time (while you debugging) click on 'refresh' button on 'Dashboard' tab in pgAdmin.
You can use the pg_stat_statements extension.
If running the db in docker just add this command in docker-compose.yml, otherwise just look at the installation instructions for your setup:
command: postgres -c shared_preload_libraries=pg_stat_statements -c pg_stat_statements.track=all -c max_connections=200
And then in the db run this query:
CREATE EXTENSION pg_stat_statements;
Now to see the operations that took more time run:
SELECT * FROM pg_stat_statements ORDER BY total_time/calls DESC LIMIT 10;
Or play with other queries over that view to find what you are looking for.
All those tools like pgbadger or pg_stat_statements require access to the server, and/or altering the server-settings/server-log-settings, which is not such a good idea, especially if it requires server-restart and because logging slows everything down, including production use.
In addition to that, extensions such as pg_stat_statements don't really show the queries, let alone in chronological order, and pg_stat_activity doesn't show you anything that doesn't run right now, and in addition, queries that are running that are from other users than you.
Instead of running any such crap, you can add a TCP-proxy in between your application and PostgreSQL-server.
Then your TCP-proxy reads all the sql-query-statements from what goes over the wire from your application to the server, and outputs it to console (or wherever). Also it forwards everything to PostgreSQL and returns the answer(s) to your application.
This way, you don't need to stop/start/restart your db-server, you don't need admin/root rights !ON THE DB-SERVER! to change the config file, and you don't need any access to the db-server. All you need to do is change the db connection string in your application (e.g. in your dev-environment) to point to the proxy server instead of the sql-server (the proxy-server then needs to point to the sql-server). Then you can see (in chronological order) what your <insert_profanity_here> application does on the database - and also, other people's queries don't show up (which makes it even better than sql-server-profiler). [Of course, you can also see what other people do if you put it on the db server on the old db port, and assing the db a new port. ]
I have implemented this with pg_proxy_net
(runs on Windows, Linux and Mac and doesn't require OS-dependencies, as it is .NET-Core-self-contained-deployment).
That way, you get appx. "the same" as you get with sql-server profiler.
Wait, if you aren't disturbed by other people's queries, what you get with pg_proxy_net is actually better than what you get with sql-server profiler.
Also, on github, I have a command-line MS-SQL-Server profiler that works on Linux/Mac.
And an GUI MS-SQL-Express-Profiler for Windows.
The funny thing is, once you have written one such tool, writing some more is just a piece of cake and done in under a day.
Also, if you want to get pg_stat_statements to work, you need to alter the config file (postgresql.conf), adding tracking and preloading libraries, and then restart the server:
CREATE EXTENSION pg_stat_statements;
-- D:\Programme\LessPortableApps\SQL_PostGreSQL\PostgreSQLPortable\Data\data\postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
You find the documentation for the PostgreSQL protocol here:
https://www.postgresql.org/docs/current/protocol-overview.html
You can see how the data is written into the TCP-buffer by looking at the source code of a postgresql-client, e.g. FrontendMessages of Npgsql on github:
https://github.com/npgsql/npgsql/blob/main/src/Npgsql/Internal/NpgsqlConnector.FrontendMessages.cs
Also, just in case you have a .NET application (with source code) that uses Npgsql, you might want to have a look at Npgsql.OpenTelemetry.
PS:
To configure the logs, see ChartIO Tutorial and TablePlus.
Cheers !
Happy "profiling" !