sphinx multiple searchd on same port - sphinx

I am working on Sphinx search.
I have made different indexes for different data.
My problem is ,i have different searchd services for different indexes.
But i want to run these all serivces on same port.Is it possible to do this in Sphinx?

Each Sphinx daemon could serve multiple indexes. You just need to put all sources and indexes definition to single configuration file and make sure indexes has different names.
Quick example:
source src1
{
...
}
source src2
{
...
}
index sphinx_index1
{
source = src1
path = <sphinx_path>/sphinx_index1
...
}
index sphinx_index2
{
source = src2
path = <sphinx_path>/sphinx_index2
...
}
Hope this helps

Related

Get file content on Github with GraphQL API and Python

Given a list of repositories on GitHub with 'repoName' and 'userName', I need to find all the '.java' files, and get the content of the java files. Tried to use RestAPI first, but ran into the rate limit very easily, now I'm switching to GraphQL API, but don't know how to achieve that.
Here is how I would do it:
Algo Identify_java_files
Entry: path the folder
Out: java files in the folder
Identify all files in the folder of the repository
For each file
if the type of the file is "blob":
if ".java" is the end of the name of the file
get its content
else if the type of the file is "tree":
Identify_java_files(path of the tree file)
You can easily implement this pseudo code using Python. My pseudo code makes the assumption to use recursion, but it can be done otherwise, it's just for the example. You will need the requests and json libraries.
Here are the queries you might need, and you can use the explorer to test them.
{
repository(name: "checkout", owner: "actions") {
defaultBranchRef {
name
}
}
}
This query allows you to check the name of the default branch of the repository. You will need it for the following queries, or you can use a specific branch but you will have to know its name. I don't know (and don't believe) if you can get all the branches names for a repository.
{
repository(name: "checkout", owner: "actions") {
object(expression: "main:") {
... on Tree {
entries {
path
type
}
}
}
}
}
This query gets the content of the root folder for a specific repository. The expression: "main:" refers to the branch of the repository along with the path. Here the branch is main and the path is empty (it comes after the ":"), meaning we are looking at the root folder. Some repositories are using "master" as default branch, so be sure of which branch to use.
To check the content of a file, you can use this accepted answer.
I updated the example in order for you to be able to try it.
{
repository(name: "checkout", owner: "actions") {
object(expression: "main:.github/workflows/test.yml") {
... on Blob {
text
}
}
}
}
You send your requests to the API using requests or alike, and store the responses as JSON or alike for treatment.
As a side note, I do not think it is possible to achieve this without issuing multiple queries. I recently had to do something similar, and this is my first SO answer, so let me know if anything is unclear.
Edit:
You can use this answer to list all files in a repository.

Get the list of PNG files in File Storage excluding _processed_ folder

As topic says, I need to get only unprocessed PNG files.
My current approach is the following:
$fileExtensionFilter = $this->objectManager->get(FileExtensionFilter::class);
$fileExtensionFilter->setAllowedFileExtensions('png');
$storage->addFileAndFolderNameFilter([$fileExtensionFilter, 'filterFileList']);
$availablePngFiles = $storage->getFileIdentifiersInFolder($storage->getRootLevelFolder(false)->getIdentifier(), true, true);
foreach ($availablePngFiles as $pngFile) {
if(!$storage->isWithinProcessingFolder($pngFile)) {
$pngFileObject = $storage->getFile($pngFile);
}
}
So, it works, but I'd like to avoid the unnecessary isWithinProcessingFolder() lookup and get only the original unprocessed files, which will significantly reduce the number of loops.
TYPO3 core 7.6.19 does only ship with two filters: FileExtensionFilter and FileNameFilter, which actually is a "hidden file filter".
You could write your own file filter and filter in there, but that's way more work than keeping those two lines of code.

Cache Slick DBIO Actions

I am trying to speed up "SELECT * FROM WHERE name=?" kind of queries in Play! + Scala app. I am using Play 2.4 + Scala 2.11 + play-slick-1.1.1 package. This package uses Slick-3.1 version.
My hypothesis was that slick generates Prepared statements from DBIO actions and they get executed. So I tried to cache them buy turning on flag cachePrepStmts=true
However I still see "Preparing statement..." messages in the log which means that PS are not getting cached! How should one instructs slick to cache them?
If I run following code shouldn't the PS be cached at some point?
for (i <- 1 until 100) {
Await.result(db.run(doctorsTable.filter(_.userName === name).result), 10 seconds)
}
Slick config is as follows:
slick.dbs.default {
driver="slick.driver.MySQLDriver$"
db {
driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/staging_db?useSSL=false&cachePrepStmts=true"
user = "user"
password = "passwd"
numThreads = 1 // For not just one thread in HikariCP
properties = {
cachePrepStmts = true
prepStmtCacheSize = 250
prepStmtCacheSqlLimit = 2048
}
}
}
Update 1
I tried following as per #pawel's suggestion of using compiled queries:
val compiledQuery = Compiled { name: Rep[String] =>
doctorsTable.filter(_.userName === name)
}
val stTime = TimeUtil.getUtcTime
for (i <- 1 until 100) {
FutureUtils.blockFuture(db.compiledQuery(name).result), 10)
}
val endTime = TimeUtil.getUtcTime - stTime
Logger.info(s"Time Taken HERE $endTime")
In my logs I still see statement like:
2017-01-16 21:34:00,510 DEBUG [db-1] s.j.J.statement [?:?] Preparing statement: select ...
Also timing of this is also remains the same. What is the desired output? Should I not see these statements anymore? How can I verify if Prepared statements are indeed reused.
You need to use Compiled queries - which are exactly doing what you want.
Just change above code to:
val compiledQuery = Compiled { name: Rep[String] =>
doctorsTable.filter(_.userName === name)
}
for (i <- 1 until 100) {
Await.result(db.run(compiledQuery(name).result), 10 seconds)
}
I extracted above name as a parameter (because you usually want to change some parameters in your PreparedStatements) but that's definitely an optional part.
For further information you can refer to: http://slick.lightbend.com/doc/3.1.0/queries.html#compiled-queries
For MySQL you need to set an additional jdbc flag, useServerPrepStmts=true
HikariCP's MySQL configuration page links to a quite useful document that provides some simple performance tuning configuration options for MySQL jdbc.
Here are a few that I've found useful (you'll need to & append them to jdbc url for options not exposed by Hikari's API). Be sure to read through linked document and/or MySQL documentation for each option; should be mostly safe to use.
zeroDateTimeBehavior=convertToNull&characterEncoding=UTF-8
rewriteBatchedStatements=true
maintainTimeStats=false
cacheServerConfiguration=true
avoidCheckOnDuplicateKeyUpdateInSQL=true
dontTrackOpenResources=true
useLocalSessionState=true
cachePrepStmts=true
useServerPrepStmts=true
prepStmtCacheSize=500
prepStmtCacheSqlLimit=2048
Also, note that statements are cached per thread; depending on what you set for Hikari connection maxLifetime and what server load is, memory usage will increase accordingly on both server and client (e.g. if you set connection max lifetime to just under MySQL default of 8 hours, both server and client will keep N prepared statements alive in memory for the life of each connection).
p.s. curious if bottleneck is indeed statement caching or something specific to Slick.
EDIT
to log statements enable the query log. On MySQL 5.7 you would add to your my.cnf:
general-log=1
general-log-file=/var/log/mysqlgeneral.log
and then sudo touch /var/log/mysqlgeneral.log followed by a restart of mysqld. Comment out above config lines and restart to turn off query logging.

Azure Media Indexer - Indexing IAssetFiles

The media indexer (to produce the closed caption files) seems to be setup to index Assets, but our Assets container multiple IAssetFiles, one of which is video that I need to index. I have successfully used this sample code but everything seems geared towards Assets and not AssetFiles. Any tips?
IAsset asset = getAssetByID(dbAsset.containerId);
IMediaProcessor indexer = GetLatestMediaProcessorByName(_mediaProcessorName);
IJob job = _context.Jobs.Create("MediaIndex Job - " + dbAsset.name);
string configuration = "";
ITask task = job.Tasks.AddNew("MediaIndex Task", indexer, configuration, TaskOptions.None);
// Specify the input asset to be indexed.
task.InputAssets.Add(asset); <---- need to pass IAssetFile here
Figured it out. There's a config file you can pass into the task and you can add the filename to the input tag: to specify which of the asset files should be indexed.

Extend multiple sources / indexes

I have many web pages that are clones of each other. They have the exact same database
structure, just different data in different databases (each clone is for a different country so everything is
separated).
I would like to clean up my sphinx config file so that I don't duplicate the same queries
for every site.
I'd like to define a main source (with db auth info) for every clone, a common source for
every table I'd like to search, and then sources&indexes for every table and every clone.
But I'm not sure how exactly I should go about doing that.
I was thinking something among this lines:
index common_index
{
# charset_type, stopwords, etc
}
source common_clone1
{
# sql_host, sql_user, ...
}
source common_clone2
{
# sql_host, sql_user, ...
}
# ...
source table1
{
# sql_query, sql_attr_*, ...
}
source clone1_table1 : ???
{
# ???
}
# ...
index clone1_table1 : common_index
{
source: clone1_table1
#path, ...
}
# ...
So you can see where I'm confused :)
I though I could do something like this:
source clone1_table1 : table1, common_clone1 {}
but it's not working obviously.
Basically what I'm asking is; is there any way to extend two sources/indexes?
If this isn't possible I'll be "forced" to write a script that will generate my sphinx config file to ease maintenance.
Apparently this isn't possible (don't know if it's in the pipeline for the future). I'll have to resort to generating the config file with some sort of script.
I've created such a script, you can find it on GitHub: sphinx generate config php