Extend multiple sources / indexes - sphinx

I have many web pages that are clones of each other. They have the exact same database
structure, just different data in different databases (each clone is for a different country so everything is
separated).
I would like to clean up my sphinx config file so that I don't duplicate the same queries
for every site.
I'd like to define a main source (with db auth info) for every clone, a common source for
every table I'd like to search, and then sources&indexes for every table and every clone.
But I'm not sure how exactly I should go about doing that.
I was thinking something among this lines:
index common_index
{
# charset_type, stopwords, etc
}
source common_clone1
{
# sql_host, sql_user, ...
}
source common_clone2
{
# sql_host, sql_user, ...
}
# ...
source table1
{
# sql_query, sql_attr_*, ...
}
source clone1_table1 : ???
{
# ???
}
# ...
index clone1_table1 : common_index
{
source: clone1_table1
#path, ...
}
# ...
So you can see where I'm confused :)
I though I could do something like this:
source clone1_table1 : table1, common_clone1 {}
but it's not working obviously.
Basically what I'm asking is; is there any way to extend two sources/indexes?
If this isn't possible I'll be "forced" to write a script that will generate my sphinx config file to ease maintenance.

Apparently this isn't possible (don't know if it's in the pipeline for the future). I'll have to resort to generating the config file with some sort of script.
I've created such a script, you can find it on GitHub: sphinx generate config php

Related

Is it possible to deactivate the autogenerated Drop?

I'm testing Alembic for a python project. The autogeneration is really nice, but dropping is not really helpful if you need to work on customer databases with many different versions for example.
Activate or deactivate Dropping for different scenarios. This would be the best solution.
I made my own configuration in env.py, so I can use more than one base script. But if I make a new script (defining a new table) and autogenerate a migration-script I have an autogenerated drop of all previous migrated tables.
I looked already for the mako-file. How is it possible to integrate a restriction in the mako-file?
I found a possibility to filter my migration-operations-list.
if you hand over a filter-methode who filters your list to the config-flag "process_revision_directives". (all configs in the env.py)
from alembic.operations import ops
def process_revision_directives(context, revision, directives):
script = directives[0]
# process both "def upgrade()", "def downgrade()"
for directive in (script.upgrade_ops, script.downgrade_ops):
# now rewrite the list of "ops" such that DropColumnOp and DropTableOp
# are removed for those tables. Needs a recursive function.
directive.ops = list(
_filter_drop_elm(directive.ops)
)
def _filter_drop_elm(directives):
# given a set of (tablename, schemaname) to be dropped, filter
# out Drop-Op from the list of directives and yield the result.
for directive in directives:
# ModifyTableOps is a container of ALTER TABLE types of
# commands. process those in place recursively.
if isinstance(directive, ops.DropTableOp):
continue
elif isinstance(directive, ops.ModifyTableOps):
directive.ops = list(
_filter_drop_elm(directive.ops)
)
if not directive.ops:
continue
elif isinstance(directive, ops.AlterTableOp) and isinstance(directive, ops.DropColumnOp):
continue
# otherwise if not filtered, yield out the directive
yield directive

Get file content on Github with GraphQL API and Python

Given a list of repositories on GitHub with 'repoName' and 'userName', I need to find all the '.java' files, and get the content of the java files. Tried to use RestAPI first, but ran into the rate limit very easily, now I'm switching to GraphQL API, but don't know how to achieve that.
Here is how I would do it:
Algo Identify_java_files
Entry: path the folder
Out: java files in the folder
Identify all files in the folder of the repository
For each file
if the type of the file is "blob":
if ".java" is the end of the name of the file
get its content
else if the type of the file is "tree":
Identify_java_files(path of the tree file)
You can easily implement this pseudo code using Python. My pseudo code makes the assumption to use recursion, but it can be done otherwise, it's just for the example. You will need the requests and json libraries.
Here are the queries you might need, and you can use the explorer to test them.
{
repository(name: "checkout", owner: "actions") {
defaultBranchRef {
name
}
}
}
This query allows you to check the name of the default branch of the repository. You will need it for the following queries, or you can use a specific branch but you will have to know its name. I don't know (and don't believe) if you can get all the branches names for a repository.
{
repository(name: "checkout", owner: "actions") {
object(expression: "main:") {
... on Tree {
entries {
path
type
}
}
}
}
}
This query gets the content of the root folder for a specific repository. The expression: "main:" refers to the branch of the repository along with the path. Here the branch is main and the path is empty (it comes after the ":"), meaning we are looking at the root folder. Some repositories are using "master" as default branch, so be sure of which branch to use.
To check the content of a file, you can use this accepted answer.
I updated the example in order for you to be able to try it.
{
repository(name: "checkout", owner: "actions") {
object(expression: "main:.github/workflows/test.yml") {
... on Blob {
text
}
}
}
}
You send your requests to the API using requests or alike, and store the responses as JSON or alike for treatment.
As a side note, I do not think it is possible to achieve this without issuing multiple queries. I recently had to do something similar, and this is my first SO answer, so let me know if anything is unclear.
Edit:
You can use this answer to list all files in a repository.

preserve existing code for arbitrary scalafmt settings

I'm trying to gently introduce scalafmt to a large existing codebase and I want it to make virtually no changes except for a handful of noncontroversial settings the whole team can agree on.
With some settings like maxColumn I can override the default of 80 to something absurd like 5000 to have no changes. But with other settings I have to make choices that will modify the existing code like with continuationIndent.callSite. The setting requires a number which would aggressively introduce changes on the first run on our codebase.
Is there anything I can do in my scalafmt config to preserve all my code except for a few specific settings?
EDIT: I will also accept suggestions of other tools that solve the same issue.
Consider project.includeFilters:
Configure which source files should be formatted in this project.
# manually include files to format.
project.includeFilters = [
regex1
regex2
]
For example, say we have project structure with foo, bar, baz, etc. packages like so
someProject/src/main/scala/com/example/foo/*.scala
someProject/src/main/scala/com/example/bar/*.scala
someProject/src/main/scala/com/example/baz/qux/*.scala
...
Then the following .scalafmt.conf
project.includeFilters = [
"foo/.*"
]
continuationIndent.callSite = 2
...
will format only files in foo package. Now we can proceed to gradually introduce formatting to the codebase package-by-package
project.includeFilters = [
"foo/.*"
"bar/.*"
]
continuationIndent.callSite = 2
...
or even file-by-file
project.includeFilters = [
"foo/FooA\.scala"
"foo/FooB\.scala"
]
continuationIndent.callSite = 2
...

Get the list of PNG files in File Storage excluding _processed_ folder

As topic says, I need to get only unprocessed PNG files.
My current approach is the following:
$fileExtensionFilter = $this->objectManager->get(FileExtensionFilter::class);
$fileExtensionFilter->setAllowedFileExtensions('png');
$storage->addFileAndFolderNameFilter([$fileExtensionFilter, 'filterFileList']);
$availablePngFiles = $storage->getFileIdentifiersInFolder($storage->getRootLevelFolder(false)->getIdentifier(), true, true);
foreach ($availablePngFiles as $pngFile) {
if(!$storage->isWithinProcessingFolder($pngFile)) {
$pngFileObject = $storage->getFile($pngFile);
}
}
So, it works, but I'd like to avoid the unnecessary isWithinProcessingFolder() lookup and get only the original unprocessed files, which will significantly reduce the number of loops.
TYPO3 core 7.6.19 does only ship with two filters: FileExtensionFilter and FileNameFilter, which actually is a "hidden file filter".
You could write your own file filter and filter in there, but that's way more work than keeping those two lines of code.

sphinx multiple searchd on same port

I am working on Sphinx search.
I have made different indexes for different data.
My problem is ,i have different searchd services for different indexes.
But i want to run these all serivces on same port.Is it possible to do this in Sphinx?
Each Sphinx daemon could serve multiple indexes. You just need to put all sources and indexes definition to single configuration file and make sure indexes has different names.
Quick example:
source src1
{
...
}
source src2
{
...
}
index sphinx_index1
{
source = src1
path = <sphinx_path>/sphinx_index1
...
}
index sphinx_index2
{
source = src2
path = <sphinx_path>/sphinx_index2
...
}
Hope this helps