How does one keep an Elasticsearch index up-to-date using elasticsearch-dsl-py? - elasticsearch-dsl

I developed a small personal information directory that my client accesses and updates through a Django admin interface. That information needs to be searchable, so I set up my Django site to keep that data in a search index. I originally used Haystack and Whoosh for the search index, but I recently had to move away from those tools, and switched to Elasticsearch 5.
Previously, whenever anything in the directory was updated, the code simply cleared the entire search index and rebuilt it from scratch. There's only a few hundred entries in this directory, so that wasn't onerously non-performant. Unfortunately, attempting to do the same thing in Elasticsearch is very unreliable, due to what I presume to be a race-condition of some sort in my code.
Here's the code I wrote that uses elasticsearch-py and elasticsearch-dsl-py:
import elasticsearch
import time
from django.apps import apps
from django.conf import settings
from elasticsearch.helpers import bulk
from elasticsearch_dsl.connections import connections
from elasticsearch_dsl import DocType, Text, Search
# Create the default Elasticsearch connection using the host specified in settings.py.
elasticsearch_host = "{0}:{1}".format(
settings.ELASTICSEARCH_HOST['HOST'], settings.ELASTICSEARCH_HOST['PORT']
)
elasticsearch_connection = connections.create_connection(hosts=[elasticsearch_host])
class DepartmentIndex(DocType):
url = Text()
name = Text()
text = Text(analyzer='english')
content_type = Text()
class Meta:
index = 'departmental_directory'
def refresh_index():
# Erase the existing index.
try:
elasticsearch_connection.indices.delete(index=DepartmentIndex().meta.index)
except elasticsearch.exceptions.NotFoundError:
# If it doesn't exist, the job's already done.
pass
# Wait a few seconds to give enough time for Elasticsearch to accept that the
# DepartmentIndex is gone before we try to recreate it.
time.sleep(3)
# Rebuild the index from scratch.
DepartmentIndex.init()
Department = apps.get_model('departmental_directory', 'Department')
bulk(
client=elasticsearch_connection,
actions=(b.indexing() for b in Department.objects.all().iterator())
)
I had set up the Django signals to call refresh_index() whenever a Department got saved. But refresh_index() was frequently crashing due this error:
elasticsearch.exceptions.RequestError: TransportError(400, u'index_already_exists_exception', u'index [departmental_directory/uOQdBukEQBWvMZk83eByug] already exists')
Which is why I added that time.sleep(3) call. I'm assuming that the index hasn't been fully deleted by the time DepartmentIndex.init() is called, which was causing the error.
My guess is that I've simply been going about this in entirely the wrong way. There's got to be a better way to keep an elasticsearch index up-to-date using elasticsearch-dsl-py, but I just don't know what it is, and I haven't been able to figure it out through their docs.
Searching for "rebuild elasticsearch index from scratch" on google gives loads of results for "how to reindex your elasticsearch data", but that's not what I want. I need to replace the data with new, more up-to-date data from my app's database.

Maybe this will help: https://github.com/HonzaKral/es-django-example/blob/master/qa/models.py#L137-L146
Either way you want to have 2 methods: batch loading all of your data into new index (https://github.com/HonzaKral/es-django-example/blob/master/qa/management/commands/index_data.py) and, optionally, a synchronization using methods/or signals as mentioned above.

Related

How can I manually edit the list of recently opened files in VS Code?

I rely heavily on the File: Open Recent… command to open frequently used files, but yesterday my local Google Drive folder got moved to a new location and now I can no longer access any of the files in that folder through the Open Recent panel because the paths don't match.
The fix would be as simple as replacing "/Google Drive/" with "/Google Drive/My Drive/" but I have no idea what file contains the list of files that appears in the recently opened panel.
I'm assuming it's somewhere in ~/Library/Application Support/Code but not sure where.
I was wondering the same thing the other day and found this while searching for a solution, so I took some time to investigate it today.
It's been a a few weeks since you posted, so hopefully this will still be of help to you.
Also, I'm using Windows and I'm not familiar with macOS, but I think it should be easy enough adjust the solution.
Location of settings
Those setting are stored in the following file: %APPDATA%\Code\User\globalStorage\state.vscdb.
The file is an sqlite3 database, which is used as a key-value store.
It has a single table named ItemTable and the relevant key is history.recentlyOpenedPathsList.
The value has the following structure:
{
"entries": [
{
"folderUri": "/path/to/folder",
"label": "...",
"remoteAuthority": "..."
}
]
}
To view the current list, you can run the following command:
sqlite3.exe -readonly "%APPDATA%\Code\User\globalStorage\state.vscdb" "SELECT [value] FROM ItemTable WHERE [key] = 'history.recentlyOpenedPathsList'" | jq ".entries[].label"
Modifying the settings
Specifically, I was interested in changing the way it's displayed (the label), so I'll detail how I did that, but it should be just as easy to update the path.
Here's the Python code I used to make those edits:
import json, sqlite3
# open the db, get the value and parse it
db = sqlite3.connect('C:/Users/<username>/AppData/Roaming/Code/User/globalStorage/state.vscdb')
history_raw = db.execute("SELECT [value] FROM ItemTable WHERE [key] = 'history.recentlyOpenedPathsList'").fetchone()[0]
history = json.loads(history_raw)
# make the changes you'd like
# ...
# stringify and update
history_raw = json.dumps(history)
db.execute(f"UPDATE ItemTable SET [value] = '{history_raw}' WHERE key = 'history.recentlyOpenedPathsList'")
db.commit()
db.close()
Code references
For reference (mostly for my future self), here are the relevant source code areas.
The settings are read here.
The File->Open Recent uses those values as-is (see here).
However when using the Get Started page, the Recents area is populated here. In the Get Started, the label is presented in a slightly different way:
vscode snapshot
The folder name is the link, and the parent folder is the the text beside it.
This is done by the splitName method.
Notes
Before messing around with the settings file, it would be wise to back it up.
I'm not sure how vscode handles and caches the settings, so I think it's best to close all vscode instances before making any changes.
I haven't played around with it too much, so not sure how characters that need to be json-encoded or html-encoded will play out.
Keep in mind that there might be some state saved by other extensions, so if anything weird happens, blame it on that.
For reference, I'm using vscode 1.74.2.
Links
SQLite command-line tools
jq - command-line JSON processor

TTeeGrid is not displaying the data at runtime using data from REST

I created a simple RME for TTeeGrid, a descendant perhaps of TGrid in Firemonkey. As shown below, the data are displayed at design time but not at runtime except the headers.
I've been breaking my head over this for weeks already but not luck.
Let me know if you need more details but what you see in the image are all you get.
I just need help to have the data displayed at runtime as shown in the design time.
UPDATE 1
This issue is not the case with TPrototypeBindSource. The data shown in the design time are displayed at runtime. Something is wrong somewhere.
I've never used the TeeGrid before, but the following worked fine
first time for me in Delphi Tokyo:
Download the TeeGrid trial from Steema.Com & install.
Create new multi-device app and place a TeeGrid and a FDMemTable on the form.
Load FDMemTable1 with the file Parts.Fds from the Delphi samples Data directory. Note, I did not then create any FieldDefs as I mentioned in my comment earlier as what I'm describing works without them.
Set the DataSource property of TeeGrid1 to FDMemTable1. TeeGrid1 immediately
creates columns for each of the Parts fields and populates them with data - see
screenshot below. I don't ordinarily include screenshots but in this case thought
I would as what I got was so clearly at odds with what you've reported.
Your TeeGrid etc are obviously more complicated than mine. so the best I can
suggest is that you backtrack to step 2 and see if you can replicate my result
with your data (either at design time or run time). It might be worth loading
your FDMemTable with some data at design time, as my impression is that live bindings
is less grief-prone when the datasource has some data.
Incidentally, fwiw the results of my own attempts to set up live bindings even with a regular TGrid have been rather patchy, until I discovered that instead of messing with the LB components myself, simply starting with a fresh TGrid, right-clicking on it and leaving the Live Bindings Wizard
to do its stuff consistently works fine.

Unable to run experiment on Azure ML Studio after copying from different workspace

My simple experiment reads from an Azure Storage Table, Selects a few columns and writes to another Azure Storage Table. This experiment runs fine on the Workspace (Let's call it workspace1).
Now I need to move this experiment as is to another workspace(Call it WorkSpace2) using Powershell and need to be able to run the experiment.
I am currently using this Library - https://github.com/hning86/azuremlps
Problem :
When I Copy the experiment using 'Copy-AmlExperiment' from WorkSpace 1 to WorkSpace 2, the experiment and all it's properties get copied except the Azure Table Account Key.
Now, this experiment runs fine if I manually enter the account Key for the Import/Export Modules on studio.azureml.net
But I am unable to perform this via powershell. If I Export(Export-AmlExperimentGraph) the copied experiment from WorkSpace2 as a JSON and insert the AccountKey into the JSON file and Import(Import-AmlExperiment) it into WorkSpace 2. The experiment fails to run.
On PowerShell I get an "Internal Server Error : 500".
While running on studio.azureml.net, I get the notification as "Your experiment cannot be run because it has been updated in another session. Please re-open this experiment to see the latest version."
Is there anyway to move an experiment with external dependencies to another workspace and run it?
Edit : I think the problem is something to do with how the experiment handles the AccountKey. When I enter it manually, it's converted into a JSON array comprising of RecordKey and IndexInRecord. But when I upload the JSON experiment with the accountKey, it continues to remain the same and does not get resolved into RecordKey and IndexInRecord.
For me publishing the experiment as a private experiment for the cortana gallery is one of the most useful options. Only the people with the link can see and add the experiment for the gallery. On the below link I've explained the steps I followed.
https://naadispeaks.wordpress.com/2017/08/14/copying-migrating-azureml-experiments/
When the experiment is copied, the pwd is wiped for security reasons. If you want to programmatically inject it back, you have to set another metadata field to signal that this is a plain-text password, not an encrypted password that you are setting. If you export the experiment in JSON format, you can easily figure this out.
I think I found the issue why you are unable to export the credentials back.
Export the JSON graph into your local disk, then update whatever parameter has to be updated.
Also, you will notice that the credentials are stored as 'Placeholders' instead of 'Literals'. Hence it makes sense to change them to Literals instead of placeholders.
This you can do by traversing through the JSON to find the relevant parameters you need to update.
Here is a brief illustration.
Changing the Placeholder to a Literal:

Visualize an embedded neo4j instance in a web browser using default visualization

I am using embedded Neo4j, version 3.0.3. Following this guide, I have created Neo4j/Java code. It creates a database, adds two nodes (one for java, one for scala) and adds a relationship.
package examples;
import java.io.File;
import org.neo4j.graphdb.*;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
public class HelloWorld {
public static void main(String[] args) {
GraphDatabaseFactory dbFactory = new GraphDatabaseFactory();
GraphDatabaseService db = dbFactory.newEmbeddedDatabase(new File("Test_DB"));
try (Transaction tx = db.beginTx()) {
Node javaNode = db.createNode(Tutorials.JAVA);
javaNode.setProperty("TutorialID", "JAVA001");
Node scalaNode = db.createNode(Tutorials.SCALA);
scalaNode.setProperty("TutorialID", "SCALA001");
Relationship relationship = javaNode.createRelationshipTo(scalaNode, TutorialRelationships.JVM_LANGUAGES);
relationship.setProperty("Id", "1234");
tx.success();
}
}
}
enum Tutorials implements Label {
JAVA, SCALA, SQL, NEO4J;
}
enum TutorialRelationships implements RelationshipType {
JVM_LANGUAGES, NON_JVM_LANGUAGES;
}
I program using Eclipse, so all the libraries are imported and I can just click the 'run' button on Eclipse to get the code running, and it seems to work without any issues. Upon running the code, I now have a folder Test_DB in the ~/workspace/project_name/Test_DB directory, where project_name is the name of the overall Eclipse folder. My goal is now to visualize this database in a web browser. The guide I linked to earlier shows an example of this; the user was able to look at the nodes in the web browser (see the bottom of the webpage). Unfortunately, I am using a Linux computer with Firefox, and that tutorial was in Windows, and I can't figure out how to get the visualization.
There have been a few other questions related to this. Unfortunately, some of them (such as this one) propose using software other than the default visualization. I don't own the computer and I have to go through a roundabout process to get external code installed. To be clear what I mean, this link discusses the default Neo4j browser. This is what I would like to see.
This question here directly tackles the same issue, and in fact, it uses the exact same tutorial I used! The answer proposes changing the path in the neo4j-server.properties file. Unfortunately, that file doesn't exist, and upon further analysis, it seems like Neo4j 3.0 changed the configuration naming, which I found out by reading the answer to this similar question. There is now a file conf/neo4j.conf with this information. I entered the following information in the first few lines, keeping the other settings the default:
# The name of the database to mount
dbms.active_database=Test_DB
# Paths of directories in the installation.
dbms.directories.data=/home/username/workspace/project_name/
This does not appear to work. Am I using these settings correctly? When I open the neo4j web browser after running ./bin/neo4j start and click on the database symbol in the left hand side, I see "Name: Test_DB", but it also says there are no nodes and no relationships in the database, and returning a match all query provides nothing. Is it possible for the browser to connect to my database so it can see the nodes (e.g., the two nodes in my Java code above)?
Or is it that I'm not using this code correctly; does the code somehow have to avoid quitting (i.e., replace tx.success() with something else?) to keep the data there?
Sorry about answering my own question, but I finally figured out how to do this! Here's what happens: according to the github change log for 3.0.0.RC-1:
Databases are now stored in a directory called databases under the directory specified in dbms.directories.data
So what we actually have to do is make sure our data base is in the following location:
/home/username/workspace/project_name/databases/
The issue is that when we run it in Eclipse, we get the database in the following folder:
/home/username/workspace/project_name/
Thus, the solution is to make sure the new database folder is preceded by a databases name, i.e., I would change one line to:
GraphDatabaseService db = dbFactory.newEmbeddedDatabase(new File("databases/Test_DB"));

Zend Lucene with symfony and i18n

I've went through the Jobeet Tutorial for integrating Zend Lucene into a symfony (1.4.8) project in order to add search capabilities into my frontend of my site (through indexing). Among others, the key concept is to use updateLuceneIndex during model's save action (needs to be overridden) in order to create/update the index of the specific entry.
My model has i18n fields, some of which (i,e name, title) I want to be inserted in the index. Everything works as expected but when it comes to save the i18n fields into the index all I get is blank values ($this->getName() returns empty string). I'm inspecting the created index with the Luke.
I ended up that this has nothing to do with the Zend Lucene but with symfony. It seems that during save the information for i18n fields isn't available (or is it?). I've also tried hook up the update during preSave(), postSave() but no avail.
So I want to ask how am I supposed to get my model's i18n field values during the save action in order to update the index accordingly?
Important note: This happens only during doctrine:data-load task. If I manually insert or update a record the index gets updated accordingly.
One last related question. It would be nice if I could save different keywords for each of the languages of the field of the model. How can I get the different values for each field's language inside the model?
The reason of this strange behaviour of Symfony is that when you are loading fixtures via cli, it has no context loaded (for instance when you try to get context instance sfContext::getInstance(), youll get "context instance does not exists" error exception).
With no context instance available, there is no "current culture" and with no current culture, there is no value of i18n fields.
The symfony context actualy supports all I18N functionalities with current User culture ($currentUserCulture = sfContext::getInstance()->getUser->getCulture()).
This all means 2 things:
You cant use symfony "current user culture" capabilities while you are
in cli session
If you needs to have sfContext::getInstance() somewhere in your
code (especialy in the models), you have to close it into condition to avoid any troubles with unexpected and hard to find exceptions while in cli
Example of getting current culture in model class (it will not pass condition while in cli):
if (sfContext::hasInstance()) {
sfContext::getInstance()->getUser()->getCulture();
}
So when you cant use Symfony i18n shortcuts (like $record->getName()), you have to work around it.
In Your symfony1-doctrine models you always have $this->Translation object available.
So you can access your translation values object via something like $this->Translation[$culture].
Its up to you to work with that, you can use your default culture $this->Translation[sfConfig::get('sf_default_culture')], or interate trough all your supported cultures from some global configuration (i recommends you to set it in one of your configuration files globaly accross of all apps - maybe /config/app.yml).
Example of getting $record Translation object in any situations:
if (sfContext::hasInstance()) {
$translation = $this->Translation[sfContext::getInstance()->getUser()->getCulture()];
}
else {
$translation = $this->Translation->getFirst();
// or: $translation = $this->Translation[$yourPreferedCulture];
}
// you can access to modified fields of translation object
$translationModified = $translation->getModified();