Exception in sumTypeTopicCounts - mallet

Hi I am trying to use MALLET to obtain 500 topics but I hit the below exception in MALLET. Is this a known issue and are there any workarounds?
overflow in merging on type 4975
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3
at cc.mallet.topics.ParallelTopicModel.sumTypeTopicCounts(ParallelTopicModel.java:453)
at cc.mallet.topics.ParallelTopicModel.estimate(ParallelTopicModel.java:825)
at cc.mallet.topics.tui.TopicTrainer.main(TopicTrainer.java:245)
I am using mallet-2.0.8RC2.

Recently, I ran Mallet with two different datasets (one with 100M and the other one around 1G). Usually this kind of exception happened with the larger dataset and when I wanted to run in in parallel for larger iteration number like 100 for the larger dataset. It threw Exception: ArrayIndexOutOfBoundsException in two different files: WorkerRunnable and ParallelTopicModel in different spots. So the thing is when the array reaches the end of the array, it prints “overflow in merging on type” to the logger and after that point, the program doesn’t do anything to get out of the situation. I was able to patch these edge cases with index checking before accessing the array. It helps me run it without breaking it but I am not sure how it might change the output anyways and it also keeps printing the same message “overflow in merging on type ” as usual but it goes on and doesn’t throw an exception.
I have uploaded the patches on my Github and follow the instructions. It has been able to resolve the issues for me as I haven’t seen this break again under different circumstances. If it doesn’t resolve your issues, you should probably download the latest version from their Github and debug and build it yourself.
I have also uploaded both datasets; both are four years of data; (1 Jan 2015- 1 Jan 2019), smaller one is StackExchange (DataScience) and the larger one is Reddit (9 DataScience Subreddits) (datasets) and you would like to play with it.
Good luck.

Related

Anylogic model stop without message

I have created a model to generate a product that will be cycled through a list of machines. Technically the product list is for a single-day run, but I run the model for long durations to stabilise the model output.
The model can run properly for months until around 20 months, then suddenly stops without any error message as shown in the screenshot. I do not know how to debug this since I do not know where the error comes from.
Does anyone have a similar encounter and could advise on how to approach this issue? Could it be an issue of memory overload?
Without more details, it's hard to pinpoint the exact reason, but this generally happens if the run is stuck in an infinite While Loop or similar. So check all your loops where it's possible for such a scenario to happen and it's likely that one of them (or more) is causing the issue.

Query (of class "PhabricatorDaemonLogQuery") overheated: examined more than 10 raw rows without finding 1 visible objects

I got an error as Query (of class "PhabricatorDaemonLogQuery") overheated: examined more than 10 raw rows without finding 1 visible objects.
Phabricator was running fine. But don't know how this error comes. I restarted the daemons. And the phabricator start to run again normally. Actually I want to know what this error means.

IBM Datastage reports failure code 262148

I realize this is a bad question, but I don't know where else to turn.
can someone point me to where I can find the list of reports failure codes for IBM? I've tried searching for it in the IBM documentation, and in general google search, but this particular error is unique and I've never seen it before.
I'm trying to find out what code 262148 means.
Background:
I built a datastage job that has:
ORACLE CONNECTOR --> TRANSFORMER -> HIERARCHICAL DATA
The intent is to pull data from a ORACLE table, and output the response of the select statement into a JSON file. I'm using the HIERARCHICAL stage to set it. When tested in the stage, no problems, I see the JSON output.
However, when I run the job, it squawks:
reports failure code 262148
then the job aborts. There are no warnings, no signs, no errors prior to this line.
Until I know what it is, I can't troubleshoot.
If someone can point me to where the list of failure codes are, i can proceed.
Thanks!
can someone point me to where I can find the list of reports failure codes for IBM?
Here you go:
https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_73/rzahb/rzahbsrclist.htm
While this list does not list your specific error code, it does categorize many other codes, and explains how the code breakdown works. While this list is not specifically for DataStage, in my experience IBM standards are generally compatible across different products. In this list every code that starts with a 2 is a disk failure, so maybe run a disk checker. That's the best I've got as far as error codes.
Without knowledge of the inner workings of the product, there is not much more you can do beyond checking system health in general (especially disk, network and permissions in this case). Personally, I prefer to go get internal knowledge whenever exterior knowledge proves insufficient. I would start with a network capture, as I'm sure there's a socket involved in the connection between the layers. Compare the network capture from when the select statement is run from the hierarchical from one captured when run from the job. There may be clues in there, like reset or refused connections.

Reading custom metrics from the last build for custom baseline comparisons

I'm planning to introduce linting into a rather massive code base. Fixing all existing issues beforehand is not possible, so seeing thousands of linter errors at start is inevitable.
I'd like to record the number of detected errors each time the build runs for master and treat this number as a success / failure threshold. If a new pull request does not exceed the current baseline, its pipeline passes and so the proposed change is good to go. However, if the number of errors increases, I'd like the pipeline to fail, thus preventing the merge.
This functionality I’ve described narrows down to writing variables to Azure DevOps servers as some side-effects of builds and also reading these values from the previous build. This looks very similar to comparing code coverage, however, I can't seem to find any docs on how to implement the read-write logic manually.
What pipeline task could I use? What else can I leverage to track a custom metric over a number of builds and compare the value with previous? To summarise, my ultimate goal is to gradually lower an arbitrary value from a large number to zero over the course of several months.

xlang/s engine event log entry: Failed while creating a X service. Object of type 'Y' cannot be converted to type 'Y'

xlang/s engine event log entry: Failed while creating a X service. Object of type 'Y' cannot be converted to type 'Y'.
This event log entry appears to be the same as what is discussed here:
Microsoft.XLANGs.Core.ServiceCreationException : Failed while creating a ABC service
I've investigated the 2 solutions offered in this post, but neither fixed my problem.
I'm running BizTalk 2010 and am seeing the issue with a uniform sequential convoy. Each instance of the orchestration is initially activated as expected. All the shapes before the second receive shape execute without issue. The problem occurs when the orchestration instance receives its second message. Execution does not proceed beyond the receive shape that corresponds to this second message.
Using the Group Hub page, I can see that the second message is associated with the correct service instance. This service instance is suspended and the error message shown above appears in the event log.
Occasionally (about 1 out of every 5 times), the problem mentioned above does NOT occur. That is, subsequent messages are process by the orchestration. I'm feeding in the same test files each time. Even more interesting...the problem NEVER occurs if I set a break point (in Orchestration Debugger) on a Listen shape just before the second receive shape.
The fact that I don't see the problem when using the debugger makes me wonder if this is a timing issue. Unfortunately, it doesn't seem like I would have much control over the timing.
Does anyone have any idea about how to prevent this problem from occurring?
Thanks
Is there only a single BizTalk host server involved? It wouldn't surprise me if the issue was related to difficulty loading a required assembly from the GAC. If there were multiple BizTalk servers involved, it could be that one of them is the culprit (or only one of them isn't). Of course, it may not be that easy.
An alternative is the second answer on the other question to which you linked, stating to check that a required schema is not deployed more than once. I have had this problem before, and about the only way to figure out that this is what's going on is to look in the BizTalk Admin Console under BizTalk Group > Applications > <AllArtifacts> > Schemas and sort by the Target Namespace to see if there are any two (or more) rows with the same combination of Target Namespace and Root Name.
The issue could also be caused by a schema mismatch, where perhaps an older/different version of a schema is deployed than expected, and a field that is only sometimes there (hence why it sometimes works) causes a mismatch.
These are, of course, just theories, without the ability to look into your environment and see the actual BizTalk artifacts.
I filed this issue with Microsoft. It turns out that "the behavior is actually an existing design limitation with the way the XLANG compiler relies on type wrappers." The issue resulted from an very specific scenario. We had an orchestration with a message variable directly referencing a schema and another message variable referencing a multi-part message type based on the same schema. The orchestration, schema, and multi-part message type were each defined in different projects.
Microsoft suggested that we modify one of the variables so that both referenced the schema or both referenced the MMT. Unfortunately, keeping the variables as they were was critical for us. We discovered (and Microsoft confirmed) that moving the definition of the MMT into the same project as the orchestration resolved the issue as well.