When I run CTS after a few hours the adb connection to device becomes unresponsive

When I run CTS after a few hours the adb connection to device becomes unresponsive - android-source

I am executing CTS on Jacinto 6 Evaluation Module (ti-jacinto6evm) and I'm encountering a number of test case failures that I don't understand.
I started by building both AOSP and CTS. Both builds were just fine. I can flash my test hardware (ti-jacinto6evm) and then I followed the instructions for setting up CTS. I have run CTS for more then 10 times on the same device and every time I got different results. The ti-jacinto6 device randomly gets hanged during execution of the test cases.
Most of the time target gets hanged and it show following error:
Reason: 'Failed to receive adb shell test output within 600000 ms. Test may have timed out, or adb connection to device became unresponsive'. Check device logcat for details
Device 170090035a700002 shell is unresponsive
05-30 04:52:21 W/TestInvocation: Invocation did not complete due to device 170090035a700002 becoming not available. Reason: Could not find device 170090035a700002
on the below test cases my target hangs:
CtsPreference2TestCases
CtsUiHostTestCases
CtsServicesHostTestCases
CtsTrustedVoiceHostTestCases
CtsTransitionTestCases
CtsAppTestCases
CtsGraphicsTestCases
CtsCameraTestCases
CtsWebkitTestCases
CtsFragmentTestCases
CtsViewTestCases
So I just excluded those test cases from the CTS and again ran CTS with the following command:
run cts --skip-preconditions --exclude-filter CtsPreference2TestCases --exclude-filter CtsServicesHostTestCases --exclude-filter CtsUiHostTestCases --exclude-filter CtsTrustedVoiceHostTestCases --exclude-filter CtsAppTestCases --exclude-filter CtsGraphicsTestCases --exclude-filter CtsTransitionTestCases --exclude-filter CtsCameraTestCases --exclude-filter CtsWebkitTestCases --exclude-filter CtsFragmentTestCases --plan cts
Problem 1
I am facing a problem where some test cases are running properly for the first time, but when I run CTS for the second time, they fail some passed test case(s).
1st iteration. On this iteration 166 modules passed:
Testcase name
Passed
Failed
Total executed
armeabi-v7a CtsWebkitTestCases
201
12
213
2nd iteration. On this iteration 91 modules passed:
Testcase name
Passed
Failed
Total executed
armeabi-v7a CtsWebkitTestCases
80
1
81
Problem 2
When CTS gets stuck on some testcases it shows a TimeoutException:
com.android.ddmlib.TimeoutException
at com.android.ddmlib.AdbHelper.read(AdbHelper.java:767)
at com.android.ddmlib.AdbHelper.read(AdbHelper.java:736)
at com.android.ddmlib.AdbHelper.readAdbResponse(AdbHelper.java:222)
at com.android.ddmlib.AdbHelper.executeRemoteCommand(AdbHelper.java:456)
at com.android.ddmlib.AdbHelper.executeRemoteCommand(AdbHelper.java:382)
at com.android.ddmlib.Device.executeShellCommand(Device.java:617)
at com.android.tradefed.device.NativeDeviceStateMonitor.waitForDeviceShell(NativeDeviceStateMonitor.java:170)
at com.android.tradefed.device.WaitDeviceRecovery.recoverDevice(WaitDeviceRecovery.java:142)
at com.android.tradefed.device.NativeDevice.recoverDevice(NativeDevice.java:1720)
at com.android.tradefed.device.NativeDevice.performDeviceAction(NativeDevice.java:1661)
at com.android.tradefed.device.NativeDevice.runInstrumentationTests(NativeDevice.java:615)
at com.android.tradefed.device.NativeDevice.runInstrumentationTests(NativeDevice.java:698)
at com.android.tradefed.testtype.InstrumentationTest.runWithRerun(InstrumentationTest.java:797)
at com.android.tradefed.testtype.InstrumentationTest.doTestRun(InstrumentationTest.java:740)
at com.android.tradefed.testtype.InstrumentationTest.run(InstrumentationTest.java:643)
at com.android.tradefed.testtype.AndroidJUnitTest.run(AndroidJUnitTest.java:233)
at com.android.compatibility.common.tradefed.testtype.ModuleDef.run(ModuleDef.java:250)
at com.android.compatibility.common.tradefed.testtype.CompatibilityTest.run(CompatibilityTest.java:506)
at com.android.tradefed.invoker.TestInvocation.runTests(TestInvocation.java:761)
at com.android.tradefed.invoker.TestInvocation.prepareAndRun(TestInvocation.java:446)
at com.android.tradefed.invoker.TestInvocation.performInvocation(TestInvocation.java:300)
at com.android.tradefed.invoker.TestInvocation.invoke(TestInvocation.java:886)
at com.android.tradefed.command.CommandScheduler$InvocationThread.run(CommandScheduler.java:567)
What is the reason behind this failure?

No need to rerun the tests that already past you can continue and run only the tests that failed or were not ran, use command
l r
to get the results and then use the first column number session id of the run to continue test from like this:
run cts --retry 12
where 12 is the run session id displayed in the first column of l r.
adb disconnection indeed can affect the test cases, I use this small script to reconnect, you can modify it to suit your needs.
cat adb_retry.sh:
while :
do
if ((`adb devices | wc -l` < 3 )); then
echo Connection for $1 droped out
echo retrying
adb connect "$1"
fi
sleep 5
echo Watching...
done

Related

Github CI UITest gives flaky tests 'Unable to monitor event loop'

I am running my UI Tests on Github CI and the tests are flaky. I don't understand how I can fix it. The animations are disabled and I am running the tests on a iPhone 13 plus. A lot of tests are running green, but some are not working. Locally, I got everything working.
These are some logs:
2022-06-21T13:42:23.2627250Z t = 63.34s Tap Cell
2022-06-21T13:42:23.2707530Z t = 63.34s Wait for com.project.project to idle
2022-06-21T13:42:23.2733620Z t = 63.41s Unable to monitor event loop
2022-06-21T13:42:23.2734250Z t = 63.41s Unable to monitor animations
2022-06-21T13:42:23.2734800Z t = 63.42s Find the Cell
2022-06-21T13:42:24.1158670Z t = 64.45s Find the Cell (retry 1)
2022-06-21T13:42:24.1287900Z t = 64.45s Collecting extra data to assist test failure triage
2022-06-21T13:42:24.2022460Z /Users/runner/work/project/UITestCase.swift:665: error: -[project.UserTagTest testTapInTextView] : Failed to get matching snapshot: Lost connection to the application (pid 12676). (Underlying Error: Couldn’t communicate with a helper application. Try your operation again. If that fails, quit and relaunch the application and try again. The connection to service created from an endpoint was invalidated: failed to check-in, peer may have been unloaded: mach_error=10000003.)
It can not find the cell because of these logs:
Unable to monitor event loop
Unable to monitor animations
I know this because I sometimes get different errors than the error above, which says that the connection to the application is lost, right below the Unable to monitor... error logging.
Is there anything I can try? I don't have a reproduction project. This is the command that is executed:
xcodebuild test -project project.xcodeproj -scheme project-iosUITests -destination 'platform=iOS Simulator,name=iPhone 13 Pro,OS=15.5'
The CI runs 35 tests and 5 fails randomly with the Unable to errors. Is there any suggestion to fix this problem?

How to stop reporting FAILED of systemctl unit

I have an systemctl service unit which has some runtime dependencies which get resolved during boot. Many times it reports "FAILED" state during boot. This service unit has "Restart=always", so ultimately after boot this unit starts successfully. But, during boot around 3-4 times it reports FAILED which I want to avoid.
Is there a way to ignore the "FAILED" state of service unit being reported?
(As I know it will succeed once the dependency is resolved or will keep retrying)

I found that the return value (including error) reported by failure of service unit can be ignored prepending an hyphen while configuring the ExecStart.
From manual:
https://www.freedesktop.org/software/systemd/man/systemd.service.html#BusName=
VIZ:
"-" If the executable path is prefixed with "-", an exit code of the command normally considered a failure (i.e. non-zero exit status or abnormal exit due to signal) is recorded, but has no further effect and is considered equivalent to success.
ExecStart=-/sbin/getty

Mappers fail for pig to insert data into MongoDB

I am trying to import a file from HDFS to MongoDB using MongoInsertStorage with PIG. The files are large, around 5GB. The script runs fine when I run it in local mode with
pig -x local example.pig
However if I run it in the mapreduce mode, Most of the mappers fail with the following error:
Error: com.mongodb.ConnectionString.getReadConcern()Lcom/mongodb/ReadConcern;
Container killed by the ApplicationMaster.
Container killed on request.
Exit code is 143 Container exited with a non-zero exit code 143
Can someone help me solve this issue?? I also increased the memory allocated to YARN containers but that hasnt helped.
Some mappers are also timing out after 300 seconds.
Pig Script is as follows
REGISTER mongo-java-driver-3.2.2.jar
REGISTER mongo-hadoop-core-1.4.0.jar
REGISTER mongo-hadoop-pig-1.4.0.jar
REGISTER mongodb-driver-3.2.2.jar
DEFINE MongoInsertStorage com.mongodb.hadoop.pig.MongoInsertStorage();
SET mapreduce.reduce.speculative true
BIG_DATA = LOAD 'hdfs://example.com:8020/user/someuser/sample.csv' using PigStorage(',') As (a:chararray,b:chararray,c:chararray);
STORE BIG_DATA INTO 'mongodb://insert.some.ip.here:27017/test.samplecollection' USING MongoInsertStorage('', '')

Found a solution.
For the error
Error: com.mongodb.ConnectionString.getReadConcern()Lcom/mongodb/ReadConcern;
Container killed by the ApplicationMaster.
Container killed on request.
Exit code is 143 Container exited with a non-zero exit code 143
I changed the JAR versions - hadoopcore and hadooppig from 1.4.0 to 2.0.2 and for Mongo Java driver from 3.2.2 to 3.4.2. This eliminated the ReadConcern Error on the mappers!
For the timeout, I added this after registering the jars:
SET mapreduce.task.timeout 1800000
I had been using SET mapred.task.timeout which didnt work
Hope this helps anyone who has a similar issue!

Solaris svcs command shows wrong status

I have freshly installed an application on solaris 5.10 . When checked through ps -ef | grep hyperic | grep agent, process are up and running . When checked the status through svcs hyperic-agent command, the output shows that the agent is in maintenance mode . Application is working fine and I dont have any issues with the application . Please help

There are several reasons that lead to that behavior:
Starter (start/exec property of service) returned status that is different from SMF_EXIT_OK (zero). Than you may check logs:
# svcs -x ssh
...
See: /var/svc/log/network-ssh:default.log
If you check logs, you may see following messages that means, starter script failed or incorrectly written:
[ Aug 11 18:40:30 Method "start" exited with status 96 ]
Another reason for such behavior is that service faults during while its working (i.e. one of processes coredumps or receives kill signal or all processes exits) as described here: https://blogs.oracle.com/lianep/entry/smf_5_fault_retry_models
The actual system that provides SMF facilities for monitoring that is System Contracts. You may determine contract ID of online service with svcs -v (field CTID):
# svcs -vp svc:/network/smtp:sendmail
STATE NSTATE STIME CTID FMRI
online - Apr_14 68 svc:/network/smtp:sendmail
Apr_14 1679 sendmail
Apr_14 1681 sendmail
Than watch events with ctwatch:
# ctwatch 68
CTID EVID CRIT ACK CTTYPE SUMMARY
68 28 crit no process contract empty
Than there are two options to handle that:
There is a real problem with service so it eventually faults. Than debug the application.
It is normal behavior of service, so you should edit and re-import your service manifest, to make SMF less paranoid. I.e. configure ignore_error and duration properties.

CTS_ERROR >>> Failed to execute shell command am instrument

Using the Android 2.2, API-8, SDK-r7 along with CTS-2.2_r4 suite.
Updated the SDK_ROOT environment variable with SDK_r7 tools in “android-cts/tools/startcts” script and the “SDK_ROOT/tools” also included in PATH environment variable.
Ran the “android” and created a new virtual device and started the same. This invokes the emulator named as “emulator-5554”.
Now, started the cts using the below command:
bash android-cts/tools/startcts.
start –plan android
Above command failed with:
Test package: android.app
install met failure [install_failed_insufficient_storage]
CTS_ERROR >>> Failed to execute shell command am instrument -w -r -e package android com.android.cts.app/android.test.InstrumentationCtsTestRunner on device emulator-5554
com.android.ddmlib.ShellCommandUnresponsiveException.
A few more issues are:
CTS_ERROR >>> Got exception while processing command
CTS_ERROR >>> Installing met timeout due to Unknown reason
CTS_ERROR >>> Timeout: ReferenceAppTest
CTS_ERROR >>> Timeout: getDeviceInfo
Any hint to avoid the above timeout issues? Thank you very much for anticipating a quick response from you.

I did not got this error on the CTS 2.1 r5. But I am not running the SDK test that Android but my own tests instead. There for I will use the CTS 2.1 for a while.