愿历尽千帆 归来仍少年

一个SharedPreferences写操作导致的ANR问题分析

字数统计: 1,372阅读时长: 7 min
2019/06/23

问题现象
手机连接WIFI后出现设置弹出无响应的弹框
问题概率
仅出现一次
问题平台
Andorid P 高通平台

日志分析
拿到bugreport,使用chkbugreport解析下(开源工具,将bugreport庞大的日志进行分门别类,并以网页形式展现出来),打开浏览器看下event日志中的时间点

1
06-19 17:53:42.307  1000  1820  1959 I am_anr  : [0,3861,com.android.settings,952745541,Input dispatching timed out (com.android.settings/com.android.settings.MainSettings

解压后的bugreport中有一份ANR文件,查看其中的trace

1
2
3
4
5
6
7
8
9
10
"main" prio=5 tid=1 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x7923e118 self=0x763be14c00
| sysTid=3861 nice=-10 cgrp=default sched=0/0 handle=0x76c1e12548
| state=S schedstat=( 4685634510 1791639950 6133 ) utm=380 stm=88 core=3 HZ=100
| stack=0x7fe6270000-0x7fe6272000 stackSize=8MB
| held mutexes=
at android.app.QueuedWork.processPendingWork(QueuedWork.java:273)
- waiting to lock <0x00210110> (a java.lang.Object) held by thread 19
at android.app.QueuedWork.waitToFinish(QueuedWork.java:184)
at android.app.ActivityThread.handleStopActivity(ActivityThread.java:4305)

可以看出主线程在等锁0x00210110,该锁被thread 19持有,这里看堆栈可以看出卡在了waitToFinish上,界面回调了onStop的生命函数,
是不是可以理解为当时界面走了onStop准备退出,发现有任务没有完成,需要等待,猜测如此,继续看thread 19

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
"queued-work-looper" prio=5 tid=19 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x130c0958 self=0x762df62800
| sysTid=3892 nice=-2 cgrp=default sched=0/0 handle=0x76223ff4f0
| state=D schedstat=( 8250990 10651510 31 ) utm=0 stm=0 core=1 HZ=100
| stack=0x76222fc000-0x76222fe000 stackSize=1041KB
| held mutexes=
kernel: (couldn't read /proc/self/task/3892/stack)
native: #00 pc 000000000007b070 /system/lib64/libc.so (fsync+8)
native: #01 pc 0000000000003670 /system/lib64/libopenjdkjvm.so (JVM_Sync+20)
native: #02 pc 000000000001cd54 /system/lib64/libopenjdk.so (FileDescriptor_sync+40)
at java.io.FileDescriptor.sync(Native method)
at android.os.FileUtils.sync(FileUtils.java:197)
at android.app.SharedPreferencesImpl.writeToFile(SharedPreferencesImpl.java:777)
at android.app.SharedPreferencesImpl.access$900(SharedPreferencesImpl.java:54)
at android.app.SharedPreferencesImpl$2.run(SharedPreferencesImpl.java:642)
- locked <0x08e457c5> (a java.lang.Object)
at android.app.QueuedWork.processPendingWork(QueuedWork.java:286)
- locked <0x00210110> (a java.lang.Object)

这里的堆栈可以看出,当时在做SharedPreference的IO操作,锁的持有对象当时在做processPendingWork,可是为什么Activity的onStop要等SharedPreference写完呢?

再来看下主线程中的堆栈

1
2
at android.app.QueuedWork.waitToFinish(QueuedWork.java:184)
at android.app.ActivityThread.handleStopActivity(ActivityThread.java:4305)

看下ActivityThread.java中的handleStopActivity

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
@Override
public void handleStopActivity(IBinder token, boolean show, int configChanges,
PendingTransactionActions pendingActions, boolean finalStateRequest, String reason) {
final ActivityClientRecord r = mActivities.get(token);
r.activity.mConfigChangeFlags |= configChanges;

final StopInfo stopInfo = new StopInfo();
performStopActivityInner(r, stopInfo, show, true /* saveState */, finalStateRequest,
reason);

if (localLOGV) Slog.v(
TAG, "Finishing stop of " + r + ": show=" + show
+ " win=" + r.window);

updateVisibility(r, show);

// Make sure any pending writes are now committed.
if (!r.isPreHoneycomb()) {
QueuedWork.waitToFinish();
}
//...
}

重点关注其中的waitToFinish()函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/**
* Trigger queued work to be processed immediately. The queued work is processed on a separate
* thread asynchronous. While doing that run and process all finishers on this thread. The
* finishers can be implemented in a way to check weather the queued work is finished.
*
* Is called from the Activity base class's onPause(), after BroadcastReceiver's onReceive,
* after Service command handling, etc. (so async work is never lost)
*/
public static void waitToFinish() {
// xxx ADD
boolean interrupt = false;

long startTime = System.currentTimeMillis();
boolean hadMessages = false;

Handler handler = getHandler();

synchronized (sLock) {
if (handler.hasMessages(QueuedWorkHandler.MSG_RUN)) {
// Delayed work will be processed at processPendingWork() below
handler.removeMessages(QueuedWorkHandler.MSG_RUN);

if (DEBUG) {
hadMessages = true;
Log.d(LOG_TAG, "waiting");
}
}

// We should not delay any work as this might delay the finishers
sCanDelay = false;
}

StrictMode.ThreadPolicy oldPolicy = StrictMode.allowThreadDiskWrites();
try {
// xxx MOD
// processPendingWork();
interrupt = processPendingWork(false);
} finally {
StrictMode.setThreadPolicy(oldPolicy);
}

依据trace,接着看processPendingWork

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
private static boolean processPendingWork(boolean fromMsg) {
long startTime = 0;
int workNdx = 0;
// if (DEBUG) {
startTime = System.currentTimeMillis();
// }
// END

synchronized (sProcessingWork) { //#273 主线程卡在这里等锁
LinkedList<Runnable> work;

synchronized (sLock) {
work = (LinkedList<Runnable>) sWork.clone();
sWork.clear();
// Remove all msg-s as all work will be processed now
getHandler().removeMessages(QueuedWorkHandler.MSG_RUN);
}
if (work.size() > 0) {
for (Runnable w : work) {
w.run(); //#286 tid19 一直在运行该处
.....
}
}

到这里基本明白了是怎么回事了,主线程在等的SharedPreference的写操作writeToFile完成,如果这个队列中的任务没有完成,SharedPreference就会一直持有这个锁,主线程拿不到这个锁,
一直等待,超过五秒后触发ANR,这点在日志中也得到了印证,ANR时间点前后有不少的SharedPreferencesImpl相关的日志输出,往上看大概能看出当时是在做同步数据之类

其实SharedPreference时不论是apply还是commit都会有出现写入磁盘慢的情况发生,特别是数据量大,或者多线程同时访问同一份xml,又或者连续多次commit,都会有可能造成waitToFinish函数耗时变长。虽然apply是异步执行的,主线程依旧需要等它执行结束。这点其实不好规避,所以尽可能的少使用SharedPreference,特别是可能存在多线程操作同一份数据这种情况,毕竟有了加锁,会可能出现排队的情况。

优化方式
尽可能传输的数据量轻量
尽可能减少commit次数
考虑新开线程,但是会有内存开销的存在,如果原本是apply,则在新开的线程改为commit

其实看公司内部代码,发现公司的framework同事有修改waitToFinish函数中调用的processPendingWork函数内容,增加了记录时间,
超过10s直接return false,这是个不错的想法,但是10s对于广播中的情况或许可以cover,对应Activity中超时5秒就ANR的这种情况不能cover

鉴于该问题只出现一次,不做修改

CATALOG