okhttp中的DiskLruCache分析
前面分析okhttp的时候碰到了这个DiskLruCache,因为当时主要的关注点在如何实现一个httpClient,也就跳过了这部分。我们在来看一下okhttp中的cache是怎么设计的,更多的也是开阔一下自己的视野,看看人家老外是怎么设计程序的。
首先给出一个DiskLruCache的大致流程框架:
可以看到,整个cache分为一个LinkedHashMap和文件系统中的一个目录这两个部分,前者的key是一个string,value是一个entry,每个entry对应着n×2个文件,n的意思是对于每一个key可以存放n个不同的内容,2是因为每个内容都要有一个临时文件,所以它本质上是一个索引。每次当需要的时候,LinkedHashMap会通过Editor的commit操作,将一个entry写入到磁盘文件中。同时,暴露给外部的get和put方法也是通过Editor来直接操作LinkedHashMap的。而在目录中有三个journal.xxx文件,其中journal文件是用来记录cache的操作的,journal.tmp是一个当journal在rebuild时临时起作用的文件,而journal.bkp是一个备份,为了帮助恢复LinkedHashMap的状态。
其实第一次看到这个名字:DiskLruCache,我最好奇的是这个LRU是怎么实现的?最近最少使用算法大家都知道,原理我们也都懂,但是好像还真没有自己去实现过一遍。这里给你们一个福音:
1 /**
2 * @param initialCapacity the initial capacity
3 * @param loadFactor the load factor
4 * @param accessOrder the ordering mode - <tt>true</tt> for
5 * access-order, <tt>false</tt> for insertion-order
6 */
7 public LinkedHashMap(int initialCapacity,
8 float loadFactor,
9 boolean accessOrder) {
10 super(initialCapacity, loadFactor);
11 this.accessOrder = accessOrder;
12 }
关注最后一个参数,代表linked的模式,true代表着按访问顺序排序,就是说通过使用Iterator访问元素时,是按照access这个元素的顺序访问的,那么当这个LinkedHashMap的大小固定时,这不就是一个LRU cache么!事实上DishLruCache就是这么干的。
我们先来理一下大致的流程:
-
当我们要插入或者更新数据时,更新LinkedHashMap中的entry,entry本身并不存放数据,但是维护这个key对应的文件列表(图中的somkeyK.K(.tmp),也就是clean和dirty文件),同时返回一个对应此entry的editor,通过editor可以获取entry的文件列表并可以返回一个io接口,通过这个接口就可以写入数据到对应的tmp文件,最后在commit的时候,将tmp文件rename成正式文件,那它就对readers可见了。
-
当要获取数据时,还是通过LinkedHashMap获得一个snapshot,snapshot同样维护一个io接口(source),它可以读取对应的clean文件
那么我正式进入吧!,先看看DiskLruCache的初始化是怎么做的:
1 public synchronized void initialize() throws IOException {
2 assert Thread.holdsLock(this);
3
4 if (initialized) {
5 return; // Already initialized.
6 }
7
8 // 如果有备份文件,就使用它
9 if (fileSystem.exists(journalFileBackup)) {
10 // 如果已经有journal文件了,就忽略备份文件
11 if (fileSystem.exists(journalFile)) {
12 fileSystem.delete(journalFileBackup);
13 } else {
14 fileSystem.rename(journalFileBackup, journalFile);
15 }
16 }
17
18 // Prefer to pick up where we left off.
19 if (fileSystem.exists(journalFile)) {
20 try {
21 readJournal();
22 processJournal();
23 initialized = true;
24 return;
25 } catch (IOException journalIsCorrupt) {
26 Platform.get().logW("DiskLruCache " + directory + " is corrupt: "
27 + journalIsCorrupt.getMessage() + ", removing");
28 delete();
29 closed = false;
30 }
31 }
32
33 rebuildJournal();
34
35 initialized = true;
36 }
翻译了一下注释,大概就是从备份文件中恢复,接下来要操作journal文件了,readJournal里会先检查一下魔数和版本号,然后通过一个while循环每次读取jouranl文件中的一行:
1 private void readJournalLine(String line) throws IOException {
2 int firstSpace = line.indexOf(' ');
3 if (firstSpace == -1) {
4 throw new IOException("unexpected journal line: " + line);
5 }
6
7 int keyBegin = firstSpace + 1;
8 int secondSpace = line.indexOf(' ', keyBegin);
9 final String key;
10 if (secondSpace == -1) {
11 key = line.substring(keyBegin);
12 if (firstSpace == REMOVE.length() && line.startsWith(REMOVE)) {
13 lruEntries.remove(key);
14 return;
15 }
16 } else {
17 key = line.substring(keyBegin, secondSpace);
18 }
19
20 Entry entry = lruEntries.get(key);
21 if (entry == null) {
22 entry = new Entry(key);
23 lruEntries.put(key, entry);
24 }
25
26 if (secondSpace != -1 && firstSpace == CLEAN.length() && line.startsWith(CLEAN)) {
27 String[] parts = line.substring(secondSpace + 1).split(" ");
28 entry.readable = true;
29 entry.currentEditor = null;
30 entry.setLengths(parts);
31 } else if (secondSpace == -1 && firstSpace == DIRTY.length() && line.startsWith(DIRTY)) {
32 entry.currentEditor = new Editor(entry);
33 } else if (secondSpace == -1 && firstSpace == READ.length() && line.startsWith(READ)) {
34 // This work was already done by calling lruEntries.get().
35 } else {
36 throw new IOException("unexpected journal line: " + line);
37 }
38 }
-
如果标记是REMOVE,就从索引中删除(我们看到的lruEntries就是之前说的LinkedHashMap)
-
获取key对应的entry(没有的话就new一个)
-
如果是CLEAN的话,对这个entry中的文件的长度进行更新
-
如果是DIRTY,说明这个值正在被操作,还没有commit,于是给entry分配一个Editor
-
如果是READ,说明这条数据被读过了,什么也不做
给个例子看看journal文件的内容你就明白了
1 * libcore.io.DiskLruCache
2 * 1
3 * 100
4 * 2
5 *
6 * CLEAN 3400330d1dfc7f3f7f4b8d4d803dfcf6 832 21054
7 * DIRTY 335c4c6028171cfddfbaae1a9c313c52
8 * CLEAN 335c4c6028171cfddfbaae1a9c313c52 3934 2342
9 * REMOVE 335c4c6028171cfddfbaae1a9c313c52
10 * DIRTY 1ab96a171faeeee38496d8b330771a7a
11 * CLEAN 1ab96a171faeeee38496d8b330771a7a 1600 234
12 * READ 335c4c6028171cfddfbaae1a9c313c52
13 * READ 3400330d1dfc7f3f7f4b8d4d803dfcf6
在readJournal的最后,如果读取数据没有问题,会创建一个journalWriter来专门对Journal文件进行写操作 readJournal完了之后,所有clean和dirty数据的entries已经放到lruEntries里去了,接下来的processJournal就是将其中的dirty的entries所对应的文件全部删除,因为dirty状态的数据是不一致,因此我们不需要它们,但是仍在lruEntries中保留它们,因为毕竟它们仍是最近用到的
如果上面的这些初始化出现了异常,那会进入rebuildJournal
1 /**
2 * Creates a new journal that omits redundant information. This replaces the current journal if it
3 * exists.
4 */
5 private synchronized void rebuildJournal() throws IOException {
6 if (journalWriter != null) {
7 journalWriter.close();
8 }
9
10 BufferedSink writer = Okio.buffer(fileSystem.sink(journalFileTmp));
11 try {
12 writer.writeUtf8(MAGIC).writeByte('\n');
13 writer.writeUtf8(VERSION_1).writeByte('\n');
14 writer.writeDecimalLong(appVersion).writeByte('\n');
15 writer.writeDecimalLong(valueCount).writeByte('\n');
16 writer.writeByte('\n');
17
18 for (Entry entry : lruEntries.values()) {
19 if (entry.currentEditor != null) {
20 writer.writeUtf8(DIRTY).writeByte(' ');
21 writer.writeUtf8(entry.key);
22 writer.writeByte('\n');
23 } else {
24 writer.writeUtf8(CLEAN).writeByte(' ');
25 writer.writeUtf8(entry.key);
26 entry.writeLengths(writer);
27 writer.writeByte('\n');
28 }
29 }
30 } finally {
31 writer.close();
32 }
33
34 if (fileSystem.exists(journalFile)) {
35 fileSystem.rename(journalFile, journalFileBackup);
36 }
37 fileSystem.rename(journalFileTmp, journalFile);
38 fileSystem.delete(journalFileBackup);
39
40 journalWriter = newJournalWriter();
41 hasJournalErrors = false;
42 mostRecentRebuildFailed = false;
43 }
-
首先会写入文件头部,包括魔数版本号什么的(这里操作的都是tmp文件)
-
根据lruEntries的内容将写回到tmp文件中
-
最后重命名为journal文件
我们可以看到,rebuild操作是以lruEntries为准,把DIRTY和CLEAN的操作写回到journal中。但发现没有,其实并没有改动真正的value,只不过重写了一些事务的记录。事实上,lruEntries和journal文件共同确定了cache数据的有效性,前者更多的是一个索引,而后者对操作进行了归类,其中任何一个都能保存一个稳定的状态。
看完了初始化,我们看看如何获取一个editor的:
1 private synchronized Editor edit(String key, long expectedSequenceNumber) throws IOException {
2 initialize();
3
4 checkNotClosed();
5 validateKey(key);
6 Entry entry = lruEntries.get(key);
7 if (expectedSequenceNumber != ANY_SEQUENCE_NUMBER && (entry == null
8 || entry.sequenceNumber != expectedSequenceNumber)) {
9 return null; // Snapshot is stale.
10 }
11 if (entry != null && entry.currentEditor != null) {
12 return null; // Another edit is in progress.
13 }
14 if (mostRecentTrimFailed || mostRecentRebuildFailed) {
15 executor.execute(cleanupRunnable);
16 return null;
17 }
18
19 // Flush the journal before creating files to prevent file leaks.
20 journalWriter.writeUtf8(DIRTY).writeByte(' ').writeUtf8(key).writeByte('\n');
21 journalWriter.flush();
22
23 if (hasJournalErrors) {
24 return null; // Don't edit; the journal can't be written.
25 }
26
27 if (entry == null) {
28 entry = new Entry(key);
29 lruEntries.put(key, entry);
30 }
31 Editor editor = new Editor(entry);
32 entry.currentEditor = editor;
33 return editor;
34 }
-
11行中如果已经有别的editor在操作这个entry了,那就返回null(在调用这个方法时如果返回值是null,那么可以考虑重做这个task,我觉得唯一有点不方便的是这个editor时不时的会进行new操作,这样就没办法通过锁的方式去等待已有的editor做完它手上的事,只能通过轮询的方式检查返回值是否为null, 如果任务特别密集我觉得是不是可以对这个entry上锁)
-
15行如果必要的话,进行cleanup操作,我们等下再看这两个boolean是什么意思
-
20行会把当前的key在journal文件里标记成dirty状态,表示这条记录正在被修改
-
28行有可能会创建一个Entry,我们看看里面有什么
1 private final class Entry {
2 private final String key;
3
4 /** Lengths of this entry's files. */
5 private final long[] lengths;
6 private final File[] cleanFiles;
7 private final File[] dirtyFiles;
8
9 /** True if this entry has ever been published. */
10 private boolean readable;
11
12 /** The ongoing edit or null if this entry is not being edited. */
13 private Editor currentEditor;
14
15 /** The sequence number of the most recently committed edit to this entry. */
16 private long sequenceNumber;
17
18 private Entry(String key) {
19 this.key = key;
20
21 lengths = new long[valueCount];
22 cleanFiles = new File[valueCount];
23 dirtyFiles = new File[valueCount];
24
25 // The names are repetitive so re-use the same builder to avoid allocations.
26 StringBuilder fileBuilder = new StringBuilder(key).append('.');
27 int truncateTo = fileBuilder.length();
28 for (int i = 0; i < valueCount; i++) {
29 fileBuilder.append(i);
30 cleanFiles[i] = new File(directory, fileBuilder.toString());
31 fileBuilder.append(".tmp");
32 dirtyFiles[i] = new File(directory, fileBuilder.toString());
33 fileBuilder.setLength(truncateTo);
34 }
35 }
Entry中维护了一堆dirtyFiles和cleanFiles,以及每个文件的length,在构造函数中,valueCount就是每个entry对应的文件数量,这是初始化的时候已经约定好的了,然后根据这个valueCount创建对应数量的文件,以后的数据就会写入到这些文件中
再来看看Editor是怎么创建的:
1 public final class Editor {
2 private final Entry entry;
3 private final boolean[] written;
4 private boolean done;
5
6 private Editor(Entry entry) {
7 this.entry = entry;
8 this.written = (entry.readable) ? null : new boolean[valueCount];
9 }
讲道理,Editor里没什么东西,更像是一个entry的包裹类,wriiten数组默认值是false
好了,这样editor就拿到了,在lruEntries中的索引也已经创建好了,journal中标记为DIRTY的一条新的操作记录也有了,好像还有点什么事没干,oh,我们的数据还没写入disk中呢 - - , 我们用okhttp中cache的例子来看看是怎么使用这个editor去写入数据到文件中的:
1 public void writeTo(DiskLruCache.Editor editor) throws IOException {
2 BufferedSink sink = Okio.buffer(editor.newSink(ENTRY_METADATA));
3
4 sink.writeUtf8(url);
5 sink.writeByte('\n');
6 // sink write a lot of data
7
8 sink.close();
9 }
很简单是不是,就是通过editor去获取一个sink,然后写入数据,最后关闭就行了(ENTRY_METADATA == 1), 剩下的问题就是newSink做了什么:
1 /**
2 * Returns a new unbuffered output stream to write the value at {@code index}. If the underlying
3 * output stream encounters errors when writing to the filesystem, this edit will be aborted
4 * when {@link #commit} is called. The returned output stream does not throw IOExceptions.
5 */
6 public Sink newSink(int index) throws IOException {
7 synchronized (DiskLruCache.this) {
8 if (done) {
9 throw new IllegalStateException();
10 }
11 if (entry.currentEditor != this) {
12 return NULL_SINK;
13 }
14 if (!entry.readable) {
15 written[index] = true;
16 }
17 File dirtyFile = entry.dirtyFiles[index];
18 Sink sink;
19 try {
20 sink = fileSystem.sink(dirtyFile);
21 } catch (FileNotFoundException e) {
22 return NULL_SINK;
23 }
24 return new FaultHidingSink(sink) {
25 @Override protected void onException(IOException e) {
26 synchronized (DiskLruCache.this) {
27 detach();}}};
28 }
29 }
如果done的话就不让我们继续操作啦(done会在commit里设置),并且关联entry的currentEditor必须是自己才可以进行操作,然后会把对应index的written设为true,表示会进行写操作,最后返回一个对第index号的dirtyFile操作的一个sink,以后就可以通过使用它来把数据写到dirtyFile里了
好了,当把数据写完之后,一定不要忘了使用commit操作来对这些数据进行提交,我们要把dirtyFiles里的内容移到cleanFiles里才能够让别的editor访问到:
1 public void commit() throws IOException {
2 synchronized (DiskLruCache.this) {
3 if (done) {
4 throw new IllegalStateException();
5 }
6 if (entry.currentEditor == this) {
7 completeEdit(this, true);
8 }
9 done = true;
10 }
11 }
这里只是对done做了一些操作,关键还是在completeEdit里:
1 private synchronized void completeEdit(Editor editor, boolean success) throws IOException {
2 Entry entry = editor.entry;
3 if (entry.currentEditor != editor) {
4 throw new IllegalStateException();
5 }
6
7 // If this edit is creating the entry for the first time, every index must have a value.
8 if (success && !entry.readable) {
9 for (int i = 0; i < valueCount; i++) {
10 if (!editor.written[i]) {
11 editor.abort();
12 throw new IllegalStateException("Newly created entry didn't create value for index " + i);
13 }
14 if (!fileSystem.exists(entry.dirtyFiles[i])) {
15 editor.abort();
16 return;
17 }
18 }
19 }
20
21 for (int i = 0; i < valueCount; i++) {
22 File dirty = entry.dirtyFiles[i];
23 if (success) {
24 if (fileSystem.exists(dirty)) {
25 File clean = entry.cleanFiles[i];
26 fileSystem.rename(dirty, clean);
27 long oldLength = entry.lengths[i];
28 long newLength = fileSystem.size(clean);
29 entry.lengths[i] = newLength;
30 size = size - oldLength + newLength;
31 }
32 } else {
33 fileSystem.delete(dirty);
34 }
35 }
36
37 redundantOpCount++;
38 entry.currentEditor = null;
39 if (entry.readable | success) {
40 entry.readable = true;
41 journalWriter.writeUtf8(CLEAN).writeByte(' ');
42 journalWriter.writeUtf8(entry.key);
43 entry.writeLengths(journalWriter);
44 journalWriter.writeByte('\n');
45 if (success) {
46 entry.sequenceNumber = nextSequenceNumber++;
47 }
48 } else {
49 lruEntries.remove(entry.key);
50 journalWriter.writeUtf8(REMOVE).writeByte(' ');
51 journalWriter.writeUtf8(entry.key);
52 journalWriter.writeByte('\n');
53 }
54 journalWriter.flush();
55
56 if (size > maxSize || journalRebuildRequired()) {
57 executor.execute(cleanupRunnable);
58 }
59 }
-
首先要明确第二个参数success代表是否要写回entry里的数据
-
8-19行,如果是第一次写回,必须保证每个index里要有数据,这是为了保证数据完整性
-
21-35行,如果是要写回,就把dirtyFile重命名为cleanFile,这就完成了把数据转移到cleanFile里的过程;如果不是,那就删除dirtyFile,表示放弃这次修改
-
39-54行,将对应的操作写入到journal里,如果commit次数很多的话,会尝试cleanup操作
这样下来数据都写进cleanFile啦,currentEditor也重新设成了null,这个写入的流程就结束了
再来看看是怎么读的:
1 /**
2 * Returns a snapshot of the entry named {@code key}, or null if it doesn't exist is not currently
3 * readable. If a value is returned, it is moved to the head of the LRU queue.
4 */
5 public synchronized Snapshot get(String key) throws IOException {
6 initialize();
7
8 checkNotClosed();
9 validateKey(key);
10 Entry entry = lruEntries.get(key);
11 if (entry == null || !entry.readable) return null;
12
13 Snapshot snapshot = entry.snapshot();
14 if (snapshot == null) return null;
15
16 redundantOpCount++;
17 journalWriter.writeUtf8(READ).writeByte(' ').writeUtf8(key).writeByte('\n');
18 if (journalRebuildRequired()) {
19 executor.execute(cleanupRunnable);
20 }
21
22 return snapshot;
23 }
先会拿到snapshot,然后会用journalWriter向journal文件中写入一条READ记录
1 /**
2 * Returns a snapshot of this entry. This opens all streams eagerly to guarantee that we see a
3 * single published snapshot. If we opened streams lazily then the streams could come from
4 * different edits.
5 */
6 Snapshot snapshot() {
7 if (!Thread.holdsLock(DiskLruCache.this)) throw new AssertionError();
8
9 Source[] sources = new Source[valueCount];
10 long[] lengths = this.lengths.clone(); // Defensive copy since these can be zeroed out.
11 try {
12 for (int i = 0; i < valueCount; i++) {
13 sources[i] = fileSystem.source(cleanFiles[i]);
14 }
15 return new Snapshot(key, sequenceNumber, sources, lengths);
16 } catch (FileNotFoundException e) {
17 // A file must have been deleted manually!
18 // blahblahblah
19 }
20 }
21 }
代码看起来乱但其实很见到,就是为每个cleanFiles(验证了之前只能从cleanFiles里读取内容)创建source,并new一个Shanpshot返回,而snapshot也只是一个包装类:
1 /** A snapshot of the values for an entry. */
2 public final class Snapshot implements Closeable {
3 private final String key;
4 private final long sequenceNumber;
5 private final Source[] sources;
6 private final long[] lengths;
7
8 private Snapshot(String key, long sequenceNumber, Source[] sources, long[] lengths) {
9 this.key = key;
10 this.sequenceNumber = sequenceNumber;
11 this.sources = sources;
12 this.lengths = lengths;
13 }
是不是跟Editor一样,属于装饰模式,通过snapshot的getSource(index)就可以读取内容了
get也讲完了,发现一个特点没有,通过dirty和clean两种文件,可以实现修改和读取同时操作,这样也就提高了系统的吞吐量。我不知道commit后会不会使source失效,没有研究过Okio不敢下定论,如果要我们自己实现的话,一定要注意处理这种情况,可以给每个cleanFile设置一个标志位,并在commit的时候对它上锁,完了修改标志,读取的时候也去用锁的方式去获取这个标志,看看这个数据是否失效。
最后还有一个小点,来看看cleanup做了什么:
1 private final Runnable cleanupRunnable = new Runnable() {
2 public void run() {
3 synchronized (DiskLruCache.this) {
4 if (!initialized | closed) {
5 return; // Nothing to do
6 }
7
8 try {
9 trimToSize();
10 } catch (IOException ignored) {
11 mostRecentTrimFailed = true;
12 }
13
14 try {
15 if (journalRebuildRequired()) {
16 rebuildJournal();
17 redundantOpCount = 0;
18 }
19 } catch (IOException e) {
20 mostRecentRebuildFailed = true;
21 journalWriter = Okio.buffer(NULL_SINK);
22 }
23 }
24 }
25 };
cleanup主要是用来调整整个cache的大小,以防止它过大,同时也能用来rebuildJournal,如果trim或者rebuild不成功,那之前edit里面也是没有办法获取Editor来进行数据修改的操作的
让我们focus on trimToSize:
1 private void trimToSize() throws IOException {
2 while (size > maxSize) {
3 Entry toEvict = lruEntries.values().iterator().next();
4 removeEntry(toEvict);
5 }
6 mostRecentTrimFailed = false;
7 }
8
9
10 private boolean removeEntry(Entry entry) throws IOException {
11 if (entry.currentEditor != null) {
12 entry.currentEditor.detach(); // Prevent the edit from completing normally.
13 }
14
15 for (int i = 0; i < valueCount; i++) {
16 fileSystem.delete(entry.cleanFiles[i]);
17 size -= entry.lengths[i];
18 entry.lengths[i] = 0;
19 }
20
21 redundantOpCount++;
22 journalWriter.writeUtf8(REMOVE).writeByte(' ').writeUtf8(entry.key).writeByte('\n');
23 lruEntries.remove(entry.key);
24
25 if (journalRebuildRequired()) {
26 executor.execute(cleanupRunnable);
27 }
28
29 return true;
30 }
trimToSize仅仅是遍历lruEntries(注意,这个遍历可是通过accessOrder来的,也就是隐含了LRU这个算法),来一个一个移除entry直到size小于maxSize 而removeEntry操作就是将editor里的dirtyFiles以及cleanFiles进行删除就是了,同时要向journal文件里写入REMOVE操作,以及删除lruEntries里面的记录
总结
-
LinkedHashMap可以实现LRU算法,并且在这个case里,它被用作对DiskCache的内存索引
-
告诉你们一个秘密,Universal-Imager-Loader里面的DiskLruCache的实现跟这里的一模一样,除了io使用inputstream/outputstream
-
使用LinkedHashMap和journal文件同时记录做过的操作,其实也就是有索引了,这样就相当于有两个备份,可以互相恢复状态
-
通过dirtyFiles和cleanFiles,可以实现更新和读取同时操作,在commit的时候将cleanFiles的内容进行更新就好了
-
对于io这块真的没有研究过,有空一定要好好研究一下