前面分析okhttp的时候碰到了这个DiskLruCache,因为当时主要的关注点在如何实现一个httpClient,也就跳过了这部分。我们在来看一下okhttp中的cache是怎么设计的,更多的也是开阔一下自己的视野,看看人家老外是怎么设计程序的。

首先给出一个DiskLruCache的大致流程框架:

可以看到,整个cache分为一个LinkedHashMap和文件系统中的一个目录这两个部分,前者的key是一个string,value是一个entry,每个entry对应着n×2个文件,n的意思是对于每一个key可以存放n个不同的内容,2是因为每个内容都要有一个临时文件,所以它本质上是一个索引。每次当需要的时候,LinkedHashMap会通过Editor的commit操作,将一个entry写入到磁盘文件中。同时,暴露给外部的get和put方法也是通过Editor来直接操作LinkedHashMap的。而在目录中有三个journal.xxx文件,其中journal文件是用来记录cache的操作的,journal.tmp是一个当journal在rebuild时临时起作用的文件,而journal.bkp是一个备份,为了帮助恢复LinkedHashMap的状态。

其实第一次看到这个名字:DiskLruCache,我最好奇的是这个LRU是怎么实现的?最近最少使用算法大家都知道,原理我们也都懂,但是好像还真没有自己去实现过一遍。这里给你们一个福音:

 1 /**
 2  * @param  initialCapacity the initial capacity
 3  * @param  loadFactor      the load factor
 4  * @param  accessOrder     the ordering mode - <tt>true</tt> for
 5  *         access-order, <tt>false</tt> for insertion-order
 6  */
 7 public LinkedHashMap(int initialCapacity,
 8                      float loadFactor,
 9                      boolean accessOrder) {
10     super(initialCapacity, loadFactor);
11     this.accessOrder = accessOrder;
12 }

关注最后一个参数,代表linked的模式,true代表着按访问顺序排序,就是说通过使用Iterator访问元素时,是按照access这个元素的顺序访问的,那么当这个LinkedHashMap的大小固定时,这不就是一个LRU cache么!事实上DishLruCache就是这么干的。

我们先来理一下大致的流程:

  • 当我们要插入或者更新数据时,更新LinkedHashMap中的entry,entry本身并不存放数据,但是维护这个key对应的文件列表(图中的somkeyK.K(.tmp),也就是clean和dirty文件),同时返回一个对应此entry的editor,通过editor可以获取entry的文件列表并可以返回一个io接口,通过这个接口就可以写入数据到对应的tmp文件,最后在commit的时候,将tmp文件rename成正式文件,那它就对readers可见了。

  • 当要获取数据时,还是通过LinkedHashMap获得一个snapshot,snapshot同样维护一个io接口(source),它可以读取对应的clean文件

那么我正式进入吧!,先看看DiskLruCache的初始化是怎么做的:

 1 public synchronized void initialize() throws IOException {
 2     assert Thread.holdsLock(this);
 3 
 4     if (initialized) {
 5       return; // Already initialized.
 6     }
 7 
 8     // 如果有备份文件,就使用它
 9     if (fileSystem.exists(journalFileBackup)) {
10       // 如果已经有journal文件了,就忽略备份文件
11       if (fileSystem.exists(journalFile)) {
12         fileSystem.delete(journalFileBackup);
13       } else {
14         fileSystem.rename(journalFileBackup, journalFile);
15       }
16     }
17 
18     // Prefer to pick up where we left off.
19     if (fileSystem.exists(journalFile)) {
20       try {
21         readJournal();
22         processJournal();
23         initialized = true;
24         return;
25       } catch (IOException journalIsCorrupt) {
26         Platform.get().logW("DiskLruCache " + directory + " is corrupt: "
27             + journalIsCorrupt.getMessage() + ", removing");
28         delete();
29         closed = false;
30       }
31     }
32 
33     rebuildJournal();
34 
35     initialized = true;
36   }

翻译了一下注释,大概就是从备份文件中恢复,接下来要操作journal文件了,readJournal里会先检查一下魔数和版本号,然后通过一个while循环每次读取jouranl文件中的一行:

 1 private void readJournalLine(String line) throws IOException {
 2     int firstSpace = line.indexOf(' ');
 3     if (firstSpace == -1) {
 4       throw new IOException("unexpected journal line: " + line);
 5     }
 6 
 7     int keyBegin = firstSpace + 1;
 8     int secondSpace = line.indexOf(' ', keyBegin);
 9     final String key;
10     if (secondSpace == -1) {
11       key = line.substring(keyBegin);
12       if (firstSpace == REMOVE.length() && line.startsWith(REMOVE)) {
13         lruEntries.remove(key);
14         return;
15       }
16     } else {
17       key = line.substring(keyBegin, secondSpace);
18     }
19 
20     Entry entry = lruEntries.get(key);
21     if (entry == null) {
22       entry = new Entry(key);
23       lruEntries.put(key, entry);
24     }
25 
26     if (secondSpace != -1 && firstSpace == CLEAN.length() && line.startsWith(CLEAN)) {
27       String[] parts = line.substring(secondSpace + 1).split(" ");
28       entry.readable = true;
29       entry.currentEditor = null;
30       entry.setLengths(parts);
31     } else if (secondSpace == -1 && firstSpace == DIRTY.length() && line.startsWith(DIRTY)) {
32       entry.currentEditor = new Editor(entry);
33     } else if (secondSpace == -1 && firstSpace == READ.length() && line.startsWith(READ)) {
34       // This work was already done by calling lruEntries.get().
35     } else {
36       throw new IOException("unexpected journal line: " + line);
37     }
38   }
  • 如果标记是REMOVE,就从索引中删除(我们看到的lruEntries就是之前说的LinkedHashMap)

  • 获取key对应的entry(没有的话就new一个)

  • 如果是CLEAN的话,对这个entry中的文件的长度进行更新

  • 如果是DIRTY,说明这个值正在被操作,还没有commit,于是给entry分配一个Editor

  • 如果是READ,说明这条数据被读过了,什么也不做

给个例子看看journal文件的内容你就明白了

 1 *     libcore.io.DiskLruCache
 2 *     1
 3 *     100
 4 *     2
 5 *
 6 *     CLEAN 3400330d1dfc7f3f7f4b8d4d803dfcf6 832 21054
 7 *     DIRTY 335c4c6028171cfddfbaae1a9c313c52
 8 *     CLEAN 335c4c6028171cfddfbaae1a9c313c52 3934 2342
 9 *     REMOVE 335c4c6028171cfddfbaae1a9c313c52
10 *     DIRTY 1ab96a171faeeee38496d8b330771a7a
11 *     CLEAN 1ab96a171faeeee38496d8b330771a7a 1600 234
12 *     READ 335c4c6028171cfddfbaae1a9c313c52
13 *     READ 3400330d1dfc7f3f7f4b8d4d803dfcf6

在readJournal的最后,如果读取数据没有问题,会创建一个journalWriter来专门对Journal文件进行写操作 readJournal完了之后,所有clean和dirty数据的entries已经放到lruEntries里去了,接下来的processJournal就是将其中的dirty的entries所对应的文件全部删除,因为dirty状态的数据是不一致,因此我们不需要它们,但是仍在lruEntries中保留它们,因为毕竟它们仍是最近用到的

如果上面的这些初始化出现了异常,那会进入rebuildJournal

 1 /**
 2    * Creates a new journal that omits redundant information. This replaces the current journal if it
 3    * exists.
 4    */
 5   private synchronized void rebuildJournal() throws IOException {
 6     if (journalWriter != null) {
 7       journalWriter.close();
 8     }
 9 
10     BufferedSink writer = Okio.buffer(fileSystem.sink(journalFileTmp));
11     try {
12       writer.writeUtf8(MAGIC).writeByte('\n');
13       writer.writeUtf8(VERSION_1).writeByte('\n');
14       writer.writeDecimalLong(appVersion).writeByte('\n');
15       writer.writeDecimalLong(valueCount).writeByte('\n');
16       writer.writeByte('\n');
17 
18       for (Entry entry : lruEntries.values()) {
19         if (entry.currentEditor != null) {
20           writer.writeUtf8(DIRTY).writeByte(' ');
21           writer.writeUtf8(entry.key);
22           writer.writeByte('\n');
23         } else {
24           writer.writeUtf8(CLEAN).writeByte(' ');
25           writer.writeUtf8(entry.key);
26           entry.writeLengths(writer);
27           writer.writeByte('\n');
28         }
29       }
30     } finally {
31       writer.close();
32     }
33 
34     if (fileSystem.exists(journalFile)) {
35       fileSystem.rename(journalFile, journalFileBackup);
36     }
37     fileSystem.rename(journalFileTmp, journalFile);
38     fileSystem.delete(journalFileBackup);
39 
40     journalWriter = newJournalWriter();
41     hasJournalErrors = false;
42     mostRecentRebuildFailed = false;
43   }
  • 首先会写入文件头部,包括魔数版本号什么的(这里操作的都是tmp文件)

  • 根据lruEntries的内容将写回到tmp文件中

  • 最后重命名为journal文件

我们可以看到,rebuild操作是以lruEntries为准,把DIRTY和CLEAN的操作写回到journal中。但发现没有,其实并没有改动真正的value,只不过重写了一些事务的记录。事实上,lruEntries和journal文件共同确定了cache数据的有效性,前者更多的是一个索引,而后者对操作进行了归类,其中任何一个都能保存一个稳定的状态。

看完了初始化,我们看看如何获取一个editor的:

 1 private synchronized Editor edit(String key, long expectedSequenceNumber) throws IOException {
 2     initialize();
 3 
 4     checkNotClosed();
 5     validateKey(key);
 6     Entry entry = lruEntries.get(key);
 7     if (expectedSequenceNumber != ANY_SEQUENCE_NUMBER && (entry == null
 8         || entry.sequenceNumber != expectedSequenceNumber)) {
 9       return null; // Snapshot is stale.
10     }
11     if (entry != null && entry.currentEditor != null) {
12       return null; // Another edit is in progress.
13     }
14     if (mostRecentTrimFailed || mostRecentRebuildFailed) {
15       executor.execute(cleanupRunnable);
16       return null;
17     }
18 
19     // Flush the journal before creating files to prevent file leaks.
20     journalWriter.writeUtf8(DIRTY).writeByte(' ').writeUtf8(key).writeByte('\n');
21     journalWriter.flush();
22 
23     if (hasJournalErrors) {
24       return null; // Don't edit; the journal can't be written.
25     }
26 
27     if (entry == null) {
28       entry = new Entry(key);
29       lruEntries.put(key, entry);
30     }
31     Editor editor = new Editor(entry);
32     entry.currentEditor = editor;
33     return editor;
34   }
  • 11行中如果已经有别的editor在操作这个entry了,那就返回null(在调用这个方法时如果返回值是null,那么可以考虑重做这个task,我觉得唯一有点不方便的是这个editor时不时的会进行new操作,这样就没办法通过锁的方式去等待已有的editor做完它手上的事,只能通过轮询的方式检查返回值是否为null, 如果任务特别密集我觉得是不是可以对这个entry上锁)

  • 15行如果必要的话,进行cleanup操作,我们等下再看这两个boolean是什么意思

  • 20行会把当前的key在journal文件里标记成dirty状态,表示这条记录正在被修改

  • 28行有可能会创建一个Entry,我们看看里面有什么

 1 private final class Entry {
 2     private final String key;
 3 
 4     /** Lengths of this entry's files. */
 5     private final long[] lengths;
 6     private final File[] cleanFiles;
 7     private final File[] dirtyFiles;
 8 
 9     /** True if this entry has ever been published. */
10     private boolean readable;
11 
12     /** The ongoing edit or null if this entry is not being edited. */
13     private Editor currentEditor;
14 
15     /** The sequence number of the most recently committed edit to this entry. */
16     private long sequenceNumber;
17 
18     private Entry(String key) {
19       this.key = key;
20 
21       lengths = new long[valueCount];
22       cleanFiles = new File[valueCount];
23       dirtyFiles = new File[valueCount];
24 
25       // The names are repetitive so re-use the same builder to avoid allocations.
26       StringBuilder fileBuilder = new StringBuilder(key).append('.');
27       int truncateTo = fileBuilder.length();
28       for (int i = 0; i < valueCount; i++) {
29         fileBuilder.append(i);
30         cleanFiles[i] = new File(directory, fileBuilder.toString());
31         fileBuilder.append(".tmp");
32         dirtyFiles[i] = new File(directory, fileBuilder.toString());
33         fileBuilder.setLength(truncateTo);
34       }
35     }

Entry中维护了一堆dirtyFiles和cleanFiles,以及每个文件的length,在构造函数中,valueCount就是每个entry对应的文件数量,这是初始化的时候已经约定好的了,然后根据这个valueCount创建对应数量的文件,以后的数据就会写入到这些文件中

再来看看Editor是怎么创建的:

1 public final class Editor {
2     private final Entry entry;
3     private final boolean[] written;
4     private boolean done;
5 
6     private Editor(Entry entry) {
7       this.entry = entry;
8       this.written = (entry.readable) ? null : new boolean[valueCount];
9     }

讲道理,Editor里没什么东西,更像是一个entry的包裹类,wriiten数组默认值是false

好了,这样editor就拿到了,在lruEntries中的索引也已经创建好了,journal中标记为DIRTY的一条新的操作记录也有了,好像还有点什么事没干,oh,我们的数据还没写入disk中呢 - - , 我们用okhttp中cache的例子来看看是怎么使用这个editor去写入数据到文件中的:

1 public void writeTo(DiskLruCache.Editor editor) throws IOException {
2       BufferedSink sink = Okio.buffer(editor.newSink(ENTRY_METADATA));
3 
4       sink.writeUtf8(url);
5       sink.writeByte('\n');
6       // sink write a lot of data
7 
8       sink.close();
9     }

很简单是不是,就是通过editor去获取一个sink,然后写入数据,最后关闭就行了(ENTRY_METADATA == 1), 剩下的问题就是newSink做了什么:

 1 /**
 2      * Returns a new unbuffered output stream to write the value at {@code index}. If the underlying
 3      * output stream encounters errors when writing to the filesystem, this edit will be aborted
 4      * when {@link #commit} is called. The returned output stream does not throw IOExceptions.
 5      */
 6     public Sink newSink(int index) throws IOException {
 7       synchronized (DiskLruCache.this) {
 8         if (done) {
 9           throw new IllegalStateException();
10         }
11         if (entry.currentEditor != this) {
12           return NULL_SINK;
13         }
14         if (!entry.readable) {
15           written[index] = true;
16         }
17         File dirtyFile = entry.dirtyFiles[index];
18         Sink sink;
19         try {
20           sink = fileSystem.sink(dirtyFile);
21         } catch (FileNotFoundException e) {
22           return NULL_SINK;
23         }
24         return new FaultHidingSink(sink) {
25           @Override protected void onException(IOException e) {
26             synchronized (DiskLruCache.this) {
27               detach();}}};
28       }
29     }

如果done的话就不让我们继续操作啦(done会在commit里设置),并且关联entry的currentEditor必须是自己才可以进行操作,然后会把对应index的written设为true,表示会进行写操作,最后返回一个对第index号的dirtyFile操作的一个sink,以后就可以通过使用它来把数据写到dirtyFile里了

好了,当把数据写完之后,一定不要忘了使用commit操作来对这些数据进行提交,我们要把dirtyFiles里的内容移到cleanFiles里才能够让别的editor访问到:

 1 public void commit() throws IOException {
 2       synchronized (DiskLruCache.this) {
 3         if (done) {
 4           throw new IllegalStateException();
 5         }
 6         if (entry.currentEditor == this) {
 7           completeEdit(this, true);
 8         }
 9         done = true;
10       }
11     }

这里只是对done做了一些操作,关键还是在completeEdit里:

 1 private synchronized void completeEdit(Editor editor, boolean success) throws IOException {
 2     Entry entry = editor.entry;
 3     if (entry.currentEditor != editor) {
 4       throw new IllegalStateException();
 5     }
 6 
 7     // If this edit is creating the entry for the first time, every index must have a value.
 8     if (success && !entry.readable) {
 9       for (int i = 0; i < valueCount; i++) {
10         if (!editor.written[i]) {
11           editor.abort();
12           throw new IllegalStateException("Newly created entry didn't create value for index " + i);
13         }
14         if (!fileSystem.exists(entry.dirtyFiles[i])) {
15           editor.abort();
16           return;
17         }
18       }
19     }
20 
21     for (int i = 0; i < valueCount; i++) {
22       File dirty = entry.dirtyFiles[i];
23       if (success) {
24         if (fileSystem.exists(dirty)) {
25           File clean = entry.cleanFiles[i];
26           fileSystem.rename(dirty, clean);
27           long oldLength = entry.lengths[i];
28           long newLength = fileSystem.size(clean);
29           entry.lengths[i] = newLength;
30           size = size - oldLength + newLength;
31         }
32       } else {
33         fileSystem.delete(dirty);
34       }
35     }
36 
37     redundantOpCount++;
38     entry.currentEditor = null;
39     if (entry.readable | success) {
40       entry.readable = true;
41       journalWriter.writeUtf8(CLEAN).writeByte(' ');
42       journalWriter.writeUtf8(entry.key);
43       entry.writeLengths(journalWriter);
44       journalWriter.writeByte('\n');
45       if (success) {
46         entry.sequenceNumber = nextSequenceNumber++;
47       }
48     } else {
49       lruEntries.remove(entry.key);
50       journalWriter.writeUtf8(REMOVE).writeByte(' ');
51       journalWriter.writeUtf8(entry.key);
52       journalWriter.writeByte('\n');
53     }
54     journalWriter.flush();
55 
56     if (size > maxSize || journalRebuildRequired()) {
57       executor.execute(cleanupRunnable);
58     }
59   }
  • 首先要明确第二个参数success代表是否要写回entry里的数据

  • 8-19行,如果是第一次写回,必须保证每个index里要有数据,这是为了保证数据完整性

  • 21-35行,如果是要写回,就把dirtyFile重命名为cleanFile,这就完成了把数据转移到cleanFile里的过程;如果不是,那就删除dirtyFile,表示放弃这次修改

  • 39-54行,将对应的操作写入到journal里,如果commit次数很多的话,会尝试cleanup操作

这样下来数据都写进cleanFile啦,currentEditor也重新设成了null,这个写入的流程就结束了

再来看看是怎么读的:

 1 /**
 2    * Returns a snapshot of the entry named {@code key}, or null if it doesn't exist is not currently
 3    * readable. If a value is returned, it is moved to the head of the LRU queue.
 4    */
 5   public synchronized Snapshot get(String key) throws IOException {
 6     initialize();
 7 
 8     checkNotClosed();
 9     validateKey(key);
10     Entry entry = lruEntries.get(key);
11     if (entry == null || !entry.readable) return null;
12 
13     Snapshot snapshot = entry.snapshot();
14     if (snapshot == null) return null;
15 
16     redundantOpCount++;
17     journalWriter.writeUtf8(READ).writeByte(' ').writeUtf8(key).writeByte('\n');
18     if (journalRebuildRequired()) {
19       executor.execute(cleanupRunnable);
20     }
21 
22     return snapshot;
23   }

先会拿到snapshot,然后会用journalWriter向journal文件中写入一条READ记录

 1 /**
 2      * Returns a snapshot of this entry. This opens all streams eagerly to guarantee that we see a
 3      * single published snapshot. If we opened streams lazily then the streams could come from
 4      * different edits.
 5      */
 6     Snapshot snapshot() {
 7       if (!Thread.holdsLock(DiskLruCache.this)) throw new AssertionError();
 8 
 9       Source[] sources = new Source[valueCount];
10       long[] lengths = this.lengths.clone(); // Defensive copy since these can be zeroed out.
11       try {
12         for (int i = 0; i < valueCount; i++) {
13           sources[i] = fileSystem.source(cleanFiles[i]);
14         }
15         return new Snapshot(key, sequenceNumber, sources, lengths);
16       } catch (FileNotFoundException e) {
17         // A file must have been deleted manually!
18         // blahblahblah
19       }
20     }
21   }

代码看起来乱但其实很见到,就是为每个cleanFiles(验证了之前只能从cleanFiles里读取内容)创建source,并new一个Shanpshot返回,而snapshot也只是一个包装类:

 1 /** A snapshot of the values for an entry. */
 2   public final class Snapshot implements Closeable {
 3     private final String key;
 4     private final long sequenceNumber;
 5     private final Source[] sources;
 6     private final long[] lengths;
 7 
 8     private Snapshot(String key, long sequenceNumber, Source[] sources, long[] lengths) {
 9       this.key = key;
10       this.sequenceNumber = sequenceNumber;
11       this.sources = sources;
12       this.lengths = lengths;
13     }

是不是跟Editor一样,属于装饰模式,通过snapshot的getSource(index)就可以读取内容了

get也讲完了,发现一个特点没有,通过dirty和clean两种文件,可以实现修改和读取同时操作,这样也就提高了系统的吞吐量。我不知道commit后会不会使source失效,没有研究过Okio不敢下定论,如果要我们自己实现的话,一定要注意处理这种情况,可以给每个cleanFile设置一个标志位,并在commit的时候对它上锁,完了修改标志,读取的时候也去用锁的方式去获取这个标志,看看这个数据是否失效。

最后还有一个小点,来看看cleanup做了什么:

 1 private final Runnable cleanupRunnable = new Runnable() {
 2     public void run() {
 3       synchronized (DiskLruCache.this) {
 4         if (!initialized | closed) {
 5           return; // Nothing to do
 6         }
 7 
 8         try {
 9           trimToSize();
10         } catch (IOException ignored) {
11           mostRecentTrimFailed = true;
12         }
13 
14         try {
15           if (journalRebuildRequired()) {
16             rebuildJournal();
17             redundantOpCount = 0;
18           }
19         } catch (IOException e) {
20           mostRecentRebuildFailed = true;
21           journalWriter = Okio.buffer(NULL_SINK);
22         }
23       }
24     }
25   };

cleanup主要是用来调整整个cache的大小,以防止它过大,同时也能用来rebuildJournal,如果trim或者rebuild不成功,那之前edit里面也是没有办法获取Editor来进行数据修改的操作的
让我们focus on trimToSize:

 1 private void trimToSize() throws IOException {
 2     while (size > maxSize) {
 3       Entry toEvict = lruEntries.values().iterator().next();
 4       removeEntry(toEvict);
 5     }
 6     mostRecentTrimFailed = false;
 7   }
 8 
 9 
10   private boolean removeEntry(Entry entry) throws IOException {
11     if (entry.currentEditor != null) {
12       entry.currentEditor.detach(); // Prevent the edit from completing normally.
13     }
14 
15     for (int i = 0; i < valueCount; i++) {
16       fileSystem.delete(entry.cleanFiles[i]);
17       size -= entry.lengths[i];
18       entry.lengths[i] = 0;
19     }
20 
21     redundantOpCount++;
22     journalWriter.writeUtf8(REMOVE).writeByte(' ').writeUtf8(entry.key).writeByte('\n');
23     lruEntries.remove(entry.key);
24 
25     if (journalRebuildRequired()) {
26       executor.execute(cleanupRunnable);
27     }
28 
29     return true;
30   }

trimToSize仅仅是遍历lruEntries(注意,这个遍历可是通过accessOrder来的,也就是隐含了LRU这个算法),来一个一个移除entry直到size小于maxSize 而removeEntry操作就是将editor里的dirtyFiles以及cleanFiles进行删除就是了,同时要向journal文件里写入REMOVE操作,以及删除lruEntries里面的记录

总结

  • LinkedHashMap可以实现LRU算法,并且在这个case里,它被用作对DiskCache的内存索引

  • 告诉你们一个秘密,Universal-Imager-Loader里面的DiskLruCache的实现跟这里的一模一样,除了io使用inputstream/outputstream

  • 使用LinkedHashMap和journal文件同时记录做过的操作,其实也就是有索引了,这样就相当于有两个备份,可以互相恢复状态

  • 通过dirtyFiles和cleanFiles,可以实现更新和读取同时操作,在commit的时候将cleanFiles的内容进行更新就好了

  • 对于io这块真的没有研究过,有空一定要好好研究一下