关于我读.NET 源码那些事

日常更新维护电脑常用软件的时候看到 dnSpy 似乎很久没有更新了，于是去 GitHub 上看了看，发现作者已经把它 archive 了，于是带着些许震惊下载了最新的版本。之后随便找个 .NET 应用反编译了一下，看着 System 命名空间，突然生出了好奇心：“那些类是怎么实现的呢？”

今天遇到了个小问题，询问学长过程中才想起来 .NET 是在 GitHub 上开源的东西：.NET Runtime, 大可以直接翻找其源代码，学长又提供了一个看 .NET 源码的网站：.NET Source Browser

此篇文章并未完全结束，以后更新。

Stack

Link to Stack.cs

点着点着，翻到了 Stack 类的实现，也算是比较好奇这个入门级的数据结构是怎么实现的，于是找了几个好奇的方法看了起来。

// Pops an item from the top of the stack.
// If the stack is empty,  Pop throws an InvalidOperationException.
public virtual object? Pop()
{
    if (_size == 0)
        throw new InvalidOperationException(SR.InvalidOperation_EmptyStack);

    _version++;
    object? obj = _array[--_size];
    _array[_size] = null;     // Free memory quicker.
    return obj;
}

Pop() 函数，这个实现也算是意料之中

// Pushes an item to the top of the stack.
public virtual void Push(object? obj)
{
    if (_size == _array.Length)
    {
        object[] newArray = new object[2 * _array.Length];
        Array.Copy(_array, newArray, _size);
        _array = newArray;
    }
    _array[_size++] = obj;
    _version++;
}

为什么要把空间成 2 倍增长

看到这段代码我疑惑了，为什么栈大小会成倍增长呢？查询相关资料后得到了不少的结论，其中包含空间是 2 的倍数的时候有利于资源的回收等等，这里简单整理一下原因，具体细节可以参照Reference中知乎的解答。

首先考虑成倍增长这件事，为什么不每一次增长一个固定的大小呢？

若以成倍的方式增长，每次增长因子为 $m$ 那么想要向动态数组中添加进入 $n$ 个元素所需扩容的次数最坏即 $log_m(n)$ 次, 在第 $i$ 次重新分配内存的时候会导致复制 $m^i$ 即当前旧数组大小的元素，于是 $n$ 次添加操作花费的时间复杂度为：

$\sum_{i=1}^{log_m n} m ^ i \approx \frac{nm}{m -1}$

由均摊分析可知它的时间复杂度是 $\Theta(1)$

而如果每次增加固定大小 $k$ , 则第 $i$ 次增加时候需要复制的数量就是 $ki$ , 那么想要向动态数组中添加进入 $n$ 个元素所需扩容的次数即 $\frac{n}{k}$ 次, 于是 $n$ 次添加操作花费的时间复杂度为：

$\sum_{i = 1}^{\frac{n}{k}} ki = k \sum_{i = 1}^{\frac{n}{k}} i$

这是一个关于 $n$ 的二次多项式，由均摊分析可知它的时间复杂度是 $\Theta(n)$

因而成倍增加时候可以保证常数级别的时间复杂度。

而至于为什么要用 2 倍，其原因分析就比较众说纷纭了。

这是另一个比较复杂的问题，有人说一直以 2 的倍数扩容利于内存对齐且利于垃圾回收的效率，有人说以 2 为倍数会导致以前的内存无法重用，因此如今很多编译器实现的时候降低到了 1.5, 对此我仍有很多疑惑，故在此不班门弄斧，发表自己的见解了。

PS: 至于为什么默认大小是 10 嘛…开心就好

private const int _defaultCapacity = 10;

维护_version 变量的意义

看这段代码的另一个引起我好奇的地方是为什么要维护一个_version变量，初看反编译的代码并没有搞明白，而后搜索和翻找源码的时候看到了：

private object?[] _array; // Storage for stack elements. Do not rename (binary serialization)
private int _size; // Number of items in the stack. Do not rename (binary serialization)
private int _version; // Used to keep enumerator in sync with collection. Do not rename (binary serialization)

它的作用主要是保持迭代器和集合一致，防止出现一些意外的修改操作，如在foreach中：

If foreach didn’t force read-only access then your binary tree would degenerate into a tree. The entire data structure would be in disarray.

Others

这里是一些翻阅源码时候的小插曲：

原来不止我会这样写代码 (
```
while((num3 = (int)(*ptr2)) != 0)
```

居然有如此粗暴的代码

public object Clone()
{
    return this;
}

自作聪明的以为避免了使用IndexOf

public bool Contains(string value)
{
    return this.IndexOf(value, StringComparison.Ordinal) >= 0;
}

一段第一次看起来“卧槽牛逼”, 细看之后“啊，就这”的代码

if (startIndex % 8 == 0)
{
    return *(long*)ptr;
}
if (BitConverter.IsLittleEndian)
{
    int num = (int)(*ptr) | (int)ptr[1] << 8 | (int)ptr[2] << 16 | (int)ptr[3] << 24;
    int num2 = (int)ptr[4] | (int)ptr[5] << 8 | (int)ptr[6] << 16 | (int)ptr[7] << 24;
    return (long)((ulong)num | (ulong)((ulong)((long)num2) << 32));
}
int num3 = (int)(*ptr) << 24 | (int)ptr[1] << 16 | (int)ptr[2] << 8 | (int)ptr[3];
int num4 = (int)ptr[4] << 24 | (int)ptr[5] << 16 | (int)ptr[6] << 8 | (int)ptr[7];
return (long)((ulong)num4 | (ulong)((ulong)((long)num3) << 32));