七叶笔记 » golang编程 » Golang内存布局

Golang内存布局

分类: golang编程 | 浏览: 400

LINUX 进程内存分配基础知识：

Go 进程虚拟内存布局

Attention： Heap prof 没有跟踪CGO /系统调用（例如：malloc / mmap）内存，

所以用了cgo的程序( 包括开启-race选项 )，RSS可能远远大于 mheap的大小

>> 引入 cgosymbolizer 库来跟踪c go 的内存调用（ runtime #SetCgoTraceback）

对象分配流程

大于 32K 的大对象直接从 mheap（页堆） 分配。

小于 16B 的使用 mcache 的微型分配器分配

对象大小在 16B ~ 32K 之间的的，首先通过计算使用的大小规格，然后使用 mcache 中对应大小规格的块分配

如果对应的大小规格在 mcache 中没有可用的块，则向 mcentral 申请

如果 mcentral 中没有可用的块，则向 mheap 申请，并 根据 BestFit 算法找到最合适的 mspan 。如果申请到的 mspan 超出申请大小，将会根据需求进行切分，以返回用户所需的页数。剩余的页构成一个新的 mspan 放回 mheap 的空闲列表。

如果 mheap 中没有可用 span，则向操作系统申请一系列新的页（最小 1MB）。

所有在堆上的内存申请都来自 arena

暂时无法在文档外展示此内容

Go的内存分配思想源自TCMalloc，具体实现可能每个版本不同

TCMalloc 的核心思想是将内存分为多个级别缩小锁的粒度。

在 TCMalloc 内存管理内部分为两个部分： 线程内存（thread memory) 和 页堆（page heap）

一些概念：

mheap

Go 使用 mheap 对象管理堆，只有一个全局变量。持有虚拟地址空间。

mcache

Go 像 TCMalloc 一样为每一个 逻辑处理器（P）（Logical Processors） 提供一个本地线程缓存（Local Thread Cache）称作 mcache ，所以如果 Goroutine 需要内存可以直接从 mcache 中获取，由于在同一时间只有一个 Goroutine 运行在 逻辑处理器（P）（Logical Processors） 上，所以中间不需要任何锁的参与

mcache 包含所有大小规格的 mspan 作为缓存

mcache 的作用是什么？

<=32K 字节的对象直接使用相应大小规格的 mspan 通过 mcache 分配

当 mcache 没有可用空间时会发生什么？

从 mcentral 的 mspans 列表获取一个新的所需大小规格的 mspan 。

mcentral

mcentral 对象收集所有给定规格大小的 span。每一个 mcentral 都包含两个 mspan 的列表：

empty mspanList — 没有空闲对象或 span 已经被 mcache 缓存的 span 列表
nonempty mspanList — 有空闲对象的 span 列表

Arena

事实证明 Go 的虚拟内存布局中包含一系列 arenas 。初始的堆映射是一个 arena ，如 64MB （基于 go 1.11.5）。

64位的Linux 一开始分配的Arena大小为64MB。

这些 arenas 就是我们所说的堆 。在 Go 中每一个 arena 都以 8KB 的粒度的页进行管理。

代码参考：

runtime/mheap.go (go1.17.7)

 // Main malloc heap.
// The heap itself is the "free" and "scav" treaps,
// but all the other global data is here too.
//
// mheap must not be heap-allocated because it contains mSpanLists,
// which must not be heap-allocated.
//
//go:notinheap
type mheap  struct  {
   // lock must only be acquired on the system stack, otherwise a g
   // could self-deadlock if its stack grows with the lock held.
   lock  mutex
   pages pageAlloc // page allocation data structure

   sweepgen     uint32 // sweep generation, see comment in mspan; written during STW
   sweepDrained uint32 // all spans are swept or are being swept
   sweepers     uint32 // number of active sweepone calls

   // allspans is a slice of all mspans ever created. Each mspan
   // appears exactly once.
   //
   // The memory for allspans is manually managed and can be
   // reallocated and move as the heap grows.
   //
   // In general, allspans is protected by mheap_.lock, which
   // prevents concurrent access as well as freeing the backing
   // store. Accesses during STW might not hold the lock, but
   // must ensure that allocation cannot happen around the
   // access (since that may free the backing store).
   allspans []*mspan // all spans out there

   _ uint32 // align uint64 fields on 32-bit for atomics

   // Proportional sweep
   //
   // These parameters represent a linear function from gcController.heapLive
   // to page sweep count. The proportional sweep system works to
   // stay in the black by keeping the current page sweep count
   // above this line at the current gcController.heapLive.
   //
   // The line has slope sweepPagesPerByte and passes through a
   // basis point at (sweepHeapLiveBasis, pagesSweptBasis). At
   // any given time, the system is at (gcController.heapLive,
   // pagesSwept) in this space.
   //
   // It's important that the line pass through a point we
   // control rather than simply starting at a (0,0) origin
   // because that lets us adjust sweep pacing at any time while
   // accounting for current progress. If we could only adjust
   // the slope, it would create a discontinuity in debt if any
   // progress has already been made.
   pagesInUse         uint64  // pages of spans in stats mSpanInUse; updated atomically
   pagesSwept         uint64  // pages swept this cycle; updated atomically
   pagesSweptBasis    uint64  // pagesSwept to use as the origin of the sweep ratio; updated atomically
   sweepHeapLiveBasis uint64  // value of gcController.heapLive to use as the origin of sweep ratio; written with lock, read without
   sweepPagesPer byte   float64 // proportional sweep ratio; written with lock, read without
   // TODO(austin): pagesInUse should be a uintptr, but the 386
   // compiler can't 8-byte align fields.

   // scavengeGoal is the amount of total retained heap memory (measured by
   // heapRetained) that the runtime will try to maintain by returning memory
   // to the OS.
   scavengeGoal uint64

   // Page reclaimer state

   // reclaimIndex is the page index in allArenas of next page to
   // reclaim. Specifically, it refers to page (i %
   // pagesPerArena) of arena allArenas[i / pagesPerArena].
   //
   // If this is >= 1<<63, the page reclaimer is done scanning
   // the page marks.
   //
   // This is accessed atomically.
   reclaimIndex uint64
   // reclaimCredit is spare credit for extra pages swept. Since
   // the page reclaimer works in large chunks, it may reclaim
   // more than requested. Any spare pages released go to this
   // credit pool.
   //
   // This is accessed atomically.
   reclaimCredit uintptr

   // arenas is the heap arena map. It points to the metadata for
   // the heap for every arena frame of the entire usable virtual
   // address space.
   //
   // Use arenaIndex to compute indexes into this array.
   //
   // For regions of the address space that are not backed by the
   // Go heap, the arena map contains nil.
   //
   // Modifications are protected by mheap_.lock. Reads can be
   // performed without locking; however, a given entry can
   // transition from nil to non-nil at any time when the lock
   // isn't held. (Entries never transitions back to nil.)
   //
   // In general, this is a two-level mapping consisting of an L1
   // map and possibly many L2 maps. This saves space when there
   // are a huge number of arena frames. However, on many
   // platforms (even 64-bit), arenaL1Bits is 0, making this
   // effectively a single-level map. In this case, arenas[0]
   // will never be nil.
   arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena

   // heapArenaAlloc is pre-reserved space for allocating heapArena
   // objects. This is only used on 32-bit, where we pre-reserve
   // this space to avoid interleaving it with the heap itself.
   heapArenaAlloc linearAlloc

   // arenaHints is a list of addresses at which to attempt to
   // add more heap arenas. This is initially populated with a
   // set of general hint addresses, and grown with the bounds of
   // actual heap arena ranges.
   arenaHints *arenaHint

   // arena is a pre-reserved space for allocating heap arenas
   // (the actual arenas). This is only used on 32-bit.
   arena linearAlloc

   // allArenas is the arenaIndex of every mapped arena. This can
   // be used to iterate through the address space.
   //
   // Access is protected by mheap_.lock. However, since this is
   // append-only and old backing arrays are never freed, it is
   // safe to acquire mheap_.lock, copy the slice header, and
   // then release mheap_.lock.
   allArenas []arenaIdx

   // sweepArenas is a snapshot of allArenas taken at the
   // beginning of the sweep cycle. This can be read safely by
   // simply blocking GC (by disabling preemption).
   sweepArenas []arenaIdx

   // markArenas is a snapshot of allArenas taken at the beginning
   // of the mark cycle. Because allArenas is append-only, neither
   // this slice nor its contents will change during the mark, so
   // it can be read safely.
   markArenas []arenaIdx

   // curArena is the arena that the heap is currently growing
   // into. This should always be physPageSize-aligned.
   curArena struct {
      base, end uintptr
   }

   _ uint32 // ensure 64-bit alignment of central

   // central free lists for small size classes.
   // the  padding  makes sure that the mcentrals are
   // spaced CacheLinePadSize bytes apart, so that each mcentral.lock
   // gets its own cache line.
   // central is indexed by spanClass.
   central [numSpanClasses]struct {
    mcentral mcentral
    pad [cpu.CacheLinePadSize-unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize]byte
   }

   spanalloc             fixalloc // allocator for span*
   cachealloc            fixalloc // allocator for mcache*
   specialfinalizeralloc fixalloc // allocator for specialfinalizer*
   specialprofilealloc   fixalloc // allocator for specialprofile*
   specialReachableAlloc fixalloc // allocator for specialReachable
   speciallock           mutex    // lock for special record allocators.
   arenaHintAlloc        fixalloc // allocator for arenaHints

   unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
}

var mheap_ mheap

// A heapArena stores metadata for a heap arena. heapArenas are stored
// outside of the Go heap and accessed via the mheap_.arenas index.
//
//go:notinheap
type heapArena struct {
   //  bitmap  stores the pointer/scalar bitmap for the words in
   // this arena. See mbitmap.go for a description. Use the
   // heapBits type to access this.
   bitmap [heapArenaBitmapBytes]byte

   // spans maps from virtual address page ID within this arena to *mspan.
   // For allocated spans, their pages map to the span itself.
   // For free spans, only the lowest and highest pages map to the span itself.
   // Internal pages map to an arbitrary span.
   // For pages that have never been allocated, spans entries are nil.
   //
   // Modifications are protected by mheap.lock. Reads can be
   // performed without locking, but ONLY from indexes that are
   // known to contain in-use or stack spans. This means there
   // must not be a safe-point between establishing that an
   // address is live and looking it up in the spans array.
   spans [pagesPerArena]*mspan

   // pageInUse is a bitmap that indicates which spans are in
   // state mSpanInUse. This bitmap is indexed by page number,
   // but only the bit corresponding to the first page in each
   // span is used.
   //
   // Reads and writes are atomic.
   pageInUse [pagesPerArena / 8]uint8

   // pageMarks is a bitmap that indicates which spans have any
   // marked objects on them. Like pageInUse, only the bit
   // corresponding to the first page in each span is used.
   //
   // Writes are done atomically during marking. Reads are
   // non-atomic and lock-free since they only occur during
   // sweeping (and hence never race with writes).
   //
   // This is used to quickly find whole spans that can be freed.
   //
   // TODO(austin): It would be nice if this was uint64 for
   // faster scanning, but we don't have 64-bit atomic bit
   // operations.
   pageMarks [pagesPerArena / 8]uint8

   // pageSpecials is a bitmap that indicates which spans have
   // specials (finalizers or other). Like pageInUse, only the bit
   // corresponding to the first page in each span is used.
   //
   // Writes are done atomically whenever a special is added to
   // a span and whenever the last special is removed from a span.
   // Reads are done atomically to find spans containing specials
   // during marking.
   pageSpecials [pagesPerArena / 8]uint8

   // checkmarks stores the debug.gccheckmark state. It is only
   // used if debug.gccheckmark > 0.
   checkmarks *checkmarksMap

   // zeroedBase marks the first byte of the first page in this
   // arena which hasn't been used yet and is therefore already
   // zero. zeroedBase is relative to the arena base.
   // Increases monotonically until it hits heapArenaBytes.
   //
   // This field is sufficient to determine if an allocation
   // needs to be zeroed because the page allocator follows an
   // address-ordered first-fit policy.
   //
   // Read atomically and written with an atomic CAS.
   zeroedBase uintptr
}

参考文献：

a-visual-guide-to-golang-memory-allocator-from-ground-up

a-visual-guide-to-golang-memory-allocator-from-ground-up中文版

Golang 内存布局

七叶笔记