七叶笔记 » golang编程 » Golang内存布局

Golang内存布局

LINUX 进程内存分配基础知识:

Go 进程虚拟内存布局

Attention: Heap prof 没有跟踪CGO /系统调用(例如:malloc / mmap)内存,

所以用了cgo的程序( 包括开启-race选项 ),RSS可能远远大于 mheap的大小

>> 引入 cgosymbolizer 库来跟踪c go 的内存调用( runtime #SetCgoTraceback)

对象分配流程

  • 大于 32K 的大对象直接从 mheap(页堆) 分配。
  • 小于 16B 的使用 mcache 的微型分配器分配
  • 对象大小在 16B ~ 32K 之间的的,首先通过计算使用的大小规格,然后使用 mcache 中对应大小规格的块分配
  • 如果对应的大小规格在 mcache 中没有可用的块,则向 mcentral 申请
  • 如果 mcentral 中没有可用的块,则向 mheap 申请,并 根据 BestFit 算法找到最合适的 mspan 。如果申请到的 mspan 超出申请大小,将会根据需求进行切分,以返回用户所需的页数。剩余的页构成一个新的 mspan 放回 mheap 的空闲列表。
  • 如果 mheap 中没有可用 span,则向操作系统申请一系列新的页(最小 1MB)。

所有在堆上的内存申请都来自 arena

暂时无法在文档外展示此内容

Go的内存分配思想源自TCMalloc, 具体实现可能每个版本不同

TCMalloc 的核心思想是将内存分为多个级别缩小锁的粒度。

在 TCMalloc 内存管理内部分为两个部分: 线程内存(thread memory) 页堆(page heap)

一些概念:

mheap

Go 使用 mheap 对象管理堆,只有一个 全局变量 。持有虚拟地址空间。

mcache

Go 像 TCMalloc 一样为每一个 逻辑处理器(P)(Logical Processors) 提供一个本地线程缓存(Local Thread Cache)称作 mcache ,所以如果 Goroutine 需要内存可以直接从 mcache 中获取,由于在同一时间只有一个 Goroutine 运行在 逻辑处理器(P)(Logical Processors) 上,所以中间不需要任何锁的参与

mcache 包含所有大小规格的 mspan 作为缓存

mcache 的作用是什么?

<=32K 字节的对象直接使用相应大小规格的 mspan 通过 mcache 分配

当 mcache 没有可用空间时会发生什么?

mcentral 的 mspans 列表获取一个新的所需大小规格的 mspan

mcentral

mcentral 对象收集所有给定规格大小的 span。每一个 mcentral 都包含两个 mspan 的列表:

  1. empty mspanList — 没有空闲对象或 span 已经被 mcache 缓存的 span 列表
  2. nonempty mspanList — 有空闲对象的 span 列表

Arena

事实证明 Go 的虚拟内存布局中包含一系列 arenas 。初始的堆映射是一个 arena ,如 64MB (基于 go 1.11.5)。

64位的Linux 一开始分配的Arena大小为64MB。

这些 arenas 就是我们所说的堆 。在 Go 中每一个 arena 都以 8KB 的粒度的页进行管理。

代码参考:

runtime/mheap.go (go1.17.7)

 // Main malloc heap.
// The heap itself is the "free" and "scav" treaps,
// but all the other global data is here too.
//
// mheap must not be heap-allocated because it contains mSpanLists,
// which must not be heap-allocated.
//
//go:notinheap
type mheap  struct  {
   // lock must only be acquired on the system stack, otherwise a g
   // could self-deadlock if its stack grows with the lock held.
   lock  mutex
   pages pageAlloc // page allocation data structure

   sweepgen     uint32 // sweep generation, see comment in mspan; written during STW
   sweepDrained uint32 // all spans are swept or are being swept
   sweepers     uint32 // number of active sweepone calls

   // allspans is a slice of all mspans ever created. Each mspan
   // appears exactly once.
   //
   // The memory for allspans is manually managed and can be
   // reallocated and move as the heap grows.
   //
   // In general, allspans is protected by mheap_.lock, which
   // prevents concurrent access as well as freeing the backing
   // store. Accesses during STW might not hold the lock, but
   // must ensure that allocation cannot happen around the
   // access (since that may free the backing store).
   allspans []*mspan // all spans out there

   _ uint32 // align uint64 fields on 32-bit for atomics

   // Proportional sweep
   //
   // These parameters represent a linear function from gcController.heapLive
   // to page sweep count. The proportional sweep system works to
   // stay in the black by keeping the current page sweep count
   // above this line at the current gcController.heapLive.
   //
   // The line has slope sweepPagesPerByte and passes through a
   // basis point at (sweepHeapLiveBasis, pagesSweptBasis). At
   // any given time, the system is at (gcController.heapLive,
   // pagesSwept) in this space.
   //
   // It's important that the line pass through a point we
   // control rather than simply starting at a (0,0) origin
   // because that lets us adjust sweep pacing at any time while
   // accounting for current progress. If we could only adjust
   // the slope, it would create a discontinuity in debt if any
   // progress has already been made.
   pagesInUse         uint64  // pages of spans in stats mSpanInUse; updated atomically
   pagesSwept         uint64  // pages swept this cycle; updated atomically
   pagesSweptBasis    uint64  // pagesSwept to use as the origin of the sweep ratio; updated atomically
   sweepHeapLiveBasis uint64  // value of gcController.heapLive to use as the origin of sweep ratio; written with lock, read without
   sweepPagesPer byte   float64 // proportional sweep ratio; written with lock, read without
   // TODO(austin): pagesInUse should be a uintptr, but the 386
   // compiler can't 8-byte align fields.

   // scavengeGoal is the amount of total retained heap memory (measured by
   // heapRetained) that the runtime will try to maintain by returning memory
   // to the OS.
   scavengeGoal uint64

   // Page reclaimer state

   // reclaimIndex is the page index in allArenas of next page to
   // reclaim. Specifically, it refers to page (i %
   // pagesPerArena) of arena allArenas[i / pagesPerArena].
   //
   // If this is >= 1<<63, the page reclaimer is done scanning
   // the page marks.
   //
   // This is accessed atomically.
   reclaimIndex uint64
   // reclaimCredit is spare credit for extra pages swept. Since
   // the page reclaimer works in large chunks, it may reclaim
   // more than requested. Any spare pages released go to this
   // credit pool.
   //
   // This is accessed atomically.
   reclaimCredit uintptr

   // arenas is the heap arena map. It points to the metadata for
   // the heap for every arena frame of the entire usable virtual
   // address space.
   //
   // Use arenaIndex to compute indexes into this array.
   //
   // For regions of the address space that are not backed by the
   // Go heap, the arena map contains nil.
   //
   // Modifications are protected by mheap_.lock. Reads can be
   // performed without locking; however, a given entry can
   // transition from nil to non-nil at any time when the lock
   // isn't held. (Entries never transitions back to nil.)
   //
   // In general, this is a two-level mapping consisting of an L1
   // map and possibly many L2 maps. This saves space when there
   // are a huge number of arena frames. However, on many
   // platforms (even 64-bit), arenaL1Bits is 0, making this
   // effectively a single-level map. In this case, arenas[0]
   // will never be nil.
   arenas [1 << arenaL1Bits]*[1 << arenaL2Bits]*heapArena

   // heapArenaAlloc is pre-reserved space for allocating heapArena
   // objects. This is only used on 32-bit, where we pre-reserve
   // this space to avoid interleaving it with the heap itself.
   heapArenaAlloc linearAlloc

   // arenaHints is a list of addresses at which to attempt to
   // add more heap arenas. This is initially populated with a
   // set of general hint addresses, and grown with the bounds of
   // actual heap arena ranges.
   arenaHints *arenaHint

   // arena is a pre-reserved space for allocating heap arenas
   // (the actual arenas). This is only used on 32-bit.
   arena linearAlloc

   // allArenas is the arenaIndex of every mapped arena. This can
   // be used to iterate through the address space.
   //
   // Access is protected by mheap_.lock. However, since this is
   // append-only and old backing arrays are never freed, it is
   // safe to acquire mheap_.lock, copy the slice header, and
   // then release mheap_.lock.
   allArenas []arenaIdx

   // sweepArenas is a snapshot of allArenas taken at the
   // beginning of the sweep cycle. This can be read safely by
   // simply blocking GC (by disabling preemption).
   sweepArenas []arenaIdx

   // markArenas is a snapshot of allArenas taken at the beginning
   // of the mark cycle. Because allArenas is append-only, neither
   // this slice nor its contents will change during the mark, so
   // it can be read safely.
   markArenas []arenaIdx

   // curArena is the arena that the heap is currently growing
   // into. This should always be physPageSize-aligned.
   curArena struct {
      base, end uintptr
   }

   _ uint32 // ensure 64-bit alignment of central

   // central free lists for small size classes.
   // the  padding  makes sure that the mcentrals are
   // spaced CacheLinePadSize bytes apart, so that each mcentral.lock
   // gets its own cache line.
   // central is indexed by spanClass.
   central [numSpanClasses]struct {
    mcentral mcentral
    pad [cpu.CacheLinePadSize-unsafe.Sizeof(mcentral{})%cpu.CacheLinePadSize]byte
   }

   spanalloc             fixalloc // allocator for span*
   cachealloc            fixalloc // allocator for mcache*
   specialfinalizeralloc fixalloc // allocator for specialfinalizer*
   specialprofilealloc   fixalloc // allocator for specialprofile*
   specialReachableAlloc fixalloc // allocator for specialReachable
   speciallock           mutex    // lock for special record allocators.
   arenaHintAlloc        fixalloc // allocator for arenaHints

   unused *specialfinalizer // never set, just here to force the specialfinalizer type into DWARF
}

var mheap_ mheap

// A heapArena stores metadata for a heap arena. heapArenas are stored
// outside of the Go heap and accessed via the mheap_.arenas index.
//
//go:notinheap
type heapArena struct {
   //  bitmap  stores the pointer/scalar bitmap for the words in
   // this arena. See mbitmap.go for a description. Use the
   // heapBits type to access this.
   bitmap [heapArenaBitmapBytes]byte

   // spans maps from virtual address page ID within this arena to *mspan.
   // For allocated spans, their pages map to the span itself.
   // For free spans, only the lowest and highest pages map to the span itself.
   // Internal pages map to an arbitrary span.
   // For pages that have never been allocated, spans entries are nil.
   //
   // Modifications are protected by mheap.lock. Reads can be
   // performed without locking, but ONLY from indexes that are
   // known to contain in-use or stack spans. This means there
   // must not be a safe-point between establishing that an
   // address is live and looking it up in the spans array.
   spans [pagesPerArena]*mspan

   // pageInUse is a bitmap that indicates which spans are in
   // state mSpanInUse. This bitmap is indexed by page number,
   // but only the bit corresponding to the first page in each
   // span is used.
   //
   // Reads and writes are atomic.
   pageInUse [pagesPerArena / 8]uint8

   // pageMarks is a bitmap that indicates which spans have any
   // marked objects on them. Like pageInUse, only the bit
   // corresponding to the first page in each span is used.
   //
   // Writes are done atomically during marking. Reads are
   // non-atomic and lock-free since they only occur during
   // sweeping (and hence never race with writes).
   //
   // This is used to quickly find whole spans that can be freed.
   //
   // TODO(austin): It would be nice if this was uint64 for
   // faster scanning, but we don't have 64-bit atomic bit
   // operations.
   pageMarks [pagesPerArena / 8]uint8

   // pageSpecials is a bitmap that indicates which spans have
   // specials (finalizers or other). Like pageInUse, only the bit
   // corresponding to the first page in each span is used.
   //
   // Writes are done atomically whenever a special is added to
   // a span and whenever the last special is removed from a span.
   // Reads are done atomically to find spans containing specials
   // during marking.
   pageSpecials [pagesPerArena / 8]uint8

   // checkmarks stores the debug.gccheckmark state. It is only
   // used if debug.gccheckmark > 0.
   checkmarks *checkmarksMap

   // zeroedBase marks the first byte of the first page in this
   // arena which hasn't been used yet and is therefore already
   // zero. zeroedBase is relative to the arena base.
   // Increases monotonically until it hits heapArenaBytes.
   //
   // This field is sufficient to determine if an allocation
   // needs to be zeroed because the page allocator follows an
   // address-ordered first-fit policy.
   //
   // Read atomically and written with an atomic CAS.
   zeroedBase uintptr
}
  

参考文献:

a-visual-guide-to-golang-memory-allocator-from-ground-up

a-visual-guide-to-golang-memory-allocator-from-ground-up中文版

相关文章