-
-
Notifications
You must be signed in to change notification settings - Fork 317
Open
Description
Description
A .NET application could experience theese symptoms after a short CPU spike, and then contributing to further process performance degradation that could last for minutes.
The process dump shows 100s of Kestrel threads congregating on a lock inside Prometheus.DotNetStats.UpdateMetrics()
prometheus-net/Prometheus/DotNetStats.cs
Line 78 in 60e9106
lock (_updateLock) |
Prometheus.DotNetStats.UpdateMetrics()
Prometheus.CollectorRegistry+<RunBeforeCollectCallbacksAsync>d__22.MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Prometheus.CollectorRegistry+<RunBeforeCollectCallbacksAsync>d__22, Prometheus.NetStandard]](<RunBeforeCollectCallbacksAsync>d__22 ByRef)
Prometheus.CollectorRegistry+<CollectAndSerializeAsync>d__21.MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Prometheus.CollectorRegistry+<CollectAndSerializeAsync>d__21, Prometheus.NetStandard]](<CollectAndSerializeAsync>d__21 ByRef)
Prometheus.MetricServerMiddleware+<Invoke>d__7.MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Prometheus.MetricServerMiddleware+<Invoke>d__7, Prometheus.AspNetCore]](<Invoke>d__7 ByRef)
Microsoft.AspNetCore.Builder.Extensions.MapMiddleware+<InvokeCore>d__4.MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Builder.Extensions.MapMiddleware+<InvokeCore>d__4, Microsoft.AspNetCore.Http.Abstractions]](<InvokeCore>d__4 ByRef)
Microsoft.AspNetCore.Builder.Extensions.MapMiddleware.Invoke(Microsoft.AspNetCore.Http.HttpContext)
Microsoft.AspNetCore.ResponseCompression.ResponseCompressionMiddleware+<InvokeCore>d__4.MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.ResponseCompression.ResponseCompressionMiddleware+<InvokeCore>d__4, Microsoft.AspNetCore.ResponseCompression]](<InvokeCore>d__4 ByRef)
Microsoft.AspNetCore.ResponseCompression.ResponseCompressionMiddleware.Invoke(Microsoft.AspNetCore.Http.HttpContext)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequests>d__238`1[[System.__Canon, System.Private.CoreLib]].MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequests>d__238`1[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ProcessRequests>d__238`1<System.__Canon> ByRef)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[[System.__Canon, System.Private.CoreLib]](Microsoft.AspNetCore.Hosting.Server.IHttpApplication`1<System.__Canon>)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequestsAsync>d__237`1[[System.__Canon, System.Private.CoreLib]].MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequestsAsync>d__237`1[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ProcessRequestsAsync>d__237`1<System.__Canon> ByRef)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync[[System.__Canon, System.Private.CoreLib]](Microsoft.AspNetCore.Hosting.Server.IHttpApplication`1<System.__Canon>)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection+<ProcessRequestsAsync>d__12`1[[System.__Canon, System.Private.CoreLib]].MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection+<ProcessRequestsAsync>d__12`1[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ProcessRequestsAsync>d__12`1<System.__Canon> ByRef)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[[System.__Canon, System.Private.CoreLib]](Microsoft.AspNetCore.Hosting.Server.IHttpApplication`1<System.__Canon>)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1+<ExecuteAsync>d__8[[System.__Canon, System.Private.CoreLib]].MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1+<ExecuteAsync>d__8[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ExecuteAsync>d__8<System.__Canon> ByRef)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1[[System.__Canon, System.Private.CoreLib]].ExecuteAsync()
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1[[System.__Canon, System.Private.CoreLib]].System.Thread
7700
ing.IThreadPoolWorkItem.Execute()
System.Threading.ThreadPoolWorkQueue.Dispatch()
System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
DebuggerU2MCatchHandlerFrame
A single thread is inside the locked section:
Interop+Sys.Open(System.String, OpenFlags, Int32)
Microsoft.Win32.SafeHandles.SafeFileHandle.Open(System.String, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, System.IO.FileOptions, Int64, System.IO.UnixFileMode, Int64 ByRef, System.IO.UnixFileMode ByRef, Boolean, Boolean ByRef, System.Func`4<ErrorInfo,OpenFlags,System.String,System.Exception>)
System.IO.Strategies.OSFileStreamStrategy..ctor(System.String, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, System.IO.FileOptions, Int64, System.Nullable`1<System.IO.UnixFileMode>)
Interop+procfs.TryReadFile(System.String, System.String ByRef)
Interop+procfs.TryParseStatFile(System.String, ParsedStat ByRef)
System.Diagnostics.ProcessManager.CreateProcessInfo(ParsedStat ByRef, ParsedStatus ByRef, System.String)
System.Diagnostics.ProcessManager.CreateProcessInfo(Int32)
System.Diagnostics.Process.EnsureState(State)
Prometheus.DotNetStats.UpdateMetrics()
...
Configuration
- .NET 8
- Ubuntu 20.04 x64
Analysis
The lock had been introduced in #173 as a solution to the Process
instance thread safety concern.
Suggestion
- If it is the responsibility of the caller to protect the Prometheus endpoint through appropriate Kestrel limits, then add an appropriate section similar to Protecting the metrics endpoint from unauthorized access .
- As precaution the library could use a caching approach to collecting the metrics by storing in memory for a short period of time to reuse by concurrent requests. After the cache expires it would refresh the metrics and repeat the cycle.Something like:
TimeSpan elapsed = cachedMetricsExpiryStopwatch.Elapsed;
var collectedMetrics = collectedMetricsCached;
// Check if the cache exists and not has expired. If either is false then collect the metrics into collectedMetricsCached struct.
// LastWriteRecheckInterval is a reasonably small value that limits time resolution from infinite a range of 100 milliseconds to a second,
if (collectedMetrics is null || elapsed > LastWriteRecheckInterval)
{
lock (collectProcessMetricStopwatch)
{
collectedMetrics = collectedMetricsCached;
elapsed = cachedMetricsExpiryStopwatch.Elapsed;
if (collectedMetrics is null || elapsed > LastWriteRecheckInterval)
{
// TBD: GetProcessMetrics. It refreshes Process instance, collects the metrics and stores as fields
// into collectedMetricsCached structure, resets cachedMetricsExpiryStopwatch.
collectedMetrics= GetProcessMetrics();
}
}
}
// Pass the cached metrics to the requestor.
for (var gen = 0; gen <= GC.MaxGeneration; gen++)
_collectionCounts[gen].IncTo(collectedMetrics.CollectionCount[gen]);
_totalMemory.Set(collectedMetrics.TotalMemory);
_virtualMemorySize.Set(collectedMetrics.VirtualMemorySize64);
_workingSet.Set(collectedMetrics.WorkingSet64);
_privateMemorySize.Set(collectedMetrics.PrivateMemorySize64);
_cpuTotal.IncTo(collectedMetrics.TotalProcessorTime.TotalSeconds);
_openHandles.Set(collectedMetrics.HandleCount);
_numThreads.Set(collectedMetrics.Threads.Count);
Metadata
Metadata
Assignees
Labels
No labels