Deprecated: Function get_magic_quotes_gpc() is deprecated in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 99

Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 619

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1169

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176

Warning: Cannot modify header information - headers already sent by (output started at /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php:99) in /hermes/walnacweb04/walnacweb04ab/b2791/pow.jasaeld/htdocs/De1337/nothing/index.php on line 1176
8000 Tags · kaltura/media-framework · GitHub
Nothing Special   »   [go: up one dir, main page]

Skip to content

Tags: kaltura/media-framework

Tags

v2.0.1

Toggle v2.0.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Merge pull request #226 from kaltura/master-transcoder-clock-estimato…

…r-wraparound-fix

transcoder may deliver kmp frame created time in far future due to wrong clock estimation in transcoded output

v2.0.0-master-estimator-fix

Toggle v2.0.0-master-estimator-fix's commit message
Executive Summary

  On September 23, 2025, the Kaltura Media Framework experienced a critical failure with "ngx_buf_queue_get:
  memory limit exceeded" at 21:51:45. Our investigation revealed a cascading failure triggered by a bug in
  the transcoder's time estimation algorithm during a 90kHz timestamp wraparound event.

  Timeline of Discovery

  Initial Investigation

  - Symptom: Memory buffer exhaustion causing stream processing failure
  - Channel: nvp1-1_yqgk4a0v
  - Error: "ngx_buf_queue_get: memory limit exceeded"

  Root Cause Chain Discovered

  1. Timestamp wraparound occurs every ~26.5 hours (90kHz clock cycle)
  2. Transcoder time estimator bug creates future timestamps
  3. Sync system chaos due to timeline divergence
  4. Segmentation failure prevents segment completion
  5. Buffer accumulation without release
  6. Memory exhaustion and system failure

  The Core Bug: clock_estimator_get_clock

  Location

  File: /Users/igorshevachmac/dev/media-framework/transcoder/utils/time_estimator.c
  Function: clock_estimator_get_clock() lines 32-58

  The Algorithm Flaw

  for (int64_t runner=fifo->framesFifoHead; runner>=fifo->framesFifoTail; runner--) {
      clock_estimator_sample_t* sample=&(fifo->samples[runner % TIME_ESTIMATOR_FIFO_SIZE]);
      int64_t runnerdistance=llabs(dts - sample->dts);

      if (runnerdistance < distance) {
          clock = dts - sample->dts + sample->clock;  // Line 48
          distance = runnerdistance;
          if (distance == 0) {
              break;
          }
      } else {
          break;  // ❌ EXITS EARLY - prevents finding true minimum
      }
  }

  The Problem: Early Exit Prevents Optimal Solution

  Normal operation: Algorithm works correctly with monotonic DTS progression
  During wraparound: Early exit prevents finding the best match

  The Failure Scenario

  FIFO State During Wraparound

  FIFO: [90000, 0, 8589467340, ...] (head to tail)
  Processing frame: dts = 8589467340 (26.5h frame delayed in pipeline)

  Algorithm Execution

  1. Iteration 1: sample->dts=90000, distance=8589377340
  2. Iteration 2: sample->dts=0, distance=8589467340
  3. Distance comparison: 8589467340 > 8589377340 → algorithm exits
  4. Never reaches: Perfect match at sample->dts=8589467340

  The Calculation That Created Future Timestamps

  clock = dts - sample->dts + sample->clock
  clock = 8589467340 - 90000 + sample->clock
  clock = 8589377340 + sample->clock = FUTURE TIMESTAMP

  Proof with Observed Values

  Observed future timestamp: 158287382912982
  Calculation verification:
  158287382912982 = 8589467340 + 158278793445642
  Where 158278793445642 was the clock value of the dts=90000 sample

  Why Only Some Tracks Failed

  Successful Tracks (Passthrough)

  - a32@b, v32@b: "passthrough": true
  - Low latency: FIFO stays synchronized with input
  - Perfect matches found: distance = 0

  Failed Tracks (Transcoded)

  - a33@b, a42@b, v33@b, v34@b, v35@b, v42@b, v43@b: Active transcoding
  - High latency: Encoding pipeline introduces delays
  - FIFO gets "ahead": Contains wrapped samples when processing pre-wrap frames
  - Future timestamps generated: Time estimator bug triggers

  The Cascade Effect

  1. Future Timestamps Generated (21:47:35)

  Transcoder sends frames with created timestamps in the future

  2. Sync System Divergence (21:47:34-21:47:35)

  a32@b: 158270203044791 (correct, old timeline)
  a33@b: 158278792980979 (incorrect, new timeline)

  3. Segmentation Chaos (21:47:35-21:51:45)

  - Syncer cannot reconcile 26.5-hour time differences
  - "large inter-frame pts jump" warnings
  - "track added/removed, forcing new period" messages

  4. Buffer Accumulation (21:47:29-21:51:45)

  - Each incomplete segment locks buffers for all tracks
  - ~40 incomplete segments × 6-7 tracks = 240-280 buffer locks
  - No segment completion = no buffer releases

  5. Memory Exhaustion (21:51:45)

  ngx_buf_queue_get: memory limit exceeded

  Evidence from Logs

  FFmpeg Warnings During Wraparound

  2025-09-23T18:47:35.385 FFMPEG:AVCodecContext WARN Queue input is backward in time
  2025-09-23T18:47:35.388 FFMPEG:AVCodecContext WARN Queue input is backward in time

  Transcoder Diagnostics

  21:47:30.226: "drift": -96, "lastDts": 8589467340    (before wrap)
  21:47:45.363: "drift": 95443698, "lastDts": 890130   (after wrap)

  Sync Timeline Divergence

  Sep 23 @ 21:47:34.031: a33@b using channel reference 158270203044791 ✅
  Sep 23 @ 21:47:35.406: a33@b using channel reference 158278792980979 ❌

  Comparison with Successful Wraparound (Sep 21)

  Sep 21, 2025 @ 21:59:41: Similar wraparound event succeeded
  - All tracks converged to single timeline reference
  - No future timestamps generated
  - System recovered normally

  Key difference: Transcoder pipeline timing and FIFO state during wraparound transition

  The Fix

  Immediate Solution: Remove Early Exit

  uint64_t clock_estimator_get_clock(clock_estimator_t *fifo, int64_t dts) {
      if (fifo->framesFifoTail == -1) {
          return 0;
      }

      int64_t min_distance = INT64_MAX;
      uint64_t best_clock = 0;

      // Always traverse entire FIFO to find true minimum
      for (int64_t runner = fifo->framesFifoHead; runner >= fifo->framesFifoTail; runner--) {
          clock_estimator_sample_t* sample = &(fifo->samples[runner % TIME_ESTIMATOR_FIFO_SIZE]);
          int64_t distance = llabs(dts - sample->dts);

          if (distance < min_distance) {
              min_distance = distance;
              best_clock = dts - sample->dts + sample->clock;
              if (distance == 0) {
                  break;  // Perfect match found
              }
          }
          // NO early exit on increasing distance
      }

      return best_clock;
  }

  Enhanced Solution: Wraparound Detection

  // Add wraparound detection threshold
  #define WRAPAROUND_THRESHOLD (8000000000LL)  // ~24 hours in 90kHz

  if (min_distance > WRAPAROUND_THRESHOLD) {
      return getClock90Khz();  // Use current time instead of extrapolation
  }

  Technical Implications

  Performance Impact

  - Minimal: FIFO typically contains 100-1000 samples
  - O(n) search remains fast for small n
  - Benefit far outweighs cost of preventing system failures

  Reliability Improvement

  - Eliminates future timestamp generation during wraparound
  - Prevents cascade failures in downstream systems
  - Maintains system stability during 26.5-hour boundaries

  Conclusion

  This investigation revealed a subtle but critical bug in time estimation logic that only manifests during
  timestamp wraparound events when transcoding latency is sufficient to create mixed FIFO states. The bug's
  impact cascades through multiple system layers, ultimately causing memory exhaustion and system failure.

  The fix is algorithmic: ensure the time estimator finds the truly optimal solution rather than settling for
   a local optimum that can be catastrophically wrong during wraparound transitions.

  ---Investigation completed: October 1, 2025Systems affected: All transcoded tracks in
  nginx-live-moduleSeverity: Critical system failure every ~26.5 hoursPriority: Immediate fix required

v2.0.0

Toggle v2.0.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix CI Test (#224)

* Fix CI test.

* Fix CI test.

* Fix CI test.

* Fix CI test.

* Fix CI test.

v1.30.1

Toggle v1.30.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
set redis version (#222)

v1.30.0

Toggle v1.30.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ngx-kmp-cc bug fix (#219)

need to output if any of the windows becomes visible, should not be only
according to the last one toggled

v1.29.0-PSVAMB-69845-transcoder-13

Toggle v1.29.0-PSVAMB-69845-transcoder-13's commit message
- fix logs

- fix acks in testing

v1.29.0-PSVAMB-69845-transcoder-12

Toggle v1.29.0-PSVAMB-69845-transcoder-12's commit message
- fix logs

- fix acks in testing

v1.29.0-PSVAMB-69845-transcoder-11

Toggle v1.29.0-PSVAMB-69845-transcoder-11's commit message
- fix warnings

- workaround downstream sending multiple media info after transcoder reconnection

v1.29.0-PSVAMB-69845-transcoder-10

Toggle v1.29.0-PSVAMB-69845-transcoder-10's commit message
send ack for media info (?)

v1.29.0-PSVAMB-69845-transcoder-9

Toggle v1.29.0-PSVAMB-69845-transcoder-9's commit message
attempt to force rtmp not to send old media info

0