CN108737382A - SVC coding HTTP streaming media self-adaption method based on Q-L earning - Google Patents
SVC coding HTTP streaming media self-adaption method based on Q-L earning Download PDFInfo
- Publication number
- CN108737382A CN108737382A CN201810366841.2A CN201810366841A CN108737382A CN 108737382 A CN108737382 A CN 108737382A CN 201810366841 A CN201810366841 A CN 201810366841A CN 108737382 A CN108737382 A CN 108737382A
- Authority
- CN
- China
- Prior art keywords
- behavior
- state
- layer
- thr
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000006399 behavior Effects 0.000 claims abstract description 92
- 230000006870 function Effects 0.000 claims abstract description 7
- 239000012634 fragment Substances 0.000 claims description 13
- 230000009471 action Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 4
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 2
- 230000002452 interceptive effect Effects 0.000 abstract description 3
- 239000010410 layer Substances 0.000 description 58
- 230000003044 adaptive effect Effects 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 6
- 230000001965 increasing effect Effects 0.000 description 4
- 230000001154 acute effect Effects 0.000 description 2
- 239000011229 interlayer Substances 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/65—Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Transfer Between Computers (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
the invention relates to a self-adaptive method of SVC coding HTTP streaming media based on Q-L earning, which comprises the steps of firstly constructing a Q-L earning model, constructing a state set, a behavior set and a return function for the streaming media interactive situation of SVC coding, selecting an exploration strategy, secondly learning the constructed Q-L earning self-adaptive algorithm in an actual network environment off-line until the knowledge obtained by algorithm learning is converged, and finally deploying the obtained model on line to make self-adaptive decision.
Description
Technical field
The invention belongs to information technology fields, especially dynamic self-adapting Streaming Media method
Background technology
In recent years, online streaming media video service is widely used.Online Video business is in entire internet
Flow is just in occupation of increasing proportion.Scalable video (Scalable Video Coding, SVC) can overcome
The redundancy issue of advanced video coding (Advanced Video Coding, AVC) is providing same video matter with AVC codings
When the service of amount, SVC codings can encode the server-side memory space for saving 200%-300% than AVC.Therefore, research is based on
To saving server-side storage resource, provide higher-quality streaming media video service has the adaptive stream media technology of SVC codings
Extremely important realistic meaning.
In streaming media video service, the technology of video playing end most critical is exactly adaptive decision-making method.Currently, research base
It is broadly divided into two classes in the adaptive approach of SVC codings, one kind is according to SVC segment progress adaptive decision-makings encoded and another
Class is the layer self-adapting decision encoded according to SVC.Adaptive decision-making based on segment is mainly according to handling capacity or cache prediction
The credit rating of next video clip, then according to the basal layer and enhancement layer of video quality grade serial download segment.It is based on
Throughput prediction method can bring the problem of fragment masses frequent switching when bandwidth changes.Prediction technique based on caching is then
It maintains high level cache and downloads the video clip of lower quality grade always, cause the whole QOE for watching video relatively low.It is existing
SVC based on section encodes decision-making technique in the acute variation of bandwidth, tends not to timely respond to, causes video cardton.It is another
Class method is to carry out decision based on layer, and existing such methods mainly have the thought following two, 1. use is successively downloaded, first
Ensure that basal layer fills up buffer area.Secondly, ensure that enhancement layer first layer fills up buffer area, until all enhancement layers are all downloaded
It is complete.When downloading slice layer every time, the slice layer of the more inferior grade in ensureing to cache will be needed to be filled.This method can be effective
Guarantee video smooth playing, but quality can be relatively low when entire video playing.2. often having downloaded one layer of basal layer or enhancing
It can determine whether to increase the quality of current clip according to bandwidth variation after layer or fill the basal layer of new segment to fill
Buffer area, can not be flexibly to having filled out when caching filling is more although this mode can timely responsive bandwidth change
The segment filled carries out increased quality, and it is not strong to promote video quality flexibility.
In general, in the streaming media self-adapting method of existing SVC codings, following two problems are primarily present:1. base
Bandwidth variation can not be timely responded in the method that SVC coding segments carry out adaptive decision, video cardton can be caused.2. being based on
The method that SVC coding slice layers carry out decision carries out increased quality to the segment that can not be had been filled with, and whole QoE is caused to decline.
Invention content
The present invention will overcome the disadvantages mentioned above of the prior art, provide a kind of SVC coding HTTP streams based on Q-Learning
Media adaptive approach.
SVC based on Q-Learning encodes HTTP streaming media self-adapting methods, includes the following steps:
1) by SVC coding Streaming Media interaction context build Q-Learning models, need build state set (States),
Strategy is explored in behavior collection (Actions), Reward Program (Reward function), and selection.Build the Q- of intensified learning
Learning model key steps are as follows:
(1.1) state set (States) is built:Select bandwidth and caching occupied state constructing environment state.Client needs
Bandwidth and caching occupied state are carried out discrete.
The maximum value that (1.1.1) defines bandwidth is BWmax, each fragment segmentation is required when in i-th layer at M layers
Lowest-bandwidth is thri(i≤0≤M), we are by bandwidth discrete at { 0~thr0,thr0~thr1,...,thrM-1~thrM, altogether
M+1 state.
It is discrete as follows that (1.1.2) caches occupied state:Definition buffer memory segment ranges are 0~Smax, cache occupied state
Bs (bufferState) is by SmaxA element forms [s1,s2,s3...ssmax], wherein skIndicate k-th of buffer memory position
The basal layer and enhancement layer sum of section storage, such as bs=[0,0,0,0,0,0,0], illustrate that all segments are not filled by.
When bs=[1,1,1,1,1,1,1,1], illustrate that all stuffers have been only filled with basal layer.
State is configured to:S={ bs, bw }, the discrete case and mode of the two elements are as shown in table 1 below:
1 ambient condition of table defines
Element | Range | Discrete way |
bs | [s1,s2,s3...ssmax] | sk∈(1,2,..M),k∈(1,2...Smax) |
bw | 0~BWmax | { 0~thr0,thr0~thr1,...thrM-1~thrM} |
(1.2) behavior collection is defined as a=(index, layer), is cache location subscript (index) and cache location respectively
Required download grade (layer), such as a=(3,0), then it represents that next file of download is the base in the 3rd segment in caching
Plinth layer.Different states generally has different optional behavior collection.The discrete case and mode of behavior collection element such as the following table 2 institute
Show:
2 behavior collection of table defines
Element | Range | Discrete way |
index | 0~Smax | {1,2...Smax} |
layer | 0~M | {0,1,2...M} |
(1.2.1) decision behavior from the optional behavior of current state concentration selected, after determining behavior, by behavior into
Row downloads next slice layer.The addition of behavior collection is as follows, if current cache occupied state bs=[s1,s2,...sk] when, currently
The optional behavior collection of state thoroughly does away with current cache occupied state and is added from left to right, if skState not be 0, then add
Behavior a=(k, sk), if skIt is 0, then adds a=(k, 0), and terminate and carry out searching new state.If bs fillings have been expired,
Then enter sleep state and when needing to refill, then carries out decision after video clip is removed during waiting caches.Bs=[1,0,
0,0,0,0,0] the download behavior (1,1) (2,0) that lower a moment can select is gone next time when bs=[3,2,0,0,0,0,0]
Can to be (1,3), (2,2), (3,0), three behaviors.
(1.3) Reward Program (Reward function):Reward Program includes three factor rfreeze, ractionWith
rswitch, it is defined as follows:
It is r that (1.3.1), which defines behavior return value,freezeIf the behavior of selection causes video pause broadcasting, to it
It is punished, enables rfreeze=-10000, otherwise enable rfreeze=0
It is r that (1.3.2), which defines behavior return value,action,raction=100* (10-index)+layer, wherein index tables
Show it is piece fragment position in caching, layer then illustrates currently lower carrier layer credit rating while also representing current video piece
The quality of section, if the behavior of selection tends to that when caching the position of subscript earlier above higher return value can be obtained as possible
(1.3.4) we define the quality of segment and be switched to rswitch, defined formula rswitch=-10*abs
(leftlayer-layer)+(- 10) * abs (rightlayer-layer) calculates position fragment masses layer and the left side of filling
The segment level (leftlayer) on side it is of poor quality and filling position fragment masses layer and the right level
(rightlayer) of poor quality, if gap is larger, punished.
The linear overall return value of (1.3.5) definition is r, r=rfreeze+raction+rswitch。
(1.4) strategy is explored
Select Softmax tactful as exploring.Boltzmann probability point is carried out according to the optional behavior Q values of current state
Cloth calculates, different action probability distribution such as formula:
Wherein π (a | s) is the probability of housing choice behavior a under s states, calculates the index multiple of the optional behavior collection e under s states
It is cumulative and, and weight of a behaviors in all behaviors is determined by τ parameters.It ensure that different behaviors has different quilts
The probability chosen.
2) off-line training Q-Learning algorithms;
(2.1.1) determines input relevant parameter:Learning rate α, discount factor γ, Reward Program r, current bandwidth bw, when
Preceding buffer area occupied state (bs);
(2.1.2) determines output parameter:Convergent Q tables;
(2.1.3) determines random initializtion Q tables;
(2.1.4) checks whether Q tables restrain, and terminates if Q tables are restrained, new spy is carried out if Q tables are not yet restrained
Rope;
(2.1.5) plays video and carries out new round exploration;
(2.1.6) determines current state s according to current bandwidth and caching occupied state;
(2.1.7), which is used, explores tactful (softmax) from s state housing choice behaviors a;
(2.1.8) process performing a calculates return value r, and enters NextState
(2.1.9) by formula Q (s, a)=(1- α) Q (s, a)+α (r+ γ .max (s`, a`)) update Q tables;
State is set to s by (2.1.10):=s`;
Whether (2.1.11) video, which plays, is terminated, and (2.1.4) is entered step if playing and terminating, if not yet playing knot
Beam then enters step (2.1.5).
3) it is applied in model line
According to current cache occupied state (bs) and current bandwidth (bw), current state is inquired in Q tables, and passes through inquiry
All executable behaviors in this state determine that the Q values of that behavior are maximum, then execute the behavior.When decision trip is (a)
When, then download the corresponding slice layer of behavior (a).
The present invention applies the Q-Learning algorithms of intensified learning in the adaptive cross layer method that SVC is encoded, and improves
The limitation of existing layer decision-making technique, improves user's viewing experience, advantage is as follows:
Solve the problems, such as it is existing based on SVC coding section decision-making techniques can not timely respond to band acute variation, can
According to the bandwidth of client end of playing back and caching occupied state, dynamic carries out interlayer decision, carries the user of client streaming media playing
Experience.
Interlayer decision is carried out using Q-Learning algorithms, is improved existing slow to what is had been filled with based on layer decision-making technique
It rushes area's segment and carries out more effective increased quality.
Description of the drawings
Fig. 1 is the flow chart of the method for the present invention
Head office when Fig. 2 is the behavior collection structure of intensified learning of the present invention is collection.
Fig. 3 be the present invention stuffer layer (1,0) after optional behavior collection.
Fig. 4 is the stuffer layer (1,0) of the present invention, after (1,1) and (2,0), optional behavior collection after caching.
Fig. 5 is the algorithm flow chart of the present invention.
Specific implementation mode
Referring to attached drawing, further illustrate the present invention:
SVC based on Q-Learning encodes HTTP streaming media self-adapting methods, includes the following steps:
1) for algorithm interactive environment referring to shown in attached drawing 1, DASH server-sides store the basis of multiple segments of a video
Layer and multilayer enhancement layer and MPD file, client downloads MPD file and obtains video clip relevant information first, and according to band
Wide and caching occupied state factor carries out adaptive decision-making.Algorithm is carried out according to the caching occupied state and bandwidth of interactive environment
Build Q-Learning models.It needs to build state set (States), behavior collection (Actions), Reward Program (Reward
Function), and strategy is explored in selection.The Q-Learning model key steps for building intensified learning are as follows:
(1.1) state set (States) is built:Select bandwidth and caching occupied state constructing environment state.Client needs
Bandwidth and caching occupied state are carried out discrete.
The maximum value that (1.1.1) defines bandwidth is BWmax, each fragment segmentation is required when in i-th layer at M layers
Lowest-bandwidth is thri(i≤0≤M), we are by bandwidth discrete at { 0~thr0,thr0~thr1,...,thrM-1~thrM, altogether
M+1 state.
It is discrete as follows that (1.1.2) caches occupied state:Definition buffer memory segment ranges are 0~Smax, cache occupied state
Bs (bufferState) is by SmaxA element forms [s1,s2,s3...ssmax], wherein skIndicate k-th of buffer memory position
The basal layer and enhancement layer sum of section storage, such as bs=[0,0,0,0,0,0,0], illustrate that all segments are not filled by.
When bs=[1,1,1,1,1,1,1,1], illustrate that all stuffers have been only filled with basal layer.
State is configured to:S={ bs, bw }, the discrete case and mode of the two elements are as shown in table 1 below:
1 ambient condition of table defines
Element | Range | Discrete way |
bs | [s1,s2,s3...ssmax] | sk∈(1,2,..M),k∈(1,2...Smax) |
bw | 0~BWmax | { 0~thr0,thr0~thr1,...thrM-1~thrM} |
(1.2) behavior collection is defined as a=(index, layer), is cache location subscript (index) and cache location respectively
Required download grade (layer), behavior, which always collects, participates in attached drawing 2.Such as a=(3,0), then it represents that next file of download is caching
In basal layer in the 3rd segment.Different states generally has different optional behavior collection.The discrete case of behavior collection element
And mode is as shown in table 2 below:
2 behavior collection of table defines
Element | Range | Discrete way |
index | 0~Smax | {1,2...Smax} |
layer | 0~M | {0,1,2...M} |
(1.2.1) decision behavior from the optional behavior of current state concentration selected, after determining behavior, by behavior into
Row downloads next slice layer.The addition of behavior collection is as follows, if current cache occupied state bs=[s1,s2,...sk] when, currently
The optional behavior collection of state thoroughly does away with current cache occupied state and is added from left to right, if skState not be 0, then add
Behavior a=(k, sk), if skIt is 0, then adds a=(k, 0), and terminate and carry out searching new state.If bs fillings have been expired,
Then enter sleep state and when needing to refill, then carries out decision after video clip is removed during waiting caches.Bs=[1,0,
0,0,0,0,0] attached drawing 3 is participated in the download behavior (1,1) (2,0) that lower a moment can select;When bs=[3,2,0,0,0,0,0]
When, behavior next time can be (1,3), (2,2), (3,0), three behaviors, shown in attached drawing 4.
(1.3) Reward Program (Reward function):Reward Program includes three factor rfreeze, ractionWith
rswitch, it is defined as follows:
It is r that (1.3.1), which defines behavior return value,freezeIf the behavior of selection causes video pause broadcasting, to it
It is punished, enables rfreeze=-10000, otherwise enable rfreeze=0
It is r that (1.3.2), which defines behavior return value,action,raction=100* (10-index)+layer, wherein index tables
Show it is piece fragment position in caching, layer then illustrates currently lower carrier layer credit rating while also representing current video piece
The quality of section, if the behavior of selection tends to that when caching the position of subscript earlier above higher return value can be obtained as possible
(1.3.4) we define the quality of segment and be switched to rswitch, defined formula rswitch=-10*abs
(leftlayer-layer)+(- 10) * abs (rightlayer-layer) calculates position fragment masses layer and the left side of filling
The segment level (leftlayer) on side it is of poor quality and filling position fragment masses layer and the right level
(rightlayer) of poor quality, if gap is larger, punished.
The linear overall return value of (1.3.5) definition is r, r=rfreeze+raction+rswitch。
(1.4) strategy is explored;
(1.4.1 selects Softmax tactful as exploring.Boltzmann is carried out according to the optional behavior Q values of current state
Probability distribution calculates, different action probability distribution such as formula:
Wherein π (a | s) is the probability of housing choice behavior a under s states, calculates the index multiple of the optional behavior collection e under s states
It is cumulative and, and weight of a behaviors in all behaviors is determined by τ parameters.It ensure that different behaviors has different quilts
The probability chosen.
2) off-line training Q-Learning algorithms
(2.1.1) determines input relevant parameter:Learning rate α, discount factor γ, Reward Program r, current bandwidth bw, when
Preceding buffer area occupied state (bs)
(2.1.2) determines output parameter:Convergent Q tables
(2.1.3) determines random initializtion Q tables;
(2.1.4) determines whether Q tables restrain, and terminates if Q tables are restrained, new spy is carried out if Q tables are not yet restrained
Rope;
(2.1.5) plays video and carries out a new wheel exploration;
(2.1.6) determines current state s according to current bandwidth and caching occupied state;
(2.1.7), which is used, explores tactful (softmax) from s state housing choice behaviors a;
(2.1.8) process performing a calculates return value r, and enters NextState
(2.1.9) by formula Q (s, a)=(1- α) Q (s, a)+α (r+ γ .max (s`, a`)) update Q tables;
State is set to s by (2.1.10):=s`;
Whether (2.1.11) video, which plays, is terminated, and (2.1.4) is entered step if playing and terminating, if not yet playing knot
Beam then enters step (2.1.5).
3) it is applied in model line;
According to current cache occupied state (bs) and current bandwidth (bw), current state is inquired in Q tables, and passes through inquiry
All executable behaviors in this state determine that the Q values of that behavior are maximum, then execute the behavior.When decision trip is (a)
When, then download the corresponding slice layer of behavior (a).
Content described in this specification embodiment is only enumerating to the way of realization of inventive concept, protection of the invention
Range is not construed as being only limitted to the concrete form that embodiment is stated, protection scope of the present invention is also and in art technology
Personnel according to present inventive concept it is conceivable that equivalent technologies mean.
Claims (1)
1. the SVC based on Q-Learning encodes HTTP streaming media self-adapting methods, include the following steps:
1) the Streaming Media interaction context of SVC codings is built into Q-Learning models, needs to build state set (States), behavior
Collect (Actions), Reward Program (Reward function), and strategy is explored in selection;Build intensified learning Q-Learning
The step of model, is as follows:
(1.1) state set (States) is built:Bandwidth and caching occupied state constructing environment state, client is selected to need to band
Wide and caching occupied state carries out discrete;
The departure process of (1.1.1) bandwidth is as follows:The maximum value for defining bandwidth is BWmax, each fragment segmentation at M layers, when in
At i-th layer, required lowest-bandwidth is thri, i≤0≤M, by bandwidth discrete at { 0~thr0,thr0~thr1,...,thrM-1
~thrM, total M+1 state;
It is discrete as follows that (1.1.2) caches occupied state:Definition buffer memory segment ranges are 0~Smax, caching occupied state bs
(bufferState) by SmaxA element forms [s1,s2,s3...ssmax], wherein skIndicate k-th of segment of buffer memory position
Basal layer and the enhancement layer sum of storage;
State is configured to:S={ bs, bw }, the discrete case and mode of the two elements are as shown in table 1 below:
1 ambient condition of table defines
(1.2) behavior collection structure (Actions):Behavior collection is defined as a=(index, layer), is cache location subscript respectively
(index) and needed for cache location grade (layer) is downloaded;Different states generally has different optional behavior collection;Behavior
Discrete case and the mode for collecting element are as shown in table 2 below:
2 behavior collection of table defines
(1.2.1) decision behavior is selected from the optional behavior of current state concentration, after determining behavior, is carried out down by behavior
Carry next slice layer;The addition of behavior collection is as follows, if current cache occupied state bs=[s1,s2,...sk] when, current state
Optional behavior collection be added from left to right according to current cache occupied state, if skState not be 0, then add behavior
A=(k, sk), if skIt is 0, then adds a=(k, 0), and terminate and carry out searching new behavior;If bs fillings have been expired, into
Enter sleep state and when needing to refill, then carries out decision after video clip is removed during waiting caches;
(1.3) Reward Program (Reward function):Reward Program includes three factor rfreeze, ractionAnd rswitch,
It is defined as follows:
It is r that (1.3.1), which defines behavior return value,freezeIf the behavior of selection causes video pause broadcasting, it is carried out
Punishment, enables rfreeze=-10000, otherwise enable rfreeze=0;
It is r that (1.3.2), which defines behavior return value,action,raction=100* (10-index)+layer, wherein index are illustrated
It is the piece fragment position in caching, layer then illustrates currently lower carrier layer credit rating while also representing current video segment
Quality, if the behavior of selection tends to that when caching the position of subscript earlier above higher return value can be obtained as possible;
The quality that (1.3.3) defines segment is switched to rswitch, defined formula rswitch=-10*abs (leftlayer-
Layer)+(- 10) * abs (rightlayer-layer) calculates the slice layer of position the fragment masses layer and the left side of filling
Grade (leftlayer) it is of poor quality and filling position fragment masses layer and the right level (rightlayer) it is of poor quality;
The linear overall return value of (1.3.4) definition is r, r=rfreeze+raction+rswitch;
(1.4) strategy is explored;
It selects Softmax as strategy is explored, Boltzmann probability distribution is carried out according to the Q values of the optional behavior of current state
It calculates, the probability distribution formula of difference action is as follows:
Wherein π (a | s) is the probability of housing choice behavior a under s states, and the index multiple for calculating the optional row collection e under s states is cumulative
With and weight of a behaviors in all behaviors is determined by τ parameters, it is different selected to ensure that different behaviors has
Probability;
2) off-line training Q-Learning algorithms are built;
(2.1.1) determines input parameter:Learning rate α, discount factor γ, Reward Program r, current bandwidth bw, current cache area
Occupied state (bs);
(2.1.2) determines output parameter:Convergent Q tables
(2.1.3) determines random initializtion Q tables;
(2.1.4) checks whether Q tables restrain, and terminates if Q tables are restrained, new exploration is carried out if Q tables are not yet restrained;
(2.1.5) plays video and carries out a new wheel exploration;
(2.1.6) determines current state s according to current bandwidth and caching occupied state;
(2.1.7), which is used, explores tactful (softmax) from s state housing choice behaviors a;
(2.1.8) process performing a calculates return value r, and enters NextState
(2.1.9) by formula Q (s, a)=(1- α) Q (s, a)+α (r+ γ .max (s`, a`)) update Q tables;
State is set to s by (2.1.10):=s`;
Whether (2.1.11) video, which plays, is terminated, and (2.1.4) is entered step if playing and terminating, and is terminated if not yet played,
It then enters step (2.1.5);
3) it is applied in model line;
According to current cache occupied state (bs) and current bandwidth (bw), current state is inquired in Q tables, and by inquiring at this
All executable behaviors under state determine that the Q values of that behavior are maximum, then execute the behavior;When decision trip is (a), then
The corresponding slice layer of download behavior (a).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810366841.2A CN108737382B (en) | 2018-04-23 | 2018-04-23 | SVC coding HTTP streaming media self-adaption method based on Q-Learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810366841.2A CN108737382B (en) | 2018-04-23 | 2018-04-23 | SVC coding HTTP streaming media self-adaption method based on Q-Learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108737382A true CN108737382A (en) | 2018-11-02 |
CN108737382B CN108737382B (en) | 2020-10-09 |
Family
ID=63939733
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810366841.2A Active CN108737382B (en) | 2018-04-23 | 2018-04-23 | SVC coding HTTP streaming media self-adaption method based on Q-Learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108737382B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109802964A (en) * | 2019-01-23 | 2019-05-24 | 西北大学 | A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN |
WO2021006972A1 (en) * | 2019-07-10 | 2021-01-14 | Microsoft Technology Licensing, Llc | Reinforcement learning in real-time communications |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521202A (en) * | 2011-11-18 | 2012-06-27 | 东南大学 | Automatic discovery method of complex system oriented MAXQ task graph structure |
CN103326946A (en) * | 2013-07-02 | 2013-09-25 | 中国(南京)未来网络产业创新中心 | SVC streaming media transmission optimization method based on OpenFlow |
CN104022850A (en) * | 2014-06-20 | 2014-09-03 | 太原科技大学 | Self-adaptive layered video transmission method based on channel characteristics |
CN104270646A (en) * | 2014-09-22 | 2015-01-07 | 何震宇 | Self-adaption transmission method and system based on mobile streaming media |
CN105072671A (en) * | 2015-06-30 | 2015-11-18 | 国网山东省电力公司潍坊供电公司 | Adaptive scheduling method for sensor nodes in advanced metering system network |
US20150373075A1 (en) * | 2014-06-23 | 2015-12-24 | Radia Perlman | Multiple network transport sessions to provide context adaptive video streaming |
US20160248835A1 (en) * | 2015-02-24 | 2016-08-25 | Koninklijke Kpn N.V. | Fair Adaptive Streaming |
-
2018
- 2018-04-23 CN CN201810366841.2A patent/CN108737382B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521202A (en) * | 2011-11-18 | 2012-06-27 | 东南大学 | Automatic discovery method of complex system oriented MAXQ task graph structure |
CN103326946A (en) * | 2013-07-02 | 2013-09-25 | 中国(南京)未来网络产业创新中心 | SVC streaming media transmission optimization method based on OpenFlow |
CN104022850A (en) * | 2014-06-20 | 2014-09-03 | 太原科技大学 | Self-adaptive layered video transmission method based on channel characteristics |
US20150373075A1 (en) * | 2014-06-23 | 2015-12-24 | Radia Perlman | Multiple network transport sessions to provide context adaptive video streaming |
CN104270646A (en) * | 2014-09-22 | 2015-01-07 | 何震宇 | Self-adaption transmission method and system based on mobile streaming media |
US20160248835A1 (en) * | 2015-02-24 | 2016-08-25 | Koninklijke Kpn N.V. | Fair Adaptive Streaming |
CN105072671A (en) * | 2015-06-30 | 2015-11-18 | 国网山东省电力公司潍坊供电公司 | Adaptive scheduling method for sensor nodes in advanced metering system network |
Non-Patent Citations (2)
Title |
---|
熊丽荣,等.: "一种基于HTTP自适应流的混合码率自适应算法", 《计算机科学》 * |
熊丽荣,等.: "基于Q-learning的HTTP自适应流码率控制方法研究", 《通信学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109802964A (en) * | 2019-01-23 | 2019-05-24 | 西北大学 | A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN |
CN109802964B (en) * | 2019-01-23 | 2021-09-28 | 西北大学 | DQN-based HTTP adaptive flow control energy consumption optimization method |
WO2021006972A1 (en) * | 2019-07-10 | 2021-01-14 | Microsoft Technology Licensing, Llc | Reinforcement learning in real-time communications |
US11373108B2 (en) | 2019-07-10 | 2022-06-28 | Microsoft Technology Licensing, Llc | Reinforcement learning in real-time communications |
Also Published As
Publication number | Publication date |
---|---|
CN108737382B (en) | 2020-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sengupta et al. | HotDASH: Hotspot aware adaptive video streaming using deep reinforcement learning | |
Zhang et al. | Video super-resolution and caching—An edge-assisted adaptive video streaming solution | |
CN113434212B (en) | Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning | |
Bokani et al. | Optimizing HTTP-based adaptive streaming in vehicular environment using markov decision process | |
CN103370709A (en) | A cache manager for segmented multimedia and corresponding method for cache management | |
CN113315978B (en) | Collaborative online video edge caching method based on federal learning | |
Li et al. | An apprenticeship learning approach for adaptive video streaming based on chunk quality and user preference | |
CN115022684B (en) | Video stream self-adaptive transmission method based on deep reinforcement learning under QUIC protocol | |
CN103338393A (en) | Video code rate selecting method driven by user experience under HSPA system | |
CN108737382A (en) | SVC coding HTTP streaming media self-adaption method based on Q-L earning | |
Li et al. | DAVS: Dynamic-chunk quality aware adaptive video streaming using apprenticeship learning | |
CN116962414A (en) | Self-adaptive video streaming transmission method and system based on server-free calculation | |
CN116249162A (en) | Collaborative caching method based on deep reinforcement learning in vehicle-mounted edge network | |
CN112055263A (en) | 360-degree video streaming transmission system based on significance detection | |
Shi et al. | CoLEAP: Cooperative learning-based edge scheme with caching and prefetching for DASH video delivery | |
Feng et al. | Timely and accurate bitrate switching in HTTP adaptive streaming with date-driven I-frame prediction | |
Zahran et al. | ARBITER: Adaptive rate-based intelligent HTTP streaming algorithm | |
CN109802964A (en) | A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN | |
Cai et al. | A multi-objective optimization approach to resource allocation for edge-based digital twin | |
Lin et al. | KNN-Q learning algorithm of bitrate adaptation for video streaming over HTTP | |
CN112333456B (en) | Live video transmission method based on cloud edge protocol | |
Lu et al. | Deep-reinforcement-learning-based user-preference-aware rate adaptation for video streaming | |
CN118175356A (en) | Video transmission method, device, equipment and storage medium | |
CN103179441A (en) | Method and server for playing contents | |
CN116209015B (en) | Edge network cache scheduling method, system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |