Nothing Special   »   [go: up one dir, main page]

CN108737382A - SVC coding HTTP streaming media self-adaption method based on Q-L earning - Google Patents

SVC coding HTTP streaming media self-adaption method based on Q-L earning Download PDF

Info

Publication number
CN108737382A
CN108737382A CN201810366841.2A CN201810366841A CN108737382A CN 108737382 A CN108737382 A CN 108737382A CN 201810366841 A CN201810366841 A CN 201810366841A CN 108737382 A CN108737382 A CN 108737382A
Authority
CN
China
Prior art keywords
behavior
state
layer
thr
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810366841.2A
Other languages
Chinese (zh)
Other versions
CN108737382B (en
Inventor
熊丽荣
尤日晶
沈树茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810366841.2A priority Critical patent/CN108737382B/en
Publication of CN108737382A publication Critical patent/CN108737382A/en
Application granted granted Critical
Publication of CN108737382B publication Critical patent/CN108737382B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

the invention relates to a self-adaptive method of SVC coding HTTP streaming media based on Q-L earning, which comprises the steps of firstly constructing a Q-L earning model, constructing a state set, a behavior set and a return function for the streaming media interactive situation of SVC coding, selecting an exploration strategy, secondly learning the constructed Q-L earning self-adaptive algorithm in an actual network environment off-line until the knowledge obtained by algorithm learning is converged, and finally deploying the obtained model on line to make self-adaptive decision.

Description

SVC based on Q-Learning encodes HTTP streaming media self-adapting methods
Technical field
The invention belongs to information technology fields, especially dynamic self-adapting Streaming Media method
Background technology
In recent years, online streaming media video service is widely used.Online Video business is in entire internet Flow is just in occupation of increasing proportion.Scalable video (Scalable Video Coding, SVC) can overcome The redundancy issue of advanced video coding (Advanced Video Coding, AVC) is providing same video matter with AVC codings When the service of amount, SVC codings can encode the server-side memory space for saving 200%-300% than AVC.Therefore, research is based on To saving server-side storage resource, provide higher-quality streaming media video service has the adaptive stream media technology of SVC codings Extremely important realistic meaning.
In streaming media video service, the technology of video playing end most critical is exactly adaptive decision-making method.Currently, research base It is broadly divided into two classes in the adaptive approach of SVC codings, one kind is according to SVC segment progress adaptive decision-makings encoded and another Class is the layer self-adapting decision encoded according to SVC.Adaptive decision-making based on segment is mainly according to handling capacity or cache prediction The credit rating of next video clip, then according to the basal layer and enhancement layer of video quality grade serial download segment.It is based on Throughput prediction method can bring the problem of fragment masses frequent switching when bandwidth changes.Prediction technique based on caching is then It maintains high level cache and downloads the video clip of lower quality grade always, cause the whole QOE for watching video relatively low.It is existing SVC based on section encodes decision-making technique in the acute variation of bandwidth, tends not to timely respond to, causes video cardton.It is another Class method is to carry out decision based on layer, and existing such methods mainly have the thought following two, 1. use is successively downloaded, first Ensure that basal layer fills up buffer area.Secondly, ensure that enhancement layer first layer fills up buffer area, until all enhancement layers are all downloaded It is complete.When downloading slice layer every time, the slice layer of the more inferior grade in ensureing to cache will be needed to be filled.This method can be effective Guarantee video smooth playing, but quality can be relatively low when entire video playing.2. often having downloaded one layer of basal layer or enhancing It can determine whether to increase the quality of current clip according to bandwidth variation after layer or fill the basal layer of new segment to fill Buffer area, can not be flexibly to having filled out when caching filling is more although this mode can timely responsive bandwidth change The segment filled carries out increased quality, and it is not strong to promote video quality flexibility.
In general, in the streaming media self-adapting method of existing SVC codings, following two problems are primarily present:1. base Bandwidth variation can not be timely responded in the method that SVC coding segments carry out adaptive decision, video cardton can be caused.2. being based on The method that SVC coding slice layers carry out decision carries out increased quality to the segment that can not be had been filled with, and whole QoE is caused to decline.
Invention content
The present invention will overcome the disadvantages mentioned above of the prior art, provide a kind of SVC coding HTTP streams based on Q-Learning Media adaptive approach.
SVC based on Q-Learning encodes HTTP streaming media self-adapting methods, includes the following steps:
1) by SVC coding Streaming Media interaction context build Q-Learning models, need build state set (States), Strategy is explored in behavior collection (Actions), Reward Program (Reward function), and selection.Build the Q- of intensified learning Learning model key steps are as follows:
(1.1) state set (States) is built:Select bandwidth and caching occupied state constructing environment state.Client needs Bandwidth and caching occupied state are carried out discrete.
The maximum value that (1.1.1) defines bandwidth is BWmax, each fragment segmentation is required when in i-th layer at M layers Lowest-bandwidth is thri(i≤0≤M), we are by bandwidth discrete at { 0~thr0,thr0~thr1,...,thrM-1~thrM, altogether M+1 state.
It is discrete as follows that (1.1.2) caches occupied state:Definition buffer memory segment ranges are 0~Smax, cache occupied state Bs (bufferState) is by SmaxA element forms [s1,s2,s3...ssmax], wherein skIndicate k-th of buffer memory position The basal layer and enhancement layer sum of section storage, such as bs=[0,0,0,0,0,0,0], illustrate that all segments are not filled by. When bs=[1,1,1,1,1,1,1,1], illustrate that all stuffers have been only filled with basal layer.
State is configured to:S={ bs, bw }, the discrete case and mode of the two elements are as shown in table 1 below:
1 ambient condition of table defines
Element Range Discrete way
bs [s1,s2,s3...ssmax] sk∈(1,2,..M),k∈(1,2...Smax)
bw 0~BWmax { 0~thr0,thr0~thr1,...thrM-1~thrM}
(1.2) behavior collection is defined as a=(index, layer), is cache location subscript (index) and cache location respectively Required download grade (layer), such as a=(3,0), then it represents that next file of download is the base in the 3rd segment in caching Plinth layer.Different states generally has different optional behavior collection.The discrete case and mode of behavior collection element such as the following table 2 institute Show:
2 behavior collection of table defines
Element Range Discrete way
index 0~Smax {1,2...Smax}
layer 0~M {0,1,2...M}
(1.2.1) decision behavior from the optional behavior of current state concentration selected, after determining behavior, by behavior into Row downloads next slice layer.The addition of behavior collection is as follows, if current cache occupied state bs=[s1,s2,...sk] when, currently The optional behavior collection of state thoroughly does away with current cache occupied state and is added from left to right, if skState not be 0, then add Behavior a=(k, sk), if skIt is 0, then adds a=(k, 0), and terminate and carry out searching new state.If bs fillings have been expired, Then enter sleep state and when needing to refill, then carries out decision after video clip is removed during waiting caches.Bs=[1,0, 0,0,0,0,0] the download behavior (1,1) (2,0) that lower a moment can select is gone next time when bs=[3,2,0,0,0,0,0] Can to be (1,3), (2,2), (3,0), three behaviors.
(1.3) Reward Program (Reward function):Reward Program includes three factor rfreeze, ractionWith rswitch, it is defined as follows:
It is r that (1.3.1), which defines behavior return value,freezeIf the behavior of selection causes video pause broadcasting, to it It is punished, enables rfreeze=-10000, otherwise enable rfreeze=0
It is r that (1.3.2), which defines behavior return value,action,raction=100* (10-index)+layer, wherein index tables Show it is piece fragment position in caching, layer then illustrates currently lower carrier layer credit rating while also representing current video piece The quality of section, if the behavior of selection tends to that when caching the position of subscript earlier above higher return value can be obtained as possible
(1.3.4) we define the quality of segment and be switched to rswitch, defined formula rswitch=-10*abs (leftlayer-layer)+(- 10) * abs (rightlayer-layer) calculates position fragment masses layer and the left side of filling The segment level (leftlayer) on side it is of poor quality and filling position fragment masses layer and the right level (rightlayer) of poor quality, if gap is larger, punished.
The linear overall return value of (1.3.5) definition is r, r=rfreeze+raction+rswitch
(1.4) strategy is explored
Select Softmax tactful as exploring.Boltzmann probability point is carried out according to the optional behavior Q values of current state Cloth calculates, different action probability distribution such as formula:
Wherein π (a | s) is the probability of housing choice behavior a under s states, calculates the index multiple of the optional behavior collection e under s states It is cumulative and, and weight of a behaviors in all behaviors is determined by τ parameters.It ensure that different behaviors has different quilts The probability chosen.
2) off-line training Q-Learning algorithms;
(2.1.1) determines input relevant parameter:Learning rate α, discount factor γ, Reward Program r, current bandwidth bw, when Preceding buffer area occupied state (bs);
(2.1.2) determines output parameter:Convergent Q tables;
(2.1.3) determines random initializtion Q tables;
(2.1.4) checks whether Q tables restrain, and terminates if Q tables are restrained, new spy is carried out if Q tables are not yet restrained Rope;
(2.1.5) plays video and carries out new round exploration;
(2.1.6) determines current state s according to current bandwidth and caching occupied state;
(2.1.7), which is used, explores tactful (softmax) from s state housing choice behaviors a;
(2.1.8) process performing a calculates return value r, and enters NextState
(2.1.9) by formula Q (s, a)=(1- α) Q (s, a)+α (r+ γ .max (s`, a`)) update Q tables;
State is set to s by (2.1.10):=s`;
Whether (2.1.11) video, which plays, is terminated, and (2.1.4) is entered step if playing and terminating, if not yet playing knot Beam then enters step (2.1.5).
3) it is applied in model line
According to current cache occupied state (bs) and current bandwidth (bw), current state is inquired in Q tables, and passes through inquiry All executable behaviors in this state determine that the Q values of that behavior are maximum, then execute the behavior.When decision trip is (a) When, then download the corresponding slice layer of behavior (a).
The present invention applies the Q-Learning algorithms of intensified learning in the adaptive cross layer method that SVC is encoded, and improves The limitation of existing layer decision-making technique, improves user's viewing experience, advantage is as follows:
Solve the problems, such as it is existing based on SVC coding section decision-making techniques can not timely respond to band acute variation, can According to the bandwidth of client end of playing back and caching occupied state, dynamic carries out interlayer decision, carries the user of client streaming media playing Experience.
Interlayer decision is carried out using Q-Learning algorithms, is improved existing slow to what is had been filled with based on layer decision-making technique It rushes area's segment and carries out more effective increased quality.
Description of the drawings
Fig. 1 is the flow chart of the method for the present invention
Head office when Fig. 2 is the behavior collection structure of intensified learning of the present invention is collection.
Fig. 3 be the present invention stuffer layer (1,0) after optional behavior collection.
Fig. 4 is the stuffer layer (1,0) of the present invention, after (1,1) and (2,0), optional behavior collection after caching.
Fig. 5 is the algorithm flow chart of the present invention.
Specific implementation mode
Referring to attached drawing, further illustrate the present invention:
SVC based on Q-Learning encodes HTTP streaming media self-adapting methods, includes the following steps:
1) for algorithm interactive environment referring to shown in attached drawing 1, DASH server-sides store the basis of multiple segments of a video Layer and multilayer enhancement layer and MPD file, client downloads MPD file and obtains video clip relevant information first, and according to band Wide and caching occupied state factor carries out adaptive decision-making.Algorithm is carried out according to the caching occupied state and bandwidth of interactive environment Build Q-Learning models.It needs to build state set (States), behavior collection (Actions), Reward Program (Reward Function), and strategy is explored in selection.The Q-Learning model key steps for building intensified learning are as follows:
(1.1) state set (States) is built:Select bandwidth and caching occupied state constructing environment state.Client needs Bandwidth and caching occupied state are carried out discrete.
The maximum value that (1.1.1) defines bandwidth is BWmax, each fragment segmentation is required when in i-th layer at M layers Lowest-bandwidth is thri(i≤0≤M), we are by bandwidth discrete at { 0~thr0,thr0~thr1,...,thrM-1~thrM, altogether M+1 state.
It is discrete as follows that (1.1.2) caches occupied state:Definition buffer memory segment ranges are 0~Smax, cache occupied state Bs (bufferState) is by SmaxA element forms [s1,s2,s3...ssmax], wherein skIndicate k-th of buffer memory position The basal layer and enhancement layer sum of section storage, such as bs=[0,0,0,0,0,0,0], illustrate that all segments are not filled by. When bs=[1,1,1,1,1,1,1,1], illustrate that all stuffers have been only filled with basal layer.
State is configured to:S={ bs, bw }, the discrete case and mode of the two elements are as shown in table 1 below:
1 ambient condition of table defines
Element Range Discrete way
bs [s1,s2,s3...ssmax] sk∈(1,2,..M),k∈(1,2...Smax)
bw 0~BWmax { 0~thr0,thr0~thr1,...thrM-1~thrM}
(1.2) behavior collection is defined as a=(index, layer), is cache location subscript (index) and cache location respectively Required download grade (layer), behavior, which always collects, participates in attached drawing 2.Such as a=(3,0), then it represents that next file of download is caching In basal layer in the 3rd segment.Different states generally has different optional behavior collection.The discrete case of behavior collection element And mode is as shown in table 2 below:
2 behavior collection of table defines
Element Range Discrete way
index 0~Smax {1,2...Smax}
layer 0~M {0,1,2...M}
(1.2.1) decision behavior from the optional behavior of current state concentration selected, after determining behavior, by behavior into Row downloads next slice layer.The addition of behavior collection is as follows, if current cache occupied state bs=[s1,s2,...sk] when, currently The optional behavior collection of state thoroughly does away with current cache occupied state and is added from left to right, if skState not be 0, then add Behavior a=(k, sk), if skIt is 0, then adds a=(k, 0), and terminate and carry out searching new state.If bs fillings have been expired, Then enter sleep state and when needing to refill, then carries out decision after video clip is removed during waiting caches.Bs=[1,0, 0,0,0,0,0] attached drawing 3 is participated in the download behavior (1,1) (2,0) that lower a moment can select;When bs=[3,2,0,0,0,0,0] When, behavior next time can be (1,3), (2,2), (3,0), three behaviors, shown in attached drawing 4.
(1.3) Reward Program (Reward function):Reward Program includes three factor rfreeze, ractionWith rswitch, it is defined as follows:
It is r that (1.3.1), which defines behavior return value,freezeIf the behavior of selection causes video pause broadcasting, to it It is punished, enables rfreeze=-10000, otherwise enable rfreeze=0
It is r that (1.3.2), which defines behavior return value,action,raction=100* (10-index)+layer, wherein index tables Show it is piece fragment position in caching, layer then illustrates currently lower carrier layer credit rating while also representing current video piece The quality of section, if the behavior of selection tends to that when caching the position of subscript earlier above higher return value can be obtained as possible
(1.3.4) we define the quality of segment and be switched to rswitch, defined formula rswitch=-10*abs (leftlayer-layer)+(- 10) * abs (rightlayer-layer) calculates position fragment masses layer and the left side of filling The segment level (leftlayer) on side it is of poor quality and filling position fragment masses layer and the right level (rightlayer) of poor quality, if gap is larger, punished.
The linear overall return value of (1.3.5) definition is r, r=rfreeze+raction+rswitch
(1.4) strategy is explored;
(1.4.1 selects Softmax tactful as exploring.Boltzmann is carried out according to the optional behavior Q values of current state Probability distribution calculates, different action probability distribution such as formula:
Wherein π (a | s) is the probability of housing choice behavior a under s states, calculates the index multiple of the optional behavior collection e under s states It is cumulative and, and weight of a behaviors in all behaviors is determined by τ parameters.It ensure that different behaviors has different quilts The probability chosen.
2) off-line training Q-Learning algorithms
(2.1.1) determines input relevant parameter:Learning rate α, discount factor γ, Reward Program r, current bandwidth bw, when Preceding buffer area occupied state (bs)
(2.1.2) determines output parameter:Convergent Q tables
(2.1.3) determines random initializtion Q tables;
(2.1.4) determines whether Q tables restrain, and terminates if Q tables are restrained, new spy is carried out if Q tables are not yet restrained Rope;
(2.1.5) plays video and carries out a new wheel exploration;
(2.1.6) determines current state s according to current bandwidth and caching occupied state;
(2.1.7), which is used, explores tactful (softmax) from s state housing choice behaviors a;
(2.1.8) process performing a calculates return value r, and enters NextState
(2.1.9) by formula Q (s, a)=(1- α) Q (s, a)+α (r+ γ .max (s`, a`)) update Q tables;
State is set to s by (2.1.10):=s`;
Whether (2.1.11) video, which plays, is terminated, and (2.1.4) is entered step if playing and terminating, if not yet playing knot Beam then enters step (2.1.5).
3) it is applied in model line;
According to current cache occupied state (bs) and current bandwidth (bw), current state is inquired in Q tables, and passes through inquiry All executable behaviors in this state determine that the Q values of that behavior are maximum, then execute the behavior.When decision trip is (a) When, then download the corresponding slice layer of behavior (a).
Content described in this specification embodiment is only enumerating to the way of realization of inventive concept, protection of the invention Range is not construed as being only limitted to the concrete form that embodiment is stated, protection scope of the present invention is also and in art technology Personnel according to present inventive concept it is conceivable that equivalent technologies mean.

Claims (1)

1. the SVC based on Q-Learning encodes HTTP streaming media self-adapting methods, include the following steps:
1) the Streaming Media interaction context of SVC codings is built into Q-Learning models, needs to build state set (States), behavior Collect (Actions), Reward Program (Reward function), and strategy is explored in selection;Build intensified learning Q-Learning The step of model, is as follows:
(1.1) state set (States) is built:Bandwidth and caching occupied state constructing environment state, client is selected to need to band Wide and caching occupied state carries out discrete;
The departure process of (1.1.1) bandwidth is as follows:The maximum value for defining bandwidth is BWmax, each fragment segmentation at M layers, when in At i-th layer, required lowest-bandwidth is thri, i≤0≤M, by bandwidth discrete at { 0~thr0,thr0~thr1,...,thrM-1 ~thrM, total M+1 state;
It is discrete as follows that (1.1.2) caches occupied state:Definition buffer memory segment ranges are 0~Smax, caching occupied state bs (bufferState) by SmaxA element forms [s1,s2,s3...ssmax], wherein skIndicate k-th of segment of buffer memory position Basal layer and the enhancement layer sum of storage;
State is configured to:S={ bs, bw }, the discrete case and mode of the two elements are as shown in table 1 below:
1 ambient condition of table defines
Element Range Discrete way bs [s1,s2,s3...ssmax] sk∈(1,2,..M),k∈(1,2...Smax) bw 0~BWmax { 0~thr0,thr0~thr1,...thrM-1~thrM}
(1.2) behavior collection structure (Actions):Behavior collection is defined as a=(index, layer), is cache location subscript respectively (index) and needed for cache location grade (layer) is downloaded;Different states generally has different optional behavior collection;Behavior Discrete case and the mode for collecting element are as shown in table 2 below:
2 behavior collection of table defines
Element Range Discrete way index 0~Smax {1,2...Smax} layer 0~M {0,1,2...M}
(1.2.1) decision behavior is selected from the optional behavior of current state concentration, after determining behavior, is carried out down by behavior Carry next slice layer;The addition of behavior collection is as follows, if current cache occupied state bs=[s1,s2,...sk] when, current state Optional behavior collection be added from left to right according to current cache occupied state, if skState not be 0, then add behavior A=(k, sk), if skIt is 0, then adds a=(k, 0), and terminate and carry out searching new behavior;If bs fillings have been expired, into Enter sleep state and when needing to refill, then carries out decision after video clip is removed during waiting caches;
(1.3) Reward Program (Reward function):Reward Program includes three factor rfreeze, ractionAnd rswitch, It is defined as follows:
It is r that (1.3.1), which defines behavior return value,freezeIf the behavior of selection causes video pause broadcasting, it is carried out Punishment, enables rfreeze=-10000, otherwise enable rfreeze=0;
It is r that (1.3.2), which defines behavior return value,action,raction=100* (10-index)+layer, wherein index are illustrated It is the piece fragment position in caching, layer then illustrates currently lower carrier layer credit rating while also representing current video segment Quality, if the behavior of selection tends to that when caching the position of subscript earlier above higher return value can be obtained as possible;
The quality that (1.3.3) defines segment is switched to rswitch, defined formula rswitch=-10*abs (leftlayer- Layer)+(- 10) * abs (rightlayer-layer) calculates the slice layer of position the fragment masses layer and the left side of filling Grade (leftlayer) it is of poor quality and filling position fragment masses layer and the right level (rightlayer) it is of poor quality;
The linear overall return value of (1.3.4) definition is r, r=rfreeze+raction+rswitch
(1.4) strategy is explored;
It selects Softmax as strategy is explored, Boltzmann probability distribution is carried out according to the Q values of the optional behavior of current state It calculates, the probability distribution formula of difference action is as follows:
Wherein π (a | s) is the probability of housing choice behavior a under s states, and the index multiple for calculating the optional row collection e under s states is cumulative With and weight of a behaviors in all behaviors is determined by τ parameters, it is different selected to ensure that different behaviors has Probability;
2) off-line training Q-Learning algorithms are built;
(2.1.1) determines input parameter:Learning rate α, discount factor γ, Reward Program r, current bandwidth bw, current cache area Occupied state (bs);
(2.1.2) determines output parameter:Convergent Q tables
(2.1.3) determines random initializtion Q tables;
(2.1.4) checks whether Q tables restrain, and terminates if Q tables are restrained, new exploration is carried out if Q tables are not yet restrained;
(2.1.5) plays video and carries out a new wheel exploration;
(2.1.6) determines current state s according to current bandwidth and caching occupied state;
(2.1.7), which is used, explores tactful (softmax) from s state housing choice behaviors a;
(2.1.8) process performing a calculates return value r, and enters NextState
(2.1.9) by formula Q (s, a)=(1- α) Q (s, a)+α (r+ γ .max (s`, a`)) update Q tables;
State is set to s by (2.1.10):=s`;
Whether (2.1.11) video, which plays, is terminated, and (2.1.4) is entered step if playing and terminating, and is terminated if not yet played, It then enters step (2.1.5);
3) it is applied in model line;
According to current cache occupied state (bs) and current bandwidth (bw), current state is inquired in Q tables, and by inquiring at this All executable behaviors under state determine that the Q values of that behavior are maximum, then execute the behavior;When decision trip is (a), then The corresponding slice layer of download behavior (a).
CN201810366841.2A 2018-04-23 2018-04-23 SVC coding HTTP streaming media self-adaption method based on Q-Learning Active CN108737382B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810366841.2A CN108737382B (en) 2018-04-23 2018-04-23 SVC coding HTTP streaming media self-adaption method based on Q-Learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810366841.2A CN108737382B (en) 2018-04-23 2018-04-23 SVC coding HTTP streaming media self-adaption method based on Q-Learning

Publications (2)

Publication Number Publication Date
CN108737382A true CN108737382A (en) 2018-11-02
CN108737382B CN108737382B (en) 2020-10-09

Family

ID=63939733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810366841.2A Active CN108737382B (en) 2018-04-23 2018-04-23 SVC coding HTTP streaming media self-adaption method based on Q-Learning

Country Status (1)

Country Link
CN (1) CN108737382B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109802964A (en) * 2019-01-23 2019-05-24 西北大学 A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN
WO2021006972A1 (en) * 2019-07-10 2021-01-14 Microsoft Technology Licensing, Llc Reinforcement learning in real-time communications

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521202A (en) * 2011-11-18 2012-06-27 东南大学 Automatic discovery method of complex system oriented MAXQ task graph structure
CN103326946A (en) * 2013-07-02 2013-09-25 中国(南京)未来网络产业创新中心 SVC streaming media transmission optimization method based on OpenFlow
CN104022850A (en) * 2014-06-20 2014-09-03 太原科技大学 Self-adaptive layered video transmission method based on channel characteristics
CN104270646A (en) * 2014-09-22 2015-01-07 何震宇 Self-adaption transmission method and system based on mobile streaming media
CN105072671A (en) * 2015-06-30 2015-11-18 国网山东省电力公司潍坊供电公司 Adaptive scheduling method for sensor nodes in advanced metering system network
US20150373075A1 (en) * 2014-06-23 2015-12-24 Radia Perlman Multiple network transport sessions to provide context adaptive video streaming
US20160248835A1 (en) * 2015-02-24 2016-08-25 Koninklijke Kpn N.V. Fair Adaptive Streaming

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521202A (en) * 2011-11-18 2012-06-27 东南大学 Automatic discovery method of complex system oriented MAXQ task graph structure
CN103326946A (en) * 2013-07-02 2013-09-25 中国(南京)未来网络产业创新中心 SVC streaming media transmission optimization method based on OpenFlow
CN104022850A (en) * 2014-06-20 2014-09-03 太原科技大学 Self-adaptive layered video transmission method based on channel characteristics
US20150373075A1 (en) * 2014-06-23 2015-12-24 Radia Perlman Multiple network transport sessions to provide context adaptive video streaming
CN104270646A (en) * 2014-09-22 2015-01-07 何震宇 Self-adaption transmission method and system based on mobile streaming media
US20160248835A1 (en) * 2015-02-24 2016-08-25 Koninklijke Kpn N.V. Fair Adaptive Streaming
CN105072671A (en) * 2015-06-30 2015-11-18 国网山东省电力公司潍坊供电公司 Adaptive scheduling method for sensor nodes in advanced metering system network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
熊丽荣,等.: "一种基于HTTP自适应流的混合码率自适应算法", 《计算机科学》 *
熊丽荣,等.: "基于Q-learning的HTTP自适应流码率控制方法研究", 《通信学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109802964A (en) * 2019-01-23 2019-05-24 西北大学 A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN
CN109802964B (en) * 2019-01-23 2021-09-28 西北大学 DQN-based HTTP adaptive flow control energy consumption optimization method
WO2021006972A1 (en) * 2019-07-10 2021-01-14 Microsoft Technology Licensing, Llc Reinforcement learning in real-time communications
US11373108B2 (en) 2019-07-10 2022-06-28 Microsoft Technology Licensing, Llc Reinforcement learning in real-time communications

Also Published As

Publication number Publication date
CN108737382B (en) 2020-10-09

Similar Documents

Publication Publication Date Title
Sengupta et al. HotDASH: Hotspot aware adaptive video streaming using deep reinforcement learning
Zhang et al. Video super-resolution and caching—An edge-assisted adaptive video streaming solution
CN113434212B (en) Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning
Bokani et al. Optimizing HTTP-based adaptive streaming in vehicular environment using markov decision process
CN103370709A (en) A cache manager for segmented multimedia and corresponding method for cache management
CN113315978B (en) Collaborative online video edge caching method based on federal learning
Li et al. An apprenticeship learning approach for adaptive video streaming based on chunk quality and user preference
CN115022684B (en) Video stream self-adaptive transmission method based on deep reinforcement learning under QUIC protocol
CN103338393A (en) Video code rate selecting method driven by user experience under HSPA system
CN108737382A (en) SVC coding HTTP streaming media self-adaption method based on Q-L earning
Li et al. DAVS: Dynamic-chunk quality aware adaptive video streaming using apprenticeship learning
CN116962414A (en) Self-adaptive video streaming transmission method and system based on server-free calculation
CN116249162A (en) Collaborative caching method based on deep reinforcement learning in vehicle-mounted edge network
CN112055263A (en) 360-degree video streaming transmission system based on significance detection
Shi et al. CoLEAP: Cooperative learning-based edge scheme with caching and prefetching for DASH video delivery
Feng et al. Timely and accurate bitrate switching in HTTP adaptive streaming with date-driven I-frame prediction
Zahran et al. ARBITER: Adaptive rate-based intelligent HTTP streaming algorithm
CN109802964A (en) A kind of HTTP self adaptation stream control energy consumption optimization method based on DQN
Cai et al. A multi-objective optimization approach to resource allocation for edge-based digital twin
Lin et al. KNN-Q learning algorithm of bitrate adaptation for video streaming over HTTP
CN112333456B (en) Live video transmission method based on cloud edge protocol
Lu et al. Deep-reinforcement-learning-based user-preference-aware rate adaptation for video streaming
CN118175356A (en) Video transmission method, device, equipment and storage medium
CN103179441A (en) Method and server for playing contents
CN116209015B (en) Edge network cache scheduling method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant