1 Introduction

The next generation wireless communication systems including 5G NR use mMIMO beamforming techniques to achieve higher SNRs and spatial multiplexing to enhance the data throughput at mmWave frequencies. The major challenge in using mmWave frequencies is propagation loss, therefore 5G NR networks are deployed with mMIMO (large scale antenna arrays) to overcome the path loss. The advantage at mmWave frequency bands is that we can accommodate more antenna elements in the given physical dimensions as wavelengths are very small. To reduce the system cost, antenna elements can be grouped into subarrays and each T-R module is dedicated to an antenna subarray using an emerging technique for mmWave communications called “Hybrid Beamforming”. Hybrid beamforming designs are capable of transmitting data to multiple users using MU-MIMO with multiplexing gains and these MU-MIMO systems have high potential in mmWave communication networks. Hybrid transceivers consist of lesser RF chains compared to number of transmit antenna elements as they use analog beamformers in the RF domain, and digital beamformers in baseband domain. Hybrid beamforming design balances beamforming gains (to overcome path losses) and power consumption, hardware cost in mmWave mMIMO communication systems. A common rule-of-thumb in mMIMO systems is that M/K > 10 (where ‘M’ is number of antennas and ‘K’ is number of users) so that user channels are likely to be orthogonal and provides maximum efficiency. It is possible to reduce the computational complexity in hybrid beamforming for mmWave communication systems with lesser number of analog RF chains (compared to number of users) and their performance is close to that of optimal (or fully) digital beamformers [1]. As per 3GPP standard, 5G wireless technologies use frequencies in FR2 band as shown in Table 1 to have short range communication and high data rates [2].

Table 1: 5G NR operating frequencies in FR2 frequency bands [3]

MU-mMIMO wireless communication networks use SDMA to have multiplexing gains by serving multiple UEs with same T-R resources and gives substantial improvements in system throughput. MU-mMIMO system improves spectral efficiency as it allows BS Tx to communicate with many UE Rxs simultaneously using same T-R resources. However, the challenge in MU-mMIMO systems is designing transmit vectors by considering co-channel interference of other users. In UE dense scenarios, MU-MIMO dimensions are increased to fully exploit the spatial multiplexing capabilities. In these scenarios, it is challenging to distinguish UEs in spatial domain as number of pairing users are more. mMIMO increases spatial resolution with more number of narrow beams and gives high degree of freedom for MU paring. It allows number of BS antenna elements of order tens to hundreds, thereby also increasing the number of data streams in a cell to a large value.

MU-mMIMO hybrid beamforming structure shown in Fig. 1 divides beamforming among RF analog domain and baseband digital domain with cost, complexity and flexibility tradeoffs [4]. In RF analog domain, beamforming is achieved by applying phase shift to each antenna element in the antenna subarray. In baseband digital domain, beamforming is achieved using channel matrix to derive precoding and combining weights that helps to transmit and recover multiple data streams independently using a single channel.

Fig. 1
figure 1

MU-mMIMO Hybrid beamforming system at the transmitter

2 Literature review

Hybrid beamforming designs are introduced to reduce the training overhead and hardware cost in mMIMO systems. Hybrid beamforming can be classified based on CSI (average or instantaneous), carrier frequency (mmWave) and complexity (reduced, full or switched complexity). Selection of algorithm to get the best tradeoff between these parameters depends on channel characteristics and application [5]. Acquiring precise and accurate CSI for MU-mMIMO systems is a challenge at mmWave frequencies as the number of BS antennas are high. Accurate channel estimation in MU-mMIMO can be achieved using joint-iterative scheme based on step-length optimization [6]. Beamforming neural network (based on deep learning) for mmWave mMIMO systems can optimize the beamforming design and it is robust with higher spectral efficiency (compared to traditional beamforming algorithms) in the presence of imperfect CSI and hardware constraints [7]. For a given number of BS antenna elements, optimal number of UEs scheduled simultaneously gives maximum spectral efficiency in mMIMO systems. The spectral efficiency can be same for DL and UL (allows joint network optimization), also it is independent of instantaneous UE locations [8]. Multitask deep learning (MTDL)-based MU hybrid beamforming algorithm for mmWave mMIMO OFDM systems can give better results in terms of sum-rate and lower run-time compared to traditional algorithms [9]. mmWave UL MU-mMIMO system uses lens type antenna array at BS (two-dimensional, both elevation and azimuth angle) and uniform planar array at MS based on arrival/departure angles of multi-path signals. “Path delay compensation” technique at BS transforms MU-MIMO frequency selective channels into parallel smaller MIMO frequency-flat channels at lower hardware cost and increased sum-rates [10]. Effective channel estimation scheme is proposed for time-varying DL channel of mmWave MU-MIMO systems based on angle of arrival/departure and with minimum number of pilots [11]. Low complex single cell DL MU-mMIMO hybrid beamforming system with perfect CSI gives sum-rate that approaches ideal channel capacity [12]. For a given number of RF chains, performance gap between hybrid and digital beamforming can be reduced by minimizing the number of multiplexed symbols [13]. Clustering and feedback based hybrid beamforming for DL mmWave MU-MIMO NOMA systems give maximum sum-rates compared to OMA systems [14]. A blind MU detection algorithm based on “markov random field” to model clustering sparsity and estimate mMIMO channel performs better than the systems that do not exploit clustering sparsity of channel [15]. Manifold optimization, eigenvalue decomposition and OMP algorithms used in designing hybrid beamforming for broadband mmWave MIMO systems offer BER and spectral efficiency closer to fully digital beamforming designs [16]. When SNR or number of RF chains are increasing in mmWave MU-mMIMO system, the optimal hybrid precoding and combining schemes using OMP algorithm gives performance close to fully digital precoding in terms of total sum-rates [17]. Energy efficiency in MU-mMIMO systems is inversely proportional to number of RF chains. Usage of optimal RF and baseband precoding matrices can improve energy and cost efficiency by 76.59% compared to OMP algorithm [18]. Generalized block OMP algorithm for channel estimation in mmWave MU-MIMO system uses different strategies of constructing pilot signals/beamforming weights and this scheme outperforms the existing channel estimation algorithms including OMP [19]. “Distributed compressive sensing” method can decrease the feedback overhead, training of CSI estimation at UE and “Joint OMP algorithm” performs CSI recovery at BS of a MU-mMIMO systems [20]. Optimal hybrid beamforming scheme for MU-mMIMO relay systems with mixed and fully connected structures based on “successive interference cancelation” maximizes the sum-rates [21]. An efficient hybrid beamforming technique for relay assisted mmWave MU-mMIMO system based on “Geometric Mean Decomposition Tomlinson Harashima Precoding” algorithm can give performance closer to fully digital beamforming [13]. A channel estimation scheme called “generalized-block compressed sampling matching pursuit” for mmWave MU-MIMO systems over frequency selective fading channels can offer better performance than OMP algorithm [22]. Optimal hybrid beamforming design is proposed to minimize Tx power under SINR constraints in a MU-mMIMO system for the cases where number of UEs are less than and greater than the number of RF chains. It gives the optimal Tx powers close that of fully digital beamforming design [23]. Low complex “hybrid regularized channel diagonalization” scheme that combines analog beamforming and digital precoding for mmWave MU-mMIMO system performs better than conventional block diagonalization based hybrid beamforming designs even in the presence of low-resolution RF phase shifters [24]. “Hybrid beamforming with selection” scheme decreases the computational and hardware cost of small-to-medium bandwidth MU-mMIMO systems with moderate frequency selective channel [25]. Hybrid precoder and combiners for DL frequency selective channels are configured in a mmWave MU-MIMO system based on factorization, iterative hybrid design. BS simultaneously estimates all channels from UEs on each subcarrier using compressed sensing scheme to reduce the number of measurements [26]. Nonconvex hybrid precoding problem in mmWave MU-MIMO systems is addressed using “penalty dual decomposition” method under the assumption of perfect CSI. It uses lesser number of RF chains but still, the performance is close to that of fully digital beamforming [27]. “Atomic Norm Minimization” method is used for accurate, low-complex channel estimation and spectral efficiency in mmWave MU-MIMO systems which is based on continuous channel representation [13]. Spectral efficiency of a MU-mMIMO hybrid beamforming system can be improved by using low complex manifold optimization algorithm and its efficiency is closer to fully digital beamforming designs [13]. For accurate channel estimation with minimum training overhead and RF chains (compared to beamspace approach), MU-mMIMO hybrid beamforming system is viewed as non-orthogonal angle division multiple access to simultaneously serve multiple users using same frequency band [28]. To improve throughput (close to fully digital beamforming) of mmWave MU-mMIMO OFDM systems, hybrid beamforming with user scheduling algorithm is defined for DL where BS allocates frequency resources for members of OFDM user group (users with identical strongest beams). Analog beamforming vectors are used to find the optimal beam of each user and digital beamforming is used to get best performance gain (by decreasing residual inter-user interference) [29]. Low computational complexity hybrid beamforming scheme for mmWave MU-MIMO UL channel reduces inter-user interference and its performance is close to the corresponding fully digital beamforming design [30]. Optimal decoupling designs for analog precoder and combiner of a DL MU-FDD mMIMO hybrid beamforming systems are defined by selecting the strongest Eigen-beams of receiving covariance matrix with limited instantaneous CSI. Simulation results proved the need of second order channel statistics in designing digital precoder to reduce intergroup interference [31]. Coordinated RF beamforming technique in mmWave mMIMO systems based on “Generalized Low Rank Approximation of Matrices” needs only composite CSI instead of complete physical channel matrix. This technique provides competing solution by considering the coordination between BS and UEs to get maximal array gain with no dimensionality constraint in both TDD and FDD systems [13]. Hybrid beamforming is designed for mmWave MU-mMIMO relay system to enhance the sum-rates (decreasing sum MSE between received signal of digital and hybrid beamforming designs) using digital beamforming. The total sum-rates are increased with accuracy of angles of arrival/departure and number of RF chains [32]. Optimal unconstrained precoder and combiner algorithms for mmWave mMIMO systems are designed for the feasibility of low-cost analog RF hardware implementations. Numerical results of the proposed algorithms [33] showed that spectral efficiency of mmWave systems with transceiver hardware constraints approaches the unconstrained performance limits. Low-complexity phased-ZF hybrid precoding is applied in RF domain (to get large power gains) and low dimensional ZF precoding is used in baseband domain (for multi-stream processing) in mmWave MU-mMIMO systems [34]. The tradeoff between energy efficiency (in bits/J) and spectral efficiency (in bits/channel/MS) is quantified for a small-scale fading channel of MU-mMIMO systems to achieve higher spectral and energy efficiencies using ZR or MRC and UL pilot signals at BS [35].

3 Proposed methodology

In MU-mMIMO hybrid beamforming systems, beamforming, precoding and corresponding combining processes are performed partly in the digital baseband domain and partly in the analog RF domain as shown in Figs. 2 and 3. These precoding and combining weights are the combination of digital weights in baseband domain and analog weights in RF domain. Digital weights in baseband domain convert the incoming data streams as input signals to each RF chain. Then, analog weights (these are phase shift values) in RF domain converts signal output at each RF chain into radiated signal of each antenna element. The data streams are modulated using precoding weights, F at the Tx, and transmitted through a channel as shown in Fig. 2. At Rx, the data streams are recovered using combining weights, W as shown in Fig. 3. There is always a challenge in distributing weights between digital baseband domain and analog RF domain. There exists trade-off between optimal weights and computational load in calculating the matrices FBB, FRF, WRF, and WBB.

Fig. 2
figure 2

Block diagram of Precoding stages at Transmitter in mmWave MU-MIMO hybrid beamforming system

Fig. 3
figure 3

Block diagram of Precoding stages at Receiver in mmWave MU-MIMO hybrid beamforming system

In the Figs. 2 and 3,

NS Number of signal streams;

\({\text{N}}_{{{\text{RF}}}}^{{\text{T}}}\) Number of transmit RF chains;

NT Number of transmitting antennas;

\({\text{N}}_{{{\text{RF}}}}^{{\text{R}}}\) Number of receive RF chains

NR Number of receive antennas;

H MIMO Scattering channel

FBB Digital decoder of size NS × \({\text{N}}_{{{\text{RF}}}}^{{\text{T}}}\);

FRF Analog precoder of size \({\text{N}}_{{{\text{RF}}}}^{{\text{T}}}\) × NT

WRF Analog combiner of size NR × \({\text{N}}_{{{\text{RF}}}}^{{\text{R}}}\)

WBB Digital combiner of size \({\text{N}}_{{{\text{RF}}}}^{{\text{R}}}\) × NS

In hybrid beamforming, the number of TR modules (\({\text{N}}_{{{\text{RF}}}}^{{\text{T}}}\)) are less than number of antenna elements (NT) and each antenna element is connected to one or more TR modules for higher flexibility.

The mathematical representation of hybrid beamforming is as follows:

$$ {\text{Precoding}}\,{\text{weights}}\,{\text{matrix}},{\text{F}} = {\text{F}}_{{{\text{BB}}}} *{\text{F}}_{{{\text{RF}}}} \,{\text{and}}\,{\text{the}}\,{\text{size}}\,{\text{of}}\,{\text{F}}\,{\text{is}}\,{\text{N}}_{{\text{S}}} \times {\text{N}}_{{\text{T}}} $$
(1)
$$ {\text{Combining}}\,{\text{weights}}\,{\text{matrix}},{\text{W}} = {\text{W}}_{{{\text{RF}}}} *{\text{W}}_{{{\text{BB}}}} \,{\text{and}}\,{\text{the}}\,{\text{size}}\,{\text{of}}\,{\text{W}}\,{\text{is}}\,{\text{N}}_{{\text{R}}} \times {\text{N}}_{{\text{S}}} $$
(2)

The matrices FRF and WRF define signal phase values. To achieve optimal weights for precoding and combining there exists some constraints during optimization process. For ideal case, resulting combination of matrices FBB*FRF and WRF *WBB are equal to calculation of F and W without any constraints.

3.1 Block diagram of MU-mMIMO system

The Fig. 4 represents a block diagram that gives complete steps of data processing in a MU-mMIMO system. At the Tx, multiple user’s data is channel encoded using convolutional codes. The channel encoded bits are mapped to equivalent QAM complex symbols and generate mapped symbols from bits/user. The QAM data of each user is divided into multiple transmit data streams. Digital baseband precoding is used to assign weights for subcarriers of transmit data streams. In this paper, precoding weights are computed using “hybrid beamforming with peak search (HBPS)” algorithm as it performs better for larger arrays of mMIMO systems and these precoding weights are used to get corresponding combining weights at the Rx. HBPS gives all digital weights and identifies \({\text{N}}_{{{\text{RF}}}}^{{\text{T}}}\), \({\text{N}}_{{{\text{RF}}}}^{{\text{R}}}\) peaks to get corresponding analog beamforming weights instead of searching iteratively for dominant mode (data streams that use most dominant mode of MIMO channel gives higher SNR) in the channel matrix. The resultant digital signal is modulated using OFDM modulation with pilot mapping followed by RF analog beamforming is performed for all Tx antennas. The modulated signal is transmitted through a rich-scattering MU-mMIMO channel and it is demodulated, decoded at Rx side as shown in Fig. 4. Channel sounding and estimation are performed at the Tx and Rx respectively using “joint spatial division multiplexing (JSDM)” algorithm as it allows large number of BS antennas with minimum CSI feedback from UEs in a MU-mMIMO downlink channel.

Fig. 4
figure 4

Data transmission and reception in a MU-mMIMO communication Systems

4 Analysis of mu-mMIMO system design

To perform MU-mMIMO transmission in mmWave cellular communication systems, high-dimensional channels need to be estimated for designing MU precoder. Digital precoding gives high performance at the cost of hardware complexity and power consumption (more number of RF chains and ADCs). On the other hand, analog precoding has less complexity, but with limited performance (it supports only one data stream). Hybrid precoding for MU-mMIMO systems shown in Fig. 5 is a combination of digital precoding and analog precoding. Hybrid precoding design reduces number of RF chains and also maintain spatial multiplexing gain in mmWave MU-mMIMO system. In MU-mMIMO systems, precoding and combining techniques are used to improve signal energy in the direction and channel of interest with help of available channel information at Tx. In a single user MIMO, the benefits of an antenna array are less because of lower channel rank. On the other hand, MU-MIMO system creates rich effective channels through spatial separation of users.

$$ {\text{The}}\,{\text{MIMO}}\,{\text{channel}}\,{\text{can}}\,{\text{be}}\,{\text{characterized}}\,{\text{asy}} = {\text{Hs}} + {\text{n}} $$
(3)
Fig. 5
figure 5

Precoding vector calculation at the BS in mmWave MU-MIMO hybrid beamforming

where y received vector, s transmitted vector n noise vector

$$ {\text{The}}\,{\text{channel}}\,{\text{matrix, }}{\mathbf{H}} = \left[ {\begin{array}{*{20}c} {h_{11} } & {h_{21} } & {...} & {h_{M1} } \\ {h_{12} } & {h_{22} } & {...} & {h_{M2} } \\ {...} & {...} & {...} & {h_{M3} } \\ {h_{1M} } & {h_{2M} } & {...} & {h_{MM} } \\ \end{array} } \right] $$

where hij is a complex gaussian random variable that models fading gain between the ith transmitted and jth receiver antenna.

4.1 MU-MIMO DL

If CSI is known, diagonalization of channel matrix (H) gives unconstrained optimal precoding weights by taking the first \({\text{N}}_{{{\text{RF}}}}^{{\text{T}}}\) dominating modes. So, assume that BS has CSI at the Tx (CSIT) then we can perform MU precoding where it is possible to send signals to all users at the same time and frequencies (T-F), still allow users to recover signals with low complexity.

$$ {\text{The}}\,{\text{DL}}\,{\text{link}}\,{\text{signal}}\,\left( {{\text{or}}\,{\text{observation}}\,{\text{vector}}} \right){\text{ for}}\,{\text{user}}\,{\text{k}},{\text{y}}_{{\text{k}}} = {\text{H}}_{{\text{k}}} {\text{x}}_{{\text{k}}} + {\text{H}}_{{\text{k}}} \sum\limits_{{{\text{a}} \ne {\text{k}}}}^{{\text{K}}} {{\text{x}}_{{\text{a}}} + {\mathbf{n}}_{{\text{k}}} } $$
(4)

where k number of users, xk signal intended for user k, Hk channel from BS to user k, nk noise.

2nd term in Eq. (4) represents signals for other users.

Instead of sending user data stream xk, we perform precoding as xk = Wksk.

$$ {\text{Therefore, DL}}\,{\text{link}}\,{\text{signal}}\,{\text{for}}\,{\text{user}}\,{\text{k}},{\mathbf{y}}_{{\text{k}}} = {\mathbf{H}}_{{\text{k}}} {\mathbf{W}}_{{\text{k}}} {\mathbf{s}}_{{\text{k}}} + {\mathbf{H}}_{{\text{k}}} \sum\limits_{{{\text{a}} \ne {\text{k}}}}^{{\text{K}}} {{\mathbf{W}}_{{\text{a}}} {\mathbf{s}}_{{\text{a}}} + {\mathbf{n}}_{{\text{k}}} } $$
(5)

Wk precoding matrix sk transmitted QAM symbols for the user k.

2nd term in Eq. (5) represents data precoding for other users.

For a case where each user has only one antenna (therefore, one data stream/user), the size of precoding matrix Wk is Nt × 1.

$$ {\text{The}}\,{\text{scalar}}\,{\text{observation}}\,{\text{of}}\,{\text{user}}\,{\text{k}},{\text{ y}}_{{\text{k}}} = {\mathbf{h}}_{{\text{k}}}^{{\text{T}}} {\mathbf{W}}_{{\text{k}}} {\mathbf{s}}_{{\text{k}}} + {\mathbf{h}}_{{\text{k}}}^{{\text{T}}} \sum\limits_{{{\text{a}} \ne {\text{k}}}}^{K} {{\mathbf{W}}_{{\text{a}}} {\mathbf{s}}_{{\text{a}}} + {\text{ n}}_{{\text{k}}} } $$
(6)

where hTk DL channel for user k, nk scalar noise.

The 1st term in Eq. (6) represents effective channel seen by the user and 2nd term represents interference due to other users.

$$ {\text{After}}\,{\text{considering}}\,{\text{all}}\,{\text{vectors}}\,{\text{for}}\,{\text{all}}\,{\text{users}}\,{\text{of}}\,{\text{a}}\,{\text{MU}} - {\text{MIMO}}\,{\text{system}},{\mathbf{y}} = {\mathbf{HWs}} + {\mathbf{n}} $$
(7)
$$ y \, = \left[ \begin{gathered} \mathop {_{y} }\nolimits_{1} \hfill \\ \mathop {_{y} }\nolimits_{2} \hfill \\ ... \hfill \\ \mathop {_{y} }\nolimits_{k} \hfill \\ \end{gathered} \right] = \left[ \begin{gathered} h_{1}^{T} \hfill \\ h_{2}^{T} \hfill \\ ... \hfill \\ \hfill \\ h_{K}^{T} \hfill \\ \end{gathered} \right]\left[ {{\mathbf{w}}_{1} {\mathbf{w}}_{2} {\mathbf{w}}_{3 \, \ldots .} {\mathbf{w}}_{K} } \right]\left[ \begin{gathered} s_{1} \hfill \\ s_{2} \hfill \\ ... \hfill \\ ... \hfill \\ s_{K} \hfill \\ \end{gathered} \right] + \left[ \begin{gathered} n_{1} \hfill \\ n_{2} \hfill \\ ... \hfill \\ ... \hfill \\ n_{K} \hfill \\ \end{gathered} \right] $$

\(\left[ \begin{gathered} \mathop {_{y} }\nolimits_{1} \hfill \\ \mathop {_{y} }\nolimits_{2} \hfill \\ ... \hfill \\ \mathop {_{y} }\nolimits_{k} \hfill \\ \end{gathered} \right]\) observations for all of the users \(\left[ \begin{gathered} h_{1}^{T} \hfill \\ h_{2}^{T} \hfill \\ ... \hfill \\ \hfill \\ h_{K}^{T} \hfill \\ \end{gathered} \right]\) channels for all of the users.

[w1 w2 w3 …. wK] precoding factors for all of the users.

\(\left[ \begin{gathered} s_{1} \hfill \\ s_{2} \hfill \\ ... \hfill \\ ... \hfill \\ s_{K} \hfill \\ \end{gathered} \right]\) transmitted QAM symbols for all users.

It is essential to design precoding matrix W at the BS using schemes like ZF, MMSE so that the performance of a MU-MIMO system is optimum. The BS sends data to users that change over time and “user scheduling” is used for selection of best K users among a large set to have good effective channel H. Precoding is also possible with statistical CSI (like covariance matrix of channel) instead of instantaneous CSI.

4.2 MU-MIMO UL

$$ {\text{At}}\,{\text{Rx, all}}\,{\text{signals}}\,{\text{are}}\,{\text{added}}\,{\text{and}}\,{\text{the}}\,{\text{observation}}\,{\text{vector}},{\text{y}} = \sum\limits_{{\text{k = 1}}}^{{\text{K}}} {{\mathbf{H}}_{{\text{k}}}^{{\text{T}}} {\mathbf{x}}_{{\text{k}}} + {\mathbf{n}}} $$
(8)

The UL observation vector, y = [HT1 HT2 HT3…HTK] \({\text{The}}\,{\text{UL}}\,{\text{observation}}\,{\text{vector}},{\mathbf{y}} = \, \left[ {{\mathbf{H}}^{T}_{1} {\mathbf{H}}^{T}_{2} {\mathbf{H}}^{T}_{3} \ldots {\mathbf{H}}^{T}_{K} } \right]\left[ \begin{gathered} \mathop {_{x} }\nolimits_{1} \hfill \\ \mathop {_{x} }\nolimits_{2} \hfill \\ .. \hfill \\ \mathop {_{x} }\nolimits_{k} \hfill \\ \end{gathered} \right] + {\mathbf{n}}\).

[HT1 HT2 HT3…HTK] is a channel matrix of size Mr x (kMt) that is concatenation of all UL channels.

Mr rows corresponding to Mr receive antennas at the BS and Mt is the number of Tx antennas/user.

\(\left[ \begin{gathered} \mathop {_{x} }\nolimits_{1} \hfill \\ \mathop {_{x} }\nolimits_{2} \hfill \\ .. \hfill \\ \mathop {_{x} }\nolimits_{k} \hfill \\ \end{gathered} \right]\) Transmitted signals of all users n noise vector.

4.3 MU-MIMO precoding

Consider K users with one antenna per user and one BS with M antennas then the DL channel matrix, H of size K x M where each row corresponds to DL channel of a single user.

$$ {\text{We}}\,{\text{consider}}\,{\text{ZF}}\,{\text{precoding}}\,{\text{where}}\,{\text{precoding}}\,{\text{matrix}},\,{\text{W}} = {\text{H}}^{ + } = {\text{H}}^{{\text{H}}} \left( {{\text{HH}}^{{\text{H}}} } \right)^{ - 1} $$
(9)

where H+ pseudo inverse of the channel.

If the channel matrix is poorly conditioned, W may be very large and it leads to high signal transmission power. Therefore, we need to enforce a constraint that the total power, PTotal = \(\left\| W \right\|^{2}\)

$$ {\text{P}}_{{{\text{Total}}}} = {\text{trace}}\left( {{\mathbf{W}}^{{\text{H}}} {\mathbf{W}}} \right) $$
$$ {\text{P}}_{{{\text{Total}}}} = \sum\limits_{{{\text{k}} = {1}}}^{K} {p_{k} } \left[ {\left( {{\mathbf{H}}^{ + } } \right)^{{\text{H}}} {\mathbf{H}}^{ + } } \right]_{{\text{k, k}}} $$
(10)

One way to achieve this is scale the entire precoding matrix with a low enough value and guarantee the equal SNR per user. On the other hand, scale each column to have fixed power and it gives different SNR per user.

The adjustment of fixed powers can be achieved using H+diag[\(p_{1}^{1/2}\), \(p_{2}^{1/2}\),…\(p_{K}^{1/2}\)]s.

where \(D_{p}^{1/2}\) = diag[\(p_{1}^{1/2}\), \(p_{2}^{1/2}\),…\(p_{K}^{1/2}\)].

p1 power assigned to user 1, pk power assigned to user k.

The channel capacity (bits/sec) = Available spectrum (Hz) x Spectrum efficiency (dB).

Under no CSIT conditions, the MIMO channel capacity (C) bounds between

$$ {\text{Blog}}_{2} \left( {1 +\uprho {\text{M}}_{{\text{r}}} } \right) \le {\text{C}} \le {\text{B}}\,{\text{min }}\left( {{\text{M}}_{{\text{r}}} ,{\text{ M}}_{{\text{t}}} } \right){\text{ log}}_{2} \left[ { \, 1 + \left( {\uprho \,{\text{max }}\left( {{\text{M}}_{{\text{r}}} ,{\text{M}}_{{\text{t}}} } \right)} \right)/{\text{M}}_{{\text{t}}} } \right] $$
(11)

where ρ SNR, B channel bandwidth, Mt number of Tx antennas, Mr number of Rx antennas.

The left hand limit value is channel capacity for LOS channel and the right hand limit is for rich scattering channel.

Case 1: Very large number of Tx antennas and channel matrix, H is almost orthogonal rows where the number of rows represent number of Rx antennas (small) and number of columns represent number of Tx antennas (large).

HHH ≈ MtIMr scaled identity matrix

$$ {\text{The}}\,{\text{capacity}}\,{\text{of}}\,{\text{MIMO}}\,{\text{channel}},{\text{ C}} = {\text{Blog}}_{2} {\text{det}}\left( {{\mathbf{I}}_{{{\text{Mr}}}} + \, \left( {\uprho {\mathbf{I}}_{{{\text{Mr}}}} {\text{M}}_{{\text{t}}} } \right){\text{M}}_{{\text{t}}} } \right) = {\text{BM}}_{{\text{r}}} {\text{log}}_{2} \left( {1 +\uprho } \right) $$
(12)

where B bandwidth, ρ SNR.

Case 2: Very large number of Rx antennas and H is almost orthogonal columns.

HHH ≈ MrIMt det(I + AAH) = det(I + AHA)

$$ {\text{The}}\,{\text{capacity}}\,{\text{of}}\,{\text{the}}\,{\text{channel}},{\text{ C}} = {\text{BM}}_{{\text{t}}} {\text{log}}_{2} \left( {1 +\uprho {\text{M}}_{{\text{r}}} /{\text{M}}_{{\text{t}}} } \right) $$
(13)

mMIMO can reach upper bound under orthogonal channels (“favorable propagation conditions”).

4.4 MU-mMIMO

MU-MIMO system alone is not feasible as multiple antennas at UEs is not cost effective and gains are modest due to limited number of antennas (< 10), also signal processing is highly complex. Alternatively, mMIMO systems can have one antenna per user and 100′s of antennas at the BS. Consider one BS with M antennas and K users with one antenna each (K < M). In order to have favorable propagation conditions, users should be sufficiently separated. Assume that a MU-MIMO at BS with channel reciprocity (use TDD) and nearly orthogonal channels (users are sufficiently separated).

$$ {\text{For}}\;{\text{UL}}\;{\text{channel, the}}\;{\text{channel}}\;{\text{matrix}}\;{\text{can}}\;{\text{be}}\;{\text{decomposed}}\;{\text{as}}\;{\text{H}} = {\text{G}}_{{{\text{M}} \times {\text{K}}}} {\text{D}}^{1/2}_{{{\text{K}} \times {\text{K}}}} $$
(14)
$$ {\text{For DL Channel}},{\mathbf{H}}^{{\text{T}}} = {\mathbf{D}}^{1/2}_{{{\text{K}} \times {\text{K}}}} {\mathbf{G}}^{{\text{T}}}_{{{\text{K}} \times {\text{M}}}} $$
(15)

GTG* ≈ M IK are the number of orthogonal channels and hi = d \(_{i}^{0.5}\) gi.

where G multi-path fading matrix, D diagonal matrix that represents path loss per user,

M number of antennas, hi column vector of H matrix.

4.5 MU-mMIMO: UL

$$ {\text{The}}\;{\text{observation}}\;{\text{model}}\;{\text{for}}\;{\text{the}}\;{\text{UL}}\;{\text{is}}\;{\text{given}}\;{\text{as}}\;{\mathbf{y}} = {\mathbf{Hx}} + {\mathbf{n}} = {\mathbf{GD}}^{0.5} {\mathbf{x}} + {\mathbf{n}} $$
(16)

H has many rows and few columns and x is a vector (size K × 1) contains signals for each of K users.

BS can apply the following shaper or combiner based on the property

$$ {\mathbf{H}}^{{\text{H}}} {\mathbf{H}} = {\mathbf{D}}^{1/2} {\mathbf{G}}^{{\text{H}}} {\mathbf{GD}}^{1/2} = {\mathbf{D}}^{1/2} {\text{M}}{\mathbf{I}}_{{\text{K}}} {\mathbf{D}}^{1/2} = {\text{ M}}{\mathbf{D}} $$
(17)
$$ {\text{The}}\,{\text{combining}}\,{\text{matrix}},{\mathbf{z}} = {\mathbf{H}}^{{\text{H}}} {\mathbf{y}} = {\mathbf{H}}^{{\text{H}}} {\mathbf{Hx}} + {\mathbf{H}}^{{\text{H}}} {\mathbf{n}} = {\text{M}}{\mathbf{Dx}} + {\mathbf{w}} $$
(18)

where w ~ CN (0, N0MD), zk = Mdkxk + wk, and wk ~ CN (0, N0Mdk).

The SNR of kth user is given as, SNRk = MdkEx/N0 where N0 noise power, N0Mdk noise variance

$$ {\text{Therefore,}}\,{\text{Rate}}\,{\text{for}}\,{\text{the}}\,{\text{kth}}\;{\text{user}},{\text{R}}_{{\text{k}}} = {\text{Blog}}_{2} \left( {1 + \left( {{\text{Md}}_{{\text{k}}} {\text{E}}_{{\text{x}}} } \right)/{\text{N}}_{0} } \right) $$
(19)
$$ {\text{Total}}\,{\text{sum - rate}}\,{\text{at}}\,{\text{BS}}\,{\text{is}}\,{\text{sum}}\,{\text{of}}\,{\text{all}}\,{\text{individual}}\,{\text{rates}}\,{\text{of}}^{\prime}{\text{K}}^{\prime}\,{\text{users}},{\text{ R}}_{\rm sum} = {\text{Blog}}_{2} \left( {1 + \left( {{\text{Md}}_{\rm k} {\text{E}}_{\rm x} } \right)/{\text{N}}_{0} } \right) $$
(20)

Equation (20) indicates the capacity of the system. This means that, with asymptotically large antennas and simple linear combining at the BS gives optimal results even in the presence of multi-user interference.

4.6 MU-mMIMO: DL

The DL is little more complicated because BS needs to do additional processing (precoding) so that users do not see interference from other users.

$$ {\text{Observation}}\,{\text{model}}\;{\text{for}}\;{\text{DL}}\;{\text{with}}\;{\text{M}} \times {\text{K}}\;{\text{precoding}}\;{\text{matrix}}\;{\text{is}}\;{\text{given}}\;{\text{as}},{\mathbf{y}} = {\mathbf{H}}^{{\mathbf{T}}} {\mathbf{Ws}} \, + \, {\mathbf{n}} $$
(21)

where HT downlink channel, W suitable precoding matrix, n noise vector seen by each user,

s vector of size K × 1 and contains data for each of users.

From MU-MIMO, a good precoder is pseudo inverse of channel matrix.

$$ {\text{Therefore, we}}\,{\text{choose}}\,{\text{precoding}}\,{\text{of}}\,{\text{the}}\,{\text{form}}\,{\mathbf{W}} = {\mathbf{H}}^{*} \surd {\mathbf{D}}_{{\text{p}}} /\surd {\text{M}} $$
(22)

where H* complex conjugate of the channel √M scaling factor.

Dp diagonal matrix that contains square root of powers for each user.

To meet the power constraints, power allocation Dp to ensure \(\left\| W \right\|^{2}\) = trace(WHW) = PTotal.

We find that WHW = √Dp HTH*Dp /M = √Dp DDDp = DpD.

Dp matrix of powers along the diagonal D matrix of path loss values along the diagonal

$$ \begin{aligned} {\text{The}}\,{\text{observations}}\,{\text{seen}}\,{\text{across}}\,{\text{different}}\,{\text{users}},\;{\mathbf{y}} & = {\mathbf{H}}^{{\text{T}}} {\mathbf{Ws}} + {\mathbf{n}} \\ & = {\mathbf{H}}^{{\text{T}}} {\mathbf{H}}^{*} \surd {\mathbf{D}}_{{\text{p}}} /\surd {\text{M}}{\mathbf{s}} + {\mathbf{n}} \\ & = \surd {\text{M }}\surd {\mathbf{D}}\surd {\mathbf{D}}_{{\text{p}}} {\mathbf{s}} + {\mathbf{n}} \\ \end{aligned} $$
(23)
$$ \begin{aligned} {\text{At}}\,{\text{user}}\,{\text{k: the}}\,{\text{observations}}\,{\text{can}}\,{\text{be}}\,{\text{split}}\,{\text{into}}\,{\text{y}}_{{\text{k}}} & = \surd \left( {{\text{Md}}_{{\text{k}}} } \right) \, \surd \left[ {{\mathbf{D}}_{{\text{p}}} } \right]_{{\text{k, k}}} {\text{s}}_{{\text{k}}} {\text{ + n}}_{{\text{k}}} \\ {\text{SNR}}_{{\text{k}}} & = {\text{Md}}_{{\text{k}}} \left[ {{\mathbf{D}}_{{\text{p}}} } \right]_{{\text{k, k}}} {\text{E}}_{{\text{s,k}}} /{\text{N}}_{0} \\ \end{aligned} $$
(24)

The channel in the DL for user i, hiT = √di giT hiT row vector of length M.

We conclude that MU-mMIMO DL channel is a matched filter with asymptotically optimal linear precoder. This means that with simple linear processing at BS, it is possible to remove all MU-interference in DL among all users.

5 Results analysis

According to 3GPP, future mmWave wireless communication systems recommend 28 GHz frequency band for MU-mMIMO [36]. mmWave MU-mMIMO communication link between the BS and UEs is validated using scattering-based MIMO spatial channel model with “single-bounce ray tracing” approximation. This model considers UEs at different T-R spatial locations and randomly placed multiple scatterers. A single channel is used for sounding as well as data transmission, allows path loss modelling with LOS and non-LOS scenarios (these are closer to real scenarios). The channel matrix is updated periodically to mimic variations of MIMO channel over time. The radiation patterns for antenna arrays are isotropic with rectangular or linear geometry. The simulations were performed for a maximum of 256 × 16 MU-mMIMO system for four users and eight users with parameters shown in Tables 2 and 3. At Tx end, 256-element rectangular antenna array is used with 4-RF chains and at Rx, 16-element square array is used with 4-RF chains. Therefore, each antenna element is connected to 4 phase shifters and there are 4-RF chains. To have maximum spectral efficiency of MU-mMIMO, each user is assigned with independent channels and each RF chain is used to send an independent data stream, therefore maximum of 4 streams is supported. The MU-mMIMO Rx at UE of each user is modeled by compensating for path loss, thermal noise.

Table 2 Simulation parameters for MU-mMIMO hybrid beamforming system
Table 3 Configuration of OFDM parameters

Figures 6, 7, 8, 9, 10 and 11 represent the RMS EVM values in four users and eight users MU-mMIMO systems of 16-QAM, 64-QAM and 256-QAM modulation schemes with multiple number of BS antennas. It is observed that, for users with only one data stream, the RMS EVM is very high compared to users with multiple data streams. For a give modulation scheme, RMS EVM is decreasing as number of BS antennas are increased for users with single data stream. For users with more number of data streams, optimum RMS EVM values are achieved for 128 BS antennas and there is very slight increment in error values for 256 BS antennas.

Fig. 6
figure 6

Representation of RMS EVM values using 16-QAM scheme with multiple BS Antennas for 4 users

Fig. 7
figure 7

Representation of RMS EVM values using 64-QAM scheme with multiple BS Antennas for 4 users

Fig. 8
figure 8

Representation of RMS EVM values using 256-QAM with multiple BS Antennas for 4 users

Fig. 9
figure 9

Representation of RMS EVM values using 16-QAM scheme with Multiple BS Antennas for 8 users

Fig. 10
figure 10

Representation of RMS EVM values using 64-QAM with multiple BS Antennas for 8 users

Fig. 11
figure 11

Representation of RMS EVM values using 256-QAM with multiple BS Antennas for 8 users

Figures 12, 13, 14, 15, 16 and 7 represent the RMS EVM values in four users and eight users MU-mMIMO systems with 64, 128, and 256 BS antennas respectively for various modulation schemes. The following observations are made from these figures: for the given number of BS antennas, users with single data stream, the RMS EVM is minimizing at higher modulation order and increasing number of BS antennas. When compared with users with more data streams, the reduction rate of RMS EVM is very high in users with single data stream as the number of BS antennas are increased from 64 to 256. For users with more data streams, optimum performance is achieved with 128 BS antennas and lowest RMS EVM values are obtained. RMS EVM values are very slightly increased for 256 BS antennas compared to 64 BS antennas. Interestingly, for a given number of BS antennas there is almost no change in RMS EVM values when the modulation order is increased from 4 to 8.

Fig. 12
figure 12

Representation of RMS EVM values using various modulation schemes with 64 BS antennas for 4 users

Fig. 13
figure 13

Representation of RMS EVM values using various modulation schemes with 128 BS antennas for 4 users

Fig. 14
figure 14

Representation of RMS EVM values using various modulation schemes with 256 BS antennas for 4 users

Fig. 15
figure 15

Representation of RMS EVM values using various modulation schemes with 64 BS antennas for 8 users

Fig. 16
figure 16

Representation of RMS EVM values using various modulation schemes with 128 BS antennas for 8 users

Fig. 17
figure 17

Representation of RMS EVM values using various modulation schemes with 256 BS antennas for 8 users

Figures 18, 19 and 20 represent the equalized symbol constellation per data stream in MU-mMIMO system with different combinations of modulation schemes and number of BS antennas. In the constellation diagrams, variance of the recovered streams is more at the users with lower number of independent streams. This is due to the absence of dominant modes in the channel and it causes poor SNR. From the receive constellation diagrams of all combinations, it is very clear that the variance of recovered data streams is very much better for users with multiple data streams as the symbol points are properly located with less dispersion in symbol constellation diagram. The reason is that, data streams use most dominant mode of scattering MIMO channel and these have higher SNR. On the other hand, the symbol points in equalized symbol constellation of users with single data stream are highly dispersed and they have poor SNR values.

Fig. 18
figure 18

Equalized symbol constellation per data stream of 16-QAM with 64 BS Antennas for 8 users

Fig. 19
figure 19

Equalized symbol constellation per data stream of 64-QAM with 64 BS Antennas for 4 users

Fig. 20
figure 20

Equalized symbol constellation per data stream of 256-QAM with 64 BS Antennas for 8 users

Figures 21, 22 and 23 represent the signal radiation patterns in MU-mMIMO wireless systems with multiple BS antennas. The stronger lobes of 3D response pattern in mMIMO designs represent distinct data streams of users. These lobes indicate the spread achieved by hybrid beamforming. From these figures, it is clear that the radiation beams are becoming sharper for increasing number of BS antennas and this increases the reliability of signal there by throughput.

Fig. 21
figure 21

Signal Radiation pattern of 16-QAM with 64 BS Antennas for 8 users

Fig. 22
figure 22

Signal Radiation pattern of 16-QAM with 128 BS Antennas for 8 users

Fig. 23
figure 23

Signal Radiation pattern of 16-QAM with 256 BS Antennas for 8 users

6 Conclusions and future scope

A mmWave DL MU-mMIMO hybrid beamforming communication system is designed with multiple independent data streams per user. From the overall results, it is observed that for users with lower number of independent data streams, RMS EVM values are higher and for more number of independent data streams, the RMS EVM is less. Therefore, the increasing number of data streams per user leads to decrease in RMS EVM values. For the given modulation scheme, as the number of BS antennas are increasing, the RMS EVM is decreasing. It gives trade-off between the number of Tx/Rx antenna elements and multiple data streams per user. It has been concluded that if the user data is divided into more number of parallel data streams then it requires less number of active antenna elements to transmit the signals. From the simulation results, it is concluded that 256 × 16 MIMO system is more suitable for eight users and 128 × 16 MIMO system for four users with multiple data streams. It is strongly recommended to have higher number of independent data streams per user in a mmWave MU-mMIMO systems to achieve higher order throughputs. For future research directions, the MU-mMIMO hybrid beamforming designs with an objective of reduction in RMS EVM values for users with single (or less) number of independent data streams and achieve higher throughput are of great investigation interests.