1. Introduction
Nowadays, there is still a need for statistical models capable of extracting all the information from the data, in order to communicate on them and make them useful as well. This is particularly the case in engineering, economics, biological studies and environmental sciences. For this reason, several generations of statisticians have concentrated their efforts in improving the desirable properties of the probability distributions at the basis of these models, through various kinds of extensions or generalizations. In this regard, sophisticated mathematical modifications have emerged, with practical use encouraged by the modern informatics developments. A classical strategy consists in adding scale or shape parameter(s), also through the use of special functions (beta, gamma, hypergeometric, etc.), with the aim to make the former distribution more pliant on some important modeling aspects (mean, variance, tails of the distributions, skewness, kurtosis, etc.). Thus, new families of continuous distributions were proposed, including those developed in the following short list of references: [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10].
In this study, a hybrid family of continuous distributions is constructed, on the basis of the so-called transmuted-G and odd Fréchet-G families. Alpha motivations behind this family are presented below. First of all, the transmuted-G (T-G) family by [
9] is defined by the cumulative distribution function (cdf) and probability density function (pdf) given by
and
respectively, where
(allowing negative value for
),
and
are the cdf and pdf of a baseline continuous distribution, respectively, with
as parameter vector. The definition of
is based on the concept of quadratic rank transmutation as described in [
9]. As prime remark, one can notice that the cdf of the T-G family can be written as a two component mixture: one is the baseline cdf (obtained for
) and the other is the exponentiated-G cdf (see [
5]) with power parameter two (obtained for
). Numerous studies proved that the simple polynomial structure behind the T-G family can improve the desirable characteristics of the baseline distribution and make the choice of the baseline distribution less determinant (see [
11] (Introduction), and the references therein). In addition, the T-G family positively serves to generalize or extend other existing families. For notable studies in this regard, we refer the reader to the transmuted exponentiated generalized-G by [
12], new transmuted-G family by [
13], generalized transmuted-G family by [
14], transmuted Weibull-G family by [
15], transmuted odd Lindley-G family by [
16], generalized transmuted-G family by [
17], transmuted Gompertz-G family by [
18], T transmuted-X family by [
19], transmuted transmuted-G family by [
20] and transmuted generalized odd generalized exponential-G family by [
21], among others.
In parallel of these modern transmuted-G families, [
7] proposed the odd Fréchet-G (OFr-G) family, constituting a new and simple family using the Fréchet distribution as main generator. More precisely, it is based on the cdf and pdf given by
and
respectively, where
(a shape parameter),
and
are the cdf and pdf of a baseline continuous distribution with
as parameter vector, respectively. It is shown in [
7] that the OFr-G family is easily applicable for modeling purposes. See also [
22] where a special member of the OFr-G family, called the odd Féchet inverse exponential distribution, is applied with success. This was also discussed in several notable extensions and generalizations, as in [
23] introducing the extended odd Fréchet family, [
24] developing the Fréchet Topp Leone-G family, [
25] for the generalized odd inverted-exponential-G family and [
26] introducing the extended odd Fréchet-G family. However, all these families are based on thorough transformations of the odd function:
; none of them investigate a simple and direct modification of
. As praised in the previous paragraph about the T-G family, a motivated idea is to investigate the tunable quadratic rank transmutation. To the best of our knowledge, this direction of work remains new and promising in view of the respective qualities of the T-G and OFr-G families. We thus introduce the transmuted odd Fréchet-G (TOFr-G) family defined with the cdf and pdf given by Equations (
1) and (
2) with
as Equation (
3) and
as Equation (
4), i.e.,
and
respectively, where the notations of the previous paragraphs have been used. The attractive motivation behind the TOFr-G family is to improve the overall adaptability of the former OFr-G family, through the use of the quadratic rank transmutation, and more specially, the tuning of the additional parameter
(the OFr-G family being obtained with
). In addition, this modification makes the choice of the baseline distribution less crucial; globally, the joint action of
and
in the definition of Equations (
5) and (
6) ensures a high level of flexibility for important distributional characteristics, such as the mode(s), skewness, kurtosis, mean and variance. We illustrate this aspect by discussing in detail a special three-parameter distribution of the family defined with the (standard one-parameter) exponential model as baseline. A graphical analysis reveals that the corresponding probability density and hazard rate functions possess a large panel of monotonic and nonmonotonic shapes, making it desirable for data fitting, among others. Additionally, by considering real data sets of interest, we show that the corresponding model has a better fit behavior in comparison to the transmuted linear exponential distribution developed by [
27], new generalized linear exponential proposed by [
28], standard Fréchet model and standard exponential model. The gain in terms of statistical modeling is significant.
The rest of the study is structured by the following plan. In
Section 2, we complete the presentation of the TOFr-G family by mentioning other important functions of interest, and some special members including the one based on the standard exponential distribution. The mathematical properties of the TOFr-G family are investigated in
Section 3, deriving some useful, representation, measures and functions. Turning on the TOFr-G family as potential statistical models, the parametric estimation of the models are discussed via the maximum likelihood method in
Section 4, with a simulation study guaranteeing their numerical performance. In
Section 5, three practical data sets are analyzed, showing how useful the TOFr-G models can be. Some conclusions are provided in
Section 6.
3. Some Results
Here, some mathematical aspects of the TOFr-G family are discussed, and specifically, alternative expressions for the corresponding pdf and cdf, various moments and related functions (incomplete moments, Lorenz curve, etc.).
Henceforth, X denotes a random variable (rv) having the cdf of the TOFr-G family.
3.1. Alternative Expression of the Pdf
Here, we establish a linear/series representation for the pdf of the TOFr-G family in terms of pdfs of the exponentiated-G family. As developed in detail in [
33], it allows to provide series expansions of important related measures and functions, such as ordinary moments, moment generating function, incomplete moments and so on. From a practical treatment, we can derive precise approximations of them by replacing the infinite limit by any large integer. This remains an acceptable analytical approach, basically less opaque than using already implemented tools in mathematical softwares. Moreover, as mentioned in [
33], the use of such series expansions can be more precise than numerical integration techniques.
Based on Equation (
6),
can be written as
Now, the power series of the exponential function gives, for
,
Now, the generalized binomial formula gives
and
Hence,
where
denotes the pdf of the exponentiated-G family (with
as power parameter) and
Similarly, upon integration over
, the cdf of the TOFr-G family can also be expressed as
where
denotes the cdf of the exponentiated-G family (with
as power parameter).
Some applications of the above results will be presented later.
3.2. Quantile Function
Like the cdf, the quantile function characterizes the distribution. It plays an essential role in many statistical applications. The quantile function of the TOFr-G family, say
, is defined as the inverse function of
. After some algebra, we establish that
where
denotes the quantile function corresponding to the baseline distribution.
In the context of the TOFrE distribution, one has . Among the possible applications involving , we can easily generate values for X; for any realization u of an uniform rv over , is a realization of X.
In addition, the quantile function allows to define of several shapes measures, as the pioneers Bowley skewness and Moors kurtosis [
34,
35].
3.3. On the Moments
Here, the moments of the TOFr-G family are discussed, with natural extensions.
Henceforth, denotes a rv having the cdf and pdf given by and , respectively.
In addition, it is assumed that all the presented sums and integrals exist (in the convergence sense), which is not guarantee a priori since most of them depend on the definition of the baseline distribution.
3.3.1. Ordinary Moments
The ordinary moments of
X are the essential ingredients to define important measures of the TOFr-G family, as the mean, variance, coefficients of variations of
X, coefficients of skewness and kurtosis, among others. They are determined below. The r-th ordinary moment of the TOFr-G family can be obtained from Equation (
8) as
In full generality, we have
. For instance, in the setting of the TOFrE distribution, the expression of
can be found in [
5] (Equation (2.1)), i.e.,
From the computational point of view, the following approximation remains acceptable:
(the choice of “40” remains subjective, any large integer can be chosen).
The mean and variance of X are, respectively, given by and . Additionally, the coefficients of skewness and kurtosis are defined by and .
Table 2 presents some of the measures above when
X follows the TOFrE distribution. Several sets of parameters values are considered. Strong variations are mainly observed for the mean, variance and kurtosis. In particular, we see that
has an important effect on the kurtosis, as already suggested by
Figure 1. In line with what has been observed in
Figure 1, the skewness remains oriented to the right, but with small variations.
3.3.2. Moment Generating Function
The moment generating function of the TOFr-G family can be obtained from Equation (
8) as
Now, one can notice that . If necessary, we can also express it as .
In the setting of the TOFrE distribution, the expression of
can be found in [
5] (Equation (2.3)), i.e., for
,
where
,
, denotes the gamma function.
3.4. Incomplete Moments and Application
Some functions are useful for prediction purposes in lifetime models, finding numerous applications in demography, economics, econometrics, insurance, reliability and medicine. Several of them can be defined through the use of incomplete moments, as discussed below.
3.4.1. Incomplete Moments
Thanks to Equation (
8), the
incomplete moments of
X evaluated at
can be expressed as
where, in full generality,
. For instance, in the framework of the TOFrE distribution, we can show that
where
,
,
, denotes the lower incomplete gamma function. Alternatively, we can use the following representation:
.
Some functions defined with the incomplete moments are presented below.
3.4.2. Applications
On some residual life functions. The mean residual life and reversed residual life functions have many applications in applied sciences. In addition, as a significant theoretical result, it is proved that the mean residual life function characterizes the distribution (see [
36]). See [
37], and the references therein.
For the TOFr-G family, we can determine the r-th moment of the residual life. It corresponds to the function of
t given as
where
and
are given by Equations (
9) and (
10), respectively.
In particular, the mean residual life function is given as
. In addition, as complementary function, the r-th moment of the reversed residual life is the function of
t given by
The mean reversed residual life function is defined by .
Mean deviations. The first incomplete moment allows to define some mean deviations, which find applications in income fields and property in economics (see [
34]). In the context of the TOFr-G family, the mean deviation of
X about the mean
and the mean deviation of
X about the median
are defined as
respectively, where
is the first complete moment given by Equation (
10) with
.
Bonferroni and Lorenz curves. Lorenz and Bonferroni curves are essential tools to determine inequality measures with numerous applications in medicine, reliability and demography. See [
38], and the references therein. In the setting of the TOFr-G family, they are defined by
respectively, where
is the first complete moment given by Equation (
10) with
and
.
5. Applications
In this section, we use the TOFrE model for statistical analyzes of three notorious data sets; the two first data sets are with right exponential-like tails and the third one is with right heavy-like tail. In particular, we aim to compare the fits of the TOFrE model with those of the transmuted linear exponential distribution (TLE) (see [
27]), new generalized linear exponential (NGLE) (see [
28]), Fréchet (Fr) and exponential (E) models.
The maximum likelihood method is used for all the models, allowing to determine the following measures: AIC, CVM, AD and KS, i.e., Akaike information criterion, Cramer–von Mises, Anderson–Darling and Kolmogorov–Smirnov statistics. In addition, the p-value of the corresponding KS test is provided. The best model is the one with the smallest AIC, CVM, AD and KS values and the biggest p-value for the KS test. The calculations are performed by using the package maxLik proposed by the R software.
5.1. Data Sets I and II (Exponential Tail)
Let us now present our two first data sets of interest, both coming from real-life phenomena.
Data set II. The second data set, called Data set II, contains 72 measurements of excedances of the Wheaton river in Canada, between 1958 to 1984. These data were also considered by [
41], among others. The data are also available at the following electronic address:
https://chesneau.users.lmno.cnrs.fr/DatasetII.txt The basics statistics of these data sets are given in
Table 7, with support of the corresponding boxplots in
Figure 4.
The main observable differences between the two data sets are in the central and dispersion parameters and also, in their right skewed nature: Data set I is highly right skewed whereas Data set II is moderately right skewed. We refine our descriptive analysis by showing the corresponding total time on test (TTT) plots in
Figure 5 as introduced by [
42].
These TTT plots reveal that Data set I has a concave TTT line, corresponding to a possible subjacent increasing hrf, whereas Data set II has concave-convex TTT line, corresponding to a possible subjacent bathtub-shaped hrf. These two cases are covered by the TOFrE model, motivating its use to analyze such data.
The MLEs of the model’s parameters along with their standard errors (SEs) are collected in
Table 8 and
Table 9 for Data sets I and II, respectively.
Table 10 and
Table 11 present the values of the criteria of fitness of the models for Data sets I and II, respectively.
From
Table 10, we see that the TOFrE and TLE models are the best for Data set I. In particular, the TOFrE model possesses the lowest AIC and KS values, and has the biggest
p-value for the KS test; it is the best under these two criteria. From
Table 11, the TOFrE model is the best for Data set II; it possesses the lowest AIC, CVM, AD and KS values and has the biggest
p-value for the KS test.
Now, we display the plots of the estimated pdfs and cdfs for Data sets I and II in
Figure 6 and
Figure 7, respectively. Visually, in comparison to the competitor models, the blue curves of the estimated fits of the TOFrE model are more close to the empirical pdfs and cdfs.
We complete this part by determining the confidence intervals of the TOFrE model parameters in
Table 12, as described in
Section 4.
5.2. Data Set III (Heavy Tail)
We may refer to [
43] for all the necessary descriptive statistics. Thus, we aim to apply our statistical methodology to this new data set. As in [
43], we also introduced the two following criteria: consistent Akaike information criterion (CAIC), and Hannan–Quinn information criterion (HQIC), which have the same interpretation to the AIC. The MLEs of the considered models are provided in
Table 13.
Table 14 indicates the values of the considered criteria of fitness of the models for Data set III.
The numerical results in
Table 14 show that the TOFrE model provides a better fit to the considered competitors. In addition, for the same data, it is better to the new heavy tailed Weibull (NHTW) model developed by [
43], having the following values for the considered criteria: AIC
, BIC
, CAIC
and HQIC
(see [
43] (
Table 6)), which reveals to be better to other heavy tailed models, such as the Weibull, Kumaraswamy Weibull, Lomax, Marshall–Olkin Weibull and Burr-XII models.
The estimated pdfs and cdfs of the models for Data set III are sketched in
Figure 8.
In the light of this study, thanks to all its numerous qualities, we hope that the TOFr-G family will seduce the practitioner for wider applications in applied sciences. As perspectives, the multivariate extensions of the TOFr-G family can be of interest for the construction of various regression models as well as clustering methods, allowing new possibilities for the analysis of big data.