Essentials Of Business Statistics 2e by Jaggia, Kelly
Revision as of 02:09, 2 October 2020 by Gary (talk | contribs) (Created page with "Essentials Of Business Statistics by Jaggia and Kelly (2nd edition) is the 2nd edition of the ''Essentials Of Business Statistics: Communicating with Numbers'' textbook au...")
Essentials Of Business Statistics by Jaggia and Kelly (2nd edition) is the 2nd edition of the Essentials Of Business Statistics: Communicating with Numbers textbook authored by Sanjiv Jaggia, California Polytechnic State University, and Alison Kelly, Suffolk University, and published by McGraw-Hill Education, New York, NY in 2020.
- Acceptance sampling. A statistical quality control technique in which a portion of the completed products is inspected.
- Addition rule. The probability that A or B occurs, or that at least one of these events occurs, is P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
- Adjusted R2. A modification of the coefficient of determination that imposes a penalty for using additional explanatory variables in the linear regression model.
- Alpha. In the capital asset pricing model (CAPM), it measures whether abnormal returns exist.
- Alternative hypothesis (HA). In a hypothesis test, the alternative hypothesis contradicts the default state or status quo specified in the null hypothesis.
- Analysis of variance (ANOVA). A statistical technique used to determine if differences exist between three or more population means.
- Arithmetic mean. The average value of a data set; the most commonly used measure of central location, also referred to as the mean or the average.
- Assignable variation. In a production process, the variation that is caused by specific events or factors that can usually be identified and eliminated.
- Average. See Arithmetic mean.
- Bar chart. A graph that depicts the frequency or relative frequency of each category of qualitative data as a series of horizontal or vertical bars, the lengths of which are proportional to the values that are to be depicted.
- Bayes' theorem. The rule for updating probabilities is P(B|A) = P(A|B)P(B) ______________________ P(A|B)P(B) + P(A|Bc)P(Bc), where P(B) is the prior probability and P(B| A) is the posterior probability.
- Bell curve. See Normal curve.
- Bell-shaped distribution. See Normal distribution.
- Bernoulli process. A series of n independent and identical trials of an experiment such that each trial has only two possible outcomes, and each time the trial is repeated, the probabilities of success and failure remain the same.
- Beta. In the capital asset pricing model (CAPM), it measures the sensitivity of the stock's return to changes in the level of the overall market.
- Between-treatments variance. In ANOVA, a measure of the variability between sample means.
- Bias. The tendency of a sample statistic to systematically overestimate or underestimate a population parameter.
- Big data. A massive volume of both structured and unstructured data that are often difficult to manage, process, and analyze using traditional data processing tools.
- Binomial distribution. A description of the probabilities associated with the possible values of a binomial random variable.
- Binomial random variable. The number of successes achieved in the n trials of a Bernoulli process.
- Box plot. A graphical display of the minimum value, quartiles, and the maximum value of a data set.
- c chart. A control chart that monitors the count of defects per item in statistical quality control.
- Capital asset pricing model (CAPM). A regression model used in finance to examine an investment return.
- Centerline. In a control chart, the centerline represents a variable's expected value when the production process is in control.
- Central limit theorem (CLT). The CLT states that the sum or mean of a large number of independent observations from the same underlying distribution has an approximate normal distribution.
- Chance variation. In a production process, the variation that is caused by a number of randomly occurring events that are part of the production process.
- Changing variability. In regression analysis, a violation of the assumption that the variance of the error term is the same for all observations. It is also referred to as heteroskedasticity.
- Chebyshev's theorem. For any data set, the proportion of observations that lie within k standard deviations from the mean will be at least 1 − 1∕k2, where k is any number greater than 1.
- Chi-square test of a contingency table. See Test for independence.
- Chi-square distribution (χ2 distribution). A family of distributions where each distribution depends on its particular degrees of freedom df. It is positively skewed, with values ranging from zero to infinity, but becomes increasingly symmetric as df increase.
- Classes. Intervals for a frequency distribution of quantitative data.
- Classical probability. A probability often used in games of chance. It is based on the assumption that all outcomes are equally likely.
- Cluster sampling. A population is first divided up into mutually exclusive and collectively exhaustive groups of observations, called clusters. A cluster sample includes observations from randomly selected clusters.
- Coefficient of determination (R2). The proportion of the sample variation in the response variable that is explained by the sample regression equation.
- Coefficient of variation (CV). The ratio of the standard deviation of a data set to its mean; a relative measure of dispersion.
- Complement. The complement of event A, denoted Ac, is the event consisting of all outcomes in the sample space that are not in A.
- Complement rule. The probability of the complement of an event is P(Ac) = 1 − P(A).
- Conditional probability. The probability of an event given that another event has already occurred.
- Confidence coefficient. The probability that the estimation procedure will generate an interval that contains the population parameter of interest.
- Confidence interval. A range of values that, with a certain level of confidence, contains the population parameter of interest.
- Consistency. An estimator is consistent if it approaches the unknown population parameter being estimated as the sample size grows larger.
- Contingency table. A table that shows frequencies for two qualitative (categorical) variables, x and y, where each cell represents a mutually exclusive combination of the pair of x and y values.
- Continuous variable (random variable). A variable that assumes uncountable values in an interval.
- Continuous uniform distribution. A distribution describing a continuous random variable that has an equally likely chance of assuming a value within a specified range.
- Control chart. A plot of statistics of a production process over time.
- Correlated observations. In regression analysis, a violation of the assumption that the observations are uncorrelated. It is also referred to as serial correlation.
- Correlation coefficient. A measure that describes the direction and strength of the linear relationship between two variables.
- Covariance. A measure that describes the direction of the linear relationship between two variables.
- Critical value. In a hypothesis test, the critical value is a point that separates the rejection region from the nonrejection region.
- Cross-sectional data. Values of a characteristic of many subjects at the same point in time or approximately the same point in time.
- Cubic regression model. In regression analysis, a model that allows two sign changes of the slope capturing the influence of the explanatory variable on the response variable.
- Cubic trend model. In time series analysis, a model that allows for two changes in the direction of the series.
- Cumulative distribution function. A probability that the value of a random variable X is less than or equal to a particular value x, P(X ≤ x).
- Cumulative frequency distribution. A distribution of quantitative data recording the number of observations that falls below the upper limit of each class.
- Cumulative relative frequency distribution. A distribution of quantitative data recording the fraction (proportion) of observations that falls below the upper limit of each class.
- Degrees of freedom. The number of independent pieces of information that go into the calculation of a given statistic. Many probability distributions are identified by the degrees of freedom.
- Dependent events. The occurrence of one event is related to the probability of the occurrence of the other event.
- Descriptive statistics. The summary of a data set in the form of tables, graphs, or numerical measures.
- Detection approach. A statistical quality control technique that determines at which point the production process does not conform to specifications.
- Deterministic relationship. A relationship in which the value of the response variable is uniquely determined by the values of the explanatory variables.
- Discrete uniform distribution. A symmetric distribution where the random variable assumes a finite number of values and each value is equally likely.
- Discrete variable (random variable). A variable that assumes a countable number of values.
- Dummy variable. A variable that takes on values of 0 or 1.
- Dummy variable trap. A regression model where the number of dummy variables equals the number of categories of a qualitative variable; the resulting model cannot be estimated.
- Efficiency. An unbiased estimator is efficient if its standard error is lower than that of other unbiased estimators.
- Empirical probability. A probability value based on observing the relative frequency with which an event occurs.
- Empirical rule. Given a sample mean x¯, a sample standard deviation s, and a relatively symmetric and bell-shaped distribution, approximately 68% of all observations fall in the interval x¯ ± s; approximately 95% of all observations fall in the interval x¯ ± 2s; and almost all observations fall in the interval x¯ ± 3s.
- Endogeneity. See Excluded variables.
- Error sum of squares (SSE). In ANOVA, a measure of the degree of variability that exists even if all population means are the same. In regression analysis, it measures the unexplained variation in the response variable.
- Estimate. A particular value of an estimator.
- Estimator. A statistic used to estimate a population parameter.
- Event. A subset of a sample space.
- Excluded variables. In regression analysis, a situation where important explanatory variables are excluded from the regression. It often leads to the violation of the assumption that the error term is uncorrelated with the (included) explanatory variables.
- Exhaustive events. When all possible outcomes of an experiment are included in the events.
- Expected value. A weighted average of all possible values of a random variable.
- Experiment. A process that leads to one of several possible outcomes.
- Explanatory variables. In regression analysis, the variables that influence the response variable. They are also called the independent variables, predictor variables, control variables, or regressors.
- Exponential distribution. A continuous, nonsymmetric probability distribution used to describe the time that has elapsed between occurrences of an event.
- Exponential regression model. A regression model in which only the response variable is transformed into natural logs.
- Exponential trend model. A regression model used for a time series that is expected to grow by an increasing amount each period.
- F distribution. A family of distributions where each distribution depends on two degrees of freedom: the numerator degrees of freedom df1 and the denominator degrees of freedom df2. It is positively skewed, with values ranging from zero to infinity, but becomes increasingly symmetric as df1 and df2 increase.
- Finite population correction factor. A correction factor that accounts for the added precision gained by sampling a larger percentage of the population. It is implemented when the sample constitutes at least 5% of the population.
- Frequency distribution. A table that groups qualitative data into categories, or quantitative data into intervals called classes, where the number of observations that fall into each category or class is recorded.
- Goodness-of-fit test. A chi-square test used to determine if the sample proportions resulting from a multinomial experiment differ from the hypothesized population proportions specified in the null hypothesis.
- Grand mean. In ANOVA, the sum of all observations in a data set divided by the total number of observations.
- Heteroskedasticity. See Changing variability.
- Histogram. A graphical depiction of a frequency or relative frequency distribution; it is a series of rectangles where the width and height of each rectangle represent the class width and frequency (or relative frequency) of the respective class.
- Hypergeometric distribution. A description of the probabilities associated with the possible values of a hypergeometric random variable.
- Hypergeometric random variable. The number of successes achieved in the n trials of a two-outcome experiment, where the trials are not assumed to be independent.
- Hypothesis test. A statistical procedure to resolve conflicts between two competing claims (hypotheses) on a particular population parameter of interest.
- Independent events. The occurrence of one event does not affect the probability of the occurrence of the other event.
- Independent random samples. Two (or more) random samples are considered independent if the process that generates one sample is completely separate from the process that generates the other sample.
- Indicator variable. See dummy variable.
- Inferential statistics. The practice of extracting useful information from a sample to draw conclusions about a population.
- Interaction variable. In a regression model, a product of two explanatory variables. For example, xd captures the interaction between a quantitative variable x and a dummy variable d.
- Interquartile range (IQR). The difference between the third and first quartiles.
- Intersection. The intersection of two events A and B, denoted A ∩ B, is the event consisting of all outcomes in A and B.
- Interval data (scale data). Values of a quantitative variable that can be categorized and ranked, and in which differences between values are meaningful.
- Interval estimate. See Confidence interval.
- Inverse transformation. A standard normal variable Z can be transformed to the normally distributed random variable X with mean μ and standard deviation σ as X = μ + Zσ.
- Joint probabilities. The values in the interior of a joint probability table, representing the probabilities of the intersection of two events.
- Joint probability table. A contingency table whose frequencies have been converted to relative frequencies.
- Kurtosis coefficient. A measure of whether data is more or less peaked than a normal distribution.
- Law of large numbers. In probability theory, if an experiment is repeated a large number of times, its empirical probability approaches its classical probability.
- Left-tailed test. In hypothesis testing, when the null hypothesis is rejected on the left side of the hypothesized value of the population parameter.
- Linear trend model. A regression model used for a time series that is expected to grow by a fixed amount each time period.
- Logarithmic regression model. A regression model in which only the explanatory variable is transformed into natural logs.
- Log-log regression model. A regression model in which both the response variable and the explanatory variable(s) are transformed into natural logs.
- Lower control limit. In a control chart, the lower control limit indicates excessive deviation below the expected value of the variable of interest.
- Margin of error. A value that accounts for the standard error of the estimator and the desired confidence level of the interval.
- Marginal probabilities. The values in the margins of a joint probability table that represent unconditional probabilities.
- Matched-pairs sample. When a sample is matched or paired in some way.
- Mean. See Arithmetic mean.
- Mean absolute deviation (MAD). The average of the absolute differences between the observations and the mean.
- Mean square error (MSE). The average of the error (residual) sum of squares, where the residual is the difference between the observed and the predicted value of a variable.
- Mean square regression. The average of the sum of squares due to regression.
- Mean-variance analysis. The idea that the performance of an asset is measured by its rate of return, and this rate of return is evaluated in terms of its reward (mean) and risk (variance).
- Median. The middle value of a data set.
- Method of least squares. See Ordinary least squares (OLS).
- Mode. The most frequently occurring value in a data set.
- Multicollinearity. In regression analysis, a situation where two or more explanatory variables are correlated.
- Multinomial experiment. A series of n independent and identical trials, such that on each trial there are k possible outcomes, called categories; the probability pi associated with the ith category remains the same; and the sum of the probabilities is one.
- Multiple linear regression model. In regression analysis, more than one explanatory variable is used to explain the variability in the response variable.
- Multiplication rule. The probability that A and B both occur is P(A ∩ B) = P(A|B)P(B).
- Mutually exclusive events. Events that do not share any common outcome of an experiment.
- Negatively skewed distribution (left-skewed distribution). A distribution in which extreme values are concentrated in the left tail of the distribution.
- Nominal data (scale data). Values of a qualitative variable that differ merely by name or label.
- Nonresponse bias. A systematic difference in preferences between respondents and nonrespondents of a survey or a poll.
- Normal curve. A graph depicting the normal probability density function; also referred to as the bell curve.
- Normal distribution (normal probability distribution). The most extensively used probability distribution in statistical work and the cornerstone of statistical inference. It is symmetric and bell-shaped and is completely described by the mean and the variance.
- Null hypothesis (H0). In a hypothesis test, the null hypothesis corresponds to a presumed default state of nature or status quo.
- Ogive. A graph of the cumulative frequency or cumulative relative frequency distribution in which lines connect a series of neighboring points, where each point represents the upper limit of each class and its corresponding cumulative frequency or cumulative relative frequency.
- One-tailed hypothesis test. A test in which the null hypothesis is rejected only on one side of the hypothesized value of the population parameter.
- One-way ANOVA. A statistical technique that analyzes the effect of one categorical variable (factor) on the mean.
- Ordinal (scale) data. Values of a qualitative variable that can be categorized and ranked.
- Ordinary least squares (OLS). A regression technique for fitting a straight line whereby the error (residual) sum of squares is minimized.
- Outliers. Extreme small or large data values.
- p¯ chart. A control chart that monitors the proportion of defectives (or some other characteristic) of a production process.
- p-value. In a hypothesis test, the likelihood of observing a sample mean that is at least as extreme as the one derived from the given sample, under the assumption that the null hypothesis is true.
- Parameter. See Population parameter.
- Percentile. The pth percentile divides a data set into two parts: approximately p percent of the observations have values less than the pth percentile and approximately (100 − p) percent of the observations have values greater than the pth percentile.
- Pie chart. A segmented circle portraying the categories and relative sizes of some qualitative variable.
- Point estimate. The value of the point estimator derived from a given sample.
- Point estimator. A function of the random sample used to make inferences about the value of an unknown population parameter.
- Poisson distribution. A description of the probabilities associated with the possible values of a Poisson random variable.
- Poisson process. An experiment in which the number of successes within a specified time or space interval equals any integer between zero and infinity; the numbers of successes counted in nonoverlapping intervals are independent from one another; and the probability that success occurs in any interval is the same for all intervals of equal size and is proportional to the size of the interval.
- Poisson random variable. The number of successes over a given interval of time or space in a Poisson process.
- Polygon. A graph of a frequency or relative frequency distribution in which lines connect a series of neighboring points, where each point represents the midpoint of a particular class and its associated frequency or relative frequency.
- Polynomial regression model. In regression analysis, a model that allow sign changes of the slope capturing the influence of an explanatory variable on the response variable.
- Population. The complete collection of items of interest in a statistical problem.
- Population parameter. A characteristic of a population.
- Positively skewed distribution (right-skewed distribution). A distribution in which extreme values are concentrated in the right tail of the distribution.
- Posterior probability. The updated probability, conditional on the arrival of new information.
- Prior probability. The unconditional probability before the arrival of new information.
- Probability. A numerical value between 0 and 1 that measures the likelihood that an event occurs.
- Probability density function. The probability density function provides the probability that a continuous random variable falls within a particular range of values.
- Probability distribution. Every random variable is associated with a probability distribution that describes the variable completely. It is used to compute probabilities associated with the variable.
- Probability mass function. The probability mass function provides the probability that a discrete random variable takes on a particular value.
- Probability tree. A graphical representation of the various possible sequences of an experiment.
- Quadratic regression model. In regression analysis, a model that allows one sign change of the slope capturing the influence of the explanatory variable on the response variable.
- Quadratic trend model. In time series analysis, a model that captures either a U-shaped trend or an inverted U-shaped trend.
- Qualitative variable. A variable that uses labels or names to identify the distinguishing characteristics of observations.
- Quantitative variable. A variable that assumes meaningful numerical values for observations.
- Quartiles. Any of the three values that divide the ordered data into four equal parts, where the first, second, and third quartiles refer to the 25th, 50th, and 75th percentiles, respectively.
- R chart. A control chart that monitors the variability of a production process.
- Random error. In regression analysis, random error is due to the omission of factors that influence the response variable.
- Random variable. A function that assigns numerical values to the outcomes of an experiment.
- Range. The difference between the maximum and the minimum values in a data set.
- Ratio data (scale data). Values of a quantitative variable that can be categorized and ranked, and in which differences between values are meaningful; in addition, a true zero point (origin) exists.
- Regression analysis. A statistical method for analyzing the relationship between variables.
- Rejection region. In a hypothesis test, a range of values such that if the value of the test statistic falls into this range, then the decision is to reject the null hypothesis.
- Relative frequency distribution. A frequency distribution that shows the fraction (proportion) of observations in each category of qualitative data or class of quantitative data.
- Residual (e). In regression analysis, the difference between the observed value and the predicted value of the response variable, that is, e = y − y.̂
- Residual plots. In regression analysis, the residuals are plotted sequentially or against an explanatory variable to identify model inadequacies. The model is adequate if the residuals are randomly dispersed around the zero value.
- Response variable. In regression analysis, the variable that is influenced by the explanatory variable(s). It is also called the dependent variable, the explained variable, the predicted variable, or the regressand.
- Right-tailed test. In hypothesis testing, when the null hypothesis is rejected on the right side of the hypothesized value of the population parameter.
- Risk-averse consumer. Someone who takes risk only if it entails a suitable compensation and may decline a risky prospect even if it offers a positive expected gain.
- Risk-loving consumer. Someone who may accept a risky prospect even if the expected gain is negative.
- Risk-neutral consumer. Someone who is indifferent to risk and makes his/her decisions solely on the basis of the expected gain.
- s chart. A control chart that monitors the variability of a production process.
- Sample. A subset of a population of interest.
- Sample correlation coefficient. A sample measure that describes both the direction and strength of the linear relationship between two variables.
- Sample covariance. A sample measure that describes the direction of the linear relationship between two variables.
- Sample space. A record of all possible outcomes of an experiment.
- Sample statistic. A random variable used to estimate the unknown population parameter of interest.
- Sampling distribution. The probability distribution of an estimator.
- Scatterplot. A graphical tool that helps in determining whether or not two variables are related in some systematic way. Each point in the diagram represents a pair of known or observed values of the two variables.
- Seasonal dummy variables. Dummy variables used to capture the seasonal component from a time series.
- Selection bias. A systematic underrepresentation of certain groups from consideration for a sample.
- Semi-log model. A regression model in which not all variables are transformed into logs.
- Serial correlation. See Correlated observations.
- Sharpe ratio. A ratio calculated by dividing the difference of the mean return from the risk-free rate by the asset's standard deviation.
- Significance level. The allowed probability of making a Type I error.
- Simple linear regression model. In regression analysis, one explanatory variable is used to explain the variability in the response variable.
- Simple random sample. A sample of n observations that has the same probability of being selected from the population as any other sample of n observations.
- Skewness coefficient. A measure that determines if the data are symmetric about the mean. Symmetric data have a skewness coefficient of zero.
- Social-desirability bias. A systematic difference between a group's "socially acceptable" responses to a survey or poll and this group's ultimate choice.
- Standard deviation. The positive square root of the variance; a common measure of dispersion.
- Standard error. The standard deviation of an estimator.
- Standard error of the estimate. The standard deviation of the residual; used as a goodness-of-fit measure for regression analysis.
- Standard normal distribution. A special case of the normal distribution with a mean equal to zero and a standard deviation (or variance) equal to one.
- Standard normal table. See z table.
- Standard transformation. A normally distributed random variable X with mean μ and standard deviation σ can be transformed into the standard normal random variable Z as Z = (X − μ)∕σ.
- Standardize. A technique used to convert a value into its corresponding z-score.
- Statistic. See Sample statistic.
- Statistical quality control. Statistical techniques used to develop and maintain a firm's ability to produce high-quality goods and services.
- Stem-and-leaf diagram. A visual method of displaying quantitative data where each value of a data set is separated into two parts: a stem, which consists of the leftmost digits, and a leaf, which consists of the last digit.
- Stochastic relationship. A relationship in which the value of the response variable is not uniquely determined by the values of the explanatory variables.
- Stratified random sampling. A population is first divided up into mutually exclusive and collectively exhaustive groups, called strata. A stratified sample includes randomly selected observations from each stratum. The number of observations per stratum is proportional to the stratum's size in the population. The data for each stratum are eventually pooled.
- Structured data. Data that conform to a predefined rowcolumn format.
- Student's t distribution. See t distribution.
- Subjective probability. A probability value based on personal and subjective judgment.
- Sum of squares due to regression (SSR). In regression analysis, it measures the explained variation in the response variable.
- Sum of squares due to treatments (SSTR). In ANOVA, a weighted sum of squared differences between the sample means and the overall mean of the data.
- Symmetry. When one side of a distribution is a mirror image of the other side.
- t distribution. A family of distributions that are similar to the z distribution except that they have broader tails. They are identified by their degrees of freedom df.
- Test for independence. A goodness-of-fit test analyzing the relationship between two qualitative variables. Also called a chi-square test of a contingency table.
- Test of individual significance. In regression analysis, a test that determines whether an explanatory variable has an individual statistical influence on the response variable.
- Test of joint significance. In regression analysis, a test to determine whether the explanatory variables have a joint statistical influence on the response variable.
- Test statistic. A sample-based measure used in hypothesis testing.
- Time series. A set of sequential observations of a variable over time.
- Total probability rule. A rule that expresses the unconditional probability of an event, P(A), in terms of probabilities conditional on various mutually exclusive and exhaustive events. The total probability rule conditional on two events B and Bc is P(A) = P(A ∩ B) + P(A ∩ Bc) = P(A|B)P(B) + P(A|Bc)P(Bc).
- Total sum of squares (SST). In regression analysis, it measures the total variation in the response variable.
- Trend. A long-term upward or downward movement of a time series.
- Two-tailed hypothesis test. A test in which the null hypothesis can be rejected on either side of the hypothesized value of the population parameter.
- Type I error. In a hypothesis test, this error occurs when the decision is to reject the null hypothesis when the null hypothesis is actually true.
- Type II error. In a hypothesis test, this error occurs when the decision is to not reject the null hypothesis when the null hypothesis is actually false.
- Unbiased. An estimator is unbiased if its expected value equals the unknown population parameter being estimated.
- Unconditional probability. The probability of an event without any restriction.
- Union. The union of two events A and B, denoted A ∪ B, is the event consisting of all outcomes in A or B.
- Unstructured data. Data that do not conform to a predefined row-column format.
- Upper control limit. In a control chart, the upper control limit indicates excessive deviation above the expected value of the variable of interest.
- Variable. A general characteristic being observed on a set of people, objects, or events, where each observation varies in kind or degree.
- Variance. The average of the squared differences from the mean; a common measure of dispersion.
- Weighted mean. When some observations contribute more than others in the calculation of an average.
- Within-treatments variance. In ANOVA, a measure of the variability within each sample.
- x¯ chart. A control chart that monitors the central tendency of a production process.
- z-score. The relative position of a value within a data set; it is also used to detect outliers.
- z table. A table providing cumulative probabilities for positive or negative values of the standard normal random variable Z.