A Concise Guide to Market Research 3e by Sarstedt, Mooi
A Concise Guide to Market Research 3e by Sarstedt, Mooi is the 3rd edition of the A Concise Guide to Market Research: The Process, Data, and Methods Using IBM SPSS Statistics book authored by:
- Marko Sarstedt, Faculty of Economics and Management, Otto-von-Guericke University Magdeburg, Magdeburg, Germany,
- Erik Mooi, Department of Management and Marketing, The University of Melbourne, Parkville, Australia.
The book is published by Springer-Verlag GmbH Germany, part of Springer Nature in 2019.
- α error. Occurs when erroneously rejecting a true null hypothesis. Also referred to as type I error.
- α-inflation. Results when multiple tests are conducted simultaneously on the same data. The result is that you are more likely to claim a significant result when this is not so (i.e., an increase or inflation in the type I error).
- Acquiescence. Describes the tendency of respondents from different cultures to agree with statements (e.g., as formulated in a Likert scale item) regardless of their content.
- Adjusted R2. Is a measure of goodness-of-fit that takes the number of independent variables and the sample size into account. The statistic is useful for comparing regression models with different numbers of independent variables, sample sizes, or both.
- Agglomerative clustering. Is a type of hierarchical clustering method in which clusters are consecutively formed from objects. It starts with each object representing an individual cluster. The objects are then sequentially merged to form clusters of multiple objects, starting with the two most similar.
- Aggregation. Is a type of scale transformation in which variables measured at a lower level are taken to a higher level.
- Akaike Information Criterion (AIC). Is a relative measure of goodness-of-fit, which can be used to compare regression models with different independent variables and/or number of observations. Compared to an alternative model setup, smaller AIC values indicate a better fit. The criterion is also used in the two-step cluster analysis to determine on the number of clusters.
- Alternative hypothesis. Is the hypothesis against which the null hypothesis is tested.
- American Marketing Association (AMA). Is the world's leading association for marketing professionals.
- Analysis of Variance (ANOVA). Is a multivariate data analysis technique that allows testing whether the means of (typically) three or more groups differ significantly on one (one-way ANOVA) or two (two-way ANOVA) metric dependent variable(s). There are numerous extensions to more dependent variables and to differently scaled independent variables.
- Anti-image. Is a measure used in principal component and factor analysis to determine whether the items correlate sufficiently. The anti-image describes the portion of an item's variance that is independent of another item in the analysis.
- Armstrong and Overton procedure. Is used to assess the degree of non-response bias. This procedure calls for comparing the first 50% respondents with the last 50 % with regard to key demographic variables. The concept behind this procedure is that later respondents more closely match the characteristics of non-respondents.
- Autocorrelation. Occurs when the residuals from a regression analysis are correlated.
- Average linkage. Is a linkage algorithm in hierarchical clustering methods in which the distance between two clusters is defined as the average distance between all pairs of objects in the two clusters.
- β error. Occurs when erroneously accepting a false null hypothesis. Also referred to as type II error.
- Back-translation. Is a translation method used in survey research in which a survey is being translated and then back-translated into the original language by another person.
- Balanced scale. Describes a scale with an equal number of positive and negative scale categories.
- Bar chart. Is a graphical representation of a single categorical variable indicating each category's frequency of occurrence. Bar charts are primarily useful for describing nominal and ordinal variables.
- Bartlett method. Is a procedure to generate factor scores in principal component analysis. The resulting factor scores have a zero mean and a standard deviation larger than one.
- Bartlett's test of sphericity. Is used in the context of principal component analysis and factor analysis to assess whether the variables are sufficiently correlated.
- Bayes Information Criterion (BIC). Is a relative measure of goodness-of-fit, which can be used to compare regression models with different independent variables and/or number of observations. Compared to an alternative model setup, smaller BIC values indicate a better fit. The criterion is also used in the two-step cluster analysis to determine on the number of clusters.
- Big data. Refers to very large datasets, generally a mix of quantitative and qualitative data in very large volumes.
- Binary logistic regression. Is a type of regression method used when the dependent variable is binary and only takes two values.
- Bivariate statistics. Describes statistics that express the empirical relationship between two variables. Covariance and correlation are key measures that indicate (linear) associations between two variables.
- Bonferroni correction. Is a post hoc test typically used in an ANOVA that maintains the familywise error rate by calculating a new pairwise alpha that divides the statistical significance level α by the number of comparisons made. See also familywise error rate and α-inflation.
- Bootstrapping. Is a resampling technique that draws a large number of subsamples from the original data (with replacement) and estimates parameters for each subsample. It is used to determine standard errors of coefficients to assess their statistical significance without relying on distributional assumptions.
- Box plot. Shows the distribution of a (typically continuous) variable. It consists of elements expressing the dispersion of the data. Also referred to as box-and-whisker plot.
- Case. Is an object such as a customer, a company, or a country in statistical analysis. Also referred to as observation.
- Causal research. Is used to understand the relationships between two or more variables. Causal research explains how variables relate.
- Census. Is a procedure of systematically acquiring and recording information about all the members of a given population.
- Centroid linkage. Is a linkage algorithm in hierarchical clustering methods in which the distance between two clusters is defined as the distance between their geometric centers (centroids).
- Chaining effect. Is a solution pattern typically observed when using a single linkage algorithm in cluster analysis.
- Chebychev distance. Is a distance measure used in cluster analysis that uses the maximum of the absolute difference in the clustering variables' values.
- City-block distance. Is a distance measure used in cluster analysis that uses the sum of the variables' absolute differences. Also referred to as Manhattan metric.
- Closed-ended questions. Is a type of question format in which respondents have a certain number of response categories from which to choose.
- Cluster analysis. Is a class of methods that groups a set of objects with the goal of obtaining high similarity within the formed groups and high dissimilarity between groups.
- Clustering variables. Are variables used in cluster analysis.
- Clusters. Are groups of objects with similar characteristics.
- Codebook. Contains essential details of a data file, such as variable names and summary statistics.
- Coefficient of determination (R2). Is a measure used in regression analysis to express the dependent variable's amount of variance that the independent variables explain.
- Collinearity. Arises when two variables are highly correlated.
- Communality. Describes the amount of a variable's variance that the extracted factors in a principal component and factor analysis reproduce.
- Complete linkage. Is a linkage algorithm in hierarchical clustering methods in which the distance between two clusters corresponds to the longest distance between any two members in the two clusters.
- Components. Are extracted in the course of a principal component analysis. They are also commonly referred to as factors.
- Confidence interval. Provides the lower and upper limit of values within which a population parameter will fall with a certain probability (e.g., 95 %).
- Confirmatory factor analysis. Is a special form of factor analysis used to test whether the measures of a construct are consistent with a researcher's understanding of that construct.
- Constant. Is a characteristic of an object whose value does not change.
- Constant sum scale. Is a type of scale that requires respondents to allocate a certain total number of points (typically 100) to a number of alternatives.
- Construct. Measures a concept that is abstract, complex, and cannot be directly observed. Also referred to as latent variable.
- Construct scores. Are composite scores that calculate a value for each construct of each observation. Construct scores are often computed by taking the mean of all the items associated with the construct.
- Construct validity. Is the degree of correspondence between a measure at the conceptual level and its empirical manifestation. Researchers often use this as an umbrella term for content, criterion, discriminant, face, and nomological validity.
- Content validity. Refers to the extent to which a measure represents all facets of a given construct.
- Control variables. Are included into (typically regression) analyses in order to rule these out as alternative explanations.
- Correlation. Is a measure of how strongly two variables relate to each other. Correlation is a scaled version of the covariance.
- Correlation residuals. Are the differences between the original item correlations and the reproduced item correlations in a principal component and factor analysis.
- Covariance. Is a measure of how strongly two variables relate to each other.
- Covariance-based structural equation modeling (CB-SEM). Is an approach to structural equation modeling used to test relationships between multiple items and constructs.
- Criterion validity. Measures how well one measure predicts the outcome of another measure when both are measured at the same time.
- Cronbach's Alpha. Is a measure of internal consistency reliability. Cronbach's Alpha generally varies between 0 and 1 with greater values indicating higher degrees of reliability.
- Crosstabs. Are tables in a matrix format that show the frequency distribution of nominal or ordinal variables.
- Cross validation. Entails comparing the results of an analysis with those obtained when using a new dataset.
- Customer relationship management (CRM). Refers to a system of databases and software used to track and predict customer behavior.
- Data entry errors. Is a mistake in transcribing data during data entry. Erroneous values that fall outside a variable's standard range can easily be identified by means of descriptive statistics (minimum, maximum, and range).
- Degrees of freedom (df). Represents the amount of information available to estimate a test statistic. Generally, an estimate's degrees of freedom are equal to the amount of independent information used (i.e., the number of observations) minus the number of parameters estimated.
- Dendrogram. Visualizes the results of a cluster analysis. Horizontal lines in a dendrogram indicate the distances at which the objects have been merged.
- Dependence of observations. Is the degree to which observations are related.
- Dependent variables. Are the concepts a researcher wants to understand, explain, or predict.
- Descriptive research. Is used to detail certain phenomena, characteristics, or functions. Descriptive research often builds on previous exploratory research.
- Directional hypothesis. Looks for an increase or a decrease in a parameter (such as a population mean) relative to a specific standard. Directional hypothesis can either be right-tailed or left-tailed.
- Direct oblimin rotation. Is a popular oblique type of rotation, which allows specifying the maximum degree of obliqueness.
- Discriminant validity. Ensures that a measure is empirically unique and represents phenomena of interest that other measures in a model do not capture.
- Distance matrix. Expresses the distances between pairs of objects.
- Divisive clustering. Is a type of hierarchical clustering method in which all objects are initially merged into a single cluster, which the algorithm then gradually splits up.
- Double-barreled questions. Are survey questions to which respondents can agree with one part but not with the other. Also refers to survey questions that cannot be answered without accepting an assumption.
- Dummy variables. Are binary variables that indicate whether a certain trait is present or not.
- Durbin-Watson test. Is a test for autocorrelation used in regression analysis.
- Effect size. Is a statistical measure to determine the strength of the effect (e.g., in an ANOVA).
- Eigenvalue. Indicates the amount of variance reproduced by a specific component or factor.
- Eigenvectors. Are the results of a principal component analysis and include the factor weights.
- Equidistance. Is indicated when the (psychological) distances between a scale's categories are identical.
- Equidistant scale. See Equidistance.
- Error. Is the difference between the regression line (which represents the regression prediction) and the actual observation.
- Error sum of squares. Quantifies the difference between the observations and the regression line.
- ESOMAR. Is the world organization for market, consumer, and societal research.
- Estimation sample. Is the sample used to run a statistical analysis.
- Eta-squared (η2). Is a statistic used in an ANOVA to describe the ratio of the between-group variation to the total variation, thereby indicating the variance accounted for by the factor variable(s). η2 is identical to the R2, the coefficient of determination in regression analysis. It can take on values between 0 and 1 whereas a high value implies that the factor exerts a strong influence on the dependent variable.
- Ethics. Are a system of morals and principles, which defines a research organization's obligations, for example, with regard to the findings they release being an accurate portrayal of the survey data.
- Ethnography. Is a type of qualitative research in which the researcher interacts with consumers over a period to observe and question them.
- Euclidean distance. Is a distance measure commonly used in cluster analysis. It is the square root of the sum of the squared differences in the variables' values. Also referred to as straight line distance.
- Experimental design. Describes which treatment variables to administer and how these relate to dependent variables. Prominent experimental designs include the one-shot case study, the before-after design, the before-after design with a control group, and the Solomon four-group design.
- Experiments. Are study designs commonly used in causal research in which a researcher controls for a potential cause and observes corresponding changes in hypothesized effects via treatment variables.
- Explained variation. Is the degree of variation that a factor variable in an ANOVA explains. It is similar to the Coefficient of Determination used for regression analysis.
- Exploratory factor analysis. Is a type of factor analysis that derives factors from a set of correlated indicator variables without the researcher having to prespecify a factor structure.
- Exploratory research. Is conducted when the researcher has little or no information about a particular problem or opportunity. It is used to refine research questions, discover new relationships, patterns, themes, and ideas or to inform measurement development.
- External secondary data. Are compiled outside a company for a variety of purposes. Sources of secondary data include, for example, governments, trade associations, market research firms, consulting firms, (literature) databases, and social networks.
- External validity. Is the extent to which the study results can be generalized to real-world settings.
- Extreme response styles. Occur when respondents systematically select the endpoints of a response scale.
- Face-to-face interview. See Personal interview
- Face validity. Is the extent to which a test is subjectively viewed as covering the concept it purports to measure.
- Factor analysis. Is a statistical procedure that uses the correlation patterns among a set of indicator variables to derive factors that represent most of the original variables' variance. Also referred to as Principal axis factoring.
- Factor-cluster segmentation. Is the process of running a cluster analysis on factor scores derived from a principal component or factor analysis to handle collinear variables.
- Factor level. The values of a factor variable that differentiates the groups in an ANOVA.
- Factor loading. Is the correlation between a (unit-scaled) factor and a variable.
- Factor rotation. Is a technique used to facilitate the interpretation of solutions in principal component and factor analysis.
- Factor scores. Are composite scores that calculate a value for each factor of each observation.
- Factor variable. Is a categorical variable used to define the groups (e.g., three types of promotion campaigns) in an ANOVA.
- Factor weights. Express the relationships between variables and factors.
- Factors. Are (1) independent variables in an ANOVA, and (2) the resulting variables of a principal component and factor analysis that summarize the information from a set of indicator variables.
- Familywise error rate. Is the probability of making one or more false discoveries or type I errors when performing multiple hypotheses tests. See also α-inflation.
- Field experiments. Are experiments in which the manipulation of a treatment variable occurs in a natural setting, thereby emphasizing the external validity, but potentially compromising internal validity.
- Field service firms. Are companies that focus on conducting surveys, determining samples, sample sizes, and collecting data. Some of these firms also translate surveys, or provide addresses and contact details.
- Focus groups. Is a method of data collection in which six to ten participants discuss a defined topic under the leadership of a moderator.
- Forced-choice scale. Is an answer scale that omits a neutral category, thereby forcing the respondents to make a positive or negative assessment.
- Formative construct. Is a type of measurement in which the indicators form the construct.
- Free-choice scale. Is an answer scale that includes a neutral choice category. Respondents are therefore not forced to make a positive or negative assessment.
- Frequency table. Is a table that displays the absolute, relative, and cumulative frequencies of one or more variables.
- F-test. A test statistic used in an ANOVA and regression analysis to test the overall model's significance.
- F-test of sample variance. See Levene's test.
- Full service providers. Are large market research companies, such as The Nielsen Company, Kantar, or GfK, that offer syndicated and customized services.
- Grand mean. Is the overall average across all levels of a factor variable or other variables that split the dataset into groups.
- Heteroskedasticity. Refers to a situation in regression analysis in which the variance of the residuals is not constant.
- Hierarchical clustering methods. Develop a tree-like structure of objects in the course of the clustering process, which can be top-down (divisive clustering) or bottom-up (agglomerative clustering).
- Histogram. Is a graph that shows how frequently categories derived from a continuous variable occur.
- Homoscedasticity. Refers to a situation in regression analysis in which the variance of the residuals is constant.
- Hypotheses. Are claims made about effects or relationships in a population.
- Inconsistent answers. Are a respondent's contradictory answer patterns.
- Independent samples. Occur if observations are sampled only once.
- Independent samples t-test. A test using the t-statistic that establishes whether two means collected from independent samples differ significantly.
- Independent variables. Are variables that explain or predict a dependent variable.
- In-depth interview. Is a qualitative conversation with participants on a specific topic. This interview type is typically used in exploratory research as it allows one-to-one probing to foster interaction between the interviewer and the respondent.
- Index. Consists of a set of variables that defines the meaning of the resulting composite.
- Index construction. Is the procedure of combining several items to form an index.
- Interaction effect. Refers to how the effect of one variable on another variable is influenced by a third variable.
- Interaction term. Is an auxiliary variable entered into the regression model to account for the interaction of the moderator variable and an independent variable.
- Intercept. Is the expected mean value of the dependent variable in a regression analysis, when the independent variables are zero. Also referred to as a constant.
- Internal consistency reliability. Is a form of reliability used to judge the consistency of results across items in the same test. It determines whether the items measuring a construct are highly correlated. The most prominent measure of internal consistency reliability is Cronbach's Alpha.
- Internal secondary data. Are data that companies compile for various reporting and analysis purposes.
- Internal validity. Is the extent to which causal claims can be made in respect of the study results.
- Interquartile range. Is the difference between the third and first quartile.
- Inter-rater reliability. Is the degree of agreement between raters expressed by the amount of consensus in their judgment.
- Interviewer fraud. Is an issue in data collection resulting from interviewers making up data or even falsifying entire surveys.
- Item non-response. Occurs when people do not provide answers to certain questions, for example, because they refuse to answer, or forgot to answer.
- Items. Represent measurable characteristics in conceptual models and statistical analysis. Also referred to as indicators.
- Kaiser criterion. Is a statistic used in principal component and factor analysis to determine the number of factors to extract from the data. According to this criterion, researchers should extract all factors with an eigenvalue greater than one. Also referred to as latent root criterion.
- Kaiser–Meyer–Olkin criterion. Is an index used to assess the adequacy of the data for a principal component and factor analysis. High values indicate that the data are sufficiently correlated. Also referred to as measure of sampling adequacy (MSA).
- KISS principle. The abbreviation of "Keep it short and simple!" and implies that any research report should be as concise as possible.
- Kruskal-Wallis rank test. Is the non-parametric equivalent of the ANOVA. The null hypothesis of the test is that the distribution of the test variable across all groups is identical.
- k-means. Is a group of clustering methods that starts with an initial partitioning of all the objects into a prespecified number of clusters and then gradually re-allocates objects in order to minimize the overall within-cluster variation.
- k-means++. Is a variant of the k-means method that uses an improved initialization process.
- k-medians. Is a popular variant of k-means that aims at minimizing the absolute deviations from the cluster medians.
- k-medoids. Is a variant of k-means that uses other cluster centers rather than the mean or median.
- Lab experiments. Are performed in controlled environments (usually in a company or academic lab) to isolate the effects of one or more treatment variables on an outcome.
- Label switching. A situation in which the labels of clusters change from one analysis to the other.
- Laddering. Is an interviewing technique where the interviewer pushes a seemingly simple response to a question in order to find subconscious motives. It is typically used in the means-end approach.
- Latent concepts. Represent broad ideas or thoughts about certain phenomena that researchers have established and want to measure in their research.
- Latent root criterion. See Kaiser criterion.
- Latent variable. measures a concept that is abstract, complex, and cannot be directly observed by (multiple) items. Also referred to as construct.
- Left-tailed hypothesis. Is a directional hypothesis expressed in a direction (lower) relative to a standard.
- Levene's test. Tests the equality of the variances between two or more groups of data. Also referred to as F-test of sample variance.
- Likert scale. Is a type of answering scale in which respondents have to indicate their degree of agreement to a statement. The degree of agreement is usually set by the scale endpoints, which range from strongly disagree to strongly agree.
- Limited service providers. Are market research companies that specialize in one or more services.
- Line chart. Is a type of chart in which measurement points are ordered (typically according to their x-axis value) and joined with straight-line segments.
- Linkage algorithm. Defines the distance from a newly formed cluster to a certain object, or to other clusters in the solution.
- Listwise deletion. Entails deleting cases with one or more missing value(s) in any of the variables used in an analysis.
- Little's MCAR test. Is a test to analyze the patterns of missing data by comparing the observed data with the pattern expected if the data were missing completely at random.
- Local optimum. Is an optimal solution when compared with similar solutions, but not a global optimum.
- Log transformation. Is a type of scale transformation commonly used to handle skewed data.
- Mail surveys. Are paper-based surveys sent to respondents via regular mail.
- Manhattan metric. See City-block distance.
- Manipulation checks. A type of analysis in experiments to check whether the experimental treatment was effective.
- Mann-Whitney U test. Is the non-parametric equivalent of the independent samples t-test used to assess whether two sample means are equal or not.
- Marginal mean. Represents the mean value of one category in respect of each of the other types of categories.
- Market segmentation. Is the segmenting of markets into groups (segments) of objects (e.g., consumers) with similar characteristics (e.g., needs and wants).
- Market segments. Are groups of objects with similar characteristics.
- Matching coefficients. Are similarity measures that express the degree to which the clustering variables' values fall into the same category.
- Mean. Is the most common method of defining a typical value of a list of numbers. It is equal to the sum of a variable's values divided by the number of observations. Also referred to as arithmetic mean or simply average.
- Means-end approach. A method used to identify the ends consumers aim to satisfy and the means (consumption) they use to do so.
- Measurement scaling. Refers to (1) the level at which a variable is measured (nominal, ordinal, interval or ratio scale), and (2) the general act of using a set of variables to measure a construct.
- Measure of sampling adequacy (MSA). See Kaiser–Meyer–Olkin criterion
- Measures of centrality. Are statistical indices of a typical or average value of a list of numbers. There are two main types of measures of centrality, the median and the mean.
- Measures of dispersion. Provide researchers with information about the variability of the data (i.e., how far the values are spread out). There are four main types of measures of dispersion: the range, interquartile range, variance, and standard deviation.
- Median. Is a value that separates the lowest 50 % of values from the highest 50 % of values.
- Middle response styles. A systematic way of responding to survey items that describes respondents' tendency to choose the mid points of a response scale.
- Minto principle. A guideline for presentations that starts with the conclusion, raising questions in the audience's mind about the way this conclusion was reached. The presenter subsequently explains the steps involved in the analysis.
- Missing at random (MAR). Is a missing values pattern in which the probability that data points are missing varies from respondent to respondent.
- Missing completely at random (MCAR). Is a missing values pattern in which the probability that data points are missing is unrelated to any other measured variable and to the variable with the missing values.
- Missing data. Occur when entire observations are missing (survey non-response), or respondents have not answered all the items (item non-response).
- Mixed mode. Is the act of combining different ways of administering surveys.
- Moderation analysis. Involves assessing whether the effect of an independent variable on a dependent variable depends on the values of a third variable, referred to as a moderator variable.
- (Multi)collinearity. Is a data issue that arises in regression analysis when two or more independent variables are highly correlated.
- Multi-item construct. Is a measurement of an abstract concept that uses several items.
- Multinomial logistic regression. Is a type of regression analysis used when the dependent variable is nominal and takes more than two values.
- Multiple imputation. Is a simulation-based statistical technique that replaces missing observations with a set of possible values (as opposed to a single value), representing the uncertainty about the missing data's true value.
- Multiple regression. Is a type of regression analysis that includes multiple independent variables.
- Mystery shopping. Is a type of observational study in which a trained researcher visits a store or restaurant and consumes their products/services.
- Nested models. Are simpler versions of a complex model.
- Net Promoter Score (NPS). Is a measure of customer loyalty that uses the single question: "How likely are you to recommend our company/product/ service to a friend or colleague?"
- Noise. Is a synonym for unexplained variation that cannot be explained. Also referred to as random noise.
- Nomological validity. Is the degree to which a construct behaves as it should in a system of related constructs.
- Non-directional hypothesis. Tests for any difference in the parameter, whether positive or negative.
- Non-hierarchical clustering methods. See Partitioning methods.
- Nonparametric tests. Are statistical tests for hypothesis testing that do not assume a specific distribution of the data (typically a normal distribution).
- Non-probability sampling. Is a sampling technique that does not give every individual in the population an equal chance of being included in the sample. The resulting sample is typically not representative of the population.
- Non-random missing. Is a missing values pattern in which the probability that data points are missing depends on the variable and on other unobserved factors.
- Null and alternative hypothesis. The null hypothesis (indicated as H0) is a statement expecting no difference or no effect. The alternative hypothesis (indicated as H1) is the hypothesis against which the null hypothesis is tested.
- Oblique rotation. Is a technique used to facilitate the interpretation of the factor solution in which the independence of a factor to all other factors is not maintained.
- Observation. Is an object, such as a customer, a company, or a country, in statistical analysis. Also referred to as case.
- Observational studies. Are procedures for gathering data in which the researcher observes people's behavior in a certain context. Observational studies are normally used to understand what people are doing rather than why they are doing it.
- Omega-squared (ω2). Is a statistic used in an ANOVA to describe the ratio of the between-group variation to the total variation, thereby indicating the variance accounted for by the data. It is commonly used for sample sizes of 50 or less and corresponds to the Adjusted R2 of regression analysis. Omega-squared is also used to indicate effect sizes of individual variables in regression analysis.
- One-sample t-test. Is a parametric test used to compare one mean with a given value.
- One-tailed tests. Are a class of statistical tests frequently used when the hypothesis is expressed directionally (i.e., < or >). The region of rejection is on one side of the sampling distribution.
- One-way ANOVA. Is a type of ANOVA that involves a single metric dependent variable and one factor variable with three (or more) levels.
- Open-ended questions. Are a type of question format that provides little or no structure for respondents' answers. Generally, the researcher asks a question and the respondent writes down his or her answer in a box. Also referred to as verbatim items.
- Operationalization. Is the process of defining a set of variables to measure a construct. The process defines latent concepts and allows them to be measured empirically.
- Ordinary least squares (OLS). Is the estimation approach commonly used in regression analysis and involves minimizing the squared deviations from the observations to the regression line (i.e., the residuals).
- Orthogonal rotation. Is a technique used to facilitate the interpretation of a factor solution in which a factor's independence is maintained from all other factors. The correlation between the factors is determined as zero.
- Outliers. Are observations that differ substantially from other observations in respect of one or more characteristics.
- Paired samples. Are samples that include multiple observations from the same object (e.g., firm or individual).
- Paired samples t-test. Is a statistical procedure used to determine whether there is a significant mean difference between observations measured at two points in time.
- Parallel analysis. Is a statistic used in principal component and factor analysis to determine the number of factors to extract from the data. According to this criterion, researchers should extract all factors whose eigenvalues are larger than those derived from randomly generated data with the same sample size and number of variables.
- Parametric test. Are statistical tests that assume a specific data distribution (typically normal).
- Partial least squares structural equation modeling (PLS-SEM). Is a method to estimate structural equation models. The goal is to maximize the explained variance of the dependent latent variables.
- Partitioning methods. Is a group of clustering procedures that does not establish a tree-like structure of objects and clusters, but exchanges objects between clusters to optimize a certain goal criterion. The most popular type of partitioning method is k-means.
- Path diagram. Is a visual representation of expected relationships tested in a structural equation modeling analysis.
- Personal interviews. Is an interview technique that involves face-to-face contact between the interviewer and the respondent. Also referred to as face-to-face interviews.
- Pie chart. Displays the relative frequencies of a variable's values.
- Population. Is a group of objects (e.g., consumers, companies, or products) that a researcher wants to assess.
- Post hoc tests. Are a group of tests used for paired comparisons in an ANOVA. Post hoc tests maintain the familywise error rate (i.e., they prevent excessive type I error).
- Power analysis. Is a procedure used to estimate the power of a statistical test.
- Power of a test (power of a statistical test). Represents the probability of rejecting a null hypothesis when it is in fact false. In other words, the power of a statistical test is the probability of rendering an effect significant when it is indeed significant. The power is defined by 1−β, where β is the probability of a type II error.
- Practical significance. Refers to whether differences or effects are large enough to influence decision-making processes.
- Predictive validity. Is the extent to which an instrument predicts the outcome of another variable, measured at a later point in time.
- Primary data. Are data gathered for a specific research project.
- Principal axis factoring. See Factor analysis.
- Principal component analysis. Is a statistical procedure that uses correlation patterns among a set of indicator variables to derive factors that represent most of the original variables' variance. Different from factor analysis, the procedure uses all the variance in the variables as input.
- Principal components. Are linear composites of original variables that reproduce the original variables' variance as well as possible.
- Principal factor analysis. See Factor analysis.
- Probability sampling. Is a sampling technique that gives every individual in the population an equal chance, different from zero, of being included in the sample.
- Profiling. Is a step in market segmentation that identifies observable variables (e.g., demographics) that characterize the segments.
- Projective technique. Is a special type of testing procedure, usually used as part of in-depth interviews. This technique provides the participants with a stimulus (e.g., pictures, words) and then gauges their responses (e.g., through sentence completion).
- Promax rotation. Is a popular oblique rotation method used in principal component and factor analysis.
- P-value. Is the probability of erroneously rejecting a true null hypothesis in a given statistical test.
- Pyramid structure for presentations. See Minto principle.
- Qualitative data. Are audio, pictorial, or textual information that researchers use to answer research questions.
- Qualitative research. Is primarily used to gain an understanding of why certain things happen. It can be used in an exploratory context by defining problems in more detail, or by developing hypotheses to be tested in subsequent research.
- Quantile plot. Is a graphical method for comparing two (typically normal) distributions by plotting their quantiles against each other.
- Quantitative data. Are data to which numbers are assigned to represent specific characteristics.
- Quartimax rotation. Is a variant of the oblimin rotation.
- R2. See Coefficient of determination.
- Ramsey's RESET test. Is a test for linearity used in regression analysis.
- Random noise. Is a synonym for unexplained variation that cannot be explained in an analysis such as in a regression or ANOVA.
- Range. Is the difference between the highest and the lowest value in a variable measured, at least, on an ordinal scale.
- Range standardization. Is a type of scale transformation in which the values of a scale are standardized to a specific range that the researcher has set.
- Rank order scale. Is an ordinal scale that asks respondents to rank a set of objects or characteristics in terms of, for example, importance, preference, or similarity.
- Reflective constructs. Is a type of measurement in which the indicators are considered manifestations of the underlying construct.
- Regression method. Is a procedure to generate factor scores in principal component analysis. The resulting factor scores have a zero mean and unit standard deviation.
- Regression sum of squares. Quantifies the difference between the regression line and the line indicating the average. It represents the variation in the data that the regression analysis explains.
- Reliability. Is the degree to which a measure is free from random error.
- Reliability analysis. Is an element of a confirmatory factor analysis and essential when working with measurement scales. See Reliability.
- Research design. Describes the general approach to answering a research question related to a marketing opportunity or problem. There are three broad types of research design: exploratory research, descriptive research, and causal research.
- Residual. Is the unexplained variance in a regression model. Also referred to as disturbance term.
- Reverse-scaled items. Are items whose statement (if a Likert scale is used), or word pair (if a semantic differential scale is used) is reversed when compared to the other items in the set.
- Right-tailed hypothesis. Is a directional hypothesis expressed in a direction (higher) relative to a standard.
- Russell and Rao coefficient. Is a similarity coefficient used in cluster analysis.
- Sample size. Is the number of observations drawn from a population.
- Sampling. Is the process through which objects are selected from a population.
- Sampling error. Occurs when the sample and population structure differ on relevant characteristics.
- Scale development. Is the process of defining a set of variables to measure a construct and which follows an iterative process with several steps and feedback loops. Also referred to as operationalization, or, in the case of an index, index construction.
- Scale transformation. Is the act of changing a variable's values to ensure comparability with other variables, or to make the data suitable for analysis.
- Scanner data. Are collected at the checkout of a supermarket where details about each product sold are entered into a database.
- Scatter plot. Is a graph that represents the relationship between two variables, thus portraying the joint values of each observation in a two-dimensional graph
- Scree plot. Is a graph used in principal component and factor analysis that plots the number of factors against the eigenvalues, sometimes resulting in a distinct break (elbow) that indicates the number of factors to extract. Following the same principle, the scree plot is also used in hierarchical cluster analysis to plot the number of clusters against the distances at which objects were merged.
- Secondary data. Are data that have already been gathered, often for a different research purpose and some time ago. Secondary data comprise internal secondary data, external secondary data, or a mix of both.
- Segment specialists. Are companies that concentrate on specific market segments, such as a particular industry or type of customer.
- Self-contained figure. Is a graph in a market research report that should be numbered sequentially and have a meaningful title so that it can be understood without reading the text.
- Self-contained table. Is a table in a market research report that should be numbered sequentially and have a meaningful title so that it can be understood without reading the text.
- Semantic differential scales. Is a type of answering scale that comprises opposing pairs of words, normally adjectives (e.g., young/old, masculine/feminine) constituting the endpoints of the scale. Respondents then indicate how well one of the word in each pair describes how he or she feels about the object to be rated (e.g., a company or brand).
- Sentence completion. Is a type of projective technique that provides respondents with beginnings of sentences that they have to complete in ways that are meaningful to them.
- Shapiro-Wilk test. Is a test for normality (i.e., whether the data are normally distributed).
- Significance level. Is the probability that an effect is incorrectly assumed when there is in fact none. The researcher sets the significance level prior to the analysis.
- Silhouette measure of cohesion and separation. Is an overall goodness-of-fit measure in two-step clustering.
- Simple matching coefficient. Is a similarity coefficient used in cluster analysis.
- Simple regression. Is the simplest type of regression analysis with one dependent and one independent variable.
- Single-item constructs. Is a measurement of a concept that uses only one item.
- Single linkage. Is a linkage algorithm in hierarchical clustering methods in which the distance between two clusters corresponds to the shortest distance between any two members in the two clusters.
- Skewed data. Occur if a variable is asymmetrically distributed. A positive skew (also called rightskewed) occurs when many observations are concentrated on the left side of the distribution, producing a long right tail (the opposite is called negative skew or left-skewed).
- Social desirability bias. Occurs when respondents provide socially desirable answers (e.g., by reporting higher or lower incomes than are actually true), or take a position that they believe society favors (e.g., not smoking or drinking).
- Social media analytics. Are methods for analyzing social networking data and comprise text mining, social network analysis, and trend analysis.
- Social networking data. Reflect how people would like others to perceive them and, thus, indicate consumers' intentions. Product or company-related social networking data are of specific interest to market researchers.
- Specialized service firms. Are market research companies that focus on particular products, markets, or market research techniques.
- Split-half reliability. Is a type of reliability assessment in which scale items are divided into halves and the scores of the halves are correlated.
- Split-sample validation. Involves splitting the dataset into two samples, running the analysis on both samples, and comparing the results.
- SPSS. Computer package specializing in quantitative data analysis.
- Standard deviation. Describes the sample distribution values' variability from the mean. It is the square root of the variance.
- Standard error. Is the sampling distribution of a statistic's standard deviation, mostly from the mean.
- Standardized effects. Express the relative effects of differently measured independent variables in a regression analysis by expressing them in terms of standard deviation changes from the mean.
- Standardizing variables. Have been rescaled (typically to a zero mean and unit standard deviation) to facilitate comparisons between differently scaled variables.
- Statistical significance. Occurs when an effect is so large that it is unlikely to have occurred by chance. Statistical significance depends on several factors, including the size of the effect, the variation in the sample data, and the number of observations.
- Straight line distance. See Euclidean distance.
- Straight-lining. Occurs when a respondent marks the same response in almost all the items.
- Structural equation modeling. Is a multivariate data analysis technique used to measure relationships between constructs, as well as between constructs and their associated indicators.
- Survey non-response. Occurs when entire responses are missing. Survey non-response rates are usually 75 %–95 %.
- Surveys. Are often used for gathering primary data. Designing surveys involves a six-step process: 1. Determine the survey goal; 2. determine the type of questionnaire required and the administration method; 3. decide on the questions and 4. the scale; 5. design the questionnaire; and 6. pretest and administer the questionnaire.
- Suspicious response patterns. Are issues in response styles in respect of straight-lining and inconsistent answers that a researcher needs to address in the analysis.
- Syndicated data. Are data sold to multiple clients, allowing them to compare key measures with those of the rest of the market.
- Telephone interviews. Allow researchers to collect data quickly and facilitate open-ended responses, although not as well as personal interviews.
- Test markets. Are a type of field experiment that evaluates a new product or promotional campaign under real market conditions.
- Test-retest reliability. Is a type of reliability assessment in which the researcher obtains repeated measurement of the same respondent or group of respondents, using the same instrument and under similar conditions. Also referred to as stability of the measurement.
- Test statistic. Is calculated from the sample data to assess the strength of the evidence in support of the null hypothesis.
- Tolerance. Is a measure to detect collinearity defined as 1/VIF, where VIF is the variance inflation factor.
- Total sum of squares. Quantifies the difference between the observations and the line indicating the average.
- Transforming data. Is an optional step in workflow of data, involving variable respecification and scale transformation.
- Treatments. Are elements in an experiment that are used to manipulate the participants by subjecting them to different situations. A simple form of treatment could be an advertisement with and without humor.
- T-test. Is the most popular type of parametric test for comparing a mean with a given standard and for comparing the means of independent samples (independent samples t-test), or the means of paired samples (paired samples t-test).
- Tukey's honestly significant difference test. Is a popular post hoc test used in an ANOVA that controls for type I errors, but is limited in terms of statistical power. Often simply referred to as Tukey's method.
- Two-sample t-test. Is the most popular type of parametric test for comparing the means of independent or paired samples.
- Two-tailed tests. Are a class of statistical tests frequently used when the hypothesis is not expressed directionally (i.e., ≠). The region of rejection is on two sides of the sampling distribution.
- Two-way ANOVA. Is a type of ANOVA that involves a single metric dependent variable and two factor variables with three (or more) levels.
- Type I error. Occurs when erroneously rejecting a true null hypothesis. Also referred to as α error.
- Type II error. Occurs when erroneously accepting a false null hypothesis. Also referred to as β error.
- Unbalanced scale. Describes a scale with an unequal number of positive and negative scale categories.
- Unexplained variation. Is the degree of variation that a factor variable in an ANOVA cannot explain. It can occur if extraneous factors, not accounted for in the analysis, cause variation instead of the factor variable.
- Unit of analysis. Is the level at which a variable is measured. Typical measurement levels include that of the respondents, customers, stores, companies, or countries.
- Univariate statistics. Are statistics that describe the centrality and dispersion of a single variable.
- Unstandardized effects. Express the absolute effects that a one-unit increase in the independent variables have on the dependent variable in a regression analysis.
- Validation sample. Is a random subsample of the original dataset used for validation testing.
- Validity. Is the degree to which a researcher measures what (s)he wants to measure. It is the degree to which a measure is free from systematic error.
- Variable. Represents a measurable characteristic whose value can change.
- Variable respecification. Involves transforming data to create new variables, or to modify existing ones.
- Variance. A measure of dispersion computed by the sum of the squared differences of each value and a variable's mean, divided by the sample size minus 1.
- Variance inflation factor (VIF). Quantifies the degree of collinearity between the independent variables in a regression analysis.
- Variance ratio criterion. Is a statistic used in cluster analysis to determine the number of clusters. The criterion compares the within- and between-cluster variation of different numbers of clusters.
- Varimax rotation. Is the most popular orthogonal rotation method used in principal component and factor analysis.
- Verbatim items. See Open-ended questions.
- Visual aids. Include overhead transparencies, flip charts, or slides (e.g., PowerPoint or Prezi) that help emphasize important points and facilitate the communication of difficult ideas in a presentation of market research results.
- Visual analogue scale. Is a type of answering scale in which respondents use levers that allow scaling on a continuum. This scale does not provide response categories.
- Ward's linkage. Is a linkage algorithm in hierarchical clustering methods that combines those objects whose merger increases the overall within-cluster variance by the smallest possible degree.
- Web surveys. Are less expensive to administer and can be fast in terms of data collection, because they can be set up very quickly. Also referred to as computer-assisted web interviews (CAWI).
- Weighted average linkage. Is a variant of the average linkage algorithm used in cluster analysis that weights the distances according to the number of objects in the cluster.
- Weighted Least Squares. Is a variant of a standard regression analysis, which is used to account for violated regression assumptions such as heteroskedastic regression errors. It "weights" the regression line such that observations with a smaller variance are given greater weight in determining the regression coefficients.
- Welch correction. Is a statistical test used in an ANOVA to assess the significance of the overall model when the group variances differ significantly and the groups differ in size.
- Wilcoxon matched-pairs signed-rank test. Is the non-parametric equivalent of the paired samples t-test.
- Wilcoxon signed-rank test. Is the non-parametric equivalent of the independent samples t-test.
- Workflow. Is a strategy to keep track of the entering, cleaning, describing, and transforming of data.
- Z-standardization. Is a type of scale transformation in which the values of a scale are standardized to a zero mean and unit standard deviation.
- Z-test. Is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution.