Choice Modeling/Conjoint Analysis - Displayr

The choice simulator is one of the main objectives of choice-based conjoint analysis. This allows you to predict the effect of different scenarios on preference or market share. For this case study, we have used the cruise ship data set which Sawtooth supplied in their 2016 modeling competition. This post assumes you have already created your simulator using a conjoint model in Displayr.

Modifying your simulator

In our example, we have created a basic simulator with 3 alternatives:

There are various ways of modifying your simulator, including weighting and making adjustments to reflect market share. Here, we intend to add a combo box for filtering our preference shares by a specific survey question. We will use likelihood to travel in the next 10 years.

Connecting your simulator to a combo box

The best location for this combo box is on the Page Master which is accessible via Appearance > Page Master. This will allow the same control to appear on every page using this template while retaining the user selections.

We can create our own title page by selecting Title Only and pressing Home > Duplicate. We will rename this 'Page with Combo box'.

To add the control item, go to Insert > Control > Combo Box. We then connect Items from to an existing table for likelihood to travel. Alternatively, you can just paste the label options separated by a semi-colon into Item list. In this case, we will also need to delete the default items there. Next, we change Selection Mode to Multiple selection. You can also optionally change the name under Properties > GENERAL.

We can now go back via Appearance > Normal and change the simulator page via Home > Layout > Page with Combo box.

With Displayr, you can easily filter data using a combo box with an R variable. As we are using a single-response question but wish to allow multiple selections, we need to first make it binary via Insert > Filter > Filters from Selected Data. You should then select the appropriate respondent data file under Data Sets and go to Insert > R > Numeric Variable. For a multiple-response combo box, the filter formula to use in the R CODE field is as follows:

rowSums(`Question_name`[, combo_box_name, drop = FALSE]) > 0

The Question name can simply be dragged over to this field from Data Sets to look like this:

rowSums(`Q3: How likely is it that you will take a cruise vacation sometime in the next 10 years? - Filters`[, Combo.box, drop = FALSE])> 0

This code will filter Q3 to the items selected in 'Combo.box'. It will then only include the respondents who fall into these categories.

Next, tick Usable as a filter. We will name this 'combo.filter'. Now you can go back to your simulator page and apply 'combo.filter' to your 'preference.shares' output under FILTERS & WEIGHT > Filter(s).

Below is the formatted version of our simulator:

Weighting your data by alternative-specific respondent preference shares

Displayr allows you to complement your simulator with further visualizations that help tell the story of your data. One way to make further use of our simulator is to weight our demographic questions by a selected alternative's preference share results.

We will begin by making a new page with the same default combo box via Home > New Page > Page with Combo box. We will now copy the 'preference.shares' output from the simulator page via Home > Duplicate and drag it over to the new page to get the respondent-level results.

First, we need to remove the combo box filter from the output. We then need to paste the below code at the bottom of Properties > R CODE:

preferences.by.respondents = data.frame(matrix(resp.shares, ncol=3))
colnames(preferences.by.respondents) = c("Alternative 1","Alternative 2","Alternative 3") 
preferences.by.respondents

You will need to change the 'ncol' reference and column names to match the number of alternatives in your simulator.

The next steps involve creating the combo box filter. In the menu ribbon, select Insert > Control > Combo Box and paste Alternative 1; Alternative 2; Alternative 3 in Item list. I have named this combo box 'cCruise'.

Next, create the filter variable via Insert > R > Numeric Variable and paste the below into the R CODE field:

preferences.by.respondents[, cCruise]

This code will filter 'preferences.by.respondents' by the alternative number selected in 'cCruise'. Once you tick Usable as a weight, this can be applied to your outputs under Inputs > FILTERS & WEIGHT > Weight.

This allows you to add visualizations for various demographic questions with the combo box filter and weight applied to the source tables. Remember to drag the tables off the page and select Appearance > Hide. You can also use a variety of conjoint-specific visualizations, such as a demand curve for the price attribute.

Creating an optimizer

An alternative to creating an online simulator is to create what we call an 'Optimizer'. Unlike a simulator, an optimizer allows multiple selections per attribute and generates multiple-preference share combinations at the same time.

To create an optimizer, you can either select your conjoint analysis output and click Inputs > SIMULATION > Create optimizer or go to Insert > More > Conjoint/Choice Modeling > Optimizer from the ribbon. You will need to then specify the number of alternatives and whether you wish to include alternative-specific attributes. We will choose 3 here and disregard the alternative attribute. This will create a page called 'Optimizer'.

Similarly, we will also apply our combo box filter to the preference share output on this page.

Again, you can format the page objects as desired. In this case, an Autofit table provides more flexibility for the summary preference share table as you can easily drag the edges to align with the optimizer's columns. You can create this via Insert > Paste Table, ticking Autofit, and selecting this page's preference share output under DATA SOURCE.

Due to the varying size of the table, we can fix the height to ensure it adds a scroll bar. We will add row.height = "15px", to Properties > R CODE where the row specific fields are.

We can now select the original output, drag it off the page and press Appearance > Hide to ensure it remains hidden from the published version of the document.

Using your optimizer

One specific use case for the optimizer is fixing the options for the second and third alternatives while selecting multiple options for the first alternative. In the first column, we will select all the options under Room, Amenities, and Price to generate the 30 combinations for the multi-selected combo boxes.

A benefit of autofit tables for this scenario is we can automatically pre-sort the table from highest to lowest by the first column. Simply go to Inputs > ROW MANIPULATIONS, tick Sort rows, place '1' in Column used for sorting rows, and tick Sort in decreasing order.

You can see the finished document here.

Formatting Data for Running Conjoint in Displayr

Oliver Harrison — Thu, 20 Aug 2020 23:17:53 +0000

There are many survey platforms that do not come with their own built-in choice-based conjoint question type. This then poses the question of how to set the data up so we can analyze it. We will now take you from the experimental design stage to your analysis stage while outlining the correct data structure.

Setting up your experimental design

In this simple conjoint example, we wish to look at the meat, sauce, and bun preferences for burgers. We have used Displayr to generate a balanced design via Insert > More > Conjoint/Choice Modeling > Experimental Design. This is based on 10 questions (or tasks) per respondent, 3 alternatives and 3 attributes per question, and 100 versions in total.

The options for the attributes (Meat, Sauce, Bun) have been pasted into the Add attributes and levels spreadsheet as below:

In the same menu, we can select Preview Choice Questionnaire to create a preview of your design. Below are the first 2 questions of version 1.

In the choice model output under Design, you will see the first 4 key columns: Version, Task, Question, and Alternative. The other columns represent the labels for the various items shown in each of the 3 attributes, Meat, Sauce, and Bun.

The 3 Alternatives correspond to the 3 columns in the questionnaire preview above i.e. Option 1, Option 2, and Option 3. Question will remain within the fixed 1 to 10 range. Task, though, is accumulative so the second version of the design will start from 11, the third from 21, and so on.

By simply clicking this output and selecting Export > Excel > Current Selection, you can export the design for programming in your survey platform.

If needed, you can also generate a numeric version of the design via Inputs > DIAGNOSTICS > Numeric design.

If you are programming your survey in Qualtrics and have API access, you can export your design directly into your survey via the Export Design to Qualtrics option.

Setting up your respondent data

In preparation for data collection, you should pre-program the necessary variables into your survey. Below you will see the key conjoint fields for matching with our design. This is from the first 10 records of the burger survey. Here, we have recorded both the Version number and the task number for each Question. The Version number is sufficient, however, if the task order is unchanged from the design order.

The format of the Alternatives selected in the survey is one variable per task corresponding to the 3 columns on display. This should either be Numeric or Categorical to ensure they are read correctly. You can convert text variables by changing INPUTS > Structure to Numeric.

If you have a 'None of these' option you will need to code this response as 0 or set Missing Values to Exclude from analyses. When importing labeled designs from Excel as a data set, you should, therefore, check the variable values. If you additionally ask a dual-response 'None of these' question after each task, you will also have the same number of 'Yes/No' questions.

Selecting the correct source options for your model

Once data collection is complete, you will need to import the appropriate data files via Home > New Data Set. We will now go to Insert > More > Conjoint/Choice Modeling > Hierarchical Bayes to start analyzing our choice data.

Under RESPONDENT DATA > Data source, you will be able to select Choice and task variables in data set or Choice and version variables in data set. When using the first option, ensure the task variables are in the exact same order as the choice variables. This is the same for when using the Dual-response 'none' choice field to select the 'Yes/No' questions. The second option allows you to simply reference the version variable if no task variables are supplied.

To input the matching design, under EXPERIMENTAL DESIGN > Data source select the appropriate option. In this case, we would select Experimental design R output as the design was created in Displayr. There are also options for Sawtooth and JMP files, and Displayr Experiment questions. The default Data set option is for all other externally created conjoint designs. When using a 'data set' option, the appropriate variables should be placed in their respective Version, Task, and Attributes fields.

Apart from when using Sawtooth CHO files (saved as .txt), Experiment questions, and Displayr designs, the other source options require one data set for the design and one for the responses. If your data comes from Alchemer (formerly SurveyGizmo), you will instead have a conjoint data set and a respondent data set. By going to Insert > More > Conjoint/Choice Modeling > Convert Alchemer (Survey Gizmo) Conjoint Data for Analysis, Displayr will append the choice and version variables to the respondent data set, provided 'ResponseID' appears in both files. For the design component, the 'ResponseID' from the conjoint data set is used as Version and 'Set Number' as Task.

For many of these options, there will also be an Enter attribute levels button at the bottom. This uses the exact same format as when we set up the design in a previous step.

Get in touch if you have any questions, and see our blog to keep up on all the latest Displayr features.

Using Choice-Based Conjoint in Pricing Research Studies

Julia Sullivan — Mon, 24 Feb 2020 16:31:39 +0000

This one is a bit more complicated than the first five techniques we’ve talked about, but the idea of this technique is to find people’s preferences by providing them tradeoffs between a series of products and describing them based on certain attributes.

For example, below is a question that compares several different cell phone providers with different prices and features. They are asked to pick a provider based on the offered package’s prices and features. Then they are asked another similar question, but with different prices and features. The magic that comes out of this approach is that once these questions are answered we can estimate, with some complicated maths, what each person’s stated willingness-to-pay is for individual product features.

When working in pricing research, one of the key outputs from a conjoint approach is the Median Willingness-to-Pay Attribute Level. In the example below, we have a sample data of product features from the U.S. cellphone market. To start this approach, we have to set a baseline and within each attribute, the lowest level of performance is assigned a willingness-to-pay of $0, so everything else is relative to that number. So, we can see that 50% of people would be willing to pay $2.19 or more for an increase in hotspot data from 10GB to 20GB, and 50% of people would pay $9.71 or more for unlimited hotspot data relative to 10GB. But beware.

While this analysis says that 50% of people would be willing to pay an extra $9.71 to get an upgrade from 10GB to Unlimited Hotspot Data, they will only pay this if there is no competition. Because if a competitor is giving unlimited hotspot data for a much lower rate then that’s what the market will bear, and you won’t be able to charge the maximum willingness-to-pay.

The next key output that people love to get form conjoint studies is Simulator, which predicts preference share. Or, if you spend a lot of time calibrating it can sometimes be used to predict market share. Looking at the example simulator below, you can see that AT&T has a market share of 51%. But what happens when we increase their price from $30 to $40? The preference share will then drop to about 37%. So, from that, we can then construct a demand curve to work out the profit-maximizing price, the same way we have in the previous techniques.

However, in practice, these optimizations are not always as useful as people envision. That’s because people will typically assume that the models are complete accurate predictors of market share and that’s rarely the case because so many key factors are ignored.

But in my history of consulting work, I have found the concept of Value Equivalence Line (VEL) to be much more useful. The idea of VEL is that a company should have a portfolio of products at different price points that match different benefits. With our cellphone company example, the idea is that the price points of the various phone plans should match the value of benefits included. So, the plans with successively higher price points will deliver more benefits.

We can do the VEL approach with Simulator to find the right price for each cellphone plan. In the simulator example below, we just have AT&T and four different price points but everything else is the same. The model suggests that the majority of people will prefer the cheapest option, as one would expect. But when trying to optimize a portfolio, our goal is to come up with four products that have similar preference shares. So, in order to find a similar percentage of preference shares, you will start to edit the options outside of price so that they are broadly similar in preference share and roughly equal in value, which is shown in the second image below.

And while the first five approaches are still important in pricing research, I find that this approach is the most useful because you are looking at the many different factors that affect price in relation to preference share.

For more examples on the other pricing research techniques see: Price Salience, Price Knowledge/Awareness, Stated Willingness-To-Pay, Price Sensitivity Meter, Random Assignment.

Using the Value Equivalence Line (VEL) with Conjoint Simulators

Tim Bock — Thu, 30 Jan 2020 15:06:56 +0000

The value equivalence line is a useful concept for setting pricing strategies in markets where products vary in terms of their overall levels of benefits (e.g., quality levels). This post explains how the value equivalence line can be used in conjunction with a conjoint simulator to portfolios, illustrated using a study of the US cell phone market.

Value maps and the VEL

A value map plots products according to their price and their utility (i.e., a quantitative measure of the benefit that each of the products provides). In a stable market, you should expect to see products offering utility commensurate with their prices. We can draw a line on the value map which indicates when price and utility are in sync; this line is called the Value Equivalence Line (VEL), and products on that line are said to be of equivalent value (note here that the concept of value here includes that of price, rather than being a synonym for price). Products that are expensive relative to their utility appear above the VEL and are at a value disadvantage. Products below the VEL are at value advantage. Products in value advantage should grow in share over time, while products in value disadvantage should decline.

The concept of the VEL aligns to the concept of price segments, with products focused on discount shoppers being at the bottom left of the value map and the premium products at the top right.

Using conjoint simulators to identify products of equal value: worked example

Consider the problem of trying to design cell phone plans. Let's say you want to offer a $30, $40, $60, and $80 plan. A good plan is one where each of these plans delivers benefits commensurate to its price point.

Using a simulator, we can create products with similar values by modifying their attributes so that they have similar shares. In the simulator output shown below, all the products are the same, and they only differ in price. Thus, the cheaper the plan, the more in value advantage. For this reason the cheapest plan, at $30 a month, has the highest share (if you are wondering why the model predicts anybody would prefer the higher alternatives, this is because when people make choices in questionnaires, just as in real life, they are a bit irrational/inconsistent).

Below, I've modified the features of the different plans so that their preferences are very similar, and thus, by definition, they offer similar levels of value, and so the resulting four-tiered pricing plan offers choices of different price points with their commensurate levels of benefit.

Computing Willingness-To-Pay (WTP) in Displayr

Tim Bock — Thu, 30 Jan 2020 15:06:15 +0000

This post explains the basics of computing willingness-to-pay (WTP) for product features in Displayr.

Step 1: Estimate a choice model with a numeric price attribute

The starting point is to estimate a choice model (Displayr: Insert > More > Conjoint/Choice Modeling > Hierarchical Bayes; Q: Automate > Browse Online Library > Conjoint/Choice Modeling > Hierarchical Bayes). When doing this, the price attribute needs to be set up as a numeric attribute. If you haven't done this before, please be aware that the scale of the price attribute is not readily comparable to the other attributes. In the example below, for example, note that the price attribute seems to have very little variability compared to the other attributes. This is because the distribution of a numeric variable is for its coefficient (don't be concerned if you don't understand this; the key bit to appreciate is that it is OK that its distribution appears much smaller).

Step 2: Save the utilities

Add new variables to the data set using Insert > More > Conjoint/Choice Modeling > Save Variables(s) > Individual-level Coefficients ( in Q: Automate > Browse Online Library > Conjoint/Choice Modeling > Save Variables(s) > Individual-level Coefficients).

Step 3: Modify the R code of the utilities

When you click on one of the variables that is created in step 2, you can see the underlying R Code, and it will look something like this (in Q,right-click on the variable and select Edit R Variable):

input.choicemodel = choice.model if (!is.null(input.choicemodel$simulated.respondent.parameters)) stop() flipChoice::RespondentParameters(input.choicemodel)

It can be changed to compute WTP with a simple modification of the last line and addition of a fourth line:

input.choicemodel = choice.model
if (!is.null(input.choicemodel$simulated.respondent.parameters)) stop()
x = flipChoice::RespondentParameters(input.choicemodel)
sweep(x, 1, -x[, "Price"], "/")

Step 4: Creating tables or visualizations

To create a table showing the average WTP for each attribute level, drag the variable set onto a page, and then using STATISTICS > Cells select Median and remove Average (as the mean can be a bit misleading with WTP data). Then, hide the Price attribute by selecting the row and using Data Manipulation > Hide in the ribbon. An example is shown below. You can then plot this if you so wish.

Creating Demand Curves Using Conjoint Studies

Tim Bock — Mon, 09 Dec 2019 20:55:46 +0000

It shows how likely people are to make purchases at different price points. There are lots of different ways of estimating demand curves. In this post, I explain the basics of doing so from a conjoint study using Displayr.

Example demand curve

Below is a demand curve from a choice-based conjoint study of the chocolate market. It shows preference share for a 2-ounce Hershey milk chocolate bar.

Preparation: Creating the model and simulator

Before computing the demand curve you need a simulator. The most straightforward way of doing this is to create a model using Insert > More > Conjoint/Choice Modeling > Hierarchical Bayes, followed by Insert > More > Conjoint/Choice Modeling > Simulator.

Manually creating the demand curve

The simplest way to create a demand curve is to manually run each scenario of interest in your simulator. Let's say we wanted to create the demand curve for Hershey. We would set each of the alternatives to the desired attribute levels, with Hershey at the lowest price point, and make a note of Hershey's market share. Then, we would increase Hershey's price to the next price point and make a note of that share, and so on. You can then use Home > Enter Table to create a table of these data points (with price in the first column and market share in the second) and hook it up to a visualization.

Code based-creation of a demand curve

There are several situations where manually creating the demand curve is a poor solution, including:

When you want to create the demand curve in a dashboard so that it automatically updates when the user filters the data or changes the attribute levels of the alternatives.
Where there are a large number of alternatives to be simulated (e.g., models of SKUs).
Where there is a numeric price attribute, and you want to test lots of price points.

In such situations, it is often better to use code to create the demand curve.

Step 1: Duplicating the code used to create the simulator

When you create a simulator automatically in Displayr it creates an R Output below the simulator that contains the underlying code that calculates the preference shares. In the screenshot below, I've selected it (hence the outline). Step 1 is to click on and press Home > Duplicate to create a copy of the R Output.

Step 2: Modifying the code

Inspecting the code

You can inspect the underlying code in the copied R Output by viewing Properties > R CODE in the Object Inspector. It will have a structure like the code below. In this example:

Lines 1 to 4 describe the scenario that is being simulated, with one row for each alternative, and all four alternatives grouped as a list within a scenario list.
Looking at Alternative 1, we can see that the level for Brand is set to cBrand.1, with the blue shading telling us that this is the name of something else in the project. In this case, the something else is the control on the page where the user selects the level of the brand attribute.

If you hover your mouse over any of the references to the controls, a box will appear to the left telling you the current selection. In the example below, we can see that the first alternative's price has been set to "$0.99".

Modifying the code

We can modify the code to insert other attribute levels. For example, if we replaced cPrice.1 with "$0.99", we would get the same result as changing it in the price control. However, if we change the R code to "$0.99", the code will no longer use the price control and will instead always use $0.99 as the price for alternative 1.

The code below is a modification of the code above, but it computes the demand curve. The key aspects of the code are:

Lines 1 to 4 are identical to those that have been automatically created by the simulator bar changing the alternative list parameters to c.
You can copy and modify Lines 5 to 13 as described in the remaining steps.
The prices for the simulator are in line 5.
In lines 10 and 11 replace "Alternative 3" with the name of the alternative that you are wanting to compute demand for. As shown in the screenshot below, in this case study, Hershey is Alternative 3.
Replace hershey in line 13 with the name of the brand you are interested in.

Step 3: Creating the Visualization

You can now hook up your new table to a visualization from the Insert > Visualization menu. To create the area chart from my example above, click Insert > Visualization > Area and select your R table in the Inputs > DATA SOURCE > Output in 'Pages' drop-down in the Object Inspector.

Reordering Attribute Levels in Conjoint Analysis Models in Displayr

Tim Bock — Mon, 29 Jul 2019 04:28:53 +0000

The order of attribute levels in choice-based conjoint analyses are determined by their order in the experimental design file. This order is not always best from an interpretation perspective. In particular, it can be useful to reorder attributes so that the first category reflects the base case, and the remaining categories follow in some relevant order.

As an example, the levels of an attribute showing preferences for sugar levels in a chocolate study are shown below. A more useful ordering would set the standard sugar level as the first category, making it easy for people to assess the relative appeal of the sugar-free attributes.

Step 1: Create a table showing the variable

Tables are created by dragging a variable set from the Data Sets tree.

Step 2: Reorder the categories in the table

The categories in a table can be reordered by dragging and dropping. First click on the row heading, which will show as a grey box with three black lines (a "hamburger"). Then, click on the hamburger and drag it to where you want it. Depending on where you release it, the category will be moved or merged with other categories.

Provided that the model is set to update automatically, it will instantly recompute to reflect the new order.

The reordering is mainly done for ease of interpretation. It should not be expected to have a substantial effect on any predictions or the correct interpretation of the results. The results from the reordered sugar category shown at the beginning of this post are below. Note that the difference between the mean utilities (shown in the column of the numbers to the right) remain unchanged. However, this reordered analysis makes it easier to see the key conclusions:

Most people prefer the standard levels of sugar in chocolate
"50% reduced sugar" is, on average, halfway between standard and "Sugar free"
There's a lot of heterogeneity (variation) in terms of preferences for "Sugar free," with a clear segment of people with a positive (blue) preference for reduced and sugar-free chocolate

How to Analyze Dual-Response ‘None of These’ Conjoint Models in Displayr

Tim Bock — Mon, 29 Jul 2019 04:05:20 +0000

In practical terms, the sample size is reduced in proportion to the frequency with which the "None of these" option is chosen. A way to prevent the sample size from being reduced is to show the "None of these" option in the choice questions, and then follow up each question with a Yes/No question asking something like "Given what you know about this market, would you really have purchased this option?"

In this post, I describe how to analyze such data in Displayr.

Data files

Typically, you will have two files. One file will contain the experimental design (this is identical to a typical choice-based conjoint study). The other file contains the raw data. This raw data file will contain:

The choices that the respondents made from all the alternatives excluding the "None of these" questions (tasks).
A set of binary variables (Structure of Binary - Multi) indicating whether or not the respondent said they would purchase the chosen alternative in each of the choice questions. This is the only difference between the setup for a dual-response 'none of these' and a traditional choice-based conjoint study.
A single variable indicating the Version of the questionnaire seen by each respondent.

Setting up the experimental design

There are lots of different ways of setting up the experimental design. In this post, I assume you've got a spreadsheet or CSV file with a standard experimental design. The actual experimental design I've used is here if you wish to inspect it.

Insert the Hierarchical Bayes Choice Modeling analysis: Anything > Advanced Analysis > Choice Modeling > Hierarchical Bayes.
Ensure that the Design source is set correctly; in this example, to Data set
Set the Version and Task variables
Set the Attributes

In the example that I'm using, it looks like this:

Selecting the respondent data

The respondent data I've used in this post is here. In the RESPONDENT DATA section, select:

The variables that contain the respondents' choices as Choices
The variables that indicate the tasks shown in each choice question in Tasks; these correspond to the Task in the experimental design
The "None of these" variables in Dual-response 'none' choice

In the example used in this post, once selected, it should look like this:

Running and interpreting the model

To run the model, check the Automatic option at the top.

The resulting model has the same interpretation as the traditional conjoint model, which is to say:

The interpretation of the utilities for all the attributes other than the "None of these" alternative remains unchanged.
The interpretation of the "None of these" alternative remains problematic. With all conjoint models, regardless of whether dual-response or not, the utility of "None of these" is unlikely to be correct, which is why typically when choice models are analyzed this option is either left out of the simulations or calibrated in some way.

To see a Displayr document with this all set up, click here.

How to Check a Choice-Based Conjoint Model

Tim Bock — Tue, 05 Mar 2019 03:39:23 +0000

Conjoint analysis, and choice modeling in general, is super-powerful. It allows us to make predictions about the future. However, if the models are poor, the resulting forecasts will be wrong. This post walks through the 7 stages involved in checking a choice model.

The 'hygiene test': checking for convergence

Most modern conjoint analyses are estimated using hierarchical Bayes (HB). HB works by starting with an initial guess, and then continually improving on that guess. Each attempt at improving is called an iteration. Most choice modeling software stops after a pre-specified number of iterations have been completed. Sometimes this is too early, and more iterations are needed before the model can be safely relied upon. The technical term for when enough iterations have been achieved is that the model has achieved convergence.

When a model has not converged, the fix is to increase the number of iterations, also known as draws, until convergence is achieved.

Just as with hygiene tests in food safety, a lack of convergence is no guarantee that the model is unsafe. Food prepared in the most unhygienic situations can be safe if you have the right bacteria in your gut. Similarly, conjoint models that have not converged can give you the right answer. For this reason, it is very common for people to conduct conjoint studies without ever checking hygiene. Of course, just like with cooking, poor hygiene can have devastating consequences, so why take the risk?

Modern choice software, such as Displayr, automatically performs such hygiene tests and provides warnings where the model has not converged. However, all hierarchical Bayes software can produce, or be used to compute, the relevant diagnostics for checking for convergence: the Gelman-Rubin statistic (Rhat), examining the effective sample size, and viewing trace plots. Please see "Checking Convergence When Using Hierarchical Bayes for Conjoint Analysis".

The 'smell test': checking the distributions of the coefficients

Once it is believed that the model has converged, the next step is to inspect the distribution of the estimated utilities (coefficients). This is the “smell test”. If something is rotten, there is a good chance of discovering it at this stage. Unlike the hygiene test, it is super powerful.

The output below shows the distributions of utilities for the first two attributes of a study of the chocolate market. The experimental design showed four alternatives in each choice question. The first attribute is Alternative, where a 1 indicates that the alternative was the left of the four shown to the respondents, the 2 is the second alternative, and so on.

Alternative 1 has a mean of 0 and a single blue column in its histogram, indicating no variation. As discussed in Choice Modeling: The Basics, choice models start with an assumption that one of the attribute levels has a utility of 0, and thus this is an assumption rather than a result. Looking at alternative 2, we can see its mean is the same as alternative 1's, which makes sense. There is a bit of variation between the respondents, which is consistent with there being a bit of noise in the data. For alternative 3 and 4, however, we see that they have lower means, suggesting that people were consistently less likely to choose these options. This is disappointing, as it suggests that people were not carefully examining the alternatives. The good news is we can take this effect out of the model at simulation time (by removing this attribute from the simulations, or, by using its average value if there is 'none of these' alternative). If you are using software that does not produce utilities for alternatives, which are also known as alternative specific constants, make sure that you include them manually as an extra attribute, as it is one of those rare things that is both easy to check and easy to fix.

Looking at the averages for the four brands, again the first brand has been pegged at 0 for all respondents, and the other results are relative to this. We can see that the means for Dove and Lindt are less than 0, so they are on average less preferred than Hershey (all else being equal), whereas Godiva has a higher average preference. For each of the brands there is quite a lot of variation (fortunately a lot more than for the alternative), suggesting that people differ a lot in their preferences.

The output below shows the utilities for the remaining attributes. For the price attribute, the first price level is set to 0 for everybody. We see that progressively higher prices have lower average levels of preference, but there is some variability about this. This is in line with common sense, and good news, as it shows that the data smells OK.

The data on cocoa strength, sugar, origin, and nuts, are not so useful from a smell test perspective. As with brand, there is no obvious way of checking if it smells good. We can, nevertheless, draw some broad conclusions.

The 70% Cocoa is least preferred on average, but is divisive. On average people want a reduction in sugar, but most do not want sugar free.
Origin is relatively unimportant. There is little difference in the means and not much variation in the preference.
On average people prefer no nuts, but there is variation in preference, particularly with regards to hazelnuts.
We should expect all but non-contrarians to prefer free trade. We do see this, but also see little variation, telling us that it is an unimportant attribute.

No random choosers (RLH statistic)

Choice modeling questions can be a bit boring. When bored, some respondents will select options without taking the time to evaluate them with care. Fortunately, this is easy to detect with a choice model: we look at how well the model predicts a person's actual choices. The simplest way to do this is to count up the number of correct choices. However, there is a better way, which is to:

Fit a choice model without an attribute for alternative (i.e., without estimating alternative specific constants). The reason that this is important is that if a person has chosen option 3 every time, a model with an attribute of alternative may predict their choices very well. Where the model includes none of these alternatives, the trick is to merge together the levels of the attributes other than 'none of these'. In a labeled choice experiment, you need to skip this step and proceed with the model with the attribute for alternatives.
For each question, compute the probability that the person chooses the option that they choose.
Multiply the probabilities together. For example, in a study involving four questions, if a person chooses an option that the model predicted they had a 0.4 probability of choosing, and their choices in the remaining three questions had probabilities of 0.2, 0.4, and 0.3 respectively, then the overall probability is 0.0096. This is technically known as the person's likelihood.
Compute likelihood^(1/k), where k is the number of questions. In this example, the result is 0.31. The value is known as the root likelihood (RLH). It is better than just looking at the percentage of the choices that the the model predicts correctly, as it rewards situations where the model was close and penalizes situations where the model was massively wrong. Note that the RLH value of 0.31 is close to the mean of the values (technically, it is the geometric mean).
Plot the RLH statistics for each person and determine a cutoff point, re-estimating the model using only people with RLH statistics above the cutoff value. In the chocolate study from earlier, we gave people a choice of four options, which tells us that the cutoff point needs to be at least 1/4 = 0.25. However, hierarchical Bayes models tend to overfit conjoint analysis data, so a higher cutoff is prudent. The histogram below suggests that there are a clump of people at around 0.33, which perhaps represents random choosers, but there is no easy way to be sure (if your data contains information on time taken to complete questions, this can also be taken into account).

Rational choosing

Random choosing is a pretty low threshold for quality of data. A higher threshold is to check that the data exhibits "rationality", where here the term "rational" is a term of art from economics, which essentially means that the person is making choices in a way that is consistent with their self interests. Working out if choices in conjoint questionnaires are rational is an ongoing area of research, and lots of work shows that they are often not rational (e.g., irrelevant information shown earlier in a questionnaire can influence choices). Nevertheless, in most studies it is possible to draw some conclusions regarding what constitutes irrationality. Among the people that did not seem to be randomly choosing, 6% had data indicating they preferred to pay $2.49 to $0.99.

While it may be tempting to exclude these 6%, it would be premature. Inevitably some people will find price relatively unimportant, and for these people there will be uncertainty about their preference for price. A more rigorous analysis is to only exclude people where it is highly likely they have ignored price. Such an analysis leads to only excluding 1 person (0.3% of the sample). In the post Performing Conjoint Analysis Calculations with HB Draws (Iterations), I describe the basic logic of such an analysis.

Cross-validation

A basic metric for any choice model is its predictive accuracy. The chocolate study has a predictive accuracy of 99.4% after removing the respondents with poor data. This sounds too good to be true, and it is too good. This predictive accuracy is computed by checking to see how well the model predicts the data used to fit the model.

A more informative approach to assessing predictive accuracy is to check the model's performance on data not used to fit the model (this is called cross-validation). Where the choice model is estimated using an experiment, say asking people 6 questions each, predictive accuracy is assessed by estimating a model with one question randomly selected not to be used when fitting the model, and then seeing how well the model predicts these choices. In the case of the chocolate study, when this is done the predictive accuracy drops markedly to 55.4%. As the study involved four alternatives, this is well above chance (chance being 25% predictive accuracy).

Checking that the utilities are appropriately correlated with other data

All else being equal, we should expect that the utilities computed for each person are correlated with other things that we know about them. In the chocolate study this occurs, with diabetics having higher utility for sugar-free chocolate, and people with higher incomes not having as low utility for higher prices as did those with lower incomes.

A word to the wise: avoid using gut feel and prejudice. For example, it is pretty common for marketers to have strong beliefs about the demographic profiles of different brands' buyers (e.g., Hershey's will be bought by poorer people than Godiva). It is often the case that such beliefs are not based on solid data. It is therefore a mistake to conclude a choice model is bad if it does not align with beliefs, without checking the quality of the beliefs.

Ability to predict historic market performance

While the goal of a choice model is to make predictions about the future, they can also be used to make predictions about the past. For example, if a choice model collects price data, it can be used to predict the historic impact of changes in price. If the predictions of history are poor, it is suggestive that the model will also not predict the future well. You can find out more about this in 12 Techniques for Increasing the Accuracy of Forecasts from Choice Experiments.

All the calculations described in this post can be reviewed in this Displayr document.

Comparing HB Root-likelihood (RLH) Between Displayr and Sawtooth

Justin Yap — Fri, 01 Mar 2019 00:29:06 +0000

Root-likelihood (RLH) is a way to measure how well a choice model fits a data set. The RLH is a value ranging between 0 and 1, where a higher value indicates a better fit. It is less susceptible to noise than prediction accuracy but is less commonly used, perhaps because it is harder to conceptualize and interpret. In this article, I will be comparing the performance of Hierarchical Bayes (HB) algorithms from Displayr and Sawtooth. I'll compare their RLH over four data sets, using both in-sample and holdout data.

How RLH is computed

In order to explain how RLH works, we need to start with the concept of a likelihood. In simple terms, the likelihood is a measurement of how plausible a model's parameters are, given the available data. In the context of choice modeling, the overall likelihood is the combined likelihoods of the respondent's questions, where each question's likelihood is the logit probability for the respondent's choice.

For example, if there are three alternatives with utilities of 3.4, -0.9, and 1.2, then the logit probabilities are:

So if the respondent chose Alternative 1, the likelihood for this question is 88.9%. The likelihood for the entire model is simply the product of the likelihoods for every question. The RLH is computed by taking the n^th root of the likelihood, where n is the number of respondent questions. This is equivalent to the geometric mean of the likelihoods of the respondent questions (i.e. RLH is a kind of "average" likelihood of each question). An RLH can also be computed for a respondent as the geometric mean of the likelihoods of just their questions.

A null model where utilities are the same for all alternatives (i.e. a model that randomly chooses alternatives with equal probability) has an RLH of 1/k, where k is the number of alternatives. A model with a good fit to the data should have a larger RLH than this. However, when the fit is worse, such as when predicting holdout data, it is not unusual to have an RLH below 1/k. This is because of the nature of the geometric mean: small values have a large impact on the mean, dragging it lower than would be the case with the arithmetic mean.

For example, suppose there are three questions and the probabilities of the chosen alternatives are 0.99, 0.005, and 0.005. If the model correctly predicts the first two choices, then the predictive accuracy is 67%. However, the RLH is much lower at 0.17 -- (0.99 * 0.99 * 0.005)^(1/3) = 0.17 -- which is less than the null model RLH of 0.33. The null model has a higher RLH even though it only has a 33% accuracy.

Methodology

I ran Hierarchical Bayes (HB) using Displayr and Sawtooth on four choice model data sets, which contained data on cruise ship holiday, eggs, chocolate, and fast food preferences. The cruise ship data set was used in a modeling competition in 2016 run by Sawtooth Software, and the other data sets were collected by Displayr.

One question was left out per respondent to be used as the holdout, and the analysis was run with the default settings in both Displayr and Sawtooth. The RLH was computed from respondent parameters estimated by the model, and charts of the results were created with Displayr.

In-sample results

The grid of scatterplots below compares the in-sample RLH for the Displayr and Sawtooth HB models. Each data point represents the RLH for a single respondent, and the results are compared for all four data sets.

In all four charts, the RLH is concentrated around the top right quadrant. This indicates a good in-sample fit to the data. The Displayr and Sawtooth RLH are somewhat correlated, but the Sawtooth RLH are larger on average.

The next chart shows the overall in-sample RLH for the same data sets. The in-sample RLH from Sawtooth is larger than those from Displayr for all four data sets, which suggests that the Sawtooth HB model has a better in-sample fit than the Displayr model.

However, this could be a sign of overfitting to the data. We can test this by using the holdout data.

The final chart shows that the Displayr holdout RLH is higher than those of Sawtooth for all four data sets. This confirms that the Sawtooth models are overfitting more than the Displayr models.

The RLH for both models is lowest for the chocolate and fast food data sets since they have a high number of attributes relative to the number of questions, making them more prone to overfitting. There are four alternatives per question, which means that the null model Chocolate data set has an RLH of 0.25. The Sawtooth model has an RLH of just 0.19.

Also, note that the difference in RLH between the Displayr and Sawtooth models is largest for chocolate and fast food. This is the case for both for in-sample and holdout. From this, we can conclude that Sawtooth overfits more on data sets that are already prone to overfitting.

Conclusion

In this article, I described how RLH is computed and compared the performance of Displayr and Sawtooth HB using this metric. I found that Sawtooth HB overfitted to all four data sets more than Displayr, especially on data sets prone to overfitting. To mitigate the issue of overfitting, the default priors can be changed to "shrink" the model. But it requires a solid understanding of the underlying HB model, as well as a lot of trial and error, to correctly set the priors. Therefore it is important that default priors for HB choice modeling software are appropriately set to minimize this issue.

Creating Online Conjoint Analysis Choice Simulators Using Displayr

Tim Bock — Thu, 28 Feb 2019 03:16:00 +0000

Creating the simulator

Create a choice model of the conjoint using hierarchical Bayes (HB), latent class analysis or Multinomial logit in Displayr (Insert > More > Conjoint/Choice Modeling). You can also do this in Q and upload the QPack to Displayr.
Click on the model and in the object inspector, click Inputs > SIMULATOR > Create simulator.
Indicate the number of alternatives you want to have (excluding 'none of these' alternatives) and press OK.
Indicate whether you want to include Alternative as an attribute in you model. Typically you won't want to do this, as this attribute is used for model checking purposes rather than simulation.

A new page will then appear beneath the page that contains your model, and this page will contain your simulator. For example, one created for on the chocolate market looks like this:

Customizing the appearance of the simulator

You can customize all the various components of the simulator by clicking on them to modify them. For example, we've restyled the simulator above into the more attractive simulator below. For another example, click here.

Customizing the calculations of the simulator

If you scroll down, you will find that below the simulator is a small table showing the simulated shares. If you click on the this table, additional options for customizing the simulator appear on the right of the screen in the object inspector. You can use these controls to:

Apply a weight to the simulator.
Apply a filter to the simulator.
Adjust the simulator to better predict market share. For more information about how to do this, please see the Reporting for conjoint video or the blog post How to Fit Conjoint Analysis Simulators to Market Share.

Numeric attributes

If attributes have been analyzed as numeric (see Numeric Attributes in Choice-Based Conjoint Analysis in Displayr), when the simulator is created the combo box for the attribute will show the minimum value of the attribute used in the research, the maximum value, and two equally-spaced points in-between. For example, if the research has tested prices of $0.99 and $2.49, the combo box will contain options of 0.99, 1.36, 1.74, 2.49.

This list of values can be edited by clicking on the combo box, and changing the values in Properties > Item list, in the object inspector. You can change the values and add additional values. The values used in the combo box are not constrained to match those used in the study. After changing the list, make sure you click on the calculations (see the image above) and press CALCULATE in the object inspector to inform the simulator of the changes.

Providing access to the simulator for others

Once you have set up a simulator in Displayr, you can publish it as a web page (Export > Web page), choosing whether to allow it to be accessed by anybody with the URL, or, setting up password access.

Adjusting Conjoint Analysis Simulators to Better Predict Market Share

Tim Bock — Thu, 28 Feb 2019 02:52:49 +0000

This post describes four methods for adjusting choice simulators from conjoint studies so that they better fit market share: change the choice rule, modify availability, tuning the scale factor, and calibration. The post assumes that you have first created a simulator, as per the instructions in Creating Online Conjoint Analysis Choice Simulators Using Displayr and have selected the calculations (see the image below).

Changing the choice rule

Choice simulators make assumptions about how to compute share given the estimated utilities. By default, Displayr computes preference share using utilities computed for each respondent. This is the typical way that preference is simulated in choice simulators. However, it is perhaps not the best method. When preferences are simulated using respondent utilities, we are implicitly assuming that the utilities are estimated without error. This assumption is not correct. As discussed in Performing Conjoint Analysis Calculations with HB Draws (Iterations), we can improve on our calculations by performing them using draws. This is theoretically better as it takes uncertainty into account. To do this we need to:

Modify our choice model so that it saves the draws: set Inputs > SIMULATION > Iterations saved per individual to, say, 100. This will cause the model to be re-run
Click on the calculations (see the screen shot above).
In the object inspector on the right of the screen, change Rule from Logit respondent to Logit draw.

There are other rules. Rather than using the inverse logit assumption when computing preference shares, we can instead assume that the respondent chooses the alternative with the highest utility (First choice respondent) or that for each draw, the alternative with the highest utility is chosen (First choice draw). And, if you click Properties > R CODE you can edit the underlying code to implement more exotic rules (e.g., assume that people will only choose from the three most preferred options).

While you can modify these rules, it is recommended that you only use Logit draw or Logit respondent. The other rules, such as First choice respondent are only provided so that users who already use them in other programs have the ability to so in Displayr. The Logit draw rule is the actual rule that is explicitly assumed when the utilities are estimated, so if you use another rule, you are doing something that is inconsistent with the data and the utilities. Logit respondent is the most widely used rule largely because it is computationally easy; its widespread use suggests that it is acceptable.

Availability

Conjoint simulators assume that all the brands are equally available to all the respondents. However, in reality this assumption is unlikely. Some alternatives may not be distributed in some markets. Some may be available only in larger chains. Some may have really poor awareness levels. Some may have poor shelf placement.

The simplest way to deal with differences in availability is to create separate simulators for different markets and only include the alternatives in those markets in the simulators. A shortcut way of doing this for more technical users is to copy the calculations box multiple times, filtering each separately to each market, modifying the source code to remove alternatives, and then calculating the market share as the weighted sum of each of the separate market simulators.

Another approach is to factor in knowledge at the respondent level. We can incorporate this into a simulator by creating an R Output containing a matrix, where each row corresponds to a respondent, each column to one of the alternatives, with TRUE and FALSE indicating which alternative is available to which respondent. This is then selected in Availability in the object inspector.

For example, consider the situation where there is a data set of 403 respondents, we have four alternatives, and we wish to make the second alternative unavailable in Texas. We would do this as follows:

Insert > R Output
Paste in the code below
Click on the calculations (i.e., see the top of this post)
Set Availability to availability

The same approach can also be used to specify distribution in terms of percentages. The code below will generate availabilities such that the first alternative has a 50% distribution, the second 95%, the third 30%, and the fourth 100%.

When using randomness to take distribution into account, two additional things to consider are:

Are there correlations between the distributions? For example, if small brands tend to not appear in the same stores, then simulating availability with a negative correlation between the availability of the smaller brands may be advisable.
Simulation noise. This can be addressed by:
- Copying the calculations multiple times.
- Having a separate availability matrix for each, modifying the set.seed code in each (e.g., set.seed(1224) will generate a completely different set of random TRUEs and FALSEs).
- Calculating share as the average across the simulations.

More exotic modifications of distribution are possible by accessing the source code and using the offset parameter, as shown below.

Tuning the scale factor

Answering conjoint questions is boring. People make mistakes. We can be lazy and careless shoppers. We make mistakes. A basic conjoint simulator assumes that we make mistakes at the same rate in the real world as occurs when we fill in conjoint questionnaires. However, we can tune conjoint simulators so that they assume a different level of noise. The jargon for this is we can "tune" the scale factor.

The scale factor can be manually modified by clicking on the calculation and entering a value into the Scale factor field in the object inspector. The default value is 1. The higher the value, the less noise you are using in the simulation. As the scale factor approaches infinity, we end up getting the same results as when using a first choice rule. When the scale scale factor is 0, we assume each alternative is equally popular (assuming there are no availability effects).

You can automatically calculate the scale factor that best predicts known market shares as follows:

Click on the calculations (i.e., see the top of this post).
Click Scale to shares in the object inspector.
Type in the shares of the alternatives in the Shares field as proportions (e.g., .26, .16, .12, .46). You will then see a message like the one shown below, showing the estimated scale factor.
Using your mouse, select the number, right-click and select Copy.
Uncheck Scale to shares.
Click into Scale factor, right-click and select Paste.

Calibrating to market share

Calibration to market share involves modifying the utility of the alternative in such a way that the share predictions exactly match market share. This is more controversial than the other adjustments discussed in this post.

The main argument against calibration is that if the simulator inaccurately predicts current market share, all its predictions are likely inaccurate, so calibration is just making something inaccurate appear to be accurate, and thus is deceptive.

There are two counter arguments to this:

The simulator is likely the best tool available, and by calibrating it you ensure that the base case matches the current market, making it easier for people to interpret the results.
Calibration can be theoretically justified in the situation where important attributes have not been included in the study. For example, let's say we were simulating preference for cola, and and we had only included the attributes of brand and price in the study. If there were important differences in the packaging of the brands, then this is a limitation of the study that could be addressed by calibration (provided that the respondents would not have inferred the packaging from the brand).

To calibrate a simulator:

Click on the calculations (i.e., see the top of this post).
Click Calibrate to shares in the object inspector.
Type in the shares of the alternatives in the Shares field as proportions (e.g., .26, .16, .12, .46). You will then see a message like the one shown below, showing the estimated scale factor..
Using your mouse, select the numbers, right-click and select Copy.
Uncheck Calibrate to shares.
Click into Calibration factor, right-click and select Paste.

Do these four methods in order

In general, its advisable to apply these four adjustments in the order described in this post. In particular, availability should always be applied prior to scaling and calibration, and scaling prior to calibration.

Testing Whether an Attribute Should be Numeric or Categorical in Conjoint Analysis

Tim Bock — Wed, 27 Feb 2019 02:52:47 +0000

Most choice-based conjoint (CBC) studies in marketing specify a fixed number of levels for each attribute. For example, a study of the fast food market could test a variety of prices -- $10, $12, $15, $20, and $30 -- and estimate the utility (or appeal) of each price point. However, in Economics it is more common to treat price as a numeric variable when estimating the choice model, assuming that price has a linear relationship with utility (i.e., a dollar increase in price leads to a constant decrease in utility). As discussed in Numeric versus Categorical Price Attributes in Conjoint Analysis, it can be useful to treat attributes like price as numeric, even if the experimental design tested a set number of price points.

This post describes how to test whether to treat price as a categorical or numeric variable.

The old-school approach

Before explaining how to determine whether an attribute is better addressed as being numeric or categorical, I am going to revisit the approach described in introductory textbooks. For reasons I will later discuss, this approach is not appropriate for conjoint analysis, but it is useful to understand the old-school approach in order to recognize when people apply it inappropriately.

For the vast majority of statistical models -- linear regression, logistic regression, and multinomial logit, etc -- a statistical test is used to assess whether an attribute (variable) should be treated as being numeric or categorical. The basic process is as follows:

Compute the first model using the numeric variable.
Compute the second model using the categorical variable.
Use an F-Test or Likelihood Ratio Test to check if the improved fit of the categorical model is due to sampling error.

For a more graphical explanation, consider the chart below. The basic idea is to run a significance test to check whether the deviations from the straight line are likely just noise, or if it's reflective of some key insights into consumer behavior. The old-school approaches begin with the assumption that treating the data as categorical ensures that the model will provide a better fit for the data. They then compute statistical significance by quantifying how much deviation can be expected from a straight line by chance alone.

If the deviation that is observed is less than some plausible level of random deviation, then we conclude that the categorical attribute is not required and we should treat the variable as being numeric (aka linear).

The problem with the old-school approach

The old-school approach is tried and true and works in many contexts. But it does not work for the modern choice models used to analyze conjoint experiments.

The reason for this is that with a modern choice model, it is not even guaranteed that the categorical variable will have a better fit to the data than the numeric attribute. In the example plotted above, for example, the model with the linear (numeric) price attribute has a better fit to the data than the model with the categorical attribute (where fit is quantified as the log-likelihood).

If you have studied some statistics, you will probably be thinking, "How can that be? The categorical model is more flexible so it must fit the data better." However, that is not necessarily true in the case of modern choice modeling methods, such as hierarchical Bayes (HB). It is the case that the categorical variable is more flexible. However, the plot above only shows the average effect.

Modern choice models also compute estimates of variation between people, and the pattern of variation implied by a numeric attribute cannot be approximated by a categorical attribute. With a numeric attribute, a modern choice model assumes that the relationship between price and utility for each person is its own straight line, with people differing in regard to the slope of the lines. By contrast, when price is treated as being categorical, the conclusion will be that each person's line is not straight, even if the model estimates a perfectly straight line for the average effect.

The underlying mathematics (for the hardcore only)

The modern choice modeling methods, like HB, include a stage where draws are made from a multivariate normal distribution. Even if the vector of means of this distribution is descending linearly, there is no chance of a random draw from this distribution having values in a straight line. That can only occur when the variance of each variable in the distribution is 0, and that never occurs in real-world studies.

The solution

Fortunately, there is a straightforward way to perform a viable test:

Randomly divide the sample into two sets of observations:
- The first set which is used to estimate the models
- The second set which is held back and predicted by the model. This second sample is variously known as a holdout or validation sample.
Compute the log-likelihood of the two different models on the holdout sample. The model with the higher log-likelihood is the better model, all else being equal.

The solution with Displayr

If you are using Displayr, this is a straightforward procedure.

Displayr allows you to automatically hold back and predict the data from a subset of the choices, by clicking on a model and selecting Inputs > MODEL > Questions left out for cross-validation.

The log-likelihoods are shown in the footer of the models. The results for the Categorical model are shown immediately below. The log-likelihood in the holdout data is shown first, and the value for the model fitted to the data is shown in brackets. Below that are the corresponding results for the Numeric model.

In both cases, the log-likelihood is higher (closer to 0) for the model with the numeric attribute. The more important comparison is the value not in parentheses, and that difference is much larger. This further emphasizes the point that the numeric variable is better suited to this data set.

Understanding Logit Scaling

Tim Bock — Wed, 20 Feb 2019 02:36:58 +0000

Example: choice-based conjoint analysis utilities

Consider the utilities plot below, which quantifies the appeal of different aspects of home delivery. If you hover over the mouse plot you will see the utilities. For example, you can see that Mexican has a utility of 4.6 and Indian of 0. These values are logit scaled.

Converting logit-scaled values into utilities

When things are on a logit scale, it has a couple of profound implications. The first is that we can compute probabilities of preference from the difference. For example, we can see from the utilities that this person seems to prefer Mexican food to Indian food (i.e., 4.6 > 0). The difference between 4.6 - 0 = 4.6 (we are starting with an easy example!), and this means that given a choice between Indian and Mexican food, we compute there is a 99% chance they would prefer Mexican food. The actual conversion from logit-scaled values to utilities is a simple formula, which is easy to compute in either Excel or R. Note that there is a minus sign prior to the value of 4.6 that we are evaluating.

Comparing Mexican food to Italian, we can see that Italian food is preferred and the difference is 0.5. As the difference is smaller, the probability is closer to 50%. A logit of 0.5 translates to a 62% probability of preferring Italian food to Mexican food.

Summing logit-scaled utilities

Things get even cooler when we add together utilities and then compute differences. Mexican food at $10 has a utility of 4.6 + 3.3 = 7.9, whereas Italian food at $20 has a utility of 5.0 + 1.0 = 6.0. This tells us that people prefer Mexican food if it is $10 cheaper. Further, as the difference is on a logit scale, we can convert the difference 7.9 - 6.0 = 1.9 into a probability of 87%.

Percentages versus probabilities

Now for some of the ugliness. So far I have described the data as being for a single person, and interpreted the logit scales as representing probabilities. In many situations, the underlying data represents multiple people (or whatever else is being studied). For example, in a model of customer churn, we would interpret the logit in terms of the percentage of people rather than the probability of a person. Why is this ugly? There are two special cases:

In many fields, our data may contain repeated measurements. For example, in a typical choice-based conjoint study we would have multiple measurements for multiple people, and this means that the logit is some kind of blend of differences between people and uncertainty about a person's preference. It is usually hard to know which, so common practice is to use whichever interpretation feels most appropriate.
More modern models, such as hierarchical Bayes, compute logit-scaled values for each person. This is good in that it means that we can interpret the scaling as representing probabilities about individual people. But, a practical problem is that the underlying mathematics means we cannot interpret averages of coefficients as being on a logit scale, and instead need to perform the relevant calculations for each person, and compute the averages of these.

Numeric Attributes in Choice-Based Conjoint Analysis in Displayr

Tim Bock — Thu, 14 Feb 2019 00:45:11 +0000

Step 1: Set up and estimate the choice model treating all the variables as categorical

Start by setting up the choice model keeping all the attributes as being categorical (see here for more info).

Step 2: Duplicate the model

In the Pages tree, click on the page that contains the choice model that was estimated in Step 1, and press Home > Duplicate. You do not need to wait for the calculations of this duplicated model to complete before proceeding to step 3.

Step 3: Set the values of the numeric attribute(s)

Select EXPERIMENTAL DESIGN > Code some categorical attributes as numeric (see below). Enter the names of any attributes, followed by commas, and the values that you want to use when treating the variables as numeric. When entering the numeric values, take care to enter them in the same order as they appear in the model estimate in Step 1. (If the model in Step 1 is still estimating, you can change MODEL > ITERATIONS to a small number (e.g., 6), so that it computes faster, but make sure you reset it back to 100 after doing this.)

Step 4: Remember when interpreting the data that the numeric variable will be on a different scale

The output below shows the estimated distribution of a numeric attribute relating to price of home delivered food. It would be very easy to look at this and conclude that, relative to Cuisine, Price per person is both not very important and that there is very little variation in terms of its importance in the population. However, this is not the right way to read this output.

To illustrate this point, I've re-estimated this model, but rather than use prices of 10, 12, 15, 20, and 30, I've divided them by 10 and used prices of 1, 1.2, 1.5, 2. and 3. You can see this in the outputs below. Price appears to be much more important below, but in reality it is equally important in both models. Keep in mind that the values shown for Cuisine are utilities, whereas the distribution shown for Price per person is instead a coefficient.

For categorical data the ideas of a utility and a coefficient are interchangeable, but with numeric attributes they are not. In order to compute utility, we need to multiply the coefficient of the numeric attribute by the values. In the output above, it shows that the coefficient for Price per person is 0.2; this is rounded, and with an extra decimal the value is -0.17. The utility for $10 is then 10*-.17 = -1.7 and the utility for $30 is 30*-.17 = -5.1.

The easiest way to avoid any possibility of misunderstanding is to use Insert > Conjoint/Choice Modeling > Utilities Plot, which automatically computes the utility for the highest and lowest prices in the study.

To find out more about Conjoint Analysis in Displayr, head here!

Numeric versus Categorical Price Attributes in Conjoint Analysis

Tim Bock — Wed, 13 Feb 2019 02:58:53 +0000

The difference between a numeric and categorical price attribute

The chart below illustrates the the implications of treating price as being categorical versus numeric. When price is treated as a numeric attribute (variable), the model assumes that there is a linear relationship between price and utility, as shown by the orange line. When price is modeled as being categorical, non-linear relationships can be found. In this example, which is from the home deliver market, the categorical price attribute leads to the conclusion the drop in utility that goes with changing price from $15 to $20 is particularly large, being much larger than the drop from $10 to $15 and even from $20 to $30. By contrast, when a numeric price attribute is used, the model assumes that the effect is constant across all price points; each extra dollar of price leads to a constant drop off in utility.

The benefit of treating price as being categorical

The key benefit of treating price as being categorical is that it is more consistent with what we know about consumer behavior. Study after study has found interesting non-linear relationships between price and consumer preferences, and in marketing it is the norm to view these as indicating interesting "psychological" pricing findings. For example, in most markets there are believed to be price thresholds (e.g., keep a phone under $1,000), and various other interesting pricing points (e.g., prices ending in 99). Identifying such psychological price points and using them as a basis for setting price seems like good strategy.

The benefits of treating price as numeric

Economic theory suggests that the relationship between price and utility should be linear. To use the jargon: the price coefficient is seen as being the marginal utility of income. Although it is routine for studies to find statistical evidence that the price is nonlinear, it is possible that psychological price points are just research artifacts. Perhaps surveys show non-linearity, but in the real world with real money people behave more rationally.

When we treat price as being a numeric attribute, it allows us to use interpolation and extrapolation to make more precise conclusions about price. In the example above, where we have treated price as being categorical, we can only safely draw conclusions at the specific price points tested in the research. In this case, these are $10, $12, $15, $20, and $30 (i.e., the end-points and the places where the lines join). By contrast, with the numeric variable we can price at any point along the line, and if we are brave can extrapolate beyond the line. Further, with the numeric attribute the effect of sampling error is smaller so we can have even more confidence about our conclusions.

Another benefit of treating price as being numeric is that it allows you to use choice modeling with data that has too many different price points to make it practical to estimate a separate utility for each price point. Such data is widespread in economics and transportation researcher, where the prices that are shown to each respondent are customized to their specific circumstances (e.g., if an attribute is the price of traveling by car, each person's fuel costs and depreciation will be slightly different).

There is a further benefit of treating price as being numeric: it makes it a lot simpler to make price-related conclusions from the research. The next section provides a bit more detail on how to interpret price effects estimates when price is treated as being numeric. Then, I discuss how this simplifies other calculations.

How to interpret numeric price coefficients

To illustrate this point, the output below shows a model that I have re-estimated, but rather than using prices of 10, 12, 15, 20, and 30, I've divided them by 10 and have used prices of 1, 1.2, 1.5, 2. and 3. Price appears to be much more important below, but in reality it is equally important in both models. The thing to keep in mind is that the values shown for Cuisine are utilities, whereas the distribution shown for Price per person is instead a coefficient. For categorical data the ideas of a utility and a coefficient are interchangeable, but with numeric attributes they are not. In order to compute utility, we need to multiply the coefficient of the numeric attribute by the values. In the output above, it shows that the coefficient for Price per person is 0.2; this is rounded, and with an extra decimal the value is -0.17. The utility for $10 is then 10*-.17 = -1.7 and the utility for $30 is 30*-.17 = -5.1.

Computing average willingness-to-pay

When we have a price coefficient, we can easily scale all other utilities by dividing by this coefficient. For example, where the coefficient for price is 0.17, using the data shown above, we end up computing the average mean utility for the different cuisines as follows:

Chicken: $0
Chinese: $1.18
Hamburgers: $0.59
Indian:-$20.00
Italian: $1.18
Mexican: $-$1.18
Pizza:$7.06
Thai: -$13.53

These are variously known as dollar-metric utilities and as willingness-to-pay (WTP). Comparing, for example, Chinese with Hamburgers, Chinese has a $0.59 higher WTP ($1.18 - $0.59), and an interpretation of this is that, on average, people are prepared to pay $0.59 more for a Chinese meal than a hamburger meal.

There are a series of posts that provide more details about different aspects of numeric attributes in choice models:

Testing Whether an Attribute Should be Numeric or Categorical in Conjoint Analysis
Numeric Attributes in Conjoint Analysis in Displayr
Using Utilities Plots to Facilitate Comparison of Numeric and Categorical Attributes in Conjoint Analysis.
Using Conjoint Analysis to Set Prices describes more

You can view these calculations, hooked up to the raw data and models in Displayr, by clicking here.

Checking Convergence When Using Hierarchical Bayes for Conjoint Analysis

Justin Yap — Wed, 06 Feb 2019 00:26:01 +0000

Please read How to Use Hierarchical Bayes for Choice Modeling in Displayr prior to reading this post.

There are a number of diagnostic tools that you can use to check the convergence of the model. These include looking at the statistics of the parameter estimates themselves, checking trace plots which describe the evolution of the estimates throughout the iterations of the algorithm, and the posterior intervals which show the range of the sampled parameters.

Download our free MaxDiff ebook

Technical overview

Hierarchical Bayes Choice Models represent individual respondent utilities as parameters (usually denoted beta) with a multivariate normal (prior) distribution. The mean and covariance matrix of this distribution are themselves parameters to be estimated (this is the source of the term hierarchical in the name). Hierarchical Bayes uses a technique called Markov Chain Monte Carlo (MCMC) to estimate the parameters, which involves running a number of iterations where estimates of the parameters are generated at each iteration. This iterative sampling of parameters forms what is known as a Markov Chain. In this post I shall use the term sample to refer to a set of estimates of the parameters from a single iteration.

Different software packages have different approaches. The R package, Stan, (used by Q and Displayr) uses a modern form of MCMC called Hamiltonian Monte Carlo. Sawtooth, on the other hand, uses the more old-school Gibbs sampling. My experience is that both approaches get the same answer, but the newer Hamiltonian Monte Carlo is faster for really big problems with lots of parameters. If you wish to use Gibbs Sampling in R, you can do so using the bayesm package. However, in my experience, the resulting models have a worse fit than those from Stan and from Sawtooth.

The samples are generated from those in the previous iteration using a set of rules. The rules are such that once sufficient iterations have been run and the initial samples are discarded, the distribution of the samples matches the posterior distribution of the parameters given prior distributions and observations. In the case of choice-based conjoint, the observations are the respondent's choices to the alternatives presented to them. By default, Stan discards the first half of the samples. This is known as the warm-up. The latter half of the samples are used to estimate the means and standard errors of the parameters.

The difficult part is knowing how many iterations is sufficient to reach convergence, which I discuss in the next section.

Multiple chains

To take advantage of multi-core processors, multiple Markov chains are run on separate cores in parallel. This has the effect of multiplying the sample size, in return for a slightly longer computation time compared to running a single chain. The more samples, the less sampling error in the results. Here, I am referring to sampling error that results from the algorithm, which is in addition to sampling error from selection of the respondents. In addition, multiple chains result in samples that are less auto-correlated and less likely to be concentrated around a local optima. Having said that, hierarchical Bayes is, in general, less susceptible to local optima than traditional optimization methods due to the use of Monte Carlo methods.

To make full use of computational resources, I recommend that the number of chains is chosen to be equal to or a multiple of the number of logical cores of the machine on which the analysis is run. In Q and Displayr the default of 8 chains is ideal. If you are running your model by R code, you can use detectCores() from the parallel R package. It is important to run multiple chains for the diagnostics discussed in the next section.

Achieving convergence

Formally, a chain is considered to have converged when the sampler reaches stationarity, which is when all samples (excluding the initial samples) have the same distribution. In practice, heuristics and diagnostics plots are used to ascertain the number of iterations required for convergence. The heuristics are based upon two statistics, n_eff and Rhat, which are shown in the parameter statistics output below:

This table includes statistics for the class sizes, estimated means and standard deviations. The off-diagonal covariance parameters are not shown due to the lack of space, and because they are not as important. n_eff is an estimate of the effective sample size (samples of the parameters, not cases). The smaller n_eff, the greater the uncertainty associated with the corresponding parameter. Thus, in the table above, we can see that all the sigma parameters (the standard deviations) tend to have more uncertainty associated with them than the means (this is typical).

The column se_mean shows the standard error of the parameter means, which is computed as sd/sqrt(n_eff), where sd is the standard deviation of the parameter.

Rhat refers to the potential scale reduction statistic, also known as the Gelman-Rubin statistic. This statistic is (roughly) the ratio of the variance of a parameter when the data is pooled across all of the chains to the within-chain variance. Thus, it measures the extent to which chains are reaching different conclusions. The further the value of the statistic from 1, the worse.

As a strategy to achieve convergence, I suggest starting with 100 iterations and set the number of chains equal to the number of logical cores available. The four conditions to check for convergence are:

No warning messages should appear. Most warning messages are due to insufficient iterations. However if a warning appears indicating that the maximum tree depth has been exceeded, you should increase the maximum tree depth setting from the default of 10 until the warning goes away.
The estimate of the effective sample size, n_eff, is at least 50 for all values and ideally 100 or more for parameters of interest. A value of 100 is equivalent to specifying that the standard error of the mean is at least an order of magnitude (10 times) less than the standard deviation.
The potential scale reduction statistic, Rhat, should be less than 1.05 and greater than 0.9 for the parameters of interest.
Diagnostics plots should not have any unusual features (discussed below).

If any of these conditions are not met, you should re-run the analysis with double the number of iterations, until all conditions have been met. Increasing the iterations beyond this point will increase the precision of the estimates but not drastically change the results. I find that the standard deviation parameters take more iterations to reach convergence than mean parameters. Also, the effective sample size condition tends to be a stricter condition than the one on the potential scale reduction statistic Rhat, so it will be the last to be satisfied.

Refer to the Stan website, and in particular the Stan Modeling Language: User's Guide and Reference Manual for more information.

Trace plots

Trace plots show the parameter samples for each chain. The plots below are for means and standard deviations from an example with 500 iterations and 2 chains. The grey halves indicate the warm-up iterations, whereas the second halves of each plot contain the samples that are used to compute the final result. The following are features to look out for that would indicate an issue with the sampling:

A high degree of correlation between parameters. This would manifest as two or more traces moving in sync.
A parameter has not stabilized by the end of the warm-up iterations.
For a particular parameter, there is a chain which is consistently higher or lower than the others.

A practical challenge with these plots on modern machines with more than 2 chains is that often it can be hard to see patterns because all the lines overlap so much.

The above example shows traces which look to converge fairly well. Consider the traces which come from reducing the number of iterations drastically to 50:

Here we see that 50 iterations is not sufficient. Many of the lines have not stabilized after the warm-up stage.

Posterior Intervals

Posterior interval plots show the range of the sampled parameters. The black dot corresponds to the median, while the red line represents the 80% interval, and the thin black line is the 95% interval. There is nothing out of the ordinary in this plot, but I would be concerned if intervals were wide or if the medians were consistently off-center.

Summary

I have provided a brief summary of how hierarchical Bayes for choice modeling works, explained some key diagnostic outputs, and outlined a strategy to ensure convergence. The challenge with hierarchical Bayes is that unlike methods such as latent class analysis, the user needs to verify that the number of iterations is sufficient before proceeding to make any inferences from the results. The convergence criteria listed in this blog post provides a way to ensure the iterations are sufficient, and give us confidence in the answers we obtain. To see the R code used to run hierarchical Bayes and generate the outputs in this blog post, see this Displayr document.