R How To... - Displayr

Using R in Displayr Video Series

Liz Kucko — Mon, 08 Jun 2020 22:10:20 +0000

R is one of the most powerful coding languages for analyzing data. It's used by millions of people across the globe, and is free to boot. Here at Displayr, we've seamlessly integrated R with our software to enable those with special custom requirements or analysis needs the ability to implement those alongside our standard features. What you now have is a one stop shop for point and click features as well as more advanced custom coding. For those who have never done coding before, or may not be familiar with R coding, getting up to speed may feel like a daunting task. For this reason, we've created a series of videos to introduce you to coding in R and walk through practical examples of how to use R to further customize your reporting and dashboards.

Links to the videos and the documents they review are below. If you're using our sister software Q, you can download the QPack version to follow along. They generally start with the basics and move onto the more advanced.

Name	Content	Link to Displayr Document	Link to Video
Overview	How does R work with Displayr How do I get help with R? Other tips?	Displayr doc QPack
Primer	Referencing Data Data Types Data Structures Functions	Same document as Overview above
Simple Tables	Table subsetting/indexing Combining tables Table calculations Sorting/ordering Renaming rows/columns Blanking cells with small values Removing rows/cols with small samples Renaming things and formatting Building a brand funnel	Displayr doc QPack
R Variables	Creating a combo box filter simple & advanced Filtering and deleting observations Banding and re-categorizing variables Checking if "any of" some variables have a particular value Splitting and combining text strings Using apply() to apply an action to each row or column	Displayr doc QPack
Custom R Outputs	Exploring outputs Error handling Updating/customizing text Logos and links	Displayr doc QPack
Advanced Tables	Working with nested banners Merging tables that don't match Customizing cell formatting Adding spans Adding statistical test results	Displayr doc QPack
Troubleshooting	Tips Useful functions Common errors/examples	Displayr doc QPack

How to use the Displayr Cloud Drive

Oliver Harrison — Fri, 08 May 2020 05:36:28 +0000

What can be saved to the Displayr Cloud Drive?

Displayr's Cloud Drive can be used for saving a variety of files. These files include images, company logos and client data sets. You can even share raw data files or R tables and charts between your company documents.

Once the Cloud Drive has been enabled on your account (contact support@displayr.com to enquire), you can access it from your document via your Profile icon > Displayr cloud drive.

You will be presented with a table that lists all the files that are stored within your company's cloud drive. This includes auditable information such as when it was last modified and what company document last updated or called it.

Saving to the Cloud Drive

To upload files to the Cloud Drive, simply click the Upload button and select the files you wish to upload. In this example, we have uploaded a logo and a data set that we want to use in our document.

Loading from the Cloud Drive

Open a document, then load the data by clicking New Data Set > Displayr Cloud Drive and choosing the data file we uploaded.

At the top of the Select a data file screen, there is a setting for automatically refreshing the data set. This will allow you to update the data file and, in turn, the document's data will update automatically. In this example, the automatic refresh interval has been set to 12 hours.

You will now be able to see the data file variables under Data Sets in the bottom left corner.

If you wish to manually update the data ahead of the automatic refresh, simply click the data set folder and then click Update > Displayr Cloud Drive > OK.

Next, we can add the saved logo to the first page via Insert > Image > Displayr Cloud Drive and choose the previously loaded image file.

Sharing R outputs between documents

If you have any tables or visualizations (created via Insert > Visualization) which you want to share with other documents, you can do this easily by selecting the output, clicking Export > Displayr Cloud Drive, naming your file and then pressing Export.

Tables are saved as R files (*.rds) and visualizations are saved as R-rendered HTML widgets without the underlying data included.

Here, we have saved the income table as an R output called test.

Connecting to the Cloud Drive using R code

An alternative method of exporting to the Cloud Drive is to use R code directly in an R output (via Insert > R Output). Here, we will use the QSaveData function from the flipAPI package:

library(flipAPI)
QSaveData(table.Income,"test.rds")

If you wish to then import this, or any other Cloud Drive file, into any document in your account, you can use the corresponding QLoadData function:

QLoadData("test.rds")

This process can also work with .csv files. You just need to specify the correct extension in the function parameter. For further information please see the Displayr Cloud Drive R API documentation.

In order to create a workflow that automatically imports and exports updated files, you can additionally add a flipTime function such as UpdateEvery or UpdateAt to set a timer. Below we have set it to run every 3 hours:

library(flipTime)
UpdateEvery(3, "hours", options = "wakeup")

You can find further information on automatic updating here.

How to Customize the Sample Size Description Widget

Matt Munley — Tue, 18 Feb 2020 20:57:22 +0000

Displayr has a built-in Sample Size Description widget (under Insert > More > Data > Sample Size Description) that you can use to describe the data being displayed, as outlined in this post. But what if the default text isn't quite what you want? This post explains how to easily customize the text to your liking. You will see how to change, reorder, and remove elements of the description, as well as modify it to reference filters selected in a combo box.

Breaking down the fields

The text in the sample size description output has 5 parts: Initial text, Sample description, Sample size description, Sample size, and Final text.

You can use the Object Inspector (as seen below) to customize some of the aspects of the Sample Size Description output: Initial text, Sample size description, and Final text. If your data is not filtered, the text in Total sample description field will be shown for Sample description.

Other bits of the output, Sample description and Sample size, are determined in the underlying R code of the output. The Sample size field displays the number of cases specified by the Complete data variable, including any filters applied. The Sample description field displays the name or names of any filters applied or the text in the Total sample description field.

Changing the basic fields in the Object Inspector allows for some customization. For deeper customization, you need to edit the R code.

Deeper Customization with R

Basic edits to the sample size description text output using R aren't as difficult as they sound. Changes such as reordering or removing the fields involves editing a single line of code.

To view the R code behind the sample size description widget, go to Properties > R CODE in the Object Inspector. The important line of code that controls the output text is the last line: paste0(formInitial, base, formN, n, formFinal). Each of the fields inside the parentheses in the code corresponds to one of the text fields of the sample size description.

Reordering the text: To swap the order of the fields, change the order of the text inside the paste0() function, such as: paste0(formN, n, formInitial, base, formFinal)
Removing text: To remove a field, delete it from inside the parentheses. The following example removes "Base: total sample;" from the output: paste0(formN, n, formFinal)
Adding custom text: To add custom text to the description, add it to the function in quotation marks, like so: paste0(formInitial, n, " respondents ", base).

Advanced Customization: Dynamic updating with Combo or List Boxes

If I have an R variable filter that is connected to a combo box or list box, the Sample Size Description output updates as the filters change, but the underlying R code needs further editing in order to show the actual selections in the control. To learn how to connect a filter to a combo or list box, please see this blog post. Like other charts or visualizations, the Sample Size Description must first be connected to the filer used in the combo box. In the Object Inspector of the Sample Size Description, select same filter variable used in the Combo Box in Inputs > FILTERS & WEIGHT > Filter(s).

After selecting the filter variable, the text updates to reflect the new Sample size. But, it only displays the name of the combo box filter - Gender - and not if Male or Female is selected. To enable that, we need to edit the R code.

To make the sample description field react to the combo box selection, change the first line of code from base <- attr(QFilter, "label") to base <- toString(Combo.box) where "Combo.box" is the name of your combo box or control used with your filter variable. That small change means the sample description field will update as the selections in the combo box change. The text will update to include all selections in the combo box, so if the combo box allows multiple selections, the text may become rather large.

If all the filters are selected, the output will show all of the included categories and not the text in the Total sample description field. To do that, we need to make a few more edits to the R code. Replace base <- attr(QFilter, "label") with the code below, changing Combo.box to the name of your combo box control and d3 to the name of the variable set the combo box is based on.

available.items <- nlevels(d3)
selected.items <- length(Combo.box)
all.selected <- ifelse(selected.items == available.items, TRUE, FALSE)
base <- toString(Combo.box)

If you are using multiple response data in your filter, use the R code below. It counts the possible selections in the question and excludes the NET.

available.items <- ncol(subset(d3, select=-c(NET)))
selected.items <- length(Combo.box)
all.selected <- ifelse(selected.items == available.items, TRUE, FALSE)
base <- toString(Combo.box)

This code compares the number of possible selections from the underlying question to the number of selections in the combo box. If they're the same, it stores TRUE in all.selected. Then, replace base in the final paste0() line with ifelse(all.selected, formTotalSample, base).

Rather than showing just the selections in the combo box, if the number of selections matches the number of variables in the underlying question, the widget will display the text in the Total sample description field just as it does when not using a filter connected to a combo box.

Creating R Variables from Multiple Input Variables Using Code

Tim Bock — Tue, 30 Jul 2019 04:33:00 +0000

Numeric variables

All the traditional mathematical operators (i.e., +, -, /, (, ), and *) work in R in the way that you would expect when performing math on variables.

For example, to add two numeric variables called q2a_1 and q2b_1, select Insert > New R > Numeric Variable (top of the screen), paste in the code q2a_1 + q2b_1, and click CALCULATE. That will create a numeric variable that, for each observation, contains the sum values of the two variables. Similarly, the following code computes a proportion for each observation: q2a_1 / (q2a_1 + q2b_1).

To see the name of a variable, hover over it in the Variable Sets tree. Or, drag the variable into the R CODE box.

Vector arithmetic

One of the great strengths of using R is that you can use vector arithmetic. Consider the expression q2a_1 / sum(q2a_1). This tells R to divide the value of q2_a1 by the sum of all the values that all observations take for this variable. That is, when computing the denominator, R sums the values of every observation in the data set. Other programs, such as SPSS, would instead treat this expression as meaning to divide q2_a1 by itself.

Similarly, if we wished to standardize q2a_1 to have a mean of 0 and a standard deviation of 1, we can use (q2a_1 - mean(q2a_1)) / sd(q2a_1).

In these two examples, there are also specialist functions we can use: q2a_1 / sum(q2a_1) is equivalent to writing prop.table(q2a_1), and (q2a_1 - mean(q2a_1)) / sd(q2a_1) is equivalent to scale(q2a_1).

`rowSums` and `rowMeans`

As shown in the previous section, sum will add up all the observations in a variable. If we want to calculate the average of a set of variables, resulting in a new variable, we do so as follows:

rowMeans(cbind(q2a, q2b, q2c, q2d, q2e, q2f))

Where:

cbind groups the variables together in a table with one row for each observation and one column for each variable
rowMeans computes the mean of each row in the table.

Missing values in vector arithmetic

Most in-built R functions, such as sd, mean, sum, rowMeans, and rowSums, will return missing values if any of the values in the vector (variable in this case) passed to them contains a missing value. In most cases, the trick is to use na.rm = TRUE. For example:

(q2a_1 - mean(q2a_1, na.rm = TRUE)) / sd(q2a_1, na.rm = TRUE)

Sadly, there is no shortage of exotic exceptions to this rule. For example, prop.table cannot deal with missing values, and scale automatically removes them.

Variable sets

The data file used in this post contains 12 variables showing the frequency of consumption for six different colas on two usage occasions. When Displayr imports this data, it automatically works out that these variables belong together (based on their having consistent metadata). The variables are then automatically grouped together as a variable set, which is represented in the Data Sets tree, as shown below.

When your mouse pointer is positioned over the variable set, it shows the raw data for the variables. In addition to showing the 12 variables, you can also see nine automatically constructed additional variables:

One variable which shows the sum of the variables, called SUM, SUM. This is the right-most of the variables.
Six showing the sum of each of the cola brands: Coca-Cola, SUM, Diet Coke, Sum, etc.
Two showing the sum of the variables pertaining to each occasion: Sum, 'out and about' and Sum, 'at home'.

These automatically constructed variables can considerably reduce the amount of code required to perform calculations. For example, to compute Coca-Cola's share of category requirements, we can use the expression:

(q2a_1 + q2a_2) / `Q2 - No. of colas consumed`[,"SUM, SUM"]

Note that the denominator has two aspects:

The Label of the variable set, which is surrounded by backticks (the key that looks a bit like an apostrophe but isn't; on my keyboard it's above the Tab key, but this can vary depending on your keyboard's region).
[,"SUM, SUM"] which means to take the column SUM, SUM.

At first glance, this may seem somewhat strange and unguessable. However, if you create a table with the variable set, you can get a better understanding of what is happening and why. The table below shows the variable set, and you can see that the SUM variables correspond to the totals. With categorical variable sets, NET appears instead of SUM. And, if you delete these categories from the table, it will also delete them from the data set itself.

The `apply` function

R has a super-cool function called apply. It is a little tricky to get your head around it if you're new to writing R code, so if your head is already swimming, skip this section!

Earlier we looked at rowMeans(cbind(q2a, q2b, q2c, q2d, q2e, q2f)). We can rewrite this as apply(cbind(q2a, q2b, q2c, q2d, q2e, q2f), 1, mean). This is doing exactly the same thing, except that:

We are telling R to compute the average with the mean argument
The 1 tells R to perform the calculation by rows. If we instead had a 2, we would instead compute the mean of the columns.

The useful thing about apply is that we can add in any function we want. For example, to compute the minimum, we replace mean with min:

apply(cbind(q2a, q2b, q2c, q2d, q2e, q2f), 1, min)

And, we can even write custom functions to apply for each row. The example below identifies flatliners (also known as straightliners), who are people with the same answer to each of a set of variables:

apply(cbind(q2a, q2b, q2c, q2d, q2e, q2f), 1, function(x) length(unique(x)) == 1)

The way it works is that:

The function(x) part is boilerplate, telling R that you are going to be creating a custom function, and to represent each row as x
unique identifies all the unique values in x (i.e., each row)
length(unique(x)) counts the number of unique values for each row
length(unique(x)) == 1 returns a TRUE for each row that contains only one unique value (i.e., flatlining) and a FALSE otherwise

We can make the code simpler by referring to variable set labels rather than variable names, as done below. But, when doing this, keep in mind that any automatically constructed SUM or NET variables will be in the calculation. This is fine for working out flatlining (as in this example), but will lead to double-counting in other situations e.g., if computing a sum or average).

apply(`Q2 - No. of colas consumed`, 1, function(x) length(unique(x)) == 1)

Categorical variables

This section returns to basics and looks at all the steps that go into recoding a numeric variable into a categorical variable. In this example, we will illustrate various aspects of how the program works by recoding age into a new variable with four categories. If all you are really wanting to do is recode, there is a much better way: see How to Recode into Existing or New Variables.

Create a table by dragging the variable onto the page. This shows us the labels that we need to reference in our code.
Insert > New R > Numeric Variable, which will cause a new variable to appear in the Data Sets tree on the left side of the screen.
Type or copy and paste the code shown below into INPUTS > R CODE (on the right of the screen) and click CALCULATE (at the top-right of the screen).
Check the new variable by cross-tabbing it with the original variable. That is, drag the new variable (probably called newvariable) over the original table, releasing it in the Columns slot. You will see the values that have been recoded to each of the categories, showing as averages.
Click back on the new variable in the Data Sets tree, and give it an appropriate Label and Name (top-right of the screen; e.g., Age groupings, and age, respectively).
Optional: change the structure of the data so that it is categorical, by setting INPUTS > Structure to Nominal: Mutually exclusive categories (at the bottom) and set the labels by clicking DATA VALUES > Labels.

Looking at the code above, note that:

For a single category, we use the == operator.
For multiple categories, we list them surrounded by c() and use the %in% operator.
The values are assigned at the end of the line, after a ~.

Automatic updating: benefits and gotchas

When your original data updates, the code is automatically re-run. This is mainly a good thing. However, if you merge the categories of the input age variable, it will cause problems to the variable. Here are two ways to avoid this:

Duplicate the original variable (Home > Duplicate) and merge its categories.
Modify the code to use the label of the merged categories.

Not (`!`)

In R, the way you write "not" (as in, "not under 40") is to use an exclamation mark (!). So, we can write:

Variable labels containing punctuation

Rather than typing variable labels, we can drag them from the data set into the R code. Where the variable label contains punctuation, it will be surrounded by backticks, which look a bit like an apostrophe. On my keyboard, the backtick key is above the Tab key.

Using variable names

When you hover over a variable in the Data Sets tree, you will see a preview which includes its name. In my data set, "living arrangement" has a variable name of d4, and we can refer to that in the code as well in place of the label.

Or (|)

You can also use the or operator, which is a pipe (i.e., a single vertical line). On my keyboard, I hold down the shift key and click the button above Enter to get the pipe.

In this example, note that I've used parentheses around the expression that is preceded by the not operator (!), as otherwise it would be read as "not living with partner and children or living with children only", rather than "not(living with partner and children or living with children only)."

Other (`TRUE`)

In the example above, line 3 is a very verbose way of writing "everybody else". We can instead use the code snippet below. The case_when function evaluates each expression in turn, so when it gets to line 3, R reads this as "everybody else" or "other".

Missing values (`NA`)

If our categories are not exhaustive, we will end up with missing values. For example, this code creates a variable with a 1 for people with children and missing values for others.

Recoding after creating the R variable

It might look like the missing values caused by the example above is a mistake. But it can be an efficient way to work because you can later recode the variable using Displayr's GUI. Simply click DATA VALUES > Values, change the Missing data in the Missing Values setting to Include in analyses, and set your desired value in the Value field.

And (&)

The example below uses the and operator, &, to compute a respondent's family life stage. The green bits, preceded by a #, are optional comments which help make the code easier to understand.

Temporary variables within the code used to create a variable

A much nicer way of computing a household structure variable is shown in the code below. This approach initially creates four variables as inputs to the main variable of interest, and these variables are not accessible anywhere else in Displayr. They exist for the sole purpose of computing household structure.

Line 1 computes a variable that contains TRUE and FALSE values for each row of data, as do lines 2 through 4. Then, case_when evaluates these using standard boolean logic for each row of data.

What makes this better code? It improves on the earlier example because:

Calculations are performed once. In the earlier example, the definition of younger appeared six times, but in this example, it only appears once.
It is simpler to read

`ifelse`

Earlier we looked at this example:

[desktop]

[mobile][/mobile]

A much shorter way of writing it is to use ifelse:

[desktop]

You can nest these if you wish, as shown below. The use of two lines and the spacing is a matter of personal preference; they are not required.

Using the numeric values of variables in computations

It can be more convenient to refer to values rather than labels when doing computations. But there's a good way and a bad way to do this. I'm going to start with the bad way because it is an obvious (but not the smartest) approach for many people new to writing code using R (particularly those used to SPSS).

Bad approach

The example below uses as.numeric to convert the categorical data into numeric data. A value of 1 is automatically assigned to the first label, a value of 2 to the second, and so on. These values will not necessarily match the values that have been set in the raw data file. For example, if the data file contains values of 1 Male and 2 Female, but no respondent selected male, then the value of 1 would be assigned to Female.

Better approach

The safer way to work is to click on the variable set, and then select a numeric structure from Inputs > Structure (on the right side of the screen). For example, you would change the age variable to a structure of Numeric. Or, better yet, first duplicate the variable (Home > Duplicate), and then change the structure of the duplicate so that the original variable remains unchanged.

In my example, the age variable in the data has midpoints assigned to each category (e.g., 21 for 18 to 24, 27 for 25 to 29, etc.). You can see these by clicking on the variable and select DATA VALUES > Values on the right of the screen.

Subscripting

An alternative approach to recoding is to use subscripting, as done below. Why this works is actually a little complex -- but it does work!

[desktop]

[mobile][/mobile]

Mathematical operations on categorical variables

This next approach is a wonderful time saver, but is a little harder on the brain.

Earlier we looked at recoding age into two categories in a few different ways, including via an ifelse:

[desktop]

The code below does the same thing. Let' unpack it:

`Age 2` is the numeric version of age, created in the way described in the previous section.
`Age 2` >= 40 creates a variable with a TRUE value for people with an age of 40 or more, and FALSE for people under 40.
+ 1 adds a 1 to the TRUE and FALSE values. This may seem odd, but it is a standard thing in computing: when you use a TRUE or a FALSE in calculations, the TRUE is treated as a 1 and the FALSE as a 0.
The parentheses tell us to first compute the TRUE and FALSE. Without them, the analysis would then be checking to see who is aged 41 or more.

This next example can be particularly useful. This code creates 18 categories representing all the combinations of age and gender, where:

as.numeric(Age) converts the categorical variable into numeric values, as described above in the "bad approach" sub-section. This means that the youngest category gets a value of 1, the second as 2, etc.
max(as.numeric(Age)) * (as.numeric(Gender) - 1) assigns a value of 0 to Males and 9 to Females, where the 9 is the number of age categories.
By adding the two together, we get values of 1 through 9 for the age categories of males, and 10 through 18 for females.
If your goal is to create a new variable to use in tables, a better approach is Insert > New Banner.

Returning to our household structure example, we can write it as:

Debugging

When you insert an R variable, you get a preview of the resulting values whenever you click CALCULATE. However, if doing anything remotely complicated, it is usually a good idea to:

First check the code by creating an R OUTPUT (Insert > R Output), as these are better for debugging.
Click on the R Output and check Inputs > OUTPUT > Show raw R Output, which will show all the steps in processing the code, line by line
Use R functions like summary and table to show the values of intermediate calculations, as shown in the example below.

How to Band Numeric Variables in Displayr

Matt Steele — Tue, 25 Jun 2019 02:13:52 +0000

Let's say you are asking survey respondents for an absolute number (eg: how many colas have you consumed in the past week?) or a point on a set scale (eg: what proportion of staff are female? Please type in a number from 0 to 100). It’s not uncommon to want to band up the range of potential inputs into categories (eg: 0-5, 6-10, 11+) for analysis purposes.

The purpose of this article is to show you the options you have for creating a banded (categorical) version of a variable, using both drag-and-drop and code methods (R and JavaScript).

Checking the Variables: Structure and Values

These variables I just described are normally read into Displayr as numeric variables. That is what you would expect of a good data collection platform that only accepts a numeric input. For numeric data, the values and labels are one and the same. A variable’s structure is indicated by the icon next to the variable in the Data tree, but also in its Object Inspector under INPUTS > Structure.

Displayr reads these variables as nominal or ordinal if there is text involved with the value label (eg: 0 – Not at all satisfied and 10 – Extremely Satisfied are the endpoints of your scale). If that is the case, it is prudent to check the Values so that they align correctly with the labels. The Values button is just under the Structure dropdown in the Object Inspector as per the picture above. You don’t want a value of 1 ascribed to 0-Not at all satisfied and so forth (it should be a value of 0). You should change it so that the correct value aligns with the label.

Displayr will interpret these variables as text if there are spaces or other characters involved in the data. This is one key reason why Excel/CSV files are a poor file format for survey data. If you have it as a text variable, you can change the variable structure to numeric, but it can’t be guaranteed that all non-numeric information will be correctly converted into numeric values. So you may need to manually format and clean your text variable in the Excel/CSV file (ie: remove all non-numeric characters that could be ‘polluting’ the variable).

Banding by drag-and-drop

Banding via drag-and-drop is the easiest way to band your variable and makes the most sense if you are unlikely to update your data file with new data.

If your variable(s) is numeric use Home > Duplicate to create a copy of the variable and then change the copied variable set structure to be nominal (or ordinal). As per the picture above, you change the structure in Object Inspector > INPUTS > Structure.

Drag your categorical variable on to the page to make a table. Then select all the categories you want to band together (using Ctrl or Shift) and use Data Manipulation > Merge to merge them into a category. At this point, it’s prudent to use Data Manipulation > Rename to give the banding a correct label.

And that’s it! The main drawback to using this method is that if you update your data file with fresh data (eg: more respondents) then you may end up with a category that is not in a band. For example, if you band up 1,2,3,4,6,7,9, and 10 into a band "1-10" using drag-and-drop and then if you update your data file, you may have news cases which provide a 5 or 8 score. These new values have not been included in the "1-10" band, and you’ll have to manually merge them in (by repeating the process above). To get around this, use one of the code based systems below.

Banding via R variable

Banding with code is a sure-fire method to ensuring that all potential values within a range will end up in the correct bands. It’s very simple code to implement and doesn't require extensive R knowledge (just copy the template below). You can flexibly change the band later by tweaking the code.

Suppose your variable has the label Q2. Number of Coca-Cola consumed? It will also have a variable name (Q2_a). You can use either the variable label or name in Displayr. The variable name is revealed in the Object Inspector > Properties > General and also by hovering over the variable in the Data Set tree.

Insert > R Variable

Over in the R CODE window in the Object Inspector you can write a simple IF and ELSE IF statement. What I like about R CODE is that you can drag a variable from the Data Sets tree directly into the R Code box, and give it the convenient label ‘x’ (or whatever). So the first line of code looks like: x = `Q2 - No. of Coca-Cola consumed`

Then you can easily set up your bands referring ‘x’, like the below:

x = `Q2 - No. of Coca-Cola consumed
x[x = 0] = 0
x[x >= 1 & x <= 5] = 1
x[x >= 6 & x <= 10] = 2
x[x > 10] = 3
x

You can augment and adjust the code above to suit your banding needs, adding as many lines as you like. See here for a guide to IF and ELSE IF statements in R.

In the above, I've given the new bands values of 0,1,2, and 3 respectively. You could make these whatever you want. Once you've made the variable you will need to adjust the label for each Value, which you can do by going to the Values Attributes window using the Values button in the Object Inspector.

Banding via JavaScript

For those of you who prefer to use JavaScript, the process is very similar to the R variable, except of course you use JavaScript, so the code is a little different. It uses IF and ELSE IF statements. In the above, you can drag-drop and/or us the variable label in the code. With JavaScript, you need to use the variable name, as per the first line of the code below. The variable name is revealed in the Object Inspector of the variable (under Properties) and by hovering your mouse over the variable in the Data Set tree. So your code could look like this:

x = q2a_1
if (x == 0) 1;
else if (x >= 1 && x <= 5) 2;
else if (x >= 6 && x <= 10) 3;
else if (x > 10) 4;

Try for yourself

The examples in this post are in this Displayr document. The variables are at the top of the Data tree.

How to link images to a visualization in Displayr

Matt Steele — Sun, 23 Jun 2019 23:29:07 +0000

Consider the bar charts in the image below. When the data is set to automatically sort rows by decreasing order, brand orders can change when a filter is applied to the bar chart. The point of interest here is that the logos (which are six separate objects that sit adjacent to the visualization) also dynamically update to reflect the order. For example, the second field updates to switch from the Coke Zero logo to the Pepsi Max logo. I'll use this as a worked example in this post.

Preparation: Getting a URL for each image

We are not going to be working with images you insert via Insert > Images from the Ribbon because those are static images. Responsive images actually sit within an R Output (Insert > R Output) that can only read images from a URL location. So the first step is to get the URLs for all images.

There are several ways to do this. Some Displayr users might have a network or shared server that can generate URLs. You can use a cloud-based app like Google Photos or Dropbox to generate links.

For instance, I used the image hosting service imgur to make my URL, which generated a URL that looks like this: https://i.imgur.com/mrIpR63.png.

In Displayr, it's useful to have a pasted table of all your URLs alongside the brand name. We will later refer to that table when we write the code. As this table will be a reference page, I suggest putting the table on a new page and hiding it (from Viewers). The following steps will do this for you:

Insert a new page by going to Insert > New Page > Title Only from the Ribbon
Insert a table by going to Insert > Enter Table from the Ribbon
Select the blank table object on the page, and go to Object Inspector > Inputs > DATA SOURCE > Paste or Type Data
Enter your table in the spreadsheet with your URLs. The one I did in my example looks like the below:

Click OK (Note: the name of the table in this example is table.output).
Optional: Give the page a title like "Image Reference," and then select the page in the Pages Tree and hide it by going to Appearance > Hide from the Ribbon.

Create a merged table of data and image URLs

The next step is to line up the data that will feed into the visualization alongside the image. We ultimately want to achieve a merged table, like the one in the image below. The table has the data for the visualization in the first column (the %'s in this case) alongside the corresponding image URL. There are several ways to do this, and my example is just one method. You may like to write your own R code to make a custom table. What matters is that you have matched the item (brand), the data (% in this case), and the correct image URL.

First, I made a table of my source data, which was created by dragging the Preferred Cola variable onto the page to generate a standard summary table. The table is named table.Q3.Preferred.cola.

I created a merged table of my source data and images and matched it up by brand. To do this, I used Home > Tables > Merge Two Tables. Over in the Object Inspector, I nominated the tables to merge. In this case, it was table.Q3.Preferred.cola and table.output. I selected Side-by-Side and Matching Only as my options. Because I want the R Output to sort the data from highest to lowest, I added some lines of code to the merged object. I did this by selecting the merged table, going to Properties > R CODE in the Object Inspector, and adding this final line: merged[order(merged[,1], decreasing = TRUE),]. If you're not familiar with this process, I recommend reading the blog post How to sort your data with R in Displayr.

Finally, in my example, I extracted the first column of the merged table because the visualization needs it separate from the URL text. I did that with the following code within another R Output (Insert > R Output). The name of this output is by default sorteddata.


sorteddata = as.numeric(merged[,1])
sorteddata = as.data.frame(sorted_data)
rownames(sorteddata) = rownames(merged)
sorteddata

Create your visualization

Now insert your visualization and link it up to the data table. In my example, I selected Insert > Visualization > Bar Chart. I hooked it up to the R Output sorteddata. If you're not sure how to set up a visualization, I recommend reading How to Create a Bar Chart in Displayr (noting the Visualization section is the relevant section, not the Charts section).

Create holders for your images

Now, create another R Output, and then use the following code. You will need to change the references in the first three lines. The first line tells us which order the item is (in this example, it is the first). The second and third line reference the merged R Output, including the column that has the URL. If you're not familiar with table subsetting and referencing with R Output, I recommend reading How to do Simple Table Manipulations with R in Displayr.

item = 1
src = merged[item,"Src"]
alt = rownames(merged)[item]
text = paste0(
    '





')

rhtmlMetro::Box(text,text.as.html = TRUE)

Next, resize your R Output. The image is set to responsively adjust in size to fill the R Output container (that's why there is lots of HTML code cited within the R code above!) Then duplicate your R Output (Home > Duplicate) and adjust the first line as necessary (eg: item = 2, item = 3, etc) in each successive R Output. Finally, align them next to the visualization. To see the dynamic behavior in action, select the original data table, table.Q3.Preferred.cola, and apply a filter to see the visualization and linked images update accordingly.

Try for yourself

The worked example can be found in this Displayr document.

How to Dynamically Change a Question Based on a Control Box

Matt Steele — Wed, 19 Jun 2019 01:11:27 +0000

The two main types of control boxes are the combo and the list box. Typically they are used for changing how the data is filtered, as discussed in this post. But you can also use a control box to change the actual question in a table (or chart, visualization, etc.). You can also use control boxes to change the weighting you want to apply.

For example, the image below shows a question (Preferred Cola) that I've chosen to split by income brackets, using the selection in the control box.

If I change the control box option to Age, it becomes:

You can do this with an R variable. The R variable dynamically updates when the selection in the control box changes. The purpose of this post is to show, via example, how you can do this.

Setup your control box with your options

Use Insert > Control and then choose either a Combo or List box. Over in the Object Inspector, list your questions in CONTROL > Item List (which can be labeled however you like). In this example, I entered 4 possible options for a combo box:

I set the Selection mode to be "Single selection," and When item list changes to be "Select first."

Be sure to take note of the control box’s name under PROPERTIES > GENERAL > Name, because we’re about to use this in the R variable.

Changing single-variable questions via your control box

Next, you will need to create an R variable with conditional statements that link to the questions via Insert > R > Numeric variable. This will make a new numeric variable under Data Sets, creatively called “newvariable” by default. Displayr will reveal in the Object Inspector a blank box where you can put in the R CODE:

As per the picture above, you enter simple conditional statements with R. Basically, it references the control box (called Combo.box in this example) and then each of the 4 options. The four variable names -- d1, d2, d3, and d4 -- pertain to each of the single-variable questions to use in the table. The code consists of very straightforward "IF and ELSE IF" statements.

Be sure to change the variable Structure to be nominal or ordinal (if you intend for the question to be categorical). This is done under INPUTS > Structure in the Object Inspector for the R variable (in the picture above at the very bottom under the code).

And that’s it! From there you can use your R variable in a table, directly in a visualization, or in another analysis. It will change dynamically as you alter the selection in the control box.

Changing multiple-variable questions via your control box

When working with multiple-variable questions, it may be possible to use the same approach of using 'if/else' code for each variable in your variable set, but there are some provisos:

Your variables must be set together as either a Binary – Multi or Number – Multi, as applicable.
You should have the same number of variables for the questions that are to be substituted.
The variable labels should be applicable for all questions, as these can't dynamically change.

When the number of variables and/or variable labels are different between the questions you wish to dynamically change via a control box, it is better to substitute tables instead. The steps are as follows:

Create separate tables for each of the questions listed in your control box, drag them off your page and select Appearance > Hide from the ribbon.
Create an R output via Insert > R Output that selects which table to choose based on the table name (found under PROPERTIES > GENERAL > Name) and the control box selection:

if (Combo.box == "Awareness") table.D1.Age.by.Awareness else
if (Combo.box == "Preference") table.D1.Age.by.Preferred.cola

In the above example I have 2 control options that switch between 2 tables, one 'Age by Awareness', the other 'Age by Preferred Cola'. As the final output is a visualization, I've also hidden this R output and dragged it off the page.
Once you update the visualization's output reference under Inputs > DATA SOURCE > Outputs in 'Pages' to this R output, you will then be able to dynamically control the data shown:

Changing the weighting dynamically with an R variable

You can apply the same technique to dynamically change the weighting. You essentially reference different weighting variables in the R code based on your selection in the control. For example:

if (Combo.box == "USA") weight_us else
if (Combo.box == "France") weight_fr else
if (Combo.box == "UK") weight_uk

Then make sure the R variable has the Usable as weight box checked in the Object Inspector. You can then apply that to a table (or chart or whatever) as your weighting variable.

Try for yourself

The above example is captured in this Displayr document. The R variables are the first two variables in the Data Set.

Get started!

How to Switch Logos and Images Based on User Selections

Chris Facer — Mon, 27 May 2019 01:30:28 +0000

Displayr's Conditional Image visualization allows you to add images to your document which change when the data changes in response to filters or other interactive components on your page. For instance, you could display a thumbs-up when your result is higher than expected, or a thumbs-down when your results are lower than expected. In this article, we are going to use this same tool to switch between different brand logos based on your viewer's selection.

For this to work, you need three components:

An interactive menu on your page which lets you choose the brand
Some R Code which translates the menu selection into a number
A Conditional Image which changes logos based on the numbers

It's important to remember that all of these elements need to be on the same page. Displayr's interactive features work on a page-by-page basis when your document is published as a web page.

Step 1 - Create your menu

If you've been designing a dashboard which can show results for one of several brands based on a menu selection, then you probably already have this. If that's the case, then jump to Step 2 below.

If you are starting out from scratch, you'll need to set up a combo box or a list box to your page. These two types of menus will both do the same job - the difference between them is how they look on your page.

Select Insert > Control (More) > Combo Box or List Box.
Click into Control > Item list in the Object Inspector on the right of the screen, and enter the list of brands that you want to be able to switch between. Each item should be separated by a semi-colon (;).
(Optional) Change the formatting options in the Control section (e.g. fonts, colors, etc).
Click into Properties > General > Name, and change it to "brand.switch". It doesn't matter what you call it, so long as you remember this name for the next step.

In this example I will use three brands: Coke, Diet Coke, and Pepsi.

Step 2 - Translate your brands into numbers

The conditional image visualization tool chooses which image to display based on a numerical value. Meaning that all we need to do is choose a number for each brand, so that our visualization knows which image to show.

On the same page, select Insert > R Output.
Paste in the code below.
Click Calculate.
Select Appearance > Hide. This prevents the number from showing up with the document is published.

The code you need to use is like this:

brands = c("Coke", "Diet Coke", "Pepsi")
current.brand = match(brand.switch, brands)

The first line of the code lists the brands in the same order as they appear in your menu from Step 1. The second line of code looks up the position of the selected brand in the list. So if the user selects "Coke" the value will be 1, if they select "Diet Coke" the value will be 2, and so on.

Now, if you change the menu selection, you should see the number in the output update itself.

Step 3 - Create your image

The final stage is to create the conditional image visualization, and then connect it to the number above in Step 2. Importantly, your images must be specified by URLs. That is, they need to be hosted on the web somewhere, and you need to copy in the links.

Select Insert > Visualization > Conditional Image.
Change Inputs > DATA SOURCE > Data source to Use an Existing R Output.
Click into Inputs > DATA SOURCE > Input data and select the R Output that you created in Step 2 above (in this example it is called current.brand).
Change Inputs > OUTPUT > Image type to Custom Images.
Paste the URL of the image for the first brand into Default image.
Change Threshold 1 to the number 2
Paste the URL for the second brand logo into Image 1.
Change Threshold 2 to the number 3
Paste the URL for the third logo into Image 2.

Here is the appearance of my settings for this example:

To see a very basic example of the finished product, click this link. To get a copy of the original document so that you can see the code and other options that I have used in this post, click here.

How to Remove a Row or Column using R in Displayr

Matt Steele — Mon, 27 May 2019 00:30:34 +0000

In doing so, you end up with tables within R Outputs. These R tables cannot be manipulated with the Data Manipulation techniques in the Ribbon, as these buttons are designed for tables that you build from variables in your Data Sets. Tables you make with R you will need to manipulate with R.

Consider the table below that is within an R Output. (It has been generated by subtracting the scores between two sources tables: the scores for males in one table minus the scores for females on a similar table). What's important here is that in the output the “None of these” row and the NET row/column have carried over. We may want to remove them in our final R Output:

One way to accomplish this is to go back to the source tables, and remove them there (without the need to fiddle with any R). But there are situations where you don’t want to change the variable set and/or perhaps your scenario is such that you can’t change it. The good news is that removing a row or column from your R outputs is very easy to do with just 1-2 lines of additional code. In this post, I’ll demonstrate how you can use some code to do this two ways:

Specifying the rows/columns to remove by index
Specifying the rows/columns to remove by name

The second one is likely the most useful of the two because often we want to remove a particular row/column than the 1st, 8th or last row/column.

Note: If the terms subsetting and index are unfamiliar to you, I suggest reading this introductory post: How to do Simple Table Manipulations with R Using Display. In all of the below, the name of the R Output we're referring to is "table".

Specifying the rows/columns to remove by index

Let’s say you wanted to remove the “None of these” and the “NET” row. A simple way to do it (provided the order of your rows isn’t likely to change) is to just specify the rows you want to keep:

 table[1:6,]

But you could also use a minus sign (-) and then specify the rows you don’t want to keep. So in this alternative, we’re saying we “don’t want the 7^th and 8^th row".

 table[-(7:8),]

This is all very well and good, but it becomes a bit problematic if the ordering of your rows changes. With an update to the data, the NET suddenly becomes the 9^th row. Perhaps then you’re better to specify the labels for the rows, as per the next section.

But there is one more trick you can do with specifying by index, and that is you get it to remove the last couple of rows. In this case, the code is as simple as:

n = nrow(table)
table[-((n-1):n),]

Here we’re getting the code to first calculate the number of rows, and storing that as n. Then, in the subset on the next line, we’re asking it to NOT return the second last to the last row (ie.. remove the last 2 rows). I could have put the above all on one line, but I think it's easier to see what's going on with the n.

Specifying the rows/columns to remove by name

If you change the source tables (e.g. by updating the data to add, subtract, or sort rows/columns), then the ordering of the R Output may be out-of-date, and so we could end up removing the wrong row. I want to be confident the updates to my R Outputs will be accurate and correct. For that reason, I prefer to specify the names of the rows or columns I'd like to remove. To do this, I use the function setdiff() which figures out what to retain (i.e. what remains after you specify what to drop).

x = setdiff(rownames(table),c("None of these","NET"))
y = setdiff(colnames(table),"NET")
table[x,y]

Let me break it down for you:

On the first line, the setdiff() function calculates the difference between all the row names in the original table, and the array of labels I’ve specified using the combine function(). So the remainder is just the six brands. I’ve stored this array of 6 brands as x.
Likewise, I’ve done the same for the columns, storing it as y. Because there’s only one ("NET") I didn’t need to use the combine function c() when inputting it into the setdiff() function.
And then one the third line, I’ve asked the R Output to subset the table by x and y respectively.

Try for yourself

The examples above can be found on this Displayr document here.

How to Set the Initial Zoom and Position of Geographic Maps

Liz Kucko — Tue, 23 Apr 2019 03:12:05 +0000

Create a geographic map to your liking using the leaflet map package in Displayr

Follow instructions on How to Make a Geographic Map in Displayr to create a map and hook up your data to the map. The example I will walk through maps the percentage of food inspections that passed in each Chicago zip code between 2010 and June 2018 (original data found here).
Select a few more settings to make the map look nice. For my map I am also selecting the following in the Object Inspector:
- Use the leaflet map package using Chart > APPEARANCE > Map package > leaflet.
- Only show "Pass" rates by setting Inputs > COLUMN MANIPULATIONS > Number of columns from left to show as 1.
- Show background map for context by checking Chart > APPEARANCE > Background map.
- Set the color of missing regions to transparent by selecting Chart > APPEARANCE > Color of NA values > More colors > white box with the red X.
- Set the shading so that darker reds mean a lower pass rate using Chart > DATA SERIES > Color palette > Reds, dark to light.

The map below is created using the steps above.

You can't see any of the data for Chicago in the map initially. To help the viewer see where the data is, we will zoom the map into Chicago automatically for them.

How to customize the default area shown on the geographic map

You'll need four bits of information before we get started:

For the code	Description	Value in example
YourVisualizationName	The Name of your map in Dispayr. You can find this by going to Properties > GENERAL > Name	chart.5
TheLongitude	The longitude for the center of the map	-87.6298
TheLatitude	The latitude for the center of the map	41.8781
TheZoomLevel	The zoom level from 0 (the world) to 18 (street-level)	9

We will use this information in our R code to customize the initial appearance of the map.

Access the R code underlying the map Properties> R CODE
At the bottom of the R code, load in the leaflet R package with library(leaflet) to get access to the full set of functions to customized the leaflet map.
Add another line after that, to set the initial position and zoom of the map using the setView function from the leaflet R package. The syntax is as follows: YourVisualizationName <- setView(map = YourVisualizationName$htmlwidget, lng = TheLongitude, lat = TheLatitude, zoom = TheZoomLevel). For this example, this will be:

[sourcecode language="r"]
library(leaflet)
chart.5 <- setView(chart.5$htmlwidget, lng = -87.6298, lat = 41.8781, zoom = 9)
[/sourcecode]

After your map recalculates, it will now automatically be zoomed into your desired area. The final map from my example is shown below.

Tips for positioning the initial map

You can tweak your initial zoom level by using decimals, such as 1.5.
To find the latitude and longitude to center your map you can:
1. Click on a blank spot on a Google map. A small window at the bottom of the screen will display the coordinates.
2. Google "YourCity/State/Country lat long", and use coordinates from the results.
3. Use a website such as https://www.latlong.net/ to find the coordinates.

Try creating one yourself with our interactive Geographic Maps tutorial

Check it out here

How to Fit a Structural Equation Model in Displayr

Tim Bock — Thu, 21 Mar 2019 07:09:20 +0000

In this post I am going to walk through the steps of fitting a structural equation model (SEM) in Displayr. The post assumes that you already know what a SEM is and how to interpret it.

Case study

In the post I am going to analyze Bollen's famous Political Democracy data set (Kenneth Bollen (1989), Structural Equations with Latent Variables, Wiley.)

Step 1: Load the data

Typically data sets are loaded into Displayr from raw data files. But, in this case we will load some data that is stored in an R package.

Insert > New Data Set > R
Name: BollenPoliticalDemocracy
Paste in the code below into the R CODE box
Click OK

Step 2: Fit the model

The hard step is fitting the model, as this requires you to specify the measurement model, the relationships to be tested (i.e., the regressions), and the correlation structure of the model. For more information about this, please check out the lavaan website.

To do this:

Insert > R Output
Paste in the code below
Press Calculate

Step 3: Review the path diagram

In order to check that the model has been correctly specified it's a good idea to review the path diagram.

Insert > R Output
Paste in the code below
Press Calculate

Step 4: Extract the summary statistics

Insert > R Output
Paste in the code below
Press Calculate
In the Object inspector, on the right of your screen, click Properties > OUTPUT > Show as > Text
To align the text neatly, go to Properties > APPEARANCE and set the font to Courier New.

How to Chart Web Traffic using Google Analytics and Displayr

Oliver Harrison — Mon, 28 Jan 2019 23:34:10 +0000

The R package googleAnalyticsR has been built specifically for R users using the Google Analytics Reporting API v4. I have previously outlined the best authentication process between Displayr and the API (see How to connect Displayr to the Google Analytics API for more details), but will do a quick re-cap. Essentially what we need to do is log into Google Analytics and set up a Google project and service account then download a secret JSON key containing authentication credentials which we push through to the API via R code.

Once authentication has been set up in your R Output via Insert > R Output (Analysis Group), the next thing we need is the View ID of the website you want to pull data from. To determine your View ID, ensure you are logged into your Google Analytics account under the specific website you want to view (if you have multiple sites monitored), click Admin on the bottom left, go to the View column and click View Settings. The View ID will be visible under Basic Settings.

Call the API

In the below example, I will call four different metrics - users, new users, sessions and page views – from my website for all records last quarter split by date:

library(googleAnalyticsR)

view_id = XXXXXX # replace this with your View ID

df = google_analytics(view_id,
            date_range = c("2018-07-01", "2018-09-30"),
            metrics = c("users", "newUsers", "sessions", "pageViews"),
            dimensions = c("date"),
            max = -1)

Here I have used max = -1 so that it will pull all the data, but you can also cap this at a specific number if you wish. Once you press Calculate, you will see a result with a structure like this:

If you also want to make this call take place daily at a specific time (e.g. 9am), we can add the below lines to the top of the R Output:

library(flipTime)
UpdateAt("01-11-2018 09:00", units = "days", frequency = 1, options = "wakeup")

Data Sampling

It's important to note that for standard Google Analytics accounts, data sampling occurs when you reach the limit of 500k sessions at the property level for the specified date range (see data sampling) in order to fetch results faster. This means that if your API call is requesting more than 500k rows, some of the rows will be estimates rather than measured values. Of course, the number of records requested by an API call will depend on the popularity of your site, the specified date range, and other factors.

If you are using a wide date range it may be prudent to split the date ranges into separate calls so as to avoid hitting the session sampling threshold and then combine them together later using a simple rbind command, for example. You can easily compare the outputs with those produced by Google Analytics to ascertain the correct split.

Another option is to use the anti_sample = TRUE setting in your API call, but it won't work in every situation. If you click Show raw R output under OUTPUT on the Object Inspector when using this option, you can read logs outlining how much sampling is taking place. By default, anti-sampling already exports all records so you don't need to set a value of max. Not using anti-sampling will also allow you to use date shortcuts such as "90daysAgo" or "yesterday" for both start and end date. Otherwise, you will need to specify the exact dates. For a list of all the metrics and dimensions you can call via the API, see API names.

Visualize your data

Now that we have the data as a table, we can hook this up to one of Displayr's cool visualizations. I have chosen the area chart (Insert > Visualization (Analysis group) > Area Chart) which is essentially a line chart with the background colored in. I just need to select the R output under DATA SOURCE > Output in 'Pages' on the Inputs tab of the Object Inspector and change some settings.

First, I will tick Show as small multiples (panel chart) to split this into separate charts for each metric, then I will add a smoother line on the Chart tab under TREND LINES > Line of best fit. I've chosen Friedman's super smoother, changed Line type to dot and ticked Ignore last data point.

You can make an area chart for free using Displayr's area chart maker! Plus now that you know how to link up Google Analytics to Displayr, you can use your own website data!

How to Blank Cells with Small Sample Sizes using R in Displayr

Matt Steele — Fri, 18 Jan 2019 04:35:03 +0000

In this post, I explain how you can automatically modify the contents of tables using a secondary R Output. In doing so, we give you a template for some simple R code that you can flexibly use whatever your scenario.

Cell modification with R, a recap

In "How to Blank and Cap Cells of Tables Using R in Displayr", I explained how you can modify the cells of a table in an R Output by using a condition. The condition then becomes the subset of the table you are modifying. It works like this:

table[condition] = value

In English, the square brackets specify a subset of a table. When the condition evaluates to TRUE, then we're manipulating just that subset of the table. Using the equals sign, it sets that subset to be equal to a new value. In the case of blanking cells, that value is NA (which stands for a missing value).

Note: In either case, you need to put in an extra line of code, which is just ‘table’. This returns the final table with the substituted values (and not just the value). This line is included as the line of code in the examples below.

How to blank cells with small sample sizes

Now, to get R to blank a table with small sample sizes, the code needs to reference the sample size for each figure. There are a couple of different ways to give this information to R. I cover one way below and describe an alternative at the end of post.

I like to have a source table that has both the values and the sample size within each cell. In the grid summary table below, I’ve specified both % and Base n as statistics.

This table has the name (table.Q5). Putting the following code in an R Output (Insert > R Output) will blank all the cells with a base n less than 75.

x = table.Q5
y = 75
values_tab = x[,,"%"]
base_tab = x[,,"Base n"]
values_tab[base_tab < y] = NA
values_tab

The first line is specifying the source table. The second line is specifying our threshold for small sample size. The third line creates a table that only has the values (% in this case). The fourth line produces a table of just the base. This is the basis of the condition (next line). The fifth line is the key that pulls it altogether. It basically says "if the base is less than the threshold of 75 in the table, then substitute with a missing value (NA)". The sixth line just returns the new table of values (freshly substituted). So the end result is the below:

Adapting the code - having a separate table of values and base size

If you’re borrowing the above code, be sure that you’ve got the correct statistics in the source table. For example, the base n in a cross-tab is different from the column n. The column n is what you use to derive column-%’s. Remember, in multi-variable questions (such as a Pick Any), the base n or column n could vary by row (or column). In the worked example above, each % in the cells of the source table was a separate binary variable (grouped into a Pick Any - Grid), so had its own base n.

You don’t have to use just one source tab to house all your reference statistics. You could have the statistics in separate source tables, but you’d need to adjust the code accordingly, a bit like the below (where lines 1 and 2 refer to different tables in the document).

values = table.Q5
base = table.Q5.base
y = 75
values[base < y] = NA
values

Be aware that the tables need to overlap exactly in terms of the order of their rows and columns. That’s why I prefer to use just the one source table (and extract what you need from that) wherever possible.

And of course, you can fiddle with the code to produce a different outcome. For instance, you can set all the cells to 0 instead of NA if you prefer.

Try it yourself

The worked example is in this Displayr document, so you can see the code in action.

How to Sort your Data with R in Displayr

Matt Steele — Tue, 15 Jan 2019 06:07:53 +0000

But there may be situations when custom automatic sorts will require you to fiddle with the underlying R CODE. Below, we discuss a couple of examples showing how you can add a line of R code to your R Outputs to get them sorting automatically. We hope to shed light on the one line of code needed, so you can then adapt it to your needs. Make sure you check out "How to do Simple Table Manipulations with R in Displayr" if you haven't yet, as this post assumes some knowledge from that post.

How you can sort data in Displayr without touching any code

For many of the R-based features in the Insert menu (mainly Visualizations), we’ve actually got the option to sort rows within the Inputs panel of the Object Inspector. So, the R Output interprets the source table as though it’s being sorted before the output is actually drawn.

When you may like to sort data via R code

One scenario where you may need to get into the R CODE to do the sorting is when you’re making your own custom table in an R Output. Examples might include a table that’s a KPI summary, a brand index matrix or any calculation/compilation. You only need to add a line of code at the end to keep the table sorted automatically. For example, consider the table below, which is the brand funnel built by R Code (as explained in this post).

By including line 7 in the code used to build the table, it will sort automatically.

Another scenario is that you’ve used one of Displayr's built-in tools for joining tables (such as Home > Tables > Merge Two Tables), and you want to sort the final output. You can do that by going to Properties > R CODE in the Object Inspector of the output. For example, the table below was created using the menu item Insert > Tables > Merge Two or More Tables:

And then by going into Properties > R CODE in the Object Inspector, I added line 5 below. Notice what happens to the output:

Understanding the magic line of R Code

The R Code looks complicated, but once you break it down, the logic of it isn’t that hard to get your head around. It just looks convoluted. The basic example (which you can use as a template) for a crosstab looks like this:

table[order(table[,column], decreasing = TRUE),]

Note that “table” is the name of the table (data frame or matrix in R lingo) you wish to sort within the R Output and “column” is the column you’re referencing. I put them in blue so it stands out that these are the key bits you need to adapt.

The first bit to understand is that you can give an array of indexes to R via the square brackets and it will sort the table for you. Let’s say, I had the following which is from a table with a reference name of tabQ3:

The order of indexes of the rows from highest to lowest is 7,1,3,6,2,4,5

We feed that as an array in a table subset (with square brackets). I use the c() combining function to put the numbers together.

table = tabQ3
table[c(7,1,3,6,2,4,5)]

So how then do we get that list of indexes without doing it manually as I did above? With the order() function. The combining function c(7,1,3,6,2,4,5) is the same as writing order(table, decreasing = TRUE). Putting that into the table subset, it then becomes table[order(table, decreasing = TRUE)]. Yes, I know there are brackets within brackets of different types. You need the decreasing = TRUE bit otherwise R will sort in ascending order (which you may want).

The above example is with a single-column table, so it's one dimensional. If you have two dimensions, then you need an extra comma when you reference the table (if that doesn't make sense, then check out this introductory post). The below sorts a crosstab of Preferred Cola (rows) by Age (columns) on the first age category. The first line of the code is simply to store the reference as an object called 'table' within the R Output.

table = table.Q3.Preferred.cola.by.D1.Age
table[order(table[,"18 - 29"], decreasing = TRUE),]

As I mentioned earlier, to someone new at R, line 2 of the code seems convoluted. But hopefully, my step-by-step explanation of subsetting a table by means of an array of indices untangles this for you. Remember, you can source this line of code and adapt it to your context.

Test yourself: how would you sort the same crosstab above by rows instead? Say by Coca-Cola?

(Answer = table[,order(table["Coca-Cola",], decreasing = TRUE)]

Have a look for yourself

In this Displayr document, I’ve got the worked examples from above. So you can go in and have a look (and a play!)

How to Use Your Twitter Data in Displayr

Tim Ali — Tue, 08 Jan 2019 05:06:29 +0000

To access data from Twitter, you first need to set up a developer account. This is in order to generate the access tokens and keys you'll need. Once you have these you plug them in your API calls into Displayr using R code.

Setting up a Twitter developer account

Before you can use the Twitter API, you must first apply for a Twitter Developer account. Once set up, twitter generates the necessary access tokens. To apply, go to the Twitter Application Management page and click Apply for a developer account. A form then asks you to provide some information describing your specific use case of the Twitter API and other related information. Complete all of the required forms and submit the application.

Once you're application has been approved, log in to the Developer Platform, select the drop-down menu from your account name in the upper right-hand corner and select Apps. Next click the Details button for your App and then select the Keys and tokens menu. Here you'll find your consumer API key, consumer API secret, access token and access secret.

All of these are used to authenticate each of the API calls you make.

Setting up Authentication in Displayr

Each individual API call you make from Displayr requires authentication. To do this, we first create an R output by selecting Insert > R Output. We will use the twitteR and ROath packages for authentication. These packages have already been installed on the Displayr R server, so we only need to load these libraries by entering the following into the R CODE section of the Object Inspector.

library("twitteR")
library("ROAuth")

Next, we store each of the access tokens and keys in a separate object.

 
consumer_key = 'your_consumer_key' 
consumer_secret = 'your_consumer_secret' 
access_token = 'your_access_token' 
access_secret = 'your_access_secret'

We then use the setup_twitter_oauth() function to authenticate using the above stored token parameters.

 
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

Click the Calculate button and if the authentication is successful, the R output will display "Using direct authentication". This authentication script can now be used for each API call.

Example API calls

Now that we have built a functioning authentication script, we can execute the API calls. There is an extensive Twitter API library containing functions for retrieving user and account information, searching and curating tweets, direct messaging, engagement and media and ad tracking. A complete list of available API's can be found on the Twitter API Reference page.

One of the most commonly used API's is the searchTwitter API function which will execute a search of Twitter based on a supplied search string. Note that there are limits as to what can be searched, so this search may not return all possible results. First select Insert > R output and then enter the following to search for Tweets about the new Bohemian Rhapsody movie.

library("twitteR")
library("ROAuth")

consumer_key = 'your_consumer_key' 
consumer_secret = 'your_consumer_secret' 
access_token = 'your_access_token' 
access_secret = 'your_access_secret'

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

search.string = 'Bohemian Rhapsody movie'
no.of.tweets = 100
tweets = searchTwitter(search.string, n=no.of.tweets, lang="en")
search.df = twListToDF(tweets)

The code starts with the authentication script. Then we supply the search parameters: the string to search for and the desired number of tweets we want. There are several other parameters that can be passed to the search function, such as date ranges, user location, language, etc. A complete list of available arguments can be found in the twitterR cran package documentation. These arguments are then passed to the searchTwitter function which executes the search. The search results are returned as a list. The last line in the above example converts the list into a more readable data frame object.

The userTimeline function is another common API that is used to extract Tweets from a supplied user timeline.

library("twitteR")
library("ROAuth")

consumer_key = 'your_consumer_key' 
consumer_secret = 'your_consumer_secret' 
access_token = 'your_access_token' 
access_secret = 'your_access_secret'

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

tweets.user = userTimeline("displayrr", n=1000, maxID=NULL, sinceID=NULL, includeRts=TRUE)
tweets.user = twListToDF(tweets.user) #converts returned Tweets to data frame
tweets.user

This code extracts the Tweets from the Displayr Twitter timeline and converts the results to a data frame.

The available arguments which can be passed to the userTimeline function can be found in the twitterR cran package documentation.

Adding Twitter data to Displayr as an R data set

In the examples above, we created R outputs to display results of the API calls. We then converted the results into a data frame, making it easier to work with in Displayr. You may, alternatively, want to bring the data into Displayr as a data set, i.e. add the data to the Data Sets tree. To do this, select Home > New Data Set to add a new data set (or select Insert > New Data Set). Select the R icon to create an R data set. Enter the R code for your API call and add a name to your data set. In this example, I've used the same API call as above to extract the Displayr Twitter timeline and named the data set "displayr_timeline".

Click OK to run the R code and add the data to the Data Sets tree.

You can now create tables, charts and visualizations from the data set as you would with any other data set loaded into Displayr.

Rate Limiting

Twitter imposes API rate limiting on a per-user (per access token) basis. The rate limits are regulated in 15 minute intervals. For example, most API calls will be throttled if you exceed 15 calls per each 15 minute window with your access token. The search API is rate limited to 180 calls per 15-minute window. You can refer to the Twitter Developer Rate Limiting page for more details.

To discover more about all the things you can do in R, check out our "R" guides.

How to Relabel Rows and Columns of Tables using R in Displayr

Matt Steele — Wed, 02 Jan 2019 22:56:13 +0000

The mixing process creates a new table as an R Output. Consider the example table below (an R Output created with the feature Home > Tables > Merge Two or More Tables). A user may want to relabel the column headers.

The thing with R Outputs is that they cannot be manipulated by tools in the Ribbon menu (such as the Rename button). The Data Manipulation tools in the Ribbon menu are designed for use with tables made with drag-and-drop. Outputs made with R code can only be modified with R code. The good news is that modifying R Output is very easy and only requires an additional line (or two) of code.

The purpose of this blog article is to show you how you can easily modify the labels of your tables within R Outputs. I’ll show you how you can do this in two ways:

Respecifying all the labels at once (manual)
Renaming individual single row/column labels (manual)

I’ll work through these two cases in the below. Key to this are the functions rownames() and colnames() respectively, which I explain via the worked example. There are other more advanced and automatic ways to do relabeling as well, which I allude to at the end of this post.

Setting the labels of all the rows and columns

Manually respecifying all the rows and column labels can be done easily with the template line of code below:

colnames(table) = c(“label1”, “label2”, “label3”)

In English, table is the name of the table you wish to change. The labels, in order, are shown in green. They are being combined using the combine function c(). When doing it this way (manually setting them all at once) you will need to specify the exact number of labels, else the output will throw an error. So you may have any number of labels.

We can use this code to quickly modify the column labels in the example R Output shown earlier. The current headers look a bit messy because they are actually the names of the three source tables that were merged. You may wish to tidy theses to be: “Awareness”, “Affinity”, “Main”. You can do this easily by adding the following line of code within the R Output:

colnames(merged) = c(“Awareness”, “Affinity”, “Main”)

In this case, I’ve edited an existing R output that had been set up (from Home > Tables > Merge Two or More Tables). Lines 1 to 3 were already set up within the R Output (which you can access via Object Inspector > Properties > R CODE). On line 3, the code is storing the new table as an object called ‘merged’. So using that name, I’ve added in line 5 which sets the column names of merged to be the new (tidy) names.

Note: you need line 6 to return the final (modified) table. Line 5 on its own simply does the relabeling to the table (but doesn’t produce the table in the R Output). Line 6 makes the R Output show the final table (with the relabeling all done by line 5).

Renaming individual single row/column labels (manual)

You can also just change one of the row or column headers, without having to respecify the whole lot. This makes it handy for tweaking a table (e.g. for correcting a spelling error). The code is a little more convoluted, but it is again just a single line that you can easily adapt from the below.

rownames(table)[rownames(table) == "old label"] = “new label”

It’s essentially the same line of code in the first example, but it looks more complicated because of the bit in the square brackets. If you can recall from "Simple Table Manipulations with R in Displayr", the square brackets subset the table. In other words, the subsetting is to specify which label is to be relabeled. All you need to do is borrow the above and swap the red, purple and green text! I’ve done that by extending the previous example, adding lines 8 and 9 to the code:

colnames(merged)[rownames(merged) == “Coca-Cola”] = “Coke”

Further automatic ways to relabel your tables

Here is a preview of some other ways you can go about automatically replacing labels in R Outputs. I won’t explain these in detail here, as I’ll save this content for another blog post.

In the first example in this post, you had to specify all the new row and column headers at once. And you had to specify the exact number. But, there is a simple and neat way you can get the row/column names to refer to another table. This means you don’t have to write out all the new headers and it automatically updates: colnames(table) = colnames(reference_table)

Here's another method. Have you ever used the Find/Replace feature in Excel? There are equivalent ways to do this in R as well. You can get the code to scan through the labels (or values) of a table made within an R Output to find and replace text. This can be useful to clean up messy text. Examples include the sub() and gsub() functions. Again, I won’t go into detail here how they work; I just wanted to point out it’s possible! Stay tuned for further posts where we show this via example.

Try it yourself

The worked example above you can find in this Displayr document.

How to Add Color-Changing Messages to your Page in Displayr

Chris Facer — Wed, 02 Jan 2019 22:23:26 +0000

Displayr documents are dynamic. This means that the data in your document can be updated automatically on a regular basis, and the people who view your published document can interact with it using filters and custom menus (like the ones here on this dashboard). In some cases, you might want to show people messages about the data when they are interacting with it. For instance, market researchers are usually conscious of the sample size that has been used to compute their statistics - in addition to showing it on their dashboard page (as done here) - they may also like to present their viewers with a warning message if the sample on the page drops below a certain level.

An R Output can be used to generate text on a page which updates with the data. However, the formatting remains fixed. With the rhtmlMetro package in R, you can generate text whose format also changes with the data. It takes a little more effort, but the results are much cooler.

Click here for an interactive tutorial on creating dynamic text

Example

Here, I will consider a really basic example. I have a small study on people's attitudes to technology brands. I have built a one-page dashboard showing the average satisfaction scores for each of the brands in my study. It looks like this:

I've set the visualization to sort automatically when the page is filtered. The sample size description at the bottom of the page will also update with any filters. This was created with Insert > More > Data > Sample Size Description. To read more about this feature, see "How to Display the Sample Size on an Online Dashboard."

Check out the published version of my document here.

What you don't see in the screenshot above is the hidden warning message that appears when the sample on the page drops below n=30. If you go into the interactive version and use the Filters menu in the top right to apply a filter for people who have an income Less than $15,000, you will see it change. It will look like this:

I now know that I need to be careful when thinking about or reporting on these figures, and the page won't let me forget it!

How does it work?

I set this up using R and HTML. I'm no wizard with either (especially not HTML!), but it's actually fairly straight-forward.

To create the output:

Select Insert > R Output.
Enter the code below.
Modify the second line to incorporate variables which define the sample (more on this below).

The code for this is:

min.sample = 30 # Set the threshold for small sample
current.sample = length(which(QFilter & !is.na(Q3_01))) # Obtain the current filtered sample
contents = ifelse(current.sample < min.sample, "Warning! Low base size, exercise caution.", "") #Change the text of the message
bgcolor = ifelse(current.sample < min.sample, "#42afe3", "#FFFFFF") #Change the background color of the message
opacity = ifelse(current.sample < min.sample, "1", "0") #Change the opacity of the message
textcolor = "#FFFFFF" #Set the font color

#Build the HTML for the text
your.html = paste('', contents, '', sep="")

# Render the HTML
rhtmlMetro::Box(text = your.html, text.as.html = TRUE)

The Code walk-through:

In the code above, I have set the warning to appear when the sample size for the page is less than 30. To work out the sample size I check QFilter, which is a special property in Displayr which returns a vector of TRUE/FALSE values indicating which cases in the data set are included in the filter. I also check the values of the first satisfaction variable, called Q3_01. If this variable has a missing value (denoted NA or NaN in the R Code), then those cases are also not counted in the sample. I used the function is.na() to work this out. Counting is done by using the which() function to identify the cases which satisfy the two conditions, and using the length() function to count them.

Next, I use the ifelse() function to conditionally choose the color, contents, and opacity of the message box, based on the sample size I had counted previously. This function takes three arguments. The first is the logical condition you want to evaluate. In this case I am considering whether the number of valid cases on my page is less than the minimum sample size of 30. The second argument is the value to return when the condition is true, and the third argument is the value to use when the condition is false.

The paste() function combines all of the options I have determined based on the sample size into a string which defined the HTML code for my message. In this case, my HTML creates a CSS style called div.mystyle which contains all of the properties I have set, including background color, font color, padding, and opacity. You can do much more with CSS. Whenever I want to do something I just search Google and work out which property I need to set. Once the style is created, the HTML simply contains a body with a single div, and uses the class "mystyle" to apply the properties that I have specified. If you want to see how the HTML looks, just comment out the final line of the code above (put a # in front of it) and re-calculate your output.

Finally, the Box() function from the rhtmlMetro package is used to render the HTML on the page.

Find out how to more in Displayr by heading over to "Using Displayr".

How to Blank and Cap Cells of Tables Using R in Displayr

Matt Steele — Wed, 02 Jan 2019 00:20:46 +0000

There are various reasons why you might want to blank and cap cells of tables. You might want to make your table clearer to read by removing some of the small values (which you might consider ‘noise'). Perhaps, you want to ‘cap’ numbers over a certain amount. In this post, I explain how you can automatically modify the contents of tables in Displayr. In doing so, we give you a template for some simple R code that you can leverage.

I assume you are familiar with the content in this introductory post on R. This post serves as a part of the training program for those wanting to learn basic and practical R.

A general piece of code to modify cells of table

In Simple Table Manipulations with R Using Displayr, we covered the concept of table sub-setting. Within the square brackets [], you specify the parts of the table you want to extract (i.e. rows and columns).

Now, suppose instead of specifying a list of row/column indices within the brackets, you could instead specify a condition. A condition, for example, might be table < x which means “all the cells in the table which have a value less than x”. Whenever that evaluates to TRUE, we are now working with only a subset of the table. You then ‘set’ that subset to be equal to new value (using the equals sign).

table[condition] = value

So in the above general piece of R code, the table is the name of the table you are specifying. It can either be:

another table in the document (in which case it will need to be highlighted blue)
a matrix or data frame earlier on within the same R Output (in which case it won’t be highlighted blue)

Note: In either case, you need to put in an extra line of code, which is just ‘table’. That returns the final table with the substituted values (and not just the value). This line is included as the line of code in the examples below.

How to blank cells with small values

Consider the table below, which is a grid question with lots of numbers.

The table has the name (tab.Q5) in the document. With the following code in an R Output (Insert> R Output), it’s going to blank all the cells with a value under 50. In the language of R, NA means blank (or missing values). By the way, this is slightly different from JavaScript which uses NaN. Also note, you don’t necessarily need the first line, I just include it to make line 2 look neater. I could equally have written: tab.Q5[tab.Q5 < 50] = NA

table = tab.Q5
table[table < 50] = NA
table

The result of the code is below. In a separate table (as an R Ouput) we now have the table from before with certain cells blanked. If you put 0 (zero) instead of NA in the code above, it would have made them all zero.

How to cap cells in a table

Here’s another example. Say you have a calculation that’s come about and you need to cap the values in a table. In the example below, some cells are estimated to be over 100%, but you want to cap it at 100.

This table was created as a Multiway Table (i.e. via R) using Insert > More > Tables > Multiway Table. It's actually already an R Output, so therefore you don’t need to make a new R Output to modify it. In this case, you can add a couple of lines of code to the existing output.

Just go into the Properties > R CODE of the Object Inspector for the multiway table and tweak it, as I have below (on the right). I’ve just added two lines of code on lines 11 and 12. The key here is identifying that all the calculations from line 2 through 9 are being stored in an object called multiway on line 2.

Try for yourself

The above two examples are stored in this example Displayr document.

R How To... - Displayr

Using R in Displayr Video Series

How to use the Displayr Cloud Drive

What can be saved to the Displayr Cloud Drive?

Saving to the Cloud Drive

Loading from the Cloud Drive

Sharing R outputs between documents

Connecting to the Cloud Drive using R code

How to Customize the Sample Size Description Widget

Breaking down the fields

Deeper Customization with R

Advanced Customization: Dynamic updating with Combo or List Boxes

Creating R Variables from Multiple Input Variables Using Code

Numeric variables

Vector arithmetic

rowSums and rowMeans

Missing values in vector arithmetic

Variable sets

The apply function

Categorical variables

Automatic updating: benefits and gotchas

Not (!)

Variable labels containing punctuation

Using variable names

Or (|)

Other (TRUE)

Missing values (NA)

Recoding after creating the R variable

And (&)

Temporary variables within the code used to create a variable

ifelse

Using the numeric values of variables in computations

Bad approach

Better approach

Subscripting

Mathematical operations on categorical variables

Debugging

How to Band Numeric Variables in Displayr

Checking the Variables: Structure and Values

Banding by drag-and-drop

Banding via R variable

Banding via JavaScript

Try for yourself

How to link images to a visualization in Displayr

Preparation: Getting a URL for each image

Create a merged table of data and image URLs

Create your visualization

Create holders for your images

Try for yourself

How to Dynamically Change a Question Based on a Control Box

Setup your control box with your options

Changing single-variable questions via your control box

Changing multiple-variable questions via your control box

Changing the weighting dynamically with an R variable

Try for yourself

How to Switch Logos and Images Based on User Selections

Step 1 - Create your menu

Step 2 - Translate your brands into numbers

Step 3 - Create your image

How to Remove a Row or Column using R in Displayr

Specifying the rows/columns to remove by index

Specifying the rows/columns to remove by name

Try for yourself

How to Set the Initial Zoom and Position of Geographic Maps

Create a geographic map to your liking using the leaflet map package in Displayr

How to customize the default area shown on the geographic map

Tips for positioning the initial map

How to Fit a Structural Equation Model in Displayr

Case study

Step 1: Load the data

Step 2: Fit the model

Step 3: Review the path diagram

Step 4: Extract the summary statistics

How to Chart Web Traffic using Google Analytics and Displayr

Call the API

Data Sampling

Visualize your data

How to Blank Cells with Small Sample Sizes using R in Displayr

Cell modification with R, a recap

How to blank cells with small sample sizes

`rowSums` and `rowMeans`

The `apply` function

Not (`!`)

Other (`TRUE`)

Missing values (`NA`)

`ifelse`