The tests were carried out in Location1 on 29/05/2014. Participants from Location1 were placed in Group A. Of the 18 participants, 13 handed in their questionnaires and analyses (12 males, 1 female). All participants played the game “MarketPlace” previously, during the 2 months of the course.
In Location2, the study was conducted over the course of two weeks, as part of an undergraduate course in Industrial Management. There were in total 15 participants in the study (13 males, 2 females), divided in two groups (B and C).
A third study was conducted online, using a LimeSurvey questionnaire and a Google Docs implementation of the analysis templates for both LM-GM and ATMSG. Participants were asked to evaluate the same game (“Senior PM Game”) using both models. Group D used ATMSG first, while Group E used LM-GM. It was possible to stop the survey in the middle and return to it at a later time.
We recruited only participants who self-identified as very familiar with serious games or serious games experts. They were given a 20euros voucher as compensation for their time.
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 32 | 23.34 | 4.783 | 23 | 22.46 | 1.483 | 19 | 44 | 25 | 2.794 | 8.801 | 0.8455 |
A | B | C | D | E | Sum | |
---|---|---|---|---|---|---|
Non-Gamer | 4 | 3 | 7 | 0 | 0 | 14 |
Gamer | 9 | 3 | 2 | 2 | 2 | 18 |
Sum | 13 | 6 | 9 | 2 | 2 | 32 |
A | B | C | D | E | Sum | |
---|---|---|---|---|---|---|
Non-expert | 12 | 6 | 9 | 0 | 0 | 27 |
SGExpert | 1 | 0 | 0 | 2 | 2 | 5 |
Sum | 13 | 6 | 9 | 2 | 2 | 32 |
The tables below show how many participants used which model to evaluate which game and the number of participants in each group.
Note that 3 participants did not deliver their analysis of Senior PM Game (two in Group B and one in Group C).
A | B | C | D | E | Sum | |
---|---|---|---|---|---|---|
MarketPlace | 13 | 0 | 0 | 0 | 0 | 13 |
Senior PM Game | 0 | 4 | 0 | 2 | 2 | 8 |
Vikings | 0 | 0 | 9 | 0 | 0 | 9 |
Sum | 13 | 4 | 9 | 2 | 2 | 30 |
A | B | C | D | E | Sum | |
---|---|---|---|---|---|---|
MarketPlace | 0 | 0 | 0 | 0 | 0 | 0 |
Senior PM Game | 0 | 0 | 8 | 2 | 2 | 12 |
Vikings | 0 | 6 | 0 | 0 | 0 | 6 |
Sum | 0 | 6 | 8 | 2 | 2 | 18 |
This section evaluates the SUS score given to each model by the participants.
For details on the SUS scale and how the SUS score is calculated, see Annex I. For a visualization of responses to each of the 10 questions of the SUS questionnaire, see Annex II.
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 30 58.83 17.5 60 58.85 14.83 22.5 95 72.5 -0.1 -0.59 3.2
## group: Location1
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 13 58.85 12.15 62.5 59.77 11.12 35 72.5 37.5 -0.71 -0.81
## se
## 1 3.37
## --------------------------------------------------------
## group: Location2
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 13 55 22.31 52.5 54.32 25.95 22.5 95 72.5 0.35 -1.13
## se
## 1 6.19
## --------------------------------------------------------
## group: Online
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 4 71.25 10.51 72.5 71.25 9.27 57.5 82.5 25 -0.24 -1.93
## se
## 1 5.25
## group: Non-Gamer
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 13 47.5 15.1 47.5 47.73 18.53 22.5 70 47.5 0.01 -1.33 4.19
## --------------------------------------------------------
## group: Gamer
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 17 67.5 14.14 70 67.67 11.12 37.5 95 57.5 0.01 -0.33
## se
## 1 3.43
## group: Non-expert
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 25 56.7 18.04 55 56.19 22.24 22.5 95 72.5 0.1 -0.62
## se
## 1 3.61
## --------------------------------------------------------
## group: SGExpert
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 5 69.5 9.91 70 69.5 11.12 57.5 82.5 25 0.06 -1.91 4.43
Considering only the ATMSG sus scores, we wanted to know if there was any differences between the groups depending on several factors, such the game played, the familiarity of the student with games or serious games.
For that purpose, we performed an ANOVA to compare the SUS scores for ATMSG. We compared the three conditions (familiarity with games, with sgs and the game played).
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.32 0.57
## 28
num Df | den Df | MSE | F | ges | Pr(>F) | |
---|---|---|---|---|---|---|
gamefam | 1 | 25 | 230.9 | 10.1255 | 0.2883 | 0.0039 |
sgsfam | 1 | 25 | 230.9 | 0.0305 | 0.0012 | 0.8628 |
game | 2 | 25 | 230.9 | 0.2964 | 0.0232 | 0.7461 |
## Tables of means
## Grand mean
##
## 58.83
##
## gamefam
## Non-Gamer Gamer
## 47.5 67.5
## rep 13.0 17.0
##
## sgsfam
## Non-expert SGExpert
## 58.43 60.83
## rep 25.00 5.00
##
## game
## MarketPlace Senior PM Game Vikings
## 56.69 59.6 61.24
## rep 13.00 8.0 9.00
A significant difference has been found in the game familiarity condition. We performed a pairwise post-hoc Tukey test to identify which groups are different.
## group: Non-Gamer
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 13 47.5 15.1 47.5 47.73 18.53 22.5 70 47.5 0.01 -1.33 4.19
## --------------------------------------------------------
## group: Gamer
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 17 67.5 14.14 70 67.67 11.12 37.5 95 57.5 0.01 -0.33
## se
## 1 3.43
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = ATMSG_data_sus_anova$lm)
##
## $gamefam
## diff lwr upr p adj
## Gamer-Non-Gamer 20 8.47 31.53 0.0015
Conclusion: We reject the null hypothesis that there is no difference between the SUS scores given by students who have self-identified as medium familiarity with games (“I’ve played digital games a few times”) and the students who stated that they have a high familiarity with digital games (“I play digital games frequently/I’m a gamer.”).
The same analysis of SUS scores was performed with LM-GM. These scores do not refer to group A, which evaluated only using ATMSG.
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 18 60.69 14.57 62.5 60.31 9.27 35 92.5 57.5 -0.24 -0.04
## se
## 1 3.43
## group: Location2
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 14 59.82 11.87 62.5 60.62 5.56 35 75 40 -1.04 0.04
## se
## 1 3.17
## --------------------------------------------------------
## group: Online
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 4 63.75 24.02 63.75 63.75 25.95 35 92.5 57.5 0 -1.97
## se
## 1 12.01
## group: Non-Gamer
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 9 59.17 10.53 62.5 59.17 3.71 35 72.5 37.5 -1.09 0.39
## se
## 1 3.51
## --------------------------------------------------------
## group: Gamer
## vars n mean sd median trimmed mad min max range skew kurtosis se
## 1 1 9 62.22 18.3 65 62.22 11.12 35 92.5 57.5 -0.18 -1.05 6.1
## group: Non-expert
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 14 59.82 11.87 62.5 60.62 5.56 35 75 40 -1.04 0.04
## se
## 1 3.17
## --------------------------------------------------------
## group: SGExpert
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 4 63.75 24.02 63.75 63.75 25.95 35 92.5 57.5 0 -1.97
## se
## 1 12.01
We also performed the ANOVA analysis in the results of the SUS in LM-GM to identify any differences due to the conditions (game familiarity, game played, serious games familiarity).
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 1.44 0.25
## 16
num Df | den Df | MSE | F | ges | Pr(>F) | |
---|---|---|---|---|---|---|
gamefam | 1 | 14 | 252.6 | 0.0186 | 0.0013 | 0.8936 |
sgsfam | 1 | 14 | 252.6 | 0.1162 | 0.0082 | 0.7383 |
game | 1 | 14 | 252.6 | 0.0606 | 0.0043 | 0.8091 |
## Tables of means
## Grand mean
##
## 60.69
##
## gamefam
## Non-Gamer Gamer
## 59.17 62.22
## rep 9.00 9.00
##
## sgsfam
## Non-expert SGExpert
## 60.26 62.22
## rep 14.00 4.00
##
## game
## Senior PM Game Vikings
## 60.11 61.86
## rep 12.00 6.00
Conclusion: we do not reject the H0 hypothesis. For LM-GM, there is no difference in perception between the participants subgroups.
Are the SUS scores from ATMSG and LM-GM significantly different?
Here we use scores from participants who evaluated both models. We also check if there is any difference between gamers and non-gamers (game familiarity) and SG experts and non-experts.
Our null hyphotheses:
H03 would even be of interest, but cannot be tested with this data, since this is an observational study with unbalanced data 1.
First, we have a look at the box plots and the interaction plots of the data.
The interaction plot above shows the different scores that participants of different familiarity with games gave to each of the models.
We then performed an ANOVA test to analyze any differences between the conditions, including dividing the participants by familiarity with games (Non-Gamer (scores 1-3), Gamer(scores 4-5)).
Our variables:
## model gamefam sus_score.mean sus_score.length
## 1 ATMSG Non-Gamer 42.81 8
## 2 ATMSG Gamer 74.38 8
## 3 LMGM Non-Gamer 57.50 8
## 4 LMGM Gamer 65.62 8
## model sus_score.mean sus_score.length
## 1 ATMSG 58.59 16
## 2 LMGM 61.56 16
## gamefam sus_score.mean sus_score.length
## 1 Non-Gamer 50.16 16
## 2 Gamer 70.00 16
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0 0.96
## 30
Type II ANOVA
num Df | den Df | MSE | F | ges | Pr(>F) | |
---|---|---|---|---|---|---|
gamefam | 1 | 14 | 211.8 | 14.8733 | 0.3183 | 0.0017 |
model | 1 | 14 | 191.7 | 0.3678 | 0.0071 | 0.5539 |
gamefam:model | 1 | 14 | 191.7 | 5.7306 | 0.1110 | 0.0312 |
Interpretations:
We have collected the participant’s comments on the questionnaires and coded the answers, from both groups. The tables below shows the number of participants in each group, split by familiarity with games.
Non-Gamer | Gamer | Sum | |
---|---|---|---|
A | 1 | 6 | 7 |
B | 3 | 3 | 6 |
C | 6 | 2 | 8 |
D | 0 | 2 | 2 |
E | 0 | 2 | 2 |
Sum | 10 | 15 | 25 |
The table below shows how many participants evaluated each model.
Location1 | Location2 | Online | Sum | |
---|---|---|---|---|
ATMSG | 13 | 13 | 4 | 30 |
LMGM | 0 | 14 | 4 | 18 |
The table below shows the frequency in which the comments were made (for all three studies), split by familiarity with games. Repeated comments made by the same participant were dropped.
Non-Gamer | Gamer | Sum | |
---|---|---|---|
ATMSGHelpful | 6 | 7 | 13 |
ATMSGMoreDetailed | 7 | 6 | 13 |
LMGMHelpful | 4 | 6 | 10 |
LMGMSimpler | 6 | 1 | 7 |
ATMSGNeedsSimplifying | 5 | 1 | 6 |
ATMSGNeedsExplanation | 2 | 2 | 4 |
ExplanationsNotGood | 1 | 3 | 4 |
ATMSGFillsObjective | 0 | 3 | 3 |
ATMSGHadSampleAnalysis | 1 | 2 | 3 |
ATMSGHard | 1 | 2 | 3 |
LMGMEasier | 1 | 2 | 3 |
LMGMMoreFocused | 1 | 2 | 3 |
LMGMNeedsExample | 3 | 0 | 3 |
ATMSGBetterDiagram | 0 | 2 | 2 |
ATMSGBetterUnderstanding | 1 | 1 | 2 |
ATMSGHigherLearningCurve | 1 | 1 | 2 |
LMGMHard | 0 | 2 | 2 |
LMGMHardGameMap | 0 | 2 | 2 |
LMGMInvitesMoreThinking | 1 | 1 | 2 |
LMGMLowerLearningCurve | 1 | 1 | 2 |
ATMSGBetterTaxonomy | 0 | 1 | 1 |
ATMSGClearer | 1 | 0 | 1 |
ATMSGEasy | 0 | 1 | 1 |
ATMSGHasTargetGroup | 0 | 1 | 1 |
ATMSGNotSoHelpful | 0 | 1 | 1 |
ATMSGRepetitive | 1 | 0 | 1 |
ATMSGTaxonomyNeedsRevision | 0 | 1 | 1 |
ATMSGTooDetailed | 0 | 1 | 1 |
BothModelsSimilar | 1 | 0 | 1 |
DifficultGame | 1 | 0 | 1 |
LMGMClearer | 1 | 0 | 1 |
LMGMMoreGraphical | 0 | 1 | 1 |
LMGMMoreSuperficial | 0 | 1 | 1 |
LMGMNeedsExplanation | 0 | 1 | 1 |
LMGMNeedsMoreDetails | 1 | 0 | 1 |
LMGMNotSoHelpful | 0 | 1 | 1 |
LMGMNotUnderstood | 0 | 1 | 1 |
LMGMSomewhatHelpful | 1 | 0 | 1 |
LMGMTaxonomyNeedsRevision | 0 | 1 | 1 |
SecondModelNoAddedInsights | 1 | 0 | 1 |
From the table above, and counting the number of participants who made comments, we reached the conclusions below. Multiple answers from the same participants were counted just once.
[1] 15
[1] 13
To generate the table below, we used only participants who evaluated both models. This table also shows the reversed frequency in which the comments were made, split by familiarity with games.
Non-Gamer | Gamer | Sum | |
---|---|---|---|
ATMSGMoreDetailed | 7 | 6 | 13 |
ATMSGHelpful | 5 | 4 | 9 |
LMGMHelpful | 3 | 6 | 9 |
LMGMSimpler | 6 | 1 | 7 |
ATMSGNeedsExplanation | 2 | 2 | 4 |
ATMSGNeedsSimplifying | 4 | 0 | 4 |
ExplanationsNotGood | 1 | 3 | 4 |
ATMSGHadSampleAnalysis | 1 | 2 | 3 |
ATMSGHard | 1 | 2 | 3 |
LMGMEasier | 1 | 2 | 3 |
LMGMMoreFocused | 1 | 2 | 3 |
LMGMNeedsExample | 3 | 0 | 3 |
ATMSGBetterDiagram | 0 | 2 | 2 |
ATMSGBetterUnderstanding | 1 | 1 | 2 |
ATMSGHigherLearningCurve | 1 | 1 | 2 |
LMGMHard | 0 | 2 | 2 |
LMGMHardGameMap | 0 | 2 | 2 |
LMGMInvitesMoreThinking | 1 | 1 | 2 |
LMGMLowerLearningCurve | 1 | 1 | 2 |
ATMSGBetterTaxonomy | 0 | 1 | 1 |
ATMSGClearer | 1 | 0 | 1 |
ATMSGEasy | 0 | 1 | 1 |
ATMSGFillsObjective | 0 | 1 | 1 |
ATMSGHasTargetGroup | 0 | 1 | 1 |
ATMSGRepetitive | 1 | 0 | 1 |
ATMSGTooDetailed | 0 | 1 | 1 |
BothModelsSimilar | 1 | 0 | 1 |
DifficultGame | 1 | 0 | 1 |
LMGMClearer | 1 | 0 | 1 |
LMGMMoreGraphical | 0 | 1 | 1 |
LMGMMoreSuperficial | 0 | 1 | 1 |
LMGMNotSoHelpful | 0 | 1 | 1 |
LMGMSomewhatHelpful | 1 | 0 | 1 |
LMGMTaxonomyNeedsRevision | 0 | 1 | 1 |
SecondModelNoAddedInsights | 1 | 0 | 1 |
The system usability scale (SUS) is a simple, ten-item attitude Likert scale giving a global view of subjective assessments of usability. It was developed by John Brooke at Digital Equipment Corporation in the UK in 1986 as a tool to be used in usability engineering of electronic office systems (Brooke, 1996).
The SUS yields a single score on a scale of 0-100, obtained by converting all the individual measurements to a scale from 0 to 4 (subtracting the user responses from 5 in the even-numbered items, and subtracting 1 from the user’s response for the odd-numbered items), adding them up and multiplying the total by 2.5.
In this analysis, the following questions were used. Green items are positive affirmations; red items are negative (same as in the original SUS).
Obs: Questions 4 and 10 indicate learnability of the system/product/model.
Responses to each question of the SUS questionnaire.
Observation: Here we do not try to infer anything about the interactions. This is an observational study, not an experimental one. Consequently, “there is no guarantee that treatments have been randomly assigned to subjects and rarely any balance causing some treatment combinations to be under-represented. All of this makes assessing interaction in observational studies dangerous. Main effects are hard enough to assess in such studies; interactions are truly pushing the envelope.” See: http://www.unc.edu/courses/2010fall/ecol/563/001/docs/lectures/lecture1.htm#interactions
What to do then? From the same author: “when I analyze observational data I start with main effects and maybe tentatively examine a few interactions that have a theoretical basis.”
Since our data does not support it, we are not going to try to identify any interaction between the factors (“game familiarity” and “model use”), but the main effects in each group. For this reason we use Type II ANOVA, which has more power than Type III SS analyses (see: https://stat.ethz.ch/pipermail/r-help/2010-March/230280.html and http://tolstoy.newcastle.edu.au/R/help/06/08/33607.html).↩