Please find an overview of my publications below. I publicly share the PDF of all my published work, as well as the source code/dataset if possible. Please contact me if you have any questions about my work.
Google Scholar profile.
Abstract: Clinical needs and technological advances have resulted in increased use of Artificial Intelligence (AI) in clinical decision support. However, such support can introduce new and amplify existing cognitive biases. Through contextual inquiry and interviews, we set out to understand the use of an existing AI support system by ophthalmologists. We identified concerns regarding anchoring bias and a misunderstanding of the AI’s capabilities. Following, we evaluated clinicians’ perceptions of three bias mitigation strategies as integrated into a mockup of their existing decision support system. While clinicians recognised the danger of anchoring bias, we identified a concern around the negative effect of bias mitigation on procedure time. Our participants were divided in their expectations of any positive impact on diagnostic accuracy, stemming from deviating levels of trust and reliance on the decision support. Our results provide insights into integrating bias mitigation in the clinical domain amidst a growing dependency on AI support systems.Close abstract Abstract: Bringing AI technology into clinical practice has proved challenging for system designers and medical professionals alike. The academic literature has, for example, highlighted the dangers of black-box decision-making and biased datasets. Further, end-users’ ability to validate a system’s performance often disappears following the introduction of AI decision-making. We present the MAP model to understand and describe the three stages through which medical observations are interpreted and handled by AI systems. These stages are Measurement, in which information is gathered and converted into data points that can be stored and processed; Algorithm, in which computational processes transform the collected data; and Presentation, where information is returned to the user for interpretation. For each stage, we highlight possible challenges that need to be overcome to develop Human-Centred AI systems. We illuminate our MAP model through complementary case studies on colonoscopy practice and dementia diagnosis, providing examples of the challenges encountered in real-world settings. By defining the process of Human-AI interaction across these three stages, we untangle some of the inherent complexities in designing AI technology for clinical decision-making, and aim to overcome misalignment between medical end-users and AI researchers and developers.Close abstract Abstract: The topic of algorithmic fairness is of increasing importance to the Human-Computer Interaction research community following accumulating concerns regarding the use and deployment of Artificial Intelligence-based systems. How we conduct research on algorithmic fairness directly influences our inferences and conclusions regarding algorithmic fairness. To better understand the methodological decisions of studies focused on people’s perceptions of algorithmic fairness, we systematic analysed relevant papers from the CHI and FAccT conferences. We identified 200 relevant papers published between 1993 and 2022 and assessed their study design, participant sample, and geographical location of participants and authors. Our results highlight that studies are predominantly cross-sectional, cover a wide range of participant roles, and that both authors and participants are primarily from the United States. Based on these findings, we reflect on the potential pitfalls and shortcomings in how the community studies algorithmic fairness.Close abstract Abstract: We employ hierarchical clustering, strategic diagrams, and network core-periphery analysis to assess and visualise the intellectual progress of accessibility research within HCI in the past two decades. The study quantifies and explains the development of accessibility research and its thematic evolution based on 1,535 papers published at TACCESS, ASSETS, IJHCS, and CHI and their respective 3470 author-assigned keywords. The novelty of this work is based on employing a quantitative methodological approach to provide an overview of accessibility research progress and insights into its driving and trending themes through the period 2001–2021. In addition, we identify declining, emerging, and core backbone themes of accessibility research. Finally, we discuss the opportunities for research that arise from our findings. These contributions provide a roadmap for researchers working on accessibility.Close abstract Abstract: Unmanned aerial vehicles (UAVs), commonly known as drones, have been deployed across various applications. These applications range from entertainment to critical situations, such as search and rescue (SAR) operations. The use of single drones is most common–one pilot controls one individual drone. Research has begun to explore the benefits of deploying a group of drones as a coordinated swarm. It is, however, uncertain how a multi-drone system should be designed to facilitate interaction in real-world contexts. We report initial findings from three study sessions involving prototype evaluations and co-design sessions we conducted in collaboration with the emergency services of Denmark. The results of our study open new questions and provide input on the features and functions that impact the future adoption of multi-drone systems, including interactions with multiple video feeds, ecology of screens, team communication, and flight control methods.Close abstract Abstract: In contrast to the adoption of personal assistants, social robots have yet to break into the domestic market. Several manufacturers have introduced and quickly retracted their social robots for the home. We report on a survey study (\textitN = 50) to understand potential users’ perceptions towards these social robots. Participants were presented with videos of three domestic social robots and subsequently provided their perception of these in terms of perceived benefits, attraction, privacy risk, usage intention, and capabilities. While participants perceived hedonic and utilitarian benefits, we found a low intention of future adoption of these devices. Further, our findings showed that owners of personal assistants perceived significantly higher hedonic benefits, fewer privacy risks, and higher intention to use domestic social robots. Our work provides an initial step towards understanding perceptions towards social robots and how previous exposure to domestic AI shapes users’ perceptions.Close abstract Abstract: An increasing number of intelligent data-driven health systems seek to support patients and clinicians in decision making tasks. However, the recommendations provided by such systems can negatively impact the reasoning abilities of its users, giving rise to cognitive biases. Such mental processes can subsequently harm the quality of the user’s decision. While decision support systems are typically designed to increase user efficiency, known approaches to mitigate such biases primarily rely on slowing down the decision making process—offsetting any efficiency benefits. This position paper calls attention to the efficiency–quality trade-off in bias mitigation and outlines a future research direction for bias mitigation in AI decision support.Close abstract Abstract: Since OpenAI released ChatGPT, researchers, policy-makers, and laypersons have raised concerns regarding its false and incorrect statements, which are furthermore expressed in an overly confident manner. We identify this flaw as part of its functionality and describe why large language models (LLMs), such as ChatGPT, should be understood as social agents manifesting morality. This manifestation happens as a consequence of human-like natural language capabilities, giving rise to humans interpreting the LLMs as potentially having moral intentions and abilities to act upon those intentions. We outline why appropriate communication between people and ChatGPT relies on moral manifestations by exemplifying `overly confident’ communication of knowledge. Moreover, we put forward future research directions of fully autonomous and semi-functional systems, such as ChatGPT, by calling attention to how engineers, developers, and designers can facilitate end-users sense-making of LLMs by increasing moral transparency.Close abstract Abstract: Artificial Intelligence (AI) systems have been increasingly used to make decision-making processes faster, more accurate, and more efficient. However, such systems are also at constant risk of being attacked. While the majority of attacks targeting AI-based applications aim to manipulate classifiers or training data and alter the output of an AI model, recently proposed Sponge Attacks against AI models aim to impede the classifier’s execution by consuming substantial resources. In this work, we propose Dual Denial of Decision (DDoD) attacks against collaborative Human-AI teams. We discuss how such attacks aim to deplete both computational and human resources, and significantly impair decision-making capabilities. We describe DDoD on human and computational resources and present potential risk scenarios in a series of exemplary domains.Close abstract Abstract: Good mental health is imperative for one’s wellbeing. While clinical mental disorder treatments exist, self-care is an essential aspect of mental health. This paper explores the use and perceived trust of conversational agents, chatbots, in the context of crowdsourced self-care through a between-subjects study (N = 80). One group used a standalone system with a conventional web interface to discover self-care methods. The other group used the same system wrapped in a chatbot interface, facilitating utterances and turn-taking between the user and a chatbot. We identify the security and integrity of the systems as critical factors that affect users’ trust. The chatbot interface scored lower on both these factors, and we contemplate the potential underlying reasons for this. We complement the quantitative data with qualitative analysis and synthesize our findings to identify suggestions for using chatbots in mental health contexts.Close abstract Abstract: Non-invasive inspection and imaging techniques are used to acquire non-visible information embedded in samples. Typical applications include medical imaging, defect evaluation, and electronics testing. However, existing methods have specific limitations, including safety risks (e.g., X-ray), equipment costs (e.g., optical tomography), personnel training (e.g., ultrasonography) and material constraints (e.g., terahertz spectroscopy). Such constraints make these approaches impractical for everyday scenarios. In this paper, we present a method that is low-cost and practical for non-invasive inspection in everyday settings. Our prototype incorporates a miniaturized near-infrared spectroscopy scanner driven by a computer-controlled 2D-plotter. Our work presents a method to optimize content embedding, as well as a wavelength selection algorithm to extract content without human supervision. We show that our method can successfully extract occluded text through a paper stack of up to 16 pages. In addition, we present a deep-learning based image enhancement model that can further improve the image quality and simultaneously decompose overlapping content. Finally, we demonstrate how our method can be generalized to different inks and other layered materials beyond paper. Our approach enables a wide range of content embedding applications, including chipless information embedding, physical secret sharing, 3D print evaluations, and steganography.Close abstract Abstract: Following the onset of the COVID-19 pandemic and subsequent lockdowns, the daily lives of software engineers were heavily disrupted as they were abruptly forced to work remotely from home. To better understand and contrast typical working days in this new reality with work in pre-pandemic times, we conducted one exploratory (N = 192) and one confirmatory study (N = 290) with software engineers recruited remotely. Specifically, we build on self-determination theory to evaluate whether and how specific activities are associated with software engineers’ satisfaction and productivity. To explore the subject domain, we first ran a two-wave longitudinal study. We found that the time software engineers spent on specific activities (e.g., coding, bugfixing, helping others) while working from home was similar to pre-pandemic times. Also, the amount of time developers spent on each activity was unrelated to their general well-being, perceived productivity, and other variables such as basic needs. Our confirmatory study found that activity-specific variables (e.g., how much autonomy software engineers had during coding) do predict activity satisfaction and productivity but not by activity-independent variables such as general resilience or a good work-life balance. Interestingly, we found that satisfaction and autonomy were significantly higher when software engineers were helping others and lower when they were bugfixing. Finally, we discuss implications for software engineers, management, and researchers. In particular, active company policies to support developers’ need for autonomy, relatedness, and competence appear particularly effective in a WFH context.Close abstract Abstract: Humans tend to perceive human qualities in interactive systems. This particularly applies to social robots that utilise human attributes such as human body characteristics and natural language capabilities. Social robots with such characteristics are increasingly deployed in critical settings, such as health and well-being, where it is key to align robot behaviour with end-user expectations. Relatively little is known about how people perceive these social robots’ moral agency. In this position paper, we stress the difference between moral agency and perceived moral agency, and argue that the latter is a timely concern. We discuss the implications of perceived moral agency and outline research directions to explore how humans make sense of social robots in critical settings through perceived moral agency.Close abstract Abstract: Background: Accurate camera and hand-eye calibration are essential to ensure high quality results in image guided surgery applications. The process must also be able to be undertaken by a non-expert user in a surgical setting. Purpose: This work seeks to identify a suitable method for tracked stereo laparoscope calibration within theatre. Methods: A custom calibration rig, to enable rapid calibration in a surgical setting, was designed. The rig was compared against freehand calibration. Stereo reprojection, stereo reconstruction, tracked stereo reprojection and tracked stereo reconstruction error metrics were used to evaluate calibration quality. Results: Use of the calibration rig reduced mean errors: reprojection (1.47mm [SD 0.13] vs 3.14mm [SD 2.11], p-value 1e-8), reconstruction (1.37px [SD 0.10] vs 10.10px [SD 4.54], p-value 6e-7) and tracked reconstruction (1.38mm [SD 0.10] vs 12.64mm [SD 4.34], p-value 1e-6) compared with freehand calibration. The use of a ChArUco pattern yielded slightly lower reprojection errors, while a dot grid produced lower reconstruction errors and was more robust under strong global illumination. Conclusion: The use of the calibration rig results in a statistically significant decrease in calibration error metrics, versus freehand calibration, and represents the preferred approach for use in the operating theatre.Close abstract Abstract: Crowdsourcing has been used to produce impactful and large-scale datasets for Machine Learning and Artificial Intelligence (AI), such as ImageNET, SuperGLUE, etc. Since the rise of crowdsourcing in early 2000s, the AI community has been studying its computational, system design, and data-centric aspects at various angles. We welcome the studies on developing and enhancing of crowdworker- centric tools, that offer task matching, requester assessment, instruction validation, among other topics. We are also interested in exploring methods that leverage the integration of crowdworkers to improve the recognition and performance of the machine learning models. Thus, we invite studies that focus on shipping active learning techniques, methods for joint learning from noisy data and from crowds, novel approaches for crowd-computer interaction, repetitive task automation, and role separation between humans and machines. Moreover, we invite works on designing and applying such techniques in various domains, including e-commerce and medicine.Close abstract Abstract: The use of Artificial Intelligence in clinical support systems is increasing. In this paper we focus on AI support for continuous interaction scenarios. A thorough understanding of end-user behaviour during these continuous Human-AI interactions, in which user input is sustained over time and during which AI suggestions can appear at any time, is still missing. We present a controlled lab-study involving 21 endoscopists and an AI colonoscopy support system. Using a custom-developed application and an off-the-shelf videogame controller, we record participants’ navigation behaviour and clinical assessment across 14 endoscopic videos. Each video is manually annotated to mimic an AI recommendation, being either true positive or false positive in nature. We find that time between AI recommendation and clinical assessment is significantly longer for incorrect assessments. Further, the type of medical content displayed significantly affects decision time. Finally, we discover that the participant’s clinical role plays a large part in the perception of clinical AI support systems. Our study presents a realistic assessment of the effects of imperfect and continuous AI support in a clinical scenario.Close abstract Abstract: The design and evaluation of accessibility technology is a core component of the Computer Science landscape, aiming to ensure that digital innovations are accessible to all. One of the most prominent and long-lasting areas of accessibility research focuses on motor impairments, deficiencies that affect the ability to move, manipulate objects, and interact with the physical world. In this survey paper, we present an extensive overview of the last two decades of research into accessibility for people with motor impairments. Following a structured selection process, we analysed the study details as reported in 177 relevant papers. Based on this analysis, we critically assess user representation, measurement instruments, and existing barriers that exist in accessibility research. Finally, we discuss future directions for accessibility research within the Computer Science domain.Close abstract Abstract: Sound zone technology allows multiple simultaneous sound experiences for multiple people in the same room without interference. However, given the inherent invisible and intangible nature of sound zones, it is unclear how to communicate the position and size of sound zones to users. This paper compares two visualisation techniques; absolute visualisation, relational visualisation, as well as a baseline condition without visualisations. In a within-subject experiment (N=33), we evaluated these techniques for effectiveness and efficiency across four representative tasks. Our findings show that the absolute and relational visualisation techniques increase effectiveness in multi-user tasks but not in single-user tasks. The efficiency for all tasks was improved using visualisations. We discuss the potential of visualisations for sound zones and highlight future research opportunities for sound zone interaction.Close abstract Abstract: The Implicit Association Test (IAT) has been widely used to assess people’s associations of target concepts with qualitative attributes, such as the likelihood of being hired or convicted depending on race, gender, or age. The condensed version–the Brief IAT–elicits implicit biases by measuring the reaction time to concept classifications. In this paper, we introduce and evaluate a new method to appropriate the BIAT using crowdsourcing to measure people’s leanings on polarizing topics. We present a web-based tool to test participants’ bias on custom themes, where self-assessments often fail. We validated our approach with 14 domain experts and assessed the fit of crowdsourced test construction. Our method allows researchers of different domains to create and validate bias tests that can be geographically tailored and updated over time. We discuss how our method can be applied to surface implicit user biases and run studies where cognitive biases may impede reliable results.Close abstract Abstract: Technology plays an increasingly prominent role in emotional lives. Researchers have begun to study how people use devices to cope with and shape emotions: a phenomenon that has been called Digital Emotion Regulation. We report a study of the impact of the COVID-19 pandemic upon young people’s digital habits and emotion regulation behaviors. We conducted a two-wave longitudinal survey, collecting data from 154 university students both before and during the COVID-19 pandemic. During the pandemic, participants were subject to increased emotional distress as well as restrictions on movement and social interaction. We present evidence that participants’ emotion regulation strategies changed and became more homogeneous during the pandemic, with participants resorting to digital tools when offline strategies were less available, while also becoming more emotionally dependent upon their devices. This study underscores the growing significance of the digital for contemporary emotional experience, and contributes to understanding the potential role for technology in supporting well-being during high-impact events.Close abstract Abstract: A new generation of wearable devices now enable end-users to keep track of their sleep patterns. This paper reports on a longitudinal study of 82 participants who used a state-of-the-art sleep tracking ring for an average of 65 days. We conducted interviews and questionnaires to understand changes to their lifestyle, their perceptions of the tracked information and sleep, and the overall experience of using an unobtrusive sleep tracking device. Our results indicate that such a device is suitable for long-term sleep tracking and helpful in identifying detrimental lifestyle elements that hinder sleep quality. However, tracking one’s sleep can also introduce stress or physical discomfort, potentially leading to adverse outcomes. We discuss these findings in light of related work and highlight the near-term research directions that the rapid commoditisation of sleep tracking technology enables.Close abstract Abstract: Tangible input has been explored as a means for participants to self-report experiences while minimising disruption and allowing for discrete data collection. However, the accuracy of these tangible devices has not been studied systematically. We compared six input techniques, including slider, slider with resistance, capacitive touch slider, squeeze, rotary knob, and joystick, to understand their accuracy and resolution profile. Each of these wireless devices was designed in a similar form factor and intended to be operated discretely with one hand. We assessed input accuracy and participant perceptions across devices through a controlled lab study (N=20), highlighting diverging limits to the accuracy of the input technique and possible explanations for the differences in resolution. Our results indicate that participant accuracy was highest using a slider, and lowest using a squeeze-based input. We discuss the suitability and challenges of discreet tangible self-report techniques, and highlight open research questions for future work.Close abstract Abstract: ‘Soundscapes’ are an increasingly active topic in Human-Computer Interaction (HCI) and interaction design. From mapping acoustic environments through sound recordings to designing compositions as interventions, soundscapes appear as a recurring theme across a wide body of HCI research. Based on this growing interest, now is the time to explore the types of studies in which soundscapes provide a valuable lens to HCI research. In this paper, we review papers from conferences sponsored or co-sponsored by the ACM Special Interest Group on Computer-Human Interaction in which the term ‘soundscape’ occurs. We analyse a total of 235 papers to understand the role of soundscapes as a research focus and identify untapped opportunities for soundscape research within HCI. We identify two common soundscape conceptualisations: (1) Acoustic environments and (2) Compositions, and describe what characterises studies into each concept and the hybrid forms that also occur. On the basis of this, we carve out a foundation for future soundscape research in HCI as a methodological anchor to form a common ground and support this growing research interest. Finally, we offer five recommendations for further research into soundscapes within HCI.Close abstract Abstract: Social conformity is the act of individuals adjusting their personal opinions to agree with an opposing majority. Previous work has identified multiple determinants of social conformity in controlled laboratory studies, but they remain largely untested in naturalistic online environments. For this study, we developed a realistic debating website, which 48 participants used for one week. We deployed four versions of the website using a 2 (high vs. low social presence) x 2 (high vs. low emphasis on majority–minority group composition) between-subjects factorial design. We found that participants were significantly more likely to conform when the platform promotes high social presence, despite its emphasis on group composition. Our qualitative findings further reveal how different aspects of social presence embedded in platform design (i.e., user representation, interactivity, and response visibility) contribute to heightened conformity behaviour. Our results provide evidence of the organic manifestation of conformity in online groups discussing subjective content and confirm the effect of platform design on online conformity behaviour. We conclude with a discussion on the implications of our findings on how future online platforms can be designed accounting for conformity influences.Close abstract Abstract: Due to their widespread adoption, frequent use, and diverse sensor capabilities, smartphones have become a powerful tool for academic studies focused on sampling human behaviour. While packing many technological advances, the need for researchers to develop their own software packages in order to run smartphone-based studies has resulted in a clear barrier to entry for researchers without the financial means, time, or technical knowledge required to overcome this technical barrier. We present AWARE-Light, a new smartphone application for data collection from study participants, which is accompanied by a website that provides any researcher the possibility to easily configure their own study. To highlight the possibilities of our tool, we present a research scenario on digital phenotyping for mental health. Furthermore, we describe the methodological configuration possibilities offered by our tool, and complement the technical configuration possibilities with recommendations from the existing literature.Close abstract Abstract: Crowdworkers silently enable much of today’s AI-based products, with several online platforms offering a myriad of data labelling and content moderation tasks through convenient labour marketplaces. The HCI community has been increasingly interested in investigating the worker-centric issues inherent in the current model and seeking for potential improvements that could be implemented in the future. This workshop explores how a reimagined perspective on crowdsourcing platforms could provide a more equitable, fair, and rewarding experience. This includes not only the workers but also the platforms, who could benefit e.g. from better processes for worker onboarding, skills-development, and growth. We invite visionary takes in various formats on this topic to spread awareness of worker-centric research and developments to the CHI community. As a result of interactive ideation work in the workshop, we articulate a future direction roadmap for research centred around crowdsourcing platforms. Finally, as a specific interest area, the workshop seeks to study crowdwork from the context of the Global South, which has been arising as an important but critically understudied crowdsourcing market in recent years.Close abstract Abstract: The increasing capabilities of Artificial Intelligence enable the support of users in a continuously growing number of applications. Current systems typically dictate that interaction between user input and AI output unfolds in discrete steps, as is the case with, for example, conversational agents. Novel scenarios require AI systems to adapt and respond to continuous user input, e.g., image-guided surgery and AI-supported text entry. In and across these applications, AI systems need to support more varied and dynamic interactions in which users and AI interact continuously and in parallel. Current methods and guidelines are often inadequate and sometimes even detrimental to user needs when considering continuous usage scenarios. Realizing a continuous interaction between users and AI requires a substantial change in perspective when designing Human-AI systems. In this SIG, we support the exchange of cutting-edge research contributing to a better understanding and improved methods and tools to design continuous Human-AI interaction.Close abstract Abstract: This workshop paper outlines a conceptual browser plugin that enables crowdworkers to store and later rapidly provide personal information frequently requested in crowdsourcing tasks. Personal data, including demographic data such as age and ethnicity, as well as responses to commonly used personality-related survey instruments, is often critical to collect in crowdsourcing tasks but results in a repetitive experience for crowdworkers. From a requesters perspective, this repetition can result in reduced data quality or the decision to abstain from collecting extensive information on the workers completing a given task. Moreover, given the extensive role of crowdworkers in labelling training data for artificial intelligence applications, ensuring awareness of the workers’ characteristics can help alleviate future biases. In this work, we present the motivation and design requirements for this (hypothetical) plugin and seek input from the community towards its future development.Close abstract Abstract: Sound zones enable multiple simultaneous sound experiences in the same physical room without interference. In this paper, we present an interactive sound zone setup that can produce two sound zones within a confined space. Through a tangible remote controller, users can change the volume, size, and position of these sound zones. In addition, we have built a custom visualisation display that provides real-time feedback of the sound zones to support users’ understanding. Sound zone systems pose novel challenges for the HCI community, including how users may understand and interact with sound zones. Our setup offers a concrete solution into investigating these challenges.Close abstract Abstract: Artificial Intelligence (AI) has the technological potential to transform healthcare by assisting medical personal in their everyday workflow. For successful collaboration and adoption of AI technology, end-users need to trust the AI system. In this paper we outline the use of Relational Signalling Theory, an established theory on Human-Human trust building, as a conceptual lens for designing trust-building signals in Human-AI interaction. We argue that the use of a theoretical foundation in the design and evaluation of interactions supports the development of Human-Centered AI in healthcare.Close abstract Abstract: The COVID-19 pandemic has drastically impacted how work is conducted, shifting many jobs to a hybrid nature with an emphasis on work-from-home. This shift has, however, not affected all job types equally. In this position paper, we argue that the advancement of collaborative robots in the industrial setting presents a unique and underexplored opportunity for robot-supported hybrid work in the industrial setting. We present five challenges that act as contributing factors that stifle access to hybrid work opportunities in the industrial context. These point to (i) the need for spacial awareness of both the robot and its surroundings, (ii) the, at times, need for physical presence for breakdown intervention and recovery, (iii) the need for contextual awareness, (iv) the need for additional employee training, and (v) a clear interface to map the varying degrees of freedom to a remote interface. We present future research opportunities with the potential to address some of the presented challenges.Close abstract Abstract: Unmanned aerial vehicles (UAVs) have been shown to effectively support search and rescue (SAR) operations, usually through manual control of each UAV. Research has started to move towards multi-drone systems with increasing levels of autonomy to support faster and more challenging SAR operations. However, how such a system should be implemented into the SAR procedures remains an open challenge. We seek to thoroughly understand the issues that arise with drone usage in SAR and how semi-autonomous multi-drone systems can help alleviate those. We conducted a pilot study with emergency services SAR pilots and our multi-drone control prototype. Initial findings have given insights into useful features and potential pitfalls in designing a multi-drone system. The results inform our continued work toward more refined prototypes that will serve as a platform for search and rescue operations.Close abstract Abstract: Large-scale multi-robot systems have numerous potential real-world applications. It is, however, still unclear how a human operator can effectively engage and control a system composed of multiple autonomous robots, especially in unstructured and outdoor environments. This paper reports on ongoing work in the project HERD — Human-AI Collaboration: Engaging and Controlling Swarms of Robots and Drones, in which we focus on two concrete use cases from industrial partners, namely farming and search & rescue. One of the industrial partners, Agro Intelligence ApS, currently sells autonomous farming robots, while the other, Robotto ApS, develops autonomous drone-based monitoring solutions for emergency responders. Both partners aim to scale their technologies to multi-robot/multi-drone operations. In this paper, we present the two use cases, their differences and similarities, challenges and preliminary results.Close abstract Abstract: Theoretical work in morphological typology offers the possibility of measuring morphological diversity on a continuous scale. However, literature in NLP typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative. In this work, we propose to reduce the theoretical rigidity of such claims, by quantifying the morphological typology at the word and segment level. We consider Payne (2017)’s approach to classify morphology using two indices: synthesis (from 1 for analytic to 3 or more for polysynthetic) and fusion (from 0 for agglutinative to 1 for fusional). For computing synthesis, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study. Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish, and verbs in English-Spanish) and segment level (previous language pairs plus English-German in both directions). We complement the word-level analysis with human evaluation, and overall, we observe a consistent impact of both indexes on machine translation quality.Close abstract Abstract: This paper presents two studies investigating how physically remote telexistence users wish to see other users visualized as virtual avatars in a) augmented reality, and b) immersive virtual reality while conducting a collaborative task. To answer this research question, a telexistence system was designed and implemented with simple avatar designs. After that, visual examples of alternative avatar representations for both use cases were designed by thoughtfully altering the visual parameters of 36 virtual avatar examples. The avatar designs were first evaluated in a user study with 16 participants in conjunction with using an implemented telexistence system. As a follow-up an online survey with 43 respondents was used to record their preferences regarding virtual avatar appearance. The results suggest that users prefer the other user to be represented in a photorealistic full-body human avatar in both augmented reality and virtual reality due to its humanlike representation and affordances for interaction. In augmented reality, the choice for a hologram full body avatar was also popular due to its see-through appearance, which prevents a mix-up with a real person in the physical space.Close abstract Abstract: The GDPR establishes a right for individuals to get access to information about automated decision-making based on their personal data. However, the application of this right comes with caveats. This paper investigates how European insurance companies have navigated these obstacles. By recruiting volunteering insurance customers, requests for in- formation about how insurance premiums are set were sent to 26 insurance companies in Denmark, Finland, The Netherlands, Poland and Sweden. Findings illustrate the practice of responding to GDPR information requests and the paper identifies possible explanations for shortcomings and omissions in the responses. The paper also adds to existing research by showing how the wordings in the different language versions of the GDPR could lead to different interpretations. Finally, the paper discusses what can reasonably be expected from explanations in consumer oriented information.Close abstract Abstract: Open data is often contributed by various governments and public sector actors. An increasingly popular way to collect large bespoke datasets is crowdsourcing. In this work we explore crowdsourced open data as an enabler of future software solutions. We recruited participants from an online paid crowdsourcing platform to provide open mental health related data that was used to create an interactive data-driven decision support system for self-care. We then invited a sub-sample of 80 participants back to explore the tool that was created using their own data and to provide a rich account of perceptions on issues around such health data reuse in software. Our results unfold a range of different perceived threats and opportunities in using crowdsourced data to enable software solutions, and our work contributes a topical case study and discussion toward the use of crowdsourced data in an open fashion.Close abstract Abstract: The uptake of artificial intelligence-based applications raises concerns about the fairness and transparency of AI behaviour. Consequently, the Computer Science community calls for the involvement of the general public in the design and evaluation of AI systems. Assessing the fairness of individual predictors is an essential step in the development of equitable algorithms. In this study, we evaluate the effect of two common visualisation techniques (text-based and scatterplot) and the display of the outcome information (i.e., ground-truth) on the perceived fairness of predictors. Our results from an online crowdsourcing study (N = 80) show that the chosen visualisation technique significantly alters people’s fairness perception and that the presented scenario, as well as the participant’s gender and past education, influence perceived fairness. Based on these results we draw recommendations for future work that seeks to involve non-experts in AI fairness evaluations.Close abstract Abstract: We investigate the use of a miniaturized Near-Infrared Spectroscopy (NIRS) device in an assisted decision-making task. We consider the real-world scenario of determining whether food contains gluten, and we investigate how end-users interact with our NIRS detection device to ultimately make this judgment. In particular, we explore the effects of different nutrition labels and representations of confidence on participants’ perception and trust. Our results show that participants tend to be conservative in their judgment and are willing to trust the device in the absence of understandable label information. We further identify strategies to increase user trust in the system. Our work contributes to the growing body of knowledge on how NIRS can be mass-appropriated for everyday sensing tasks, and how to enhance the trustworthiness of assisted decision-making systems.Close abstract Abstract: Public controversies around the unethical use of personal data are increasing, spotlighting data ethics as an increasingly important field of study. MyData is a related emerging vision that emphasizes individuals’ control of their personal data. In this paper, we investigated people’s perceptions of various data management scenarios by measuring the perceived ethicality and level of personal felt concern regarding the scenarios. We deployed the set of 96 unique scenarios to a paid crowdsourcing platform for assessment and invited a representative sample of the participants to a second-stage questionnaire about the MyData vision as well as its potential in the field of healthcare. Our results provide a timely investigation into how topical data-related practices affect the perceived ethicality and the felt concern. The questionnaire analysis reveals great potential in the MyData vision. Through the combined quantitative and qualitative results, we contribute to the field of data ethics.Close abstract Abstract: With the rise in artificial intelligence (AI)—driven interactive systems, both academics and practitioners within human-computer interaction (HCI) have a growing focus on human-AI interaction. This has resulted in, for example, system-design guidelines and reflections on the differences and challenges when designing for AI-driven interaction as opposed to more-traditional applications. We argue that the current work on human-AI interaction is defined primarily by a focus on what we refer to as intermittent interaction scenarios, in which there is a clear line between the human initiator of an interaction and an almost immediate system response. However, user interaction with AI systems does not necessarily follow this rigid interaction pattern. Inspired by Kristina Höök and Yang et al., we define human-AI interaction as the completion of a user’s task with the help of AI support, which may manifest itself in non-intermittent scenarios. By overlooking these other interaction paradigms, we neglect the opportunity to define and support alternative human-AI scenarios. In this article, we present and outline three types of human-AI interaction paradigms, which we refer to as intermittent, continuous, and proactive, highlighting a diverse set of interaction scenarios and pointing to a need for HCI considerations across different types of human-AI interaction. While a wide range of existing AI-powered systems operate continuously in the background of our lives (e.g., step counters, spam filters), these applications do not engage directly with their users. Here, we focus on AI applications that interact directly with their users.Close abstract Abstract: The Experience Sampling Method is used to collect participant self-reports over extended observation periods. These self-reports offer a rich insight into the individual lives of study participants by intermittently asking participants a set of questions. However, the longitudinal and repetitive nature of this sampling approach introduces a variety of concerns regarding the data contributed by participants. A decrease in participant interest and motivation may negatively affect study adherence, as well as potentially affecting the reliability of participant data. In this chapter we reflect on a number of studies that aim to understand better participant performance with Experience Sampling. We discuss the main issues relating to participant data for longitudinal studies, and provide hands-on recommendations for researchers to remedy these concerns in their own studies.Close abstract Abstract: Electric stand-up scooters (e-scooters) are introduced in several cities worldwide, providing new means for people to travel around the city. While praised for their flexibility, e-scooters are also met with negative sentiments due to fatal accidents and chaotic parking. In this paper, we seek to understand the mobility of shared e-scooters and point to gaps in the user interaction between the digital and physical world. We carried out three data collections, including interviews, in situ observation, analysis of news media coverage. Our findings illustrate integration with alternate modes of transportation in urban context, and how technologies facilitate or hinder (micro-) mobility. We found that users of e-scooters primarily view these devices as an alternative to walking rather than other transportation forms. Additionally, we found that users’ and non-users’ needs, perspectives and experiences of e-scooters clash, in particular with regard to perceptions of sustainability. Based on these findings, we present three relevant perspectives of sustainability, extending the ongoing debate of sustainable HCI research. We contribute with an empirically supported understanding of the perception of mobility and sustainability for e-scooters in a Scandinavian urban context.Close abstract Abstract: Routines are an important element of day-to-day work life, allowing people to structure their day around required tasks. Effectively managing these routines is, however, experienced as challenging by many – an issue further amplified by the current work from home lockdown measures. In this paper we present Rainmaker, a tangible device that aims to support people in their working life in the context of their own homes. We evaluate and iterate on our prototype through two qualitative studies, spanning respectively three days (N= 11) and 15 days (N= 2). Our results highlight the perceived advantages of the use of a primarily physical rather than digital tool for work support, allowing users to stay focused on their tasks and reflect on their work achievements. We present lessons for future work in this area and publicly release the software and hardware used in the construction of Rainmaker.Close abstract Abstract: The Experience Sampling Method is a widely used methodology for the collection of self-report data. These self-reports are typically collected through bespoke mobile applications or text messages. Recently, an increasing number of social messaging applications have introduced chatbots – automated services that operate inside existing chat applications. In this paper, we present an initial study on the use of chatbots for self-report studies. Furthermore, we outline three use cases in which the use of chatbots enables new research opportunities, namely personalised and empathic chatbots, psychometric instrument construction, and group-based assessments. We conclude with an overview of the opportunities and challenges that chatbots offer to researchers employing the Experience Sampling Method.Close abstract Abstract: Spatial experience is an important subject in various fields, and in HCI it has been mostly investigated in the urban scale. Research on human scale spaces has focused mostly on the personal meaning or aesthetic and embodied experiences in the space. Further, spatial experience is increasingly topical in envisioning how to build and interact with technologies in our everyday lived environments, particularly in so-called smart cities. This workshop brings researchers and practitioners from diverse fields to collaboratively discover new ways to understand and capture human scale spatial experience and envision its implications to future technological and creative developments in our habitats. Using a speculative design approach, we sketch concrete solutions that could help to better capture critical features of human scale spaces and allow for unique possibilities for aspects such as urban play. As a result, we hope to contribute a road map for future HCI research on human scale spatial experience and its application.Close abstract Abstract: We propose a workshop on methods and theories for dealing with complex dynamical systems, and their application in HCI. Such methods are increasingly relevant across a wide range of disciplines which focus on human behaviour, applied to understand the role of context and interactions in the behaviour of individuals and groups, and how they unfold over time. Traditional approaches to quantifying and modelling behaviour in HCI have tended to focus primarily on individuals and components. Complexity methods shift the focus onto interactions between components, and the emergence of behaviour from complex networks of interactions, as for example in Enactivist approaches to cognitive science. While we believe that complexity methods can be highly informative to HCI researchers, uptake in the community remains low due to widespread unfamiliarity. This one-day workshop will introduce, support, and encourage the development and adoption of complexity methods within HCI. Reflecting the multidisciplinary mix within complexity science, we will draw on examples of complexity-oriented theories and methods from a range of disciplines, including Control-Theory, Social Science, and Cognitive Science. Attendees will engage in group discussions and a Q&A with a panel, and a discussion group will be set up ahead of time to encourage exploratory conversations. In this way, diverse backgrounds can be brought together, matched, and inform one another.Close abstract Abstract: Colonoscopy, the visual inspection of the large bowel using an endoscope, offers protection against colorectal cancer by allowing for the detection and removal of pre-cancerous polyps. The literature on polyp detection shows widely varying miss rates among clinicians, with averages ranging around 22–27%. While recent work has considered the use of AI support systems for polyp detection, how to visualise and integrate these systems into clinical practice is an open question. In this work, we explore the design of visual markers as used in an AI support system for colonoscopy. Supported by the gastroenterologists in our team, we designed seven unique visual markers and rendered them on real-life patient video footage. Through an online survey targeting relevant clinical staff (N = 36), we evaluated these designs and obtained initial insights and understanding into the way in which clinical staff envision AI to integrate in their daily work-environment. Our results provide concrete recommendations for the future deployment of AI support systems in continuous, adaptive scenarios.Close abstract Abstract: Researchers in Human-Computer Interaction typically rely on experiments to assess the causal effects of experimental conditions on variables of interest. Although this classic approach can be very useful, it offers little help in tackling questions of causality in the kind of data that are increasingly common in HCI – capturing user behavior ‘in the wild.’ To analyze such data, model-based regressions such as cross-lagged panel models or vector autoregressions can be used, but these require parametric assumptions about the structural form of effects among the variables. To overcome some of the limitations associated with experiments and model-based regressions, we adopt and extend ‘empirical dynamic modelling’ methods from ecology that lend themselves to conceptualizing multiple users’ behavior as complex nonlinear dynamical systems. Extending a method known as ‘convergent cross mapping’ or CCM, we show how to make causal inferences that do not rely on experimental manipulations or model-based regressions and, by virtue of being non-parametric, can accommodate data emanating from complex nonlinear dynamical systems. By using this approach for multiple users, which we call ‘multiple convergent cross mapping’ or MCCM, researchers can achieve a better understanding of the interactions between users and technology – by distinguishing causality from correlation – in real-world settings.Close abstract Abstract: The COVID-19 pandemic has forced governments worldwide to impose movement restrictions on their citizens. Although critical to reducing the virus’ reproduction rate, these restrictions come with far-reaching social and economic consequences. In this paper, we investigate the impact of these restrictions on an individual level among software engineers who were working from home. Although software professionals are accustomed to working with digital tools, but not all of them remotely, in their day-to-day work, the abrupt and enforced work-from-home context has resulted in an unprecedented scenario for the software engineering community. In a two-wave longitudinal study (N = 192), we covered over 50 psychological, social, situational, and physiological factors that have previously been associated with well-being or productivity. Examples include anxiety, distractions, coping strategies, psychological and physical needs, office set-up, stress, and work motivation. This design allowed us to identify the variables that explained unique variance in well-being and productivity. Results include (1) the quality of social contacts predicted positively, and stress predicted an individual’s well-being negatively when controlling for other variables consistently across both waves; (2) boredom and distractions predicted productivity negatively; (3) productivity was less strongly associated with all predictor variables at time two compared to time one, suggesting that software engineers adapted to the lockdown situation over time; and (4) longitudinal analyses did not provide evidence that any predictor variable causal explained variance in well-being and productivity. Overall, we conclude that working from home was per se not a significant challenge for software engineers. Finally, our study can assess the effectiveness of current work-from-home and general well-being and productivity support guidelines and provides tailored insights for software professionals.Close abstract Abstract: We investigate the impact of anthropomorphism on embodied AI through a study of personal assistants (PA). The effects of physical embodiment remain underexplored while the consumer market for PAs shows an increase in the diversity of physical appearances of these products. We designed three fictional personal assistants with varying levels of embodied anthropomorphism. We validated that our prototypes differed significantly in levels of anthropomorphism (N=26). We developed a set of identical videos for each device, demonstrating realistic end-user interaction across six scenarios. Using a between-subject video survey study (N=150), we evaluate the impact of different levels of embodied anthropomorphism on the perception of personal assistants. Our results show that while anthropomorphism did not significantly affect the perception of Overall Goodness, it affected perceptions of Perceived Intelligence, Likeability, and the device’s Pragmatic Qualities. Finally, we discuss the implications of the identified relationships between anthropomorphism and user confidence in embodied AI systems.Close abstract Abstract: In the context of learning systems, identifying causal relationships among information presented to the user, their behavior and cognitive effort required/exerted to understand and perform a task is key to building effective learning experiences, and to maintain engagement in learning processes. An unexplored question is whether our interaction with presented information affects our cognitive effort (and behaviour), or vice-versa. We investigate causal relationship between information presented and cognitive effort (and behaviour) in the context of two separate studies (N = 40, N = 98), and study the effect of instruction (active/passive task). We utilize screenrecordings and eye-tracking data to investigate the relationship among these variables. To investigate the causal relationships among the different measurements, we use Granger’s causality. Further, we propose a new method to combine two time-series from multiple participants for detecting causal relationships. Our results indicate that information presentation drives user focus size (behaviour), and that cognitive load (a measure of cognitive effort exerted) drives information presentation. This relationship is also moderated by instruction type and performance-level (high/low). We draw implications for design of educational material and learning technologies.Close abstract Abstract: We present CARE, a context-aware tool for nurses in nursing homes. The system utilises a sensors infrastructure to quantify the behaviour and wellbeing (e.g., activity, mood, social and nurse interactions) of elderly residents. The sensor data is offloaded, processed and analysed in the cloud, to generate daily and long-term summaries of residents’ health. These insights are then presented to nurses via an Android tablet application. We aim to create a tool that can assist nurses and increase their awareness to residents’ needs. We deployed CARE in a local nursing home for two months and evaluated the system through a post-hoc exploratory analysis and interviews with the nurses. The results indicate that CARE can reveal essential insights on the wellbeing of elderly residents and improve the care service. In the discussion, we reflect on our understanding and potential impact of future integrated technology in elderly care environments.Close abstract Abstract: Despite large investments in smartwatch development, the market growth remains smaller than forecasted. The purpose of smartwatch use remains unclear, indicated by the lack of large-scale adoption. Thus, we aim to better understand the early adoption and everyday smartwatch use. We investigate a diverse usage data of smartwatches logged over a period of up to 14 months from 79 individuals between December 2015 and March 2017, one of the largest wearable datasets collected. First, we identify both explorative and accepted behaviours that users exhibit and further investigate how the individual usage traits and features differ between the two categories. Our analysis offers an insightful perspective on how smartwatch use evolves organically. Our results improve our shared understanding of smartwatch use and users adapting their use of smartwatch over time to match the capabilities of the technology by validating numerous findings from previous literature.Close abstract Abstract: To prepare for a future in which robots are more commonplace, it is important to know what robot behaviors people find socially normative. Previous work suggests that for robots to be accepted by people, the robot should adhere to the prevalent social norms, such as those related to approaching people. However, we do not expect that socially normative approach behaviors for robots can be translated on a one-on-one basis from people to robots, because currently robots have unique and different features to humans, including (but not limited to) wheels, sounds, and shapes. The two studies presented in this article go beyond the state- of-the-art and focus on socially normative approach behaviors for robots. In the first study, we compared people’s responses to violations of personal space done by robots compared to people. In the second study, we explored what features (sound, size, speed) of a robot approaching people have an effect on acceptance. Findings indicate that people are more lenient toward violations of a social norm by a robot as compared to a person. Also, we found that robots can use their unique features to mitigate the negative effects of norm violations by communicating intent.Close abstract Abstract: The development, evaluation, and eventual deployment of novel medical devices is a complex process involving various areas of expertise. Although the need for a User Centred Design approach to the development of both hardware and software has long been established, both current regulatory guidelines and widespread evaluation approaches fail to reflect the challenges encountered during day-to-day clinical practice. As such, the results from these evaluations may not provide a realistic account of the problems encountered by users when introduced to clinical practice. In this paper, we present a case study on designing the evaluation of a novel device to support laparoscopic liver surgery. Through a reflective account of the design of our usability evaluation, we identify and describe seven primary dimensions of ecological validity encountered in clinical usability evaluations. These dimensions are: ‘user roles’, ‘environment’, ‘training’, ‘scenario’, ‘patient involvement’, ‘software’, and ‘hardware’. We analyse three recently published clinical usability evaluation articles to assess (and illustrate) the applicability and completeness of these dimensions. Finally, we discuss the compromises encountered during clinical usability evaluations and how to best report on these considerations. The framework presented here aims to further the agenda of ecologically valid evaluation practice, reflecting the constraints of medical practice.Close abstract Abstract: The emergence of big data combined with the technical developments in Artificial Intelligence has enabled novel opportunities for autonomous and continuous decision support. While initial work has begun to explore how human morality can inform the decision making of future Artificial Intelligence applications, these approaches typically consider human morals as static and immutable. In this work, we present an initial exploration of the effect of context on human morality from a Utilitarian perspective. Through an online narrative transportation study, in which participants are primed with either a positive story, a negative story or a control condition (N = 82), we collect participants’ perceptions on technology that has to deal with moral judgment in changing contexts. Based on an in-depth qualitative analysis of participant responses, we contrast participant perceptions to related work on Fairness, Accountability and Transparency. Our work highlights the importance of contextual morality for Artificial Intelligence and identifies opportunities for future work through a FACT-based (Fairness, Accountability, Context and Transparency) perspective.Close abstract Abstract: Echoing the evolving interest and impact of artificial intelligence on society, governments are increasingly looking for ways to strategically position themselves as both innovators and regulators in this new domain. One of the most explicit and accessible ways in which governments outline these plans is through national strategy and policy documents. We follow a systematic search strategy to identify national AI policy documents across twenty-five countries. Through an analysis of these documents, including topic modelling, clustering, and reverse topic-search, we provide an overview of the topics discussed in national AI policies and contrast the differences between countries. Furthermore, we analyse the frequency of eleven ethical principles across our corpus. Our paper outlines implications of the differences between geographical and cultural clusters in relation to the future development of artificial intelligence applications.Close abstract Abstract: While crowd workers typically complete a variety of tasks in crowdsourcing platforms, there is no widely accepted method to successfully match workers to different types of tasks. Researchers have considered using worker demographics, behavioural traces, and prior task completion records to optimise task assignment. However, optimum task assignment remains a challenging research problem due to limitations of proposed approaches, which in turn can have a significant impact on the future of crowdsourcing. We present ‘CrowdCog’, an online dynamic system that performs both task assignment and task recommendations, by relying on fast-paced online cognitive tests to estimate worker performance across a variety of tasks. Our work extends prior work that highlights the effect of workers’ cognitive ability on crowdsourcing task performance. Our study, deployed on Amazon Mechanical Turk, involved 574 workers and 983 HITs that span across four typical crowd tasks (Classification, Counting, Transcription, and Sentiment Analysis). Our results show that both our assignment method and recommendation method result in a significant performance increase (5% to 20%) as compared to a generic or random task assignment. Our findings pave the way for the use of quick cognitive tests to provide robust recommendations and assignments to crowd workers.Close abstract Abstract: Social conformity occurs when individuals in group settings change their personal opinion to be in agreement with the majority’s position. While recent literature frequently reports on conformity in online group settings, the causes for online conformity are yet to be fully understood. This study aims to understand how social presence, i.e., the sense of being connected to others via mediated communication, influences conformity among individuals placed in online groups while answering subjective and objective questions. Acknowledging its multifaceted nature, we investigate three aspects of online social presence: user representation (generic vs.user-specific avatars), interactivity (discussion vs.no discussion ), and response visibility (public vs.private ). Our results show an overall conformity rate of 30% and main effects from task objectivity, group size difference between the majority and the minority, and self-confidence on personal answer. Furthermore, we observe an interaction effect between interactivity and response visibility, such that conformity is highest in the presence of peer discussion and public responses, and lowest when these two elements are absent. We conclude with a discussion on the implications of our findings in designing online group settings, accounting for the effects of social presence on conformity.Close abstract Abstract: Crowd-powered innovation platforms act to a large degree as silos: they cater the cognitive surplus of a bespoke, selfselected audience to a limited amount of high-value clients in a model where both stakeholders typically have to jump through a series of hoops to enrol to the services. We propose a fundamentally disruptive way for discovery with distributed crowds, by orchestrating already established online audiences for serendipitous crowdsourcing. Two recent developments make our proposition, The Planetary Cognition Delivery Network, compelling right now. First, online properties are struggling due to declining advertising revenues, caused by the proliferation of ad-blockers and a few key Internet giants taking an increasingly larger cut of the available revenue. Second, and perhaps somewhat counter-intuitively to the immediate perception in the Western world, the Internet is just now becoming widely available in many corners of the world, which provides an opportunity for a truly worldwide reach during the next decade or two. COGNET is designed to offer a frictionless participation mechanism for all three key stakeholders: requesters, providers, and contributors. It essentially orchestrates a distributed network of human cognition pools for arbitrary discovery tasks that could benefit from the vast cognitive surplus available through the internet.Close abstract Abstract: Despite decades of research concerning social conformity and its effects on face-to-face groups, it is yet to be comprehensively investigated in online contexts. In our work, we investigate the impact of contextual determinants (such as majority group size, the number of opposing minorities and their sizes, and the nature of the task) and personal determinants (such as self-confidence, personality and gender) on online social conformity. In order to achieve this, we deployed an online quiz with subjective and objective multiple-choice questions. For each question, participants provided their answer and self-reported confidence. Following this, they were shown a fabricated bar chart that positioned the participant either in the majority or minority, presenting the distribution of group answers across different answer options. Each question tested a unique group distribution in terms of the number of minorities against the majority and their corresponding group sizes. Subsequently, participants were given the opportunity to change their answer and reported confidence. Upon completing the quiz, participants undertook a personality test and participated in a semi-structured interview. Our results show that 78% of the participants conformed to the majority’s answers at least once during the quiz. Further analysis reveals that the tendency to conform was significantly higher for objective questions, especially when a participant was unsure of their answer and faced an opposing majority with a significant size. While we saw no significant gender differences in conformity, participants with higher conscientiousness and neuroticism tended to conform more frequently than others. We conclude that online social conformity is a function of majority size, nature of the task, self-confidence and certain personality traits.Close abstract Abstract: The collection of participant data ‘in the wild’ is widely employed by Human-Computer Interaction researchers. A variety of methods, including experience sampling, mobile crowdsourcing, and citizen science, rely on repeated participant contributions for data collection. Given this strong reliance on participant data, ensuring that the data is complete, reliable, timely, and accurate is key. Although previous work has made significant progress on ensuring that a sufficient amount of data is collected, the accuracy of human contributions has remained underexposed. In this article we argue for an emerging need for an increased focus on this aspect of human-labelled data. The articles published in this special issue demonstrate how a focus on the accuracy of the collected data has implications on all aspects of a study – ranging from study design to the analysis and reporting of results. We put forward a five-point research agenda in which we outline future opportunities in assessing and improving human accuracy in mobile data collection.Close abstract Abstract: In this paper, we demonstrate the existence of a bidirectional causal relationship between smartphone application use and user emotions. In a two-week long in-the-wild study with 30 participants we captured 502,851 instances of smartphone application use in tandem with corresponding emotional data from facial expressions. Our analysis shows that while in most cases application use drives user emotions, multiple application categories exist for which the causal effect is in the opposite direction. Our findings shed light on the relationship between smartphone use and emotional states. We furthermore discuss the opportunities for research and practice that arise from our findings and their potential to support emotional well-being.Close abstract Abstract: Inspired by the increasing prevalence of digital voice assistants, we demonstrate the feasibility of using voice interfaces to deploy and complete crowd tasks. We have developed Crowd Tasker, a novel system that delivers crowd tasks through a digital voice assistant. In a lab study, we validate our proof-ofconcept and show that crowd task performance through a voice assistant is comparable to that of a web interface for voicecompatible and voice-based crowd tasks for native English speakers. We also report on a field study where participants used our system in their homes. We find that crowdsourcing through voice can provide greater flexibility to crowd workers by allowing them to work in brief sessions, enabling multitasking, and reducing the time and effort required to initiate tasks. We conclude by proposing a set of design guidelines for the creation of crowd tasks for voice and the development of future voice-based crowdsourcing systems.Close abstract Abstract: Experimenter-induced influences can trigger biased responses from research participants. We evaluate how digital bots can be used as an alternative research tool to mitigate these biases, as based on existing literature. We note that the conversational interactivity provided by bots can significantly reduce biased responses and satisficing behaviour, while simultaneously enhancing disclosure and facilitating scalability. Bots can also build rapport with participants and explain tasks at hand as well as a human experimenter, with the added benefit of anonymity. However, bots often follow a predetermined script when conversing and therefore may not be able to handle complex and unstructured conversations, which could frustrate users. Studies also imply that bots with human-like features may induce experimenter effects as similar to humans. We conclude with a discussion on how bots could be designed for optimal utilisation in research.Close abstract Abstract: Kinship verification is the problem whereby a third party determines whether two people are related. Despite previous research in Psychology and Machine Vision, the factors affecting a person’s verification ability are poorly understood. Through an online crowdsourcing study, we investigate the impact of gender, race and medium type (image vs video) on kinship verification – taking into account the demographics of both raters and ratees. A total of 325 workers completed over 50,000 kinship verification tasks consisting of pairs of faces shown in images and videos from three widely used datasets. Our results identify an own-race bias and a higher verification accuracy for same-gender image pairs than opposite-gender image pairs. Our results demonstrate that humans can still outperform current state-of-the-art automated unsupervised approaches. Furthermore, we show that humans perform better when presented with videos instead of still images. Our findings contribute to the design of future humanin-the-loop kinship verification tasks, including time-critical use cases such as identifying missing persons.Close abstract Abstract: Virtual Reality (VR) environments ofer new ways and formats to consume and process information. Despite multimedia oferings, most information remains to be presented via text. VR has the potential to deliver immersive reading experiences while compen- sating for some of the drawbacks of rather static e-books. To allow readers to step into virtual books, we developed a 3D reading envi- ronment with three page-turning techniques for VR. Readers either move the camera position from page to page or control the page fow as positioned in a sequential or radial arrangement. Results from a user study with 18 participants show that moving pages is perceived as more comfortable than moving the camera position while allowing for higher fuency and reading speeds. Linear page movements support readers’ focus on a single page whereas the radial arrangement enables readers to jump between pages quickly. Our fndings inform the design of immersive reading experiences in VR.Close abstract Abstract: Interactive public displays are versatile two-way interfaces between the digital world and passersby. They can convey information and harvest purposeful data from their users. Surprisingly little work has exploited public displays for collecting tagged data that might be useful beyond a single application. In this work, we set to fill this gap and present two studies: (1) a field study where we investigated collecting biometrically tagged video-selfies using public kiosk-sized screens, and (2) an online narrative transportation study that further elicited rich qualitative insights on key emerging aspects from the first study. In the first study, a 61-day deployment resulted in 199 video-selfies with consent to leverage the videos in any non-profit research. The field study indicates that people are willing to donate even highly sensitive data about themselves in public. The subsequent online narrative transportation study provides a deeper understanding of a variety of issues arising from the first study that can be leveraged in the future design of such systems. The two studies combined in this article pave the way forward towards a vision where volunteers can, should they so choose, ethically and serendipitously help unleash advances in data-driven areas such as computer vision and machine learning in health care.Close abstract Abstract: A popular methodology used for in situ observations is the Experience Sampling Method (ESM), in which participants intermittently answer short questionnaires. We analyse a set of recent ESM studies and find substantial differences in the number of collected responses across participants. These differences amount to ‘compliance bias’, as the experiences of responsive participants skew the results. Our work develops ways for researchers to ensure the collection of an adequate number of responses across participants. Through a cross-study analysis of ESM studies, we construct a model that describes the effect of contextual, routine, and study-specific factors on participants’ response rate. In addition to previous work, which aims to maximise the number of total responses, this work also aims to achieve a more equal distribution of responses between participants. In order to achieve this goal, we analyse which contextual cues can be personalised to achieve a higher response rate. Our results highlight a number of factors that have a strong effect on participants’ response rate and can guide the design of future experiments.Close abstract Abstract: NIRS is a spectroscopic method that propagates near-infrared waves through objects and measures the absorbance by diffuse reflection, users could analyze the composition information of objects based on that. The technology has fast speed and non-destructive analysis features with relatively simple requirements for operators, making it very friendly to non-expert users. Traditional NIRS scanners used in research laboratories are large and expensive, while recently more and more affordable smaller NIRS scanners are appearing, which attract more end-users to buy and use. Besides, pairing the technology with mobile devices (smartphones, tablets, etc.) could get rid of other professional operation problems, and bring much more possibilities to non-expert users in realistic scenarios. We will explore one such use case in this paper with the extension of work by (Klakegg et al., 2018), namely Smart Pillbox for elderly care. We develop a prototype solution consisting of a hardware-software assistance to support non-expert users.Close abstract Abstract: Users are increasingly confronted with a tremendous amount of information proactively and without explicit consent through notifications from a variety of applications and services. This information load is increased due to the ubiquity of end-user (mobile) devices. Novel computing paradigms such as IoT and smart cities may further overload end-users, despite the clear indication from literature that human attention is limited. To counter this challenge, “attention management”, including attention representation, sensing, prediction, analysis, personalization, and adaptive behavior is needed in our computing systems. Following the successful UbiTtention workshop series as organised from 2016 on-wards, the UbiTtention 2020 workshop brings together researchers and practitioners from academia and industry to explore the management of human attention and notifications across versatile devices and contexts. UbiTtention presents and elicits research to overcome information overload and overchoice – tailoring device or application behavior to user needs.Close abstract Abstract: The Diet Explorer is a lightweight system that relies on aggregated human insights for assessing and recommending suitable weight loss diets. We compared its performance against Google and suggest that the system, bootstrapped using a public crowdsourcing platform, provides comparable results in terms of overall satisfaction, relevance, and trustworthiness.Close abstract Abstract: The widespread availability of technologically-advanced mobile devices has brought researchers the opportunity to observe human life in day-to-day circumstances. Rather than studying human behaviour through extensive surveys or in artificial laboratory situations, this research instrument allows us to systematically capture human life in naturalistic settings. Mobile devices can capture two distinct data streams. First, the data from sensors embedded within these devices can be appropriated to construct the context of study participants. Second, participants can be asked to actively and repeatedly provide data on phenomena which cannot be reliably collected using the aforementioned sensor streams. This method is known as Experience Sampling. Researchers employing this method ask participants to provide observations multiple times per day, across a range of contexts, and to reflect on current rather than past experiences. This approach brings a number of advantages over existing methods, such as the ability to observe shifts in participant experiences over time and context, and reducing reliance on the participant’s ability to accurately recall past events. As the onus of data collection lies with participants rather researchers, there is a firm reliance on the reliability of participant contributions. While previous work has focused on increasing the number of participant contributions, the quality of these contributions has remained relatively unexplored. This thesis focuses on improving the quality and quantity of participant data collected through mobile Experience Sampling. Assessing and subsequently improving the quality of participant responses is a crucial step towards increasing the reliability of this increasingly popular data collection method. Previous recommendations for researchers are based primarily on anecdotal evidence or personal experience in running Experience Sampling studies. While such insights are valuable, it is challenging to replicate these recommendations and quantify their effect. Furthermore, we evaluate the application of this method in light of recent developments in mobile devices. The opportunities and challenges introduced by smartphone-based Experience Sampling studies remain underexplored in the current literature. Such devices can be utilised to infer participants’ context and optimise questionnaire scheduling and presentation to increase data quality and quantity. By deploying our studies on these devices, we explore the opportunities of mobile sensing and interaction in the context of mobile Experience Sampling studies. Our findings illustrate the feasibility of assessing and quantifying participant accuracy through the use of peer assessment, ground truth questions, and the assessment of cognitive skills. We empirically evaluate these approaches across a variety of study goals. Furthermore, our results provide recommendations on study design, motivation and data collection practices, and appropriate analysis techniques of participant data concerning response accuracy. Researchers can use our findings to increase the reliability of their data, to collect participant responses more evenly across different contexts in order to reduce the potential for bias, and to increase the total number of collected responses. The goal of this thesis is to improve the collection of human-labelled data in ESM studies, thereby strengthening the role of smartphones as valuable scientific instruments. Our work reveals a clear opportunity in the combination of human and sensor data sensing techniques for researchers interested in studying human behaviour in situ.Close abstract Abstract: The increased reliance on algorithmic decision-making in socially impactful processes has intensified the calls for algorithms that are unbiased and procedurally fair. Identifying fair predictors is an essential step in the construction of equitable algorithms, but the lack of ground-truth in fair predictor selection makes this a challenging task. In our study, we recruit 90 crowdworkers to judge the inclusion of various predictors for recidivism. We divide participants across three conditions with varying group composition. Our results show that participants were able to make informed decisions on predictor selection. We find that agreement with the majority vote is higher when participants are part of a more diverse group. The presented workflow, which provides a scalable and practical approach to reach a diverse audience, allows researchers to capture participants’ perceptions of fairness in private while simultaneously allowing for structured participant discussion.Close abstract Abstract: Social conformity occurs when an individual changes their behaviour in line with the majority’s expectations. Although social conformity has been investigated in small group settings, the effect of gender – of both the individual and the majority/minority – is not well understood in online settings. Here we systematically investigate the impact of groups’ gender composition on social conformity in online settings. We use an online quiz in which participants submit their answers and confidence scores, both prior to and following the presentation of peer answers that are dynamically fabricated. Our results show an overall conformity rate of 39%, and a significant effect of gender that manifests in a number of ways: gender composition of the majority, the perceived nature of the question, participant gender, visual cues of the system, and final answer correctness. We conclude with a discussion on the implications of our findings in designing online group settings, accounting for the effects of gender on conformity.Close abstract Abstract: Near-Infrared Spectroscopy (NIRS) is a non-invasive sensing technique which can be used to acquire information on an object’s chemical composition. Although NIRS is conventionally used in dedicated laboratories, the recent introduction of miniaturized NIRS scanners has greatly expanded the use cases of this technology. Previous work from the UbiComp community shows that miniaturized NIRS can be successfully adapted to identify medical pills and alcohol concentration. In this paper, we further extend this technology to identify sugar (sucrose) contents in everyday drinks. We developed a standalone mobile device which includes inter alia a NIRS scanner and a 3D printed clamp. The clamp can be attached to a straw-like tube to sense a liquid’s sucrose content. Through a series of studies, we show that our technique can accurately measure sucrose levels in both lab-made samples and commercially available drinks, as well as classify commercial drinks. Furthermore, we show that our method is robust to variations in the ambient temperature and lighting conditions. Overall, our system can estimate the concentration of sugar with ±0.29 g/100ml error in lab-made samples and < 2.0 g/100ml error in 18 commercial drinks, and can identify everyday drinks with > 99% accuracy. Furthermore, in our analysis, we are able to discern three characteristic wavelengths in the near-infrared region (1055 nm, 1235 nm and 1545 nm) with acute responses to sugar (sucrose). Our proposed protocol contributes to the development of everyday “food scanners” consumers.Close abstract Abstract: In order to build more fair Artificial Intelligence applications, a thorough understanding of human morality is required. Given the variable nature of human moral values, AI algorithms will have to adjust their behaviour based on the moral values of its users in order to align with end user expectations. Quantifying human moral values is, however, a challenging task which cannot easily be completed using e.g. surveys. In order to address this problem, we propose the use of game theory in longitudinal mobile sensing deployments. Game theory has long been used in disciplines such as Economics to quantify human preferences by asking participants to choose between a set of hypothetical options and outcomes. The behaviour observed in these games, combined with the use of mobile sensors, enables researchers to obtain unique insights into the effect of context on participant convictions.Close abstract Abstract: The measurement of participant attention is a frequent by-product of mobile sensing-based studies, which typically focus on user interruptibility or the effectiveness of notification deliveries. We note that, despite the popularity of interruptibility research within our discipline, research focused on attention is surprisingly scarce. This omission may be due to (a combination of) methodological, technological, or disciplinary constraints. In this paper, we argue how attention levels can be effectively measured with existing technologies and methodologies by adapting continuous measurements of attention fluctuations. Many clinically researched technologies, as well as sensing-based analysis methods, could be leveraged for this purpose. This paper invites co-researchers to assess the use of novel ways to measure attention in their future endeavours.Close abstract Abstract: Mobile self-reports are a popular technique to collect participant labelled data in the wild. While literature has focused on increasing participant compliance to self-report questionnaires, relatively little work has assessed response accuracy. In this paper, we investigate how participant context can affect response accuracy and help identify strategies to improve the accuracy of mobile self-report data. In a 3-week study we collect over 2,500 questionnaires containing both verifiable and non-verifiable questions. We find that response accuracy is higher for questionnaires that arrive when the phone is not in ongoing or very recent use. Furthermore, our results show that long completion times are an indicator of a lower accuracy. Using contextual mechanisms readily available on smartphones, we are able to explain up to 13% of the variance in participant accuracy. We offer actionable recommendations to assist researchers in their future deployments of mobile self-report studies.Close abstract Abstract: We present the results of our experiment aimed to comprehensively understand the combination of 1) how smartphone users interact with their notifications, 2) what notification content is considered important, 3) the complex relationship between the interaction choices and content importance, and lastly 4) establish an intelligent method to predict user’s preference to seeing an incoming notification. We use a dataset of notifications received by 40 anonymous users in-the-wild, which consists of 1) qualitative user-labelled information about their preferences on notification’s contents, 2) notification source, and 3) the context in which the notification was received. We assess the effectiveness of personalised prediction models generated using a combination of self-reported content importance and contextual information. We uncover four distinct user types, based on the number of daily notifications and interaction choices. We showcase how usage traits of these groups highlight the requirement for notification filtering approaches, e.g., when specific users habitually neglect to manually filter out unimportant notifications. Our machine learning-based predictor, based on both contextual sensing and notification contents can predict the user’s preference for successfully acknowledging an incoming notification with 91.1% mean accuracy, crucial for time-critical user engagement and interventions.Close abstract Abstract: Situated crowdsourcing has been growing in popularity as an alternative way to collect complex and often creative crowd work. However, previous situated crowdsourcing deployments have not successfully leveraged cooperation possibilities with their audiences, which can improve the data quality of deployed macrotasks. In this chapter, we present three situated crowdsourcing case studies that used different situated technologies and identify the reasons behind their missteps regarding promoting cooperation between workers. Then, based on the identified issues, we propose the design of a novel situated crowdsourcing platform that aims to effectively support cooperation without alienating solo workers. In order to gather insights on our proposed design, we built a prototype platform and evaluated it using a laboratory study with 24 participants. In general, participants were positive about the idea as it provided an easy way to cooperate with friends when completing tasks, while also allowing them to adjust the working environment to their liking. Finally, we conclude by offering insights towards improving cooperation in future situated crowdsourcing deployments and how this can assist in completing macrotasks.Close abstract Abstract: Research shows that environmental factors such as ambient noise and cold ambience can render users situationally impaired, adversely affecting interaction with mobile devices. However, an internal factor which is known to negatively impact cognitive abilities – stress – has not been systematically investigated in terms of its impact on mobile interaction. In this paper, we report a study where we use the Trier Social Stress Test to induce stress on participants, and investigate its effect on three aspects of mobile interaction: target acquisition, visual search, and text entry. We find that stress reduces completion time and accuracy during target acquisition tasks, as well as completion time during visual search tasks. Finally, we are able to directly contrast the magnitude of these effects to previously published effects of environmentally-caused impairments. Our work contributes to the growing body of literature on situational impairments.Close abstract Abstract: Interactive public displays have matured into highly capable two-way interfaces. They can be used for efficiently delivering information to people as well as for collecting insights from their users. While displays have been used for harvesting opinions and other content from users, surprisingly little work has looked into exploiting such screens for the consensual collection of tagged data that might be useful beyond one application. We present a field study where we collected biometrically tagged data using public kiosk-sized interactive screens. During 61 days of deployment time, we collected 199 selfie videos, cost-efficiently and with consent to leverage the videos in any non-profit research. 78 of the videos also had metadata attached to them. Overall, our studies indicate that people are willing to donate even highly sensitive data about themselves in public but that, at the same time, the participants had specific ethical and privacy concerns over the future of their data. Our study paves the way forward toward a future where volunteers can ethically help advance innovations in computer vision research across a variety of exciting application domains, such as health monitoring and care.Close abstract Abstract: Matching crowd workers to suitable tasks is highly desirable as it can enhance task performance, reduce the cost for requesters, and increase worker satisfaction. In this paper, we propose a method that considers workers’ cognitive ability to predict their suitability for a wide range of crowdsourcing tasks. We measure cognitive ability via fast-paced online cognitive tests with a combined average duration of 6.2 minutes. We then demonstrate that our proposed method can effectively assign or recommend workers to five different popular crowd tasks: Classification, Counting, Proofreading, Sentiment Analysis, and Transcription. Using our approach we demonstrate a significant improvement in the expected overall task accuracy. While previous methods require access to worker history or demographics, our work offers a quick and accurate way to determine which workers are more suitable for which tasks.Close abstract Abstract: The adverse effect of ambient noise on humans has been extensively studied in fields like cognitive science, indicating a significant impact on cognitive performance, behaviour, and emotional state. Surprisingly, the effect of ambient noise has not been studied in the context of mobile interaction. As smartphones are ubiquitous by design, smartphone users are exposed to a wide variety of ambient noises while interacting with their devices. In this paper, we present a structured analysis of the effect of six distinct ambient noise types on typical smartphone usage tasks. The evaluated ambient noise types include variants of music, urban noise and speech. We analyse task completion time and errors, and find that different ambient noises affect users differently. For example, while speech and urban noise slow down text entry, being exposed to music reduces completion time in target acquisition tasks. Our study contributes to the growing research area on situational impairments, and we compare our results to previous work on the effect of cold-induced situational impairments. Our results can be used to support smartphone users through adaptive interfaces which respond to the ongoing context of the user.Close abstract Abstract: The Experience Sampling Method is widely used to collect human labelled data in the wild. Using this methodology, study participants repeatedly answer a set of questions, constructing a rich overview of the studied phenomena. One of the methodological decisions faced by researchers is deciding on the question scheduling. The literature defines three distinct schedule types: randomised, interval-based, or event-based (in our case, smartphone unlock). However, little evidence exists regarding the side-effects of these schedules on response rate and recall accuracy, and how they may bias study findings. We evaluate the effect of these three contingency configurations in a 3-week within-subjects study (N = 20). Participants answered various objective questions regarding their phone usage, while we simultaneously establish a ground-truth through smartphone instrumentation. We find that scheduling questions on phone unlock yields a higher response rate and accuracy. Our study provides empirical evidence for the effects of notification scheduling on participant responses, and informs researchers who conduct experience sampling studies on smartphones.Close abstract Abstract: Big data and the increased use of Artificial Intelligence (AI) and Machine Learning (ML) have opened many new opportunities for continuous decision-support by autonomous systems. While initial work has begun to explore how human morality can inform the decision-making of future AI’s , these approaches consider human morality as a static concept. We note that human morality and decision-making is affected not only by cultures and personalities but is to a large degree affected by an individual’s context. In order to align with the moral judgements of their users, future ML applications should adjust their decision-making accordingly based on user context. In this work, we discuss our critical take on the importance of contextual morality for AI and identify opportunities for future work.Close abstract Abstract: Digital voice assistants or smart speakers have rapidly changed the landscape of voice user interfaces over the past few years. In this paper we discuss how we could utilise the affordances of these devices to create a novel crowdsourcing platform that deliver crowd tasks through voice with particular focus on creative tasks. We describe the potential benefits and challenges of using this technology for these purposes, and outline our future work in this research area.Close abstract Abstract: We investigate the predictability of the next unlock event on smartphones, using machine learning and smartphone contextual data. In a 2-week field study with 27 participants, we demonstrate that it is possible to predict when the next unlock event will occur. Additionally, we show how our approach can improve accuracy and energy efficiency by solely relying on software-related contextual data. Based on our findings, smartphone applications and operating systems can improve their energy efficiency by utilising short-term predictions to minimise unnecessary executions, or launch computation-intensive tasks, such as OS updates, in the locked state. For instance, by inferring the next unlock event, smartphones can pre-emptively collect sensor data or prepare timely content to improve the user experience during the subsequent phone usage session.Close abstract Abstract: This paper presents our findings on knowledge work environment usage behaviour through a combined automated mobile indoor positioning system and self-reports collected from the environment’s inhabitants. Contemporary work environments are increasingly flexible multi-occupant environments as opposed to cellular offices. Understanding persons’ task-related and situation-related environmental needs is critical to improve the design of future knowledge work environments. This study is conducted in a team office environment prior to and following an intervention in which the office layout was re-organized. The combined methodological approach described in this paper provides a new tool for architecture researchers aiming to understand the use of workspaces. Importantly, combining self-reports with context-aware location data collection provides researchers an efficient in situ tool to access participants experiences and decision-making process in choosing their workstation or workspace.Close abstract Abstract: A fundamental challenge in real-time labelling of activity data is user burden. The Experience Sampling Method (ESM) is widely used to obtain such labels for sensor data. However, in an in-situ deployment, it is not feasible to expect users to precisely label the start and end time of each event or activity. For this reason, time-point based experience sampling (without an actual start and end time) is prevalent. We present a framework that applies multi-instance and semi-supervised learning techniques to perform to predict user annotations from multiple mobile sensor data streams. Our proposed framework estimates users’ annotations in ESM-based studies progressively, via an interactive pipeline of co-training and active learning. We evaluate our work using data collected from an in-the-wild data collection.Close abstract Abstract: The authors present Back Pain Buddy, a mobile application offering decision support and coaching for people with low back pain (LBP). The application takes advantage of smartphones powerful capabilities and provides a crowd-sourced decision support system for discovering treatments and a mobile sensing solution for collecting data about users activities that are crucial in LBP research.Close abstract Abstract: The adverse effect of ambient noise on humans has been extensively studied in fields like cognitive science, indicating a significant impact on cognitive performance, behaviour, and emotional state. Surprisingly, the effect of ambient noise has not been studied in the context of mobile interaction. As smartphones are ubiquitous by design, smartphone users are exposed to a wide variety of ambient noises while interacting with their devices. In this paper, we present a structured analysis of the effect of six distinct ambient noise types on typical smartphone usage tasks. The evaluated ambient noise types include variants of music, urban noise and speech. We analyse task completion time and errors, and find that different ambient noises affect users differently. For example, while speech and urban noise slow down text entry, being exposed to music reduces completion time in target acquisition tasks. Our study contributes to the growing research area on situational impairments, and we compare our results to previous work on the effect of cold-induced situational impairments. Our results can be used to support smartphone users through adaptive interfaces which respond to the ongoing context of the user.Close abstract Abstract: The collection of human contributions through mobile devices is increasingly common across a range of methodologies. However, possible quality issues of these contributions are often overlooked. As the quality of human data has a direct impact on study reliability, more should be done to improve the accuracy of these contributions. We identify and categorise solutions aimed at increasing the accuracy of contributions prior, during, and following data collection. Our categorisation assists in the positioning of future work in this area and fosters the usage of cross-methodological practises.Close abstract Abstract: Our work investigates the use of a Near InfraRed Spectroscopy scanner for the identification of liquids. While previous work has shown promising results for the identification of solid objects, identifying liquids poses additional challenges. These challenges include light scattering and low reflectance caused by the transparency of liquids, which interfere with the infrared measurement. We develop a prototype solution consisting of a 3D printed clamp that attaches to a tube, such that it blocks ambient light from interfering. Our preliminary results indicate that our prototype works, and we demonstrate this by measuring sugar levels in a liquid solution.Close abstract Abstract: Our world is increasingly interconnected via a wide variety of computers, IoT, wearable and mobile devices. The information provided collectively through these devices offers insightful information on our everyday lives, daily patterns, mood, behaviour, and surrounding environment. Our workshop brings together researchers interested in collecting and augmenting context to understand device specific behaviour and routines, human behaviour and mood, and changes in the environment. The outcomes of this workshop are new tools, methodologies, and potential collaborations for sensing the outlying world as well as ourselves.Close abstract Abstract: Ubicomp/HCI researchers are increasingly using smartphones to collect human-labelled data ‘in the wild’. While this allows for the collection of a wide range of interesting data in authentic settings and surroundings, humans are notoriously inconsistent in the quality of their contributions. Improving the quality of data collected with mobile devices is a largely unexplored, but highly relevant field. The primary objective of this workshop is to share insights, ideas, and discoveries on the quality of mobile human contributions. The work presented in the International Workshop on Mobile Human Contributions (MHC ’18) explores methods, tools, and novel approaches towards increasing the reliability of human data submissions with mobile devices.Close abstract Abstract: The 2018 ACM CHI Conference on Human Factors in Computing Systems took place 21-27 April in Montreal, Canada, and attracted over 3,000 participants. This years CHI marked the 50th anniversary of the Mother of all Demos and showcased many exciting new technologies and techniques that will help shape the future of pervasive computing.Close abstract Abstract: Can Internet search data be used as a proxy to predict refugee mobility? The soaring refugee death toll in Europe creates an urgent need for novel tools that monitor and forecast refugee flows. This study investigates the correlation between refugee mobility data and Internet search data from Google Trends. Google Trends is a freely accessible tool that provides access to Internet search data by analyzing a sample of all web queries. In our study, we surveyed refugees in Greece (entry point) and in Finland (destination point) to identify what search queries they had used during their travel. Next, we conducted time series analysis on Google search data to investigate whether interest in user-defined search queries correlated with the levels of refugee arrival data recorded by the United Nations High Commissioner for Refugees (UNHCR). Results indicate that the reuse of internet search data considerably improves the predictive power of the models.Close abstract Abstract: Near-infrared spectroscopy (NIRS) measures the light reflected from objects to infer highly detailed information about their molecular composition. Traditionally, NIRS has been an instrument reserved for laboratory usage, but recently affordable and smaller devices for NIRS have proliferated. Pairing this technology with the ubiquitous smartphone opens up a plethora of new use cases. In this paper, we explore one such use case, namely medication management in a nursing home/elderly care centre. First, we conducted a qualitative user study with nurses working in an elderly care centre to examine the protocols and workflows involved in administering medication, and the nurses’ perceptions on using this technology. Based on our findings, we identify the main impact areas that would benefit from introducing miniaturised NIRS. Finally, we demonstrate via a user study in a realistic scenario that miniaturised NIRS can be effectively used for medication management when leveraging appropriate machine learning techniques. Specifically, we assess the performance of multiple pre-processing and classification algorithms for a selected set of pharmaceuticals. In addition, we compare our solution with currently used methods for pharmaceutical identification in a local care centre. We hope that our reflection on the multiple aspects associated with the introduction of this device in an elderly care setting can help both academics and practitioners working on related problems.Close abstract Abstract: With the rapid proliferation of social media sites, researchers have increasingly turned to data generated from these platforms to investigate human behaviour. In this paper we report the design and implementation of the RDV (Reddit Data Visualisation) platform, a visualisation tool aimed at facilitating the analysis of a publicly available Reddit dataset, which contains 1.7 billion JSON objects collected from October 2007 to October 2015. RDV allows for researchers without advanced coding skills to easily analyse this dataset, while also providing a tailor-made platform to account for the intricacies of any dataset originating from Reddit. We showcase the features of the platform through an example of data analysis using the Reddit dataset: the 2015 United Kingdom general elections. Finally, we conclude by discussing the need for better and simpler visualisation tools for non-technical researchers to analyse Big Online Behavioural Datasets, and report our ongoing work in this area.Close abstract Abstract: Low back pain (LBP) is a globally common condition with no silver bullet solutions. Further, the lack of therapeutic consensus causes challenges in choosing suitable solutions to try. In this work, we crowdsourced knowledge bases on LBP treatments. The knowledge bases were used to rank and offer best-matching LBP treatments to end users. We collected two knowledge bases: one from clinical professionals and one from non-professionals. Our quantitative analysis revealed that non-professional end users perceived the best treatments by both groups as equally good. However, the worst treatments by non-professionals were clearly seen as inferior to the lowest ranking treatments by professionals. Certain treatments by professionals were also perceived significantly differently by non-professionals and professionals themselves. Professionals found our system handy for self-reflection and for educating new patients, while non-professionals appreciated the reliable decision support that also respected the non-professional opinion.Close abstract Abstract: The Internet offers plenty of options for those who want to lose weight. Choosing among the practically unlimited number of weight loss diets, exercises, and pills, advertised as borderline magical, is however challenging. We present The Diet Explorer, a crowd-powered, knowledge base that can be queried in real-time to discover weight loss diets that best match personal needs. Our long-term goals are to help people in making better-informed dieting decisions and ultimately reach more satisfactory diet outcomes.Close abstract Abstract: Online crowdsourcing enables the distribution of work to a global labor force as small and often repetitive tasks. Recently, situated crowdsourcing has emerged as a complementary enabler to elicit labor in specific locations and from specific crowds. Teamwork in online crowdsourcing has been recently shown to increase the quality of output, but teamwork in situated crowdsourcing remains unexplored. We set out to fill this gap. We present a generic crowdsourcing platform that supports situated teamwork and provide experiences from a laboratory study that focused on comparing traditional online crowdsourcing to situated team-based crowdsourcing. We built a crowdsourcing desk that hosts three networked terminal displays. The displays run our custom team-driven crowdsourcing platform that was used to investigate collocated crowdsourcing in small teams. In addition to analyzing quantitative data, we provide findings based on questionnaires, interviews, and observations. We highlight 1) emerging differences between traditional and collocated crowdsourcing, 2) the collaboration strategies that teams exhibited in collocated crowdsourcing, and 3) that a priori team familiarity does not significantly affect collocated interaction in crowdsourcing. The approach we introduce is a novel multi-display crowdsourcing setup that supports collocated labor teams and along with the reported study makes specific contributions to situated crowdsourcing research.Close abstract Abstract: The Experience Sampling Method (ESM) is used by scientists from various disciplines to gather insights into the intra-psychic elements of human life. Researchers have used the ESM in a wide variety of studies, with the method seeing increased popularity. Mobile technologies have enabled new possibilities for the use of the ESM, while simultaneously leading to new conceptual, methodological, and technological challenges. In this survey, we provide an overview of the history of the ESM, usage of this methodology in the computer science discipline, as well as its evolution over time. Next, we identify and discuss important considerations for ESM studies on mobile devices, and analyse the particular methodological parameters scientists should consider in their study design. We reflect on the existing tools that support the ESM methodology and discuss the future development of such tools. Finally, we discuss the effect of future technological developments on the use of the ESM and identify areas requiring further investigation.Close abstract Abstract: User interaction with mobile devices can be negatively affected by contextual factors, known as situationally-induced impairments. In this paper, we provide a systematic overview of established situational impairments and their impact on interaction with mobile devices, as well as existing methods for their detection and design guidelines to overcome them. We also propose a research roadmap for this topic where we argue that more experiments are required regarding the less investigated situational impairments. Furthermore, we argue that successful detection of the presence of a specific situational impairment is paramount before solutions can be proposed to adapt mobile interfaces to accommodate potential situational impairments.Close abstract Abstract: The Experience Sampling Method is used to capture high-quality in situ data from study participants. This method has become popular in studies involving smartphones, where it is often adapted to motivate participation through the use of gamification techniques. However, no work to date has evaluated whether gamification actually affects the quality and quantity of data collected through Experience Sampling. Our study systematically investigates the effect of gamification on the quantity and quality of experience sampling responses on smartphones. In a field study, we combine event contingent and interval contingent triggers to ask participants to describe their location. Subsequently, participants rate the quality of these entries by playing a game with a purpose. Our results indicate that participants using the gamified version of our ESM software provided significantly higher quality responses, slightly increased their response rate, and provided significantly more data on their own accord. Our findings suggest that gamifying experience sampling can improve data collection and quality in mobile settings.Close abstract Abstract: Previous work has highlighted the detrimental effect of cold ambience on fine-motor skills during interaction with mobile devices. In this work, we develop a method to infer changes in finger temperature of smartphone users without the need for specialised hardware. Specifically, we demonstrate that smartphone battery temperature is a reliable gauge for determining changes to finger temperature. In addition, we show that the behaviour of smartphone battery temperature in cold settings is consistent across different smartphone models and battery configurations. Our method can be used to determine cold-induced situational impairments, and trigger interface adaptations during mobile interaction.Close abstract Abstract: We develop and evaluate a new ubiquitous crowdsourcing platform called CrowdPickUp, that combines the advantages of mobile and situated crowdsourcing to overcome their respective limitations. In a 19-day long field study with 70 participants, we evaluate the quality of work that CrowdPickUp produces. In particular, we measure quality in terms of worker performance in a variety of tasks (requiring local knowledge, location-based, general) while using a number of different quality control mechanisms, and also capture workers’ perceptions of the platform. Our findings show that workers of CrowdPickUp contributed data of comparable quality to previously presented crowdsourcing deployments while at the same time allowing for a wide breadth of tasks to be deployed. Finally, we offer insights towards the continued exploration of this research agenda.Close abstract Abstract: We present an assistive healthcare platform, CARE, which aims to provide daily support for elderly caregivers with context-aware, unobtrusive, and actionable information. This information is collected through a plethora of IoT sensors installed strategically at an elderly care centre and is accessed through an Android tablet application. The application’s goal is to empower nurses with a better understanding of elderly needs and ultimately, improve the care service. We investigate how IoT devices and sensors can enable a pervasive healthcare system, and discuss a wide-range of important parameters for integration of elderly care practices.Close abstract Abstract: To humanize interaction between users and computers, one needs the ability to infer the users’ mood. One approach is to use a vision-based approach. We quantify the ‘preview effect’ bias in visual mood assessment. We demonstrate that automated tools which infer user mood from photographs or video may be affected by the presentation methodology used while performing image capture. Specifically, we demonstrate that showing a “preview” of oneself, i.e., a mirror, increases the accuracy of the visual mood inference algorithms present in Google’s Mobile Vision API. Our findings show that studies that incorporate visual mood assessment should include “preview” images to reduce bias and increase the reliability of vision-based happiness inference.Close abstract Abstract: Previous work suggests that Quantified-Self applications can retain long-term usage with motivational methods. These methods often require intermittent attention requests with manual data input. This may cause unnecessary burden to the user, leading to annoyance, frustration and possible application abandonment. We designed a novel method that uses on-screen alert dialogs to transform recurrent smartphone usage sessions into moments of data contributions and evaluate how accurately machine learning can reduce unintended interruptions. We collected sensor data from 48 participants during a 4-week long deployment and analysed how personal device usage can be considered in scheduling data inputs. We show that up to 81.7% of user interactions with the alert dialogs can be accurately predicted using user clusters, and up to 75.5% of unintended interruptions can be prevented and rescheduled. Our approach can be leveraged by applications that require selfreports on a frequent basis and may provide a better longitudinal QS experience.Close abstract Abstract: We argue that improved data entry can motivate Quantified-Self (QS) users to better engage with QS applications. To improve data entry, we investigate the notion of transforming active smartphone usage into data logging contributions through alert dialogs. We evaluate this assertion in a 4-week long deployment with 48 participants. We collect 17,906 data entries, where 68.3% of the entries are reported using the alert dialogs. We demonstrate that QS applications can benefit from alert dialogs: to increase data precision, frequency, and reduce the probability of forgetfulness in data logging. We investigate the impact of usage session type (e.g., sessions with different goals or durations) and the assigned reminder delay on frequency of data contributions. We conclude with insights gathered from our investigation, and the implications they have on future designs.Close abstract Abstract: Elderly care is a pressing societal challenge: government’s financial burden is expected to exponentially increase in the next 20 years as the population is aging rapidly. Solutions to mitigate this challenge include the use of IoT and software solutions to minimise the effort of elderly care, in care centres and at home. To accomplish this, we set to quantify what are the most important elderly care metrics (i.e., what is important to support caregivers’ work) through field observations and interviews at a local care centre housing 14 old adults. We designed iteratively and evaluated the usefulness of a mobile application with 8 caregivers, to summarise and communicate the care metrics, juxtaposed with wellbeing data (e.g., social interaction, mobility and others), part of a larger elderly care support platform, CARE. The goal of the mobile application is to enable a better care service by raising awareness to daily needs and routines of the elderly and to provide quick access to their wellbeing information. Our findings advocate that our design could positively benefit the care personnel and assist them carrying out the daily duties at the care centre.Close abstract Abstract: Near Infrared Spectroscopy (NIRS) is a sensing technique in which near infrared light is transmitted into a sample, followed by light absorbance measurements at various wavelengths. This technique enables the inference of the inner chemical composition of the scanned sample, and therefore can be used to identify or classify objects. In this paper, we describe how to facilitate the use of NIRS by non- expert users in everyday settings. Our work highlights the key challenges of placing NIRS devices in the hands of non-experts. We develop a system to mitigate these challenges, and evaluate it in a user study. We show how NIRS technology can be successfully utilised by untrained users in an unsupervised manner through a special enclosure and an accompanying smartphone app. Finally, we discuss potential future developments of commoditised NIRS.Close abstract Abstract: We seek to quantify smartwatch use, and establish differences and similarities to smartphone use. Our analysis considers use traces from 307 users that include over 2.8 million notifications and 800,000 screen usage events, and we compare our findings to previous work that quantifies smartphone use. The results show that smartwatches are used more briefly and more frequently throughout the day, with half the sessions lasting less than 5 seconds. Interaction with notifications is similar across both types of devices, both in terms of response times and preferred application types. We also analyse the differences between our smartwatch dataset and a dataset aggregated from four previously conducted smartphone studies. The similarities and differences between smartwatch and smartphone use suggest effect on usage that go beyond differences in form factor.Close abstract Abstract: We present a study that quantifies the effect of cold temperature on smartphone input performance, particularly on tapping tasks. Our results show that smartphone input performance decreases when completing tapping tasks in cold temperatures. We show that colder temperature is associated with lower throughput and less accurate performance when using the phone in both one-handed and two-handed operations. We also demonstrate that colder temperature is related to higher error rate when using the phone in one-handed operation only, but not two-handed. Finally, we identify a number of design recommendations from the literature that can be considered as a countermeasure to poorer smartphone input performance in completing tapping tasks in cold temperature.Close abstract Abstract: Mobile instrumentation provides researchers and professionals the opportunity to collect data on several aspects of human life. In this paper we discuss our initial experiences on collecting data via mobile instrumentation in an elementary school. We augmented a classroom with mobile phones and Bluetooth beacons to capture student experiences as well as their relative distance to each other during a collaborative group project. We describe the study, and present lessons learned when instrumenting such a unique school setting with young participants.Close abstract Abstract: In this paper we present a study on crowdsourcing subjective knowledge. We introduce a mobile app that was built for this purpose, and compare results from two datasets collected using the app. One dataset was collected during a workshop and the other one during a one-week long field trial. We present interview findings on mobile knowledge collection. Further, we discuss the types of information that should optimally be collected on the go, and show how our data analysis supports the qualitative findings. This work directly continues our earlier efforts on creating a platform that encapsulates wisdom of the crowd for decision support.Close abstract Abstract: Researchers who analyse smartphone usage logs often make the assumption that users who lock and unlock their phone for brief periods of time (e.g., less than a minute) are continuing the same “session” of interaction. However, this assumption is not empirically validated, and in fact different studies apply different arbitrary thresholds in their analysis. To validate this assumption, we conducted a field study where we collected user-labelled activity data through ESM and sensor logging. Our results indicate that for the majority of instances where users return to their smartphone, i.e., unlock their device, they in fact begin a new session as opposed to continuing a previous one. Our findings suggest that the commonly used approach of ignoring brief standby periods is not reliable, but optimisation is possible. We therefore propose various metrics related to usage sessions and evaluate various machine learning approaches to classify gaps in usage.Close abstract Abstract: Research claims that users value the battery life of their smartphones, but no study to date has attempted to quantify battery value and how this value changes according to users’ current context and needs. Previous work has quantified the monetary value that smartphone users place on their data (e.g., location), but not on battery life. Here we present a field study and methodology for systematically measuring the monetary value of smartphone battery life, using a reverse second-price sealed-bid auction protocol. Our results show that the prices for the first and last 10% battery segments differ substantially. Our findings also quantify the tradeoffs that users consider in relation to battery, and provide a monetary model that can be used to measure the value of apps and enable fair ad-hoc sharing of smartphone resources.Close abstract Abstract: The quantified-self movement entails self-tracking of physical activity, often using wearable devices and mobile applications. In parallel, mobile applications focusing on mental health are increasingly popular, and they often rely on active user input to track the user progress and to deliver feedback and motivation. In this paper we discuss the potential benefits of bridging these two distinct yet highly relevant application domains. We argue for the benefits of combining explicit (user-provided) and implicit (devicecollected) data sources in the context of mental health care. We argue that this combination allows for improved methods to observe patients’ lives, and thus provide a more in-depth overview of their progress. This may enable mental health professionals to establish more personalised and adaptive care plans.Close abstract Abstract: This study outlines initial steps to define a new framework to measure relatedness, originality and creativity of student projects in FabLab environment. A default project topic provided to students in a 3D fabrication and prototyping class served as a basis to investigate originality on the functional component level. The added components with their input and output methods, along with the control logic, were used to judge the relatedness to a given solution of the generated design ideas. An example set of ideas given by the students was evaluated with the proposed framework. The framework can complement existing measures of originality and creativity in general.Close abstract Abstract: Quantified Selfers are individuals that take a proactive stance to collect and act upon their personal data. However, these endeavours towards a better insight into one’s life often do not last long. An important challenge for QS is sustaining data collection over a long period of time (i.e., months, years, decades). In this paper we discuss the drivers, needs and concerns of longitudinal QS-data collection. We argue that to support longitudinal QS various obstacles have to be overcome, including i) integration and sharing of data between a variety of (new) devices, ii) incorporating human input for psychological data collection and iii) providing answers to the questions people really have.Close abstract