---
authors:
  - givenNames:
      - James
      - J
    familyNames:
      - Gilchrist
    type: Person
    emails:
      - james.gilchrist@paediatrics.ox.ac.uk
    affiliations:
      - name: Department of Paediatrics, University of Oxford
        address:
          addressLocality: Oxford
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
      - name: MRC–Weatherall Institute of Molecular Medicine, University of Oxford
        address:
          addressLocality: Oxford
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
      - name: Wellcome Centre for Human Genetics, University of Oxford
        address:
          addressLocality: Oxford
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
  - givenNames:
      - Silvia
      - 'N'
    familyNames:
      - Kariuki
    type: Person
    affiliations:
      - name: KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast
        address:
          addressLocality: Kilifi
          addressCountry: Kenya
          type: PostalAddress
        type: Organization
  - givenNames:
      - James
      - A
    familyNames:
      - Watson
    type: Person
    affiliations:
      - name: Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford
        address:
          addressLocality: Oxford
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
      - name: Mahidol Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University
        address:
          addressCountry: Bangkok
          type: PostalAddress
        type: Organization
  - givenNames:
      - Gavin
    familyNames:
      - Band
    type: Person
    affiliations:
      - name: Wellcome Centre for Human Genetics, University of Oxford
        address:
          addressLocality: Oxford
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
  - givenNames:
      - Sophie
    familyNames:
      - Uyoga
    type: Person
    affiliations:
      - name: KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast
        address:
          addressLocality: Kilifi
          addressCountry: Kenya
          type: PostalAddress
        type: Organization
  - givenNames:
      - Carolyne
      - M
    familyNames:
      - Ndila
    type: Person
    affiliations:
      - name: KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast
        address:
          addressLocality: Kilifi
          addressCountry: Kenya
          type: PostalAddress
        type: Organization
  - givenNames:
      - Neema
    familyNames:
      - Mturi
    type: Person
    affiliations:
      - name: KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast
        address:
          addressLocality: Kilifi
          addressCountry: Kenya
          type: PostalAddress
        type: Organization
  - givenNames:
      - Salim
    familyNames:
      - Mwarumba
    type: Person
    affiliations:
      - name: KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast
        address:
          addressLocality: Kilifi
          addressCountry: Kenya
          type: PostalAddress
        type: Organization
  - givenNames:
      - Shebe
    familyNames:
      - Mohammed
    type: Person
    affiliations:
      - name: KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast
        address:
          addressLocality: Kilifi
          addressCountry: Kenya
          type: PostalAddress
        type: Organization
  - givenNames:
      - Moses
    familyNames:
      - Mosobo
    type: Person
    affiliations:
      - name: KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast
        address:
          addressLocality: Kilifi
          addressCountry: Kenya
          type: PostalAddress
        type: Organization
  - givenNames:
      - Kaur
    familyNames:
      - Alasoo
    type: Person
    affiliations:
      - name: Institute of Computer Science, University of Tartu
        address:
          addressLocality: Tartu
          addressCountry: Estonia
          type: PostalAddress
        type: Organization
  - givenNames:
      - Kirk
      - A
    familyNames:
      - Rockett
    type: Person
    affiliations:
      - name: Wellcome Centre for Human Genetics, University of Oxford
        address:
          addressLocality: Oxford
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
  - givenNames:
      - Alexander
      - J
    familyNames:
      - Mentzer
    type: Person
    affiliations:
      - name: Wellcome Centre for Human Genetics, University of Oxford
        address:
          addressLocality: Oxford
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
  - givenNames:
      - Dominic
      - P
    familyNames:
      - Kwiatkowski
    type: Person
    affiliations:
      - name: Wellcome Centre for Human Genetics, University of Oxford
        address:
          addressLocality: Oxford
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
      - name: Wellcome Sanger Institute
        address:
          addressLocality: Cambridge
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
  - givenNames:
      - Adrian
      - VS
    familyNames:
      - Hill
    type: Person
    affiliations:
      - name: Wellcome Centre for Human Genetics, University of Oxford
        address:
          addressLocality: Oxford
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
      - name: The Jenner Institute, University of Oxford
        address:
          addressLocality: Oxford
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
  - givenNames:
      - Kathryn
    familyNames:
      - Maitland
    type: Person
    affiliations:
      - name: KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast
        address:
          addressLocality: Kilifi
          addressCountry: Kenya
          type: PostalAddress
        type: Organization
      - name: Division of Medicine, Imperial College
        address:
          addressLocality: London
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
  - givenNames:
      - J
      - Anthony
      - G
    familyNames:
      - Scott
    type: Person
    affiliations:
      - name: KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast
        address:
          addressLocality: Kilifi
          addressCountry: Kenya
          type: PostalAddress
        type: Organization
      - name: Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine
        address:
          addressLocality: London
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
  - givenNames:
      - Thomas
      - 'N'
    familyNames:
      - Williams
    type: Person
    emails:
      - TWilliams@kemri-wellcome.org
    affiliations:
      - name: KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast
        address:
          addressLocality: Kilifi
          addressCountry: Kenya
          type: PostalAddress
        type: Organization
      - name: Institute for Global Health Innovation, Department of Surgery and Cancer, Imperial College
        address:
          addressLocality: London
          addressCountry: United Kingdom
          type: PostalAddress
        type: Organization
editors:
  - givenNames:
      - Alexander
    familyNames:
      - Young
    type: Person
    affiliations:
      - name: United States
        type: Organization
datePublished:
  value: '2022-07-22'
  type: Date
dateReceived:
  value: '2022-01-31'
  type: Date
dateAccepted:
  value: '2022-07-22'
  type: Date
title:
  - type: Emphasis
    content:
      - BIRC6
  - ' modifies risk of invasive bacterial infection in Kenyan children'
description:
  - >-
    Invasive bacterial disease is a major cause of morbidity and mortality in
    African children. Despite being caused by diverse pathogens, children with
    sepsis are clinically indistinguishable from one another. In spite of this,
    most genetic susceptibility loci for invasive infection that have been
    discovered to date are pathogen specific and are not therefore suggestive of
    a shared genetic architecture of bacterial sepsis. Here, we utilise
    probabilistic diagnostic models to identify children with a high probability
    of invasive bacterial disease among critically unwell Kenyan children with 
  - type: Emphasis
    content:
      - Plasmodium falciparum
  - ' parasitaemia. We construct a joint dataset including 1445 bacteraemia cases and 1143 severe malaria cases, and population controls, among critically unwell Kenyan children that have previously been genotyped for human genetic variation. Using these data, we perform a cross-trait genome-wide association study of invasive bacterial infection, weighting cases according to their probability of bacterial disease. In doing so, we identify and validate a novel risk locus for invasive infection secondary to multiple bacterial pathogens, that has no apparent effect on malaria risk. The locus identified modifies splicing of '
  - type: Emphasis
    content:
      - BIRC6
  - ' in stimulated monocytes, implicating regulation of apoptosis and autophagy in the pathogenesis of sepsis in Kenyan children.'
isPartOf:
  volumeNumber: 11
  isPartOf:
    title: eLife
    issns:
      - 2050-084X
    identifiers:
      - name: nlm-ta
        propertyID: https://registry.identifiers.org/registry/nlm-ta
        value: elife
        type: PropertyValue
      - name: publisher-id
        propertyID: https://registry.identifiers.org/registry/publisher-id
        value: eLife
        type: PropertyValue
    publisher:
      name: eLife Sciences Publications, Ltd
      type: Organization
    type: Periodical
  type: PublicationVolume
licenses:
  - url: http://creativecommons.org/licenses/by/4.0/
    content:
      - content:
          - 'This article is distributed under the terms of the '
          - content:
              - Creative Commons Attribution License
            target: http://creativecommons.org/licenses/by/4.0/
            type: Link
          - >-
            , which permits unrestricted use and redistribution provided that
            the original author and source are credited.
        type: Paragraph
    type: CreativeWork
keywords:
  - bacteraemia
  - GWAS
  - malaria
  - Human
identifiers:
  - name: publisher-id
    propertyID: https://registry.identifiers.org/registry/publisher-id
    value: 77461
    type: PropertyValue
  - name: doi
    propertyID: https://registry.identifiers.org/registry/doi
    value: 10.7554/eLife.77461
    type: PropertyValue
  - name: elocation-id
    propertyID: https://registry.identifiers.org/registry/elocation-id
    value: e77461
    type: PropertyValue
fundedBy:
  - identifiers:
      - value: 202800
        type: PropertyValue
    funders:
      - name: Wellcome Trust
        type: Organization
    type: MonetaryGrant
  - identifiers:
      - value: 98532
        type: PropertyValue
    funders:
      - name: Wellcome Trust
        type: Organization
    type: MonetaryGrant
  - identifiers: []
    funders:
      - name: National Institute for Health Research
        type: Organization
    type: MonetaryGrant
  - identifiers:
      - value: 223253/Z/21/Z
        type: PropertyValue
    funders:
      - name: Wellcome Trust
        type: Organization
    type: MonetaryGrant
  - identifiers:
      - value: 209265/Z/17/Z
        type: PropertyValue
    funders:
      - name: Wellcome Trust
        type: Organization
    type: MonetaryGrant
  - identifiers:
      - value: HCUZZ0
        type: PropertyValue
    funders:
      - name: Wellcome Trust
        type: Organization
    type: MonetaryGrant
  - identifiers:
      - value: 294557
        type: PropertyValue
    funders:
      - name: European Research Council
        type: Organization
    type: MonetaryGrant
about:
  - name: Genetics and Genomics
    type: DefinedTerm
  - name: Microbiology and Infectious Disease
    type: DefinedTerm
genre:
  - Research Article
bibliography: elife-77461.references.bib
---

# Introduction

Invasive bacterial diseases are a major cause of child morbidity and mortality in Africa [@bib8]. Although improved control measures, including the rollout of anti-bacterial conjugate vaccines [@bib13; @bib52], have led to recent declines, the burden of conditions such as pneumonia, meningitis, and sepsis secondary to bacterial pathogens remains significant [@bib55]. A better understanding of the biology of invasive bacterial infections in African populations might help the development of novel interventions.

Susceptibility to invasive bacterial infections varies widely between individuals. In African children, some of this variability is determined by acquired comorbidities such as HIV, malnutrition, and malaria [@bib8; @bib11; @bib51]. However, the identification of common genetic variants as determinants of bacterial infection suggests that a significant portion of this variability is heritable. Many of these genetic susceptibility loci have pathogen-specific effects [@bib15; @bib24; @bib50], which is consistent with our understanding of infection susceptibility derived from primary immunodeficiencies. Key examples of pathogen specificity among primary immunodeficiencies include Mendelian susceptibility to mycobacterial disease and susceptibility to non-tuberculous mycobacteria and nontyphoidal _Salmonella_ [@bib9], terminal complement deficiencies and meningococcal disease [@bib21], and IRAK4 deficiency and pneumococcal disease [@bib48]. A major exception to this is the rs334 A>T mutation in _HBB_ (sickle haemoglobin), which is associated with raised and lowered risks of infection secondary to a broad range of pathogens among homozygotes [@bib59] and heterozygotes [@bib51], respectively. However, the effect sizes associated with sickle haemoglobin are extreme, being maintained in populations by balancing selection, and larger sample sizes will probably be needed for the discovery of new variants with multi-pathogen effects.

Because the clinical features of invasive bacterial infections and severe malaria are broadly similar [@bib6], it can be difficult to distinguish between severe illness caused by extensive sequestration of malaria parasites in the microvasculature (‘true’ severe malaria) and bacterial sepsis in the presence of incidental parasitaemia on the basis of clinical features alone. This is made harder still by the fact that antibiotic pre-treatment and inadequate blood culture volumes mean that, even in settings with excellent diagnostic facilities, true invasive bacterial infections can often not be confirmed [@bib18]. Recently, we illustrated this clinical complexity through a study in which we used probabilistic models based on malaria-specific biomarkers to show that approximately one-third of children recruited to studies in Africa with a clinical diagnosis of severe malaria were actually suffering from other conditions [@bib57; @bib58].

In the current study, we extend this work to show that invasive bacterial infections are common in children admitted to hospital with a clinical diagnosis of severe malaria, but in whom biomarkers subsequently suggest that malaria was probably not the primary cause for their severe illness. We then construct a dataset of genome-wide genotyped samples from 5400 Kenyan children, comprising critically unwell Kenyan children with bacteraemia [@bib50] and severe malaria [@bib4], and population controls. Using this dataset, we perform a genome-wide association study (GWAS) of invasive bacterial infection in Kenyan children, weighting cases according to the probability that their disease was mediated by a bacterial pathogen. In doing so, we increase our study power and identify _BIRC6_ as a novel genetic determinant of invasive bacterial disease in Kenyan children.

# Results

## Severe malaria probability and risk of bacteraemia

Children admitted to the high dependency ward of Kilifi County Hospital with a clinical diagnosis of severe malaria, defined as a severe febrile illness in the presence of _Plasmodium falciparum_ parasitaemia (n=2200), between 11 June 1995 and 12 June 2008 were included in the study. While this definition is sensitive it is not specific, meaning that our study will have included some children with sepsis accompanied by incidental parasitaemia [@bib57]. We therefore used two probabilistic models, which included either platelet counts and plasma _Pf_HRP2 concentrations (Model 1, n=1400) or white blood cell and platelet counts (Model 2, n=2200), to determine the likelihood of ‘true’ severe malaria among these children. The estimated probabilities of ‘true’ severe malaria using each model were well correlated (${\displaystyle \mathrm{r}=0.64}$). Of 1400 children with a clinical diagnosis of severe malaria with measured plasma _Pf_HRP2 concentrations, 425 (30.4%, [Figure 1A and B](#fig1)) had a low probability (P(SM|Data)<0.5) of having ‘true’ severe malaria (941 of 2220 children using WBC and platelet count data, [Figure 1—figure supplement 1A and B](#fig1s1)). That is, while they presented with febrile illness and concomitant malaria parasitaemia, it is unlikely that their illnesses were directly attributable to malaria.

chunk: Figure 1.
:::
### _Pf_HRP2 and platelet count as predictors of severe malaria.

(**A**) Distribution of _Pf_HRP2 concentrations and platelet count among Kenyan children (n=1400) with a clinical diagnosis of severe malaria. Points are coloured according to the probability of ‘true’ severe malaria given the data. (**B**) Distribution of ‘true’ severe malaria probabilities estimated from platelet count and plasma _Pf_HRP2 concentrations. (**C**) Platelets counts in children with a clinical diagnosis of severe malaria with and without concomitant bacteraemia. (**D**) _Pf_HRP2 concentrations in children with a clinical diagnosis of severe malaria with and without concomitant bacteraemia. Box and whisker plots; boxes depict the upper and lower quartiles of the data, and whiskers depict the range of the data excluding outliers (outliers are defined as data-points > 1.5 × the inter-quartile range from the upper or lower quartiles). Comparisons of platelet count and plasma _Pf_HRP2 concentrations in children with and without bacteraemia were performed with logistic regression.

```{r message=FALSE, warning=FALSE}
#' @width 25
#' @height 15

library(dplyr)
library(ggplot2)
library(patchwork)
library(RColorBrewer)
library(ggrepel)
library(ggplotify)
library(nnet)
library(coloc)

cols <- brewer.pal(8,"Set2")
cols2 <- brewer.pal(8,"Dark2")
cols3 <- brewer.pal(11,"RdYlBu")

ps <- read.table("Fig1_data.txt", header = T)

#Plot PfHRP2 concentration vs platelet count - colour by probability 'true' severe malaria, i.e. Model 1
prob_hrp2 = ggplot(ps, aes(log10(platelet), log10(hrp2))) + 
  geom_point(aes(colour = factor(p2_fifths)), size = 2) + scale_colour_manual(values =cols3[c(5,4,3,2,1)], na.translate=FALSE, name = "P(SM | Data)", labels = c("\u2264 0.2", "0.2-0.4", "0.4-0.6", "0.6-0.8", "> 0.8")) +
  theme_bw() +
  scale_y_continuous(breaks=c(0,1,2,3,4,5),
                   labels=c(1,10,expression(10^2), expression(10^3), expression(10^4), expression(10^5))) +
  scale_x_continuous(breaks=c(log10(10),log10(20), log10(50), log10(200), log10(1000)),
                     labels=c(10,20,50,200,1000)) +
  labs(x = expression(paste("Platelet count (x", 10^9,"/L)")),
        y = "PfHRP2 (ng/ml)") +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=15),
        legend.text=element_text(size=12),
        legend.title=element_text(size=15),
        plot.title=element_text(size=50, face = "bold"))

#Plot white cell count vs platelet count - colour by probability 'true' severe malaria, i.e. Model 2

prob_wcc = ggplot(ps, aes(log10(platelet), log10(wbc))) + 
  geom_point(aes(colour = factor(p1_fifths)), size = 2) + scale_colour_manual(values =cols3[c(5,4,3,2,1)], na.translate=FALSE, name = "P(SM | Data)", labels = c("\u2264 0.2", "0.2-0.4", "0.4-0.6", "0.6-0.8", "> 0.8")) +
  theme_bw() +
  scale_y_continuous(breaks=c(log10(1),log10(2),log10(5), log10(10), log10(20), log10(50), log10(100)),
                     labels=c(1,2,5,10, 20,50,100)) +
  scale_x_continuous(breaks=c(log10(10),log10(20), log10(50), log10(200), log10(1000)),
                     labels=c(10,20,50,200,1000)) +
  labs(x = expression(paste("Platelet count (x", 10^9,"/L)")),
       y = expression(paste("White cell count (x", 10^9,"/L)"))) +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=15),
        legend.text=element_text(size=12),
        legend.title=element_text(size=15),
        plot.title=element_text(size=50, face = "bold"))

cols <- brewer.pal(11,"RdYlBu")

#Distribution of 'true' malaria probabilities: Model 1

hist_hrp2<-ggplot(ps, aes(x=P_SM2, fill=factor(p2_fifths), colour=factor(p2_fifths))) + 
  geom_histogram(binwidth=0.012, alpha=0.72) +
  scale_fill_manual(values = cols[c(5,4,3,2,1)]) +
  scale_colour_manual(values = cols[c(5,4,3,2,1)]) +
  theme_bw() + xlab("P(SM | Data)") + theme(legend.position = "none") +
  geom_hline(yintercept=0, color=cols[8]) +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=15))

#Distribution of 'true' malaria probabilities: Model 2

hist_wcc<-ggplot(ps, aes(x=P_SM1, fill=factor(p1_fifths), colour=factor(p1_fifths))) + 
  geom_histogram(binwidth=0.012, alpha=0.72) +
  scale_fill_manual(values = cols[c(5,4,3,2,1)]) +
  scale_colour_manual(values = cols[c(5,4,3,2,1)]) +
  theme_bw() + xlab("P(SM | Data)") + theme(legend.position = "none") +
  geom_hline(yintercept=0, color=cols[8]) +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=15))

cols <- brewer.pal(8,"Set2")
cols2 <- brewer.pal(8,"Dark2")
cols3 <- brewer.pal(11,"RdYlBu")

#PfHRP2 concentration as a predictor of bacteraemia

p_labels = data.frame(expt = c("bloodculture_pos"), label = c("italic(P)==9.62 %*% 10^-6"))
hrp2.box = ggplot(ps, aes(x=factor(bloodculture_pos), y=log10(hrp2))) +
geom_dotplot(binaxis="y", binwidth=0.02, stackdir="center", alpha = 0.75) + geom_boxplot(alpha = 0.5) +
scale_x_discrete(breaks=c(0,1), labels = c("No", "Yes")) + aes(fill = factor(bloodculture_pos), col=factor(bloodculture_pos)) + scale_fill_manual(values = cols[c(8,3)]) + scale_colour_manual(values = cols[c(8,3)]) +
ylab("PfHRP2 (ng/ml)") + xlab("Bacteraemia") +
scale_y_continuous(breaks=c(0,1,2,3,4,5),
                     labels=c(1,10,expression(10^2), expression(10^3), expression(10^4), expression(10^5))) +
theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                             axis.title=element_text(size=15)) + 
  geom_text(x = 1.5, y = Inf, vjust=1.5, aes(label=label), data=p_labels, parse=TRUE, inherit.aes=F, size = 5)


#Platelet count as a predictor of bacteraemia

p_labels = data.frame(expt = c("bloodculture_pos"), label = c("italic(P)==0.0144"))
plt.box = ggplot(ps, aes(x=factor(bloodculture_pos), y=log10(platelet))) +
  geom_dotplot(binaxis="y", binwidth=0.01, stackdir="center", alpha = 0.75) + geom_boxplot(alpha = 0.5) +
  scale_x_discrete(breaks=c(0,1), labels = c("No", "Yes")) + aes(fill = factor(bloodculture_pos), col=factor(bloodculture_pos)) + scale_fill_manual(values = cols[c(8,3)]) + scale_colour_manual(values = cols[c(8,3)]) +
  ylab(expression(paste("Platelet count (x", 10^9,"/L)"))) + xlab("Bacteraemia") +
  scale_y_continuous(breaks=c(log10(10),log10(20), log10(50), log10(200), log10(1000)),
                     labels=c(10,20,50,200,1000)) +
  theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                     axis.title=element_text(size=15)) + 
  geom_text(x = 1.5, y = Inf, vjust=2, aes(label=label), data=p_labels, parse=TRUE, inherit.aes=F, size = 5)

#White cell count as a predictor of bacteraemia

p_labels = data.frame(expt = c("bloodculture_pos"), label = c("NS"))
wcc.box = ggplot(ps, aes(x=factor(bloodculture_pos), y=log10(wbc))) +
  geom_dotplot(binaxis="y", binwidth=0.01, stackdir="center", alpha = 0.75) + geom_boxplot(alpha = 0.5) +
  scale_x_discrete(breaks=c(0,1), labels = c("No", "Yes")) + aes(fill = factor(bloodculture_pos), col=factor(bloodculture_pos)) + scale_fill_manual(values = cols[c(8,3)]) + scale_colour_manual(values = cols[c(8,3)]) +
  ylab(expression(paste("White cell count (x", 10^9,"/L)"))) + xlab("Bacteraemia") +
  scale_y_continuous(breaks=c(log10(1),log10(2),log10(5), log10(10), log10(20), log10(50), log10(100)),
                     labels=c(1,2,5,10, 20,50,100)) +
  theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                     axis.title=element_text(size=15)) + 
  geom_text(x = 1.5, y = Inf, vjust=2, aes(label=label), data=p_labels, parse=TRUE, inherit.aes=F, size = 5)


Figure1 <- ((prob_hrp2|hist_hrp2)/(plt.box|hrp2.box)) + plot_layout(heights = c(1, 1)) +
  plot_annotation(tag_levels = 'A')

Figure1
```
:::
{#fig1}

chunk: Figure 1—figure supplement 1.
:::
### White blood cell and platelet count as predictors of severe malaria.

(**A**) Distribution of white blood cell and platelet count among Kenyan children (n=2200) with a clinical diagnosis of severe malaria. Points are coloured according to the probability of ‘true’ severe malaria given the data. (**B**) Distribution of ‘true’ severe malaria probabilities estimated from platelet count and white blood cell count. (**C**) Platelets counts in children with a clinical diagnosis of severe malaria with and without concomitant bacteraemia. (**D**) White blood cell counts in children with a clinical diagnosis of severe malaria with and without concomitant bacteraemia. Box and whisker plots; boxes depict the upper and lower quartiles of the data, and whiskers depict the range of the data excluding outliers (outliers are defined as data-points > 1.5 × the inter-quartile range from the upper or lower quartiles). Comparisons of platelet and white cell count in children with and without bacteraemia were performed with logistic regression.

```{r message=FALSE, warning=FALSE}
#' @width 25
#' @height 15

Figure1.supp1 <- ((prob_wcc|hist_wcc)/(plt.box|wcc.box))+ plot_layout(heights = c(1, 1)) +
  plot_annotation(tag_levels = 'A')

Figure1.supp1
```
:::
{#fig1s1}

In keeping with the hypothesis that a significant proportion of these critically unwell children represented culture-negative invasive bacterial disease ([Figure 1](#fig1)), in-patient mortality was higher among children with a low than a high probability of ‘true’ severe malaria ([Table 1](#table1); ${\displaystyle \mathrm{O}{\mathrm{R}}_{\mathrm{m}\mathrm{o}\mathrm{d}\mathrm{e}\mathrm{l}1}=1.57}$, 95% CI $1.11-2.21$, ${\displaystyle p=0.01}$, 95% CI ${\displaystyle 1.60-2.72}$, ${\displaystyle p=4.91\times {10}^{-8}}$). This was also reflected in the rates of concurrent bacteraemia ([Table 1](#table1); ${\displaystyle \mathrm{O}{\mathrm{R}}_{\mathrm{m}\mathrm{o}\mathrm{d}\mathrm{e}\mathrm{l}1}=2.92}$, 95% CI $1.66-5.13$, ${\displaystyle p=1.07\times {10}^{-4}}$, 95% CI $1.27-3.17$, $p=0.003$). Similarly, the constituents of Model 1 were each associated with blood culture positivity, both higher platelet counts (OR = 2.36, 95% CI 1.19–4.70, $p=0.014$) and lower _Pf_HRP2 levels (${\displaystyle \mathrm{O}\mathrm{R}=0.52,95\mathrm{\%}\text{}\mathrm{C}\mathrm{I}\text{}0.39–0.70}$, ${\displaystyle p=9.62\times {10}^{-6}}$) both being associated with the risk of coincident bacteraemia ([Figure 1C and D](#fig1)). Conversely, white blood counts in isolation were not associated with risk of concurrent bacteraemia ([Figure 1—figure supplement 1](#fig1s1)). Plasma _Pf_HRP2 is the single best biomarker for severe malaria [@bib28]. In light of this, and given the greater enrichment for concurrent bacteraemia among children with a low probability of ‘true’ severe malaria as calculated by Model 1 than Model 2, we used Model 1 probabilities in downstream analyses where available (n=1400) and used Model 2 probabilities for all other cases (n=800).

table: Table 1.
:::
### Demographics and clinical characteristics of Kenyan children with severe malaria.

| Model                      | Numbers        | Sex (female)  | Age (months) | Bacteraemia | Mortality   |
| -------------------------- | -------------- | ------------- | ------------ | ----------- | ----------- |
| _Pf_HRP2/Plt               | Total (n=1400) | 695 (49.6%)   | 29 (17–44)   | 51 (3.6%)   | 155 (11.1%) |
| P(SM\|Data)>0.5 (n=975)    | 497 (51.0%)    | 29 (18–45)    | 23 (2.4%)    | 94 (7.4%)   |             |
| P(SM\|Data)&lt;0.5 (n=425) | 198 (46.6%)    | 28 (16–43)    | 28 (6.6%)    | 61 (14.4%)  |             |
| WBC/Plt                    | Total (n=2220) | 1,074 (48.4%) | 28 (15–43)   | 78 (3.5%)   | 256 (11.6%) |
| P(SM\|Data)>0.5 (n=1279)   | 623 (48.7%)    | 29 (17–44)    | 32 (2.5%)    | 106 (8.4%)  |             |
| P(SM\|Data)&lt;0.5 (n=941) | 451 (47.9%)    | 25 (13–40)    | 46 (4.9%)    | 150 (15.9%) |             |
:::
{#table1}

## GWAS of invasive bacterial disease in Kenyan children

Children with a clinical diagnosis of severe malaria but a low probability of having ‘true’ severe malaria are thus enriched for invasive bacterial disease. Using genome-wide genotyping data from cases of culture-confirmed bacteraemia and healthy controls, we estimated SNP heritability of bacteraemia in this population to be 19% (95% CI 3–35%, $p=0.0084$). Despite this, our GWAS of bacteraemia failed to identify genetic correlates of bacteraemia risk beyond the sickle cell locus [@bib50]. Motivated by these observations, we performed a GWAS of invasive bacterial infection in Kenyan children in which we included both children with culture-confirmed bacteraemia and children with a clinical diagnosis of severe malaria. Children admitted to Kilifi County Hospital between 1 August 1998 and 30 October 2010 with community-acquired bacteraemia were recruited to the study as well as children from the severe malaria study described above. Control children were recruited from the same population between 1 August 2006 and 30 December 2010 as described in detail previously [@bib51].

Following quality control measures (see Materials and methods), we included 1445 cases of culture-confirmed bacteraemia, 1143 cases of severe malaria, and 2812 control children in our current analysis ([Table 2](#table2), [Figure 2](#fig2)). To account for the varying proportion of invasive bacterial disease among severe malaria cases, we applied weights to our regression analysis to reflect the greater likelihood of invasive bacterial disease among children with a low probability of ‘true’ severe malaria (sample weight, ${\displaystyle \mathrm{w}=1-\mathrm{P}(\mathrm{S}\mathrm{M}|\mathrm{D}\mathrm{a}\mathrm{t}\mathrm{a})}$). Where _Pf_HRP2 concentrations were available (n=909) we used _Pf_HRP2 and platelet count to determine ${\displaystyle \mathrm{P}(\mathrm{S}\mathrm{M}|\mathrm{D}\mathrm{a}\mathrm{t}\mathrm{a})}$ while we used white cell and platelet counts (n=234) in cases where they were not available. Cases with culture-proven bacteraemia and control samples were assigned a sample weight of $w=1$. Inclusion of the six major principal components (PCs) of genotyping data and genotyping platform as covariates in the model adequately controlled for confounding variation ($\lambda =1.0208$, [Figure 3—figure supplement 1](#fig3s1)). In that analysis we found evidence supporting an association between risk of invasive bacterial disease in Kenyan children and seven SNPs at a single locus on chromosome 2 (peak SNP: rs183868412:T, OR = 2.13, 95% CI 1.65–2.74, $p=4.64\times {10}^{-9}$) ([Figure 3](#fig3), [Table 3](#table3)). Fine mapping of this association identified a credible set of seven SNPs with a 95% probability of containing the causal variant ([Table 3](#table3)), spanning a 212 kb region: chr2:32,402,640–32,614,746.

figure: Figure 2.
:::
![](elife-77461.xml.media/fig2.jpg)

### Study sample recruitment.

Recruitment of severe malaria cases (n=1143), discovery analysis bacteraemia cases (n=1445), and replication analysis bacteraemia cases (n=434) between 1998 and 2010. The time period during which control samples (n=2812) were recruited is also highlighted (black bar). Sample numbers represent children with genome-wide genotype data, who passed quality control filters and were included in this study.
:::
{#fig2}

chunk: Figure 3.
:::
### Manhattan plot of invasive bacterial infection in Kenyan children.

Evidence for association with invasive bacterial disease at genotyped and imputed autosomal SNPs and indels (n=14,010,600) in Kenyan children (bacteraemia cases =1445, severe malaria cases =1143, controls = 2812). Association statistics represent additive association. The red, dashed line denotes $p=5\times {10}^{-8}$.

```{r message=FALSE, warning=FALSE}
#' @width 25
#' @height 15
#thinned example summary pvalues 1:100 SNPs

disc.add <- read.table("example.pval.gz", header = F)
colnames(disc.add) <- c("chr", "bp", "rsid", "p")
#str(disc.add)

disc.add$chr <- factor(disc.add$chr, levels = c(1:22))
levels(disc.add$chr) <- c(1:22)

#highlight chr2 hit
disc.add$label <- NA
disc.add$label[which(disc.add$rsid=="rs183868412:T")] <- "BIRC6"

 
disc.add$anno <- NA
disc.add$anno[which(disc.add$rsid=="rs183868412:T")] <- 1

#data for manahattan
don.1 <- disc.add %>% 
  group_by(chr) %>% 
  summarise(chr_len=max(bp)) %>% 
  mutate(tot=cumsum(as.numeric(chr_len))-chr_len) %>%
  dplyr::select(-chr_len) %>%
  left_join(disc.add, ., by=c("chr"="chr")) %>%
  arrange(chr, bp) %>%
  mutate( BPcum=bp+tot)

axisdf = don.1 %>% group_by(chr) %>% summarize(center=( max(BPcum) + min(BPcum) ) / 2 )

cols <- brewer.pal(8,"Set2")
cols2 <- brewer.pal(8,"Dark2")
cols3 <- brewer.pal(8,"Paired")

#Manhattan plot
ggplot(don.1, aes(x=BPcum, y=-log10(p))) +
  geom_point( aes(color=as.factor(chr)), size=2) +
  scale_color_manual(values = rep(c(cols3[2], cols3[1]), 11 )) +
  geom_text_repel( data=subset(don.1, anno==1), aes(label=label), size=5, min.segment.length = unit(0, 'lines'),
                   nudge_y = 1) +
  #gwas sig line
  geom_segment(aes(x = 10177, y = 7.30103, xend = 2879943885, yend = 7.30103), data = don.1, linetype="dashed", color = "red") +
  scale_x_continuous( label = axisdf$chr[c(1:18,20,22)], breaks= axisdf$center[c(1:18,20,22)] ) +
  scale_y_continuous( labels = c("0","2","4","6","8","10"), breaks = c(0,2,4,6,8,10), expand = c(0, 0), limits= c(0,12)) +     # remove space between plot area and x axis
  xlab("chromosome") +
  theme_bw() +
  theme( 
    legend.position="none",
    panel.border = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),axis.text=element_text(size=15),
    axis.title=element_text(size=20)
  )


```
:::
{#fig3}

figure: Figure 3—figure supplement 1A-B.
:::
![](fig3-figsupp1.jpg)

### Genome-wide association analysis quality control.

(**A**) Principal components of genome-wide genotyping data in discovery samples. Individuals are colour-coded according to genotyping platform; Affymetrix SNP 6.0 in purple, Illumina Omni 2.5M in orange. (**B**) Principal components of genome-wide genotyping data in discovery samples. Individuals are colour-coded according to self-reported ethnicity; Chonyi in red, Giriama in blue, Kauma in green, and other in grey.
:::
{#fig3s1}

chunk: Figure 3—figure supplement 1C.
:::

```{r message=FALSE, warning=FALSE}
#' @width 10
#' @height 10

df <- data.frame(observed = -log10(sort(na.omit(disc.add$p))),
    expected = -log10(ppoints(length(na.omit(disc.add$p)))))
  
  log10Pe <- expression(paste("Expected -log"[10], plain(P)))
  log10Po <- expression(paste("Observed -log"[10], plain(P)))
  
  chisq <- qchisq(1-disc.add$p,1)
  #median(chisq)/qchisq(0.5,1)
  
  lambda <- paste("lambda == ", round(median(chisq)/qchisq(0.5,1),3))

#QQ plot
Figure3.supp1c <- ggplot(df) +
  geom_abline(intercept = 0, slope = 1, size = 1, colour = cols3[6]) +
  geom_point(aes(expected, observed), size = 3, colour = cols3[2]) +
  xlab(log10Pe) +
  ylab(log10Po) +
  annotate("text", x = 1, y = 9, label = lambda, parse = TRUE, size = 10) +
  theme_bw() + ylim(NA, 10) + 
  theme( 
    legend.position="none",
    panel.border = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),axis.text=element_text(size=15),
    axis.title=element_text(size=20), plot.title=element_text(size=30, face = "bold", hjust = 0.5))
  
Figure3.supp1c
```

### Quantile-quantile plot of invasive bacterial infection in Kenyan children. 

QQ plot of weighted logistic regression genome-wide association study (GWAS) of invasive bacterial disease in Kenyan children (2588 cases, 2812 controls).
:::
{#fig3s1c}

figure: Figure 3—figure supplement 2.
:::
![](elife-77461.xml.media/fig3-figsupp2.jpg)

### Principal components of genome-wide genotyping data in discovery samples.

Outlier samples identified by ABERRANT (n=22) are highlighted (orange).
:::
{#fig3s2}

figure: Figure 3—figure supplement 3.
:::
![](elife-77461.xml.media/fig3-figsupp3.jpg)

### Sensitivity analysis of case weights.

Comparison of effect sizes (odds ratios and 95% confidence intervals) of the rs183868412:T association with invasive bacterial infection in the combined analysis (bacteraemia cases = 1445, severe malaria cases = 1143, controls = 2812), and restricted to cases with weights calculated with _Pf_HRP2 plasma concentration and platelet count (Model 1; bacteraemia cases = 1445, severe malaria cases = 909, controls = 2812) or white cell count and platelet count (Model 2; bacteraemia cases = 1445, severe malaria cases = 909, controls = 2812). Effect estimates were calculated with weighted logistic regression.
:::
{#fig3s3}

chunk: Figure 3—figure supplement 4.
:::
### Genotyping concordance between Illumina and Affymetrix platforms.

Pairwise genotyping concordance between samples genotyped on both Affymetrix SNP 6.0 and Illumina Omni 2.5M platforms.

```{r}
#' @width 15
#' @height 10
#Genotyping concordance data
concord <- read.table("genotyping.concord.txt", header = TRUE)

cols <- brewer.pal(8,"Set2")
cols2 <- brewer.pal(8,"Dark2")
cols3 <- brewer.pal(8,"Paired")

ggplot(data=concord, aes(pairwise)) + 
  geom_histogram(breaks=seq(0.8, 1, by = 0.001), col = cols3[2]) + 
  labs(x="pairwise concordance", y="count") + 
  xlim(c(0.8,1.01)) +
  theme_bw() +
  theme( 
    legend.position="none",
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),axis.text=element_text(size=15),
    axis.title=element_text(size=20))
```
:::
{#fig3s4}

table: Table 2.
:::
### Demographics and clinical characteristics of genome-wide association study (GWAS) study samples.

|                            |         |              |              | Severe malaria subtypes | Concurrent infection |          |             |           |
| -------------------------- | ------- | ------------ | ------------ | ----------------------- | -------------------- | -------- | ----------- | --------- |
|                            | Numbers | Sex (female) | Age (months) | SMA                     | CM                   | Malaria  | Bacteraemia | Mortality |
| Bacteraemia (overall)      | 1445    | 614 (43%)    | 14 (5–34)    |                         |                      | 94 (12%) |             | 358 (26%) |
| _Acinetobacter_            | 118     | 45 (38%)     | 13 (3–28)    |                         |                      | 11 (13%) |             | 12 (10%)  |
| β-Haemolytic streptococci  | 130     | 60 (46%)     | 5 (1–20)     |                         |                      | 6 (8%)   |             | 37 (30%)  |
| _Escherichia coli_         | 141     | 58 (41%)     | 11 (6–25)    |                         |                      | 12 (15%) |             | 45 (34%)  |
| Hib                        | 113     | 53 (47%)     | 12 (5–25)    |                         |                      | 3 (8%)   |             | 29 (26%)  |
| NTS                        | 159     | 75 (47%)     | 15 (9–26)    |                         |                      | 15 (25%) |             | 31 (20%)  |
| _Streptococcus pneumoniae_ | 390     | 151 (39%)    | 23 (9–61)    |                         |                      | 20 (9%)  |             | 86 (23%)  |
| _Staphylococcus aureus_    | 152     | 64 (42.1%)   | 26 (9–88)    |                         |                      | 15 (15%) |             | 22 (15%)  |
| Other                      | 242     | 110 (46%)    | 10 (1–28)    |                         |                      | 10 (8%)  |             | 96 (41%)  |
| Malaria (overall)          | 1143    | 559 (49%)    | 27 (16–41)   | 298 (26%)               | 697 (61%)            |          | 40 (4%)     | 140 (12%) |
| P(SM\|Data)&lt;0.5         | 375     | 176 (47%)    | 28 (17–42)   | 62 (17%)                | 262 (70%)            |          | 23 (6%)     | 62 (17%)  |
| P(SM\|Data)>0.5            | 768     | 383 (50%)    | 26 (15–40)   | 236 (31%)               | 435 (57%)            |          | 17 (2%)     | 17 (2%)   |
| Controls (SNP 6.0)         | 1895    | 955 (50%)    | \*           |                         |                      |          |             |           |
| Controls (Omni 2.5M)       | 917     | 451 (49.2%)  |              |                         |                      |          |             |           |
:::
{#table2}

table: Table 3.
:::
### 95% credible SNP set of invasive bacterial disease association.

| SNP         | Effect allele | Chr | BP         | MAF   | Info score | OR (95% CI)      | p-Value                |
| ----------- | ------------- | --- | ---------- | ----- | ---------- | ---------------- | ---------------------- |
| rs183868412 | T             | 2   | 32,478,169 | 0.021 | 0.956      | 2.13 (1.65–2.74) | $4.64\times {10}^{-9}$ |
| rs139827594 | G             | 2   | 32,402,640 | 0.02  | 0.966      | 2.12 (1.65–2.73) | $4.96\times {10}^{-9}$ |
| rs144257579 | G             | 2   | 32,507,619 | 0.021 | 0.954      | 2.11 (1.64–2.72) | $6.82\times {10}^{-9}$ |
| rs145056232 | C             | 2   | 32,503,024 | 0.021 | 0.955      | 2.11 (1.64–2.72) | $6.86\times {10}^{-9}$ |
| rs145315025 | G             | 2   | 32,502,654 | 0.021 | 0.955      | 2.11 (1.64–2.72) | $6.87\times {10}^{-9}$ |
| rs143909151 | T             | 2   | 32,531,452 | 0.021 | 0.962      | 2.11 (1.64–2.71) | $8.01\times {10}^{-9}$ |
| rs150430979 | T             | 2   | 32,614,746 | 0.021 | 0.955      | 2.11 (1.64–2.72) | $8.18\times {10}^{-9}$ |
:::
{#table3}

To address the possibility that the observed association at this locus is driven by confounding secondary to population structure, we used ABERRANT [@bib7] to define a set of outlier samples on the first two PCs of genotyping data (n=22, [Figure 3—figure supplement 2](#fig3s2)). These individuals are all genotyped on the Illumina Omni 2.5M array and are both cases and controls (15 and 7, respectively). While they are predominantly individuals with less common self-reported ethnicities in our study population (19 of 22 are not Giriama, Chonyi, or Kauma), they are not representative of a single self-reported ethnicity (the most common single ethnicity in this group is Digo, n=7). Excluding these samples from the association analysis at rs183868412:T did not significantly alter the association with invasive bacterial infection ($p=2.38\times {10}^{-8}$, OR = 2.05, 95% CI 1.59–2.64). We further estimated the effect of rs183868412:T on invasive bacterial disease risk in four subpopulations defined by self-reported ethnicity (Giriama, ${\displaystyle \mathrm{n}=2,501}$; Chonyi, ${\displaystyle \mathrm{n}=1,560}$; Kauma, ${\displaystyle \mathrm{n}=472}$; Other, ${\displaystyle \mathrm{n}=384}$). Within each subpopulation, we tested for association between genotype and case status in a weighted logistic regression model, including platform as a categorial covariate ([Table 4](#table4)). The minor allele frequency (MAF) at rs183868412 ranged from 0.016 (Giriama) to 0.037 (Kauma), with no evidence of differentiation between subpopulations (${F}_{ST}=0.001$). We observed consistent effect sizes in both of the major study subpopulations; Giriama (OR =1.97, 95% CI 1.30–3.01, $p=0.0015$) and Chonyi (OR = 2.18, 95% CI 1.34–3.54, $p=0.0017$) samples ([Table 4](#table4)), which together make up 83% of the study samples. Genotype at rs183868412 was also associated with invasive bacterial disease among the less common self-reported ethnicities grouped together (OR = 2.46, 95% CI 1.01–5.96, $p=0.047$). Genotype was not associated with invasive bacterial disease risk in the Kauma subpopulation, however the sample size in the stratum is very limited (154 cases, 318 controls) and may simply reflect insufficient power to detect an association.

table: Table 4.
:::
### Effect of rs183868412 genotype on risk of invasive bacterial disease stratifies by self-reported ethnicity.

| Discovery population    |       |            | Numbers    | Genotypes  | MAF   | OR (95% CI)      | p-Value                  |
| ----------------------- | ----- | ---------- | ---------- | ---------- | ----- | ---------------- | ------------------------ |
| Giriama                 | Cases | Overall    | 1232       | 0/56/1176  | 0.023 | 1.97 (1.30–3.01) | $p=0.0015$               |
| Bacteraemia             | 558   | 0/38/520   | 0.034      |            |       |                  |                          |
| SM − P(SM\|Data)&lt;0.5 | 199   | 0/12/187   | 0.03       |            |       |                  |                          |
| SM − P(SM\|Data)>0.5    | 475   | 0/6/469    | 0.006      |            |       |                  |                          |
| Controls                |       | 1269       | 0/41/1228  | 0.016      |       |                  |                          |
| Chonyi                  | Cases | Overall    | 503        | 0/38/465   | 0.038 | 2.18 (1.34–3.54) | $p=0.0017$               |
| Bacteraemia             | 238   | 0/27/211   | 0.057      |            |       |                  |                          |
| SM − P(SM\|Data)&lt;0.5 | 105   | 0/4/101    | 0.019      |            |       |                  |                          |
| SM − P(SM\|Data)>0.5    | 160   | 0/7/153    | 0.022      |            |       |                  |                          |
| Controls                |       | 1057       | 0/43/1014  | 0.02       |       |                  |                          |
| Kauma                   | Cases | Overall    | 154        | 0/8/146    | 0.026 | 1.20 (0.50–2.85) | $p=0.686$                |
| Bacteraemia             | 70    | 0/6/64     | 0.043      |            |       |                  |                          |
| SM − P(SM\|Data)&lt;0.5 | 25    | 0/1/24     | 0.02       |            |       |                  |                          |
| SM − P(SM\|Data)>0.5    | 59    | 0/1/58     | 0.008      |            |       |                  |                          |
| Controls                |       | 318        | 0/20/298   | 0.031      |       |                  |                          |
| Other                   | Cases | Overall    | 219        | 1/16/202   | 0.041 | 2.46 (1.01–5.96) | $p=0.047$                |
| Bacteraemia             | 101   | 1/8/92     | 0.05       |            |       |                  |                          |
| SM − P(SM\|Data)&lt;0.5 | 38    | 0/5/33     | 0.066      |            |       |                  |                          |
| SM − P(SM\|Data)>0.5    | 80    | 0/3/77     | 0.019      |            |       |                  |                          |
| Controls                |       | 165        | 0/7/158    | 0.021      |       |                  |                          |
| Total                   | Cases | Overall    | 2588       | 3/164/2421 | 0.033 | 2.13 (1.65–2.74) | $p=4.64\times {10}^{-9}$ |
| Bacteraemia             | 1445  | 3/125/1317 | 0.045      |            |       |                  |                          |
| SM − P(SM\|Data)&lt;0.5 | 375   | 0/20/355   | 0.027      |            |       |                  |                          |
| SM − P(SM\|Data)>0.5    | 768   | 0/19/749   | 0.012      |            |       |                  |                          |
| Controls                |       | 2812       | 0/111/2701 | 0.02       |       |                  |                          |
:::
{#table4}

To assess whether our analysis could be affected by our choice of model to define severe malaria case weights, we restricted our analysis to samples with data available to calculate estimates for $P(SM|Data)$ using both Model 1 and Model 2 (n=909 severe malaria cases). We recalculated effect estimates for the rs183868412 association with invasive bacterial disease using each model alone. The association with invasive bacterial disease at rs183868412:T is robust to the choice of the model for case weights, with effect estimates derived using Model 1 alone (OR = 2.13, 95% CI 1.65–2.75, $p=6.92\times {10}^{-9}$) and Model 2 alone (OR = 2.05, 95% CI 1.59–2.65, $p=2.63\times {10}^{-8}$) being entirely consistent with those seen in the main analysis ([Figure 3—figure supplement 3](#fig3s3)). Moreover, restricting our analysis to cases of culture-confirmed bacteraemia, the effect estimate for bacteraemia risk observed in the discovery analysis (1445 cases, 2812 controls; OR = 2.12, 95% CI 1.60–2.82, $p=1.97\times {10}^{-7}$) is consistent with that seen in the main model.

We sought to replicate evidence of association in our discovery analysis through use of an independent case-control collection of Kenyan children with bacteraemia (n=434) and healthy controls (n=1258) conducted in the same population. The peak trait-associated variants in the discovery analysis were well imputed in the replication data (rs183868412 imputation info score $=0.84$). In that analysis, we found evidence supporting the association at chromosome 2 with invasive bacterial disease ([Figure 4](#fig4), [Table 5](#table5): rs183868412:T, OR = 2.85, 95% CI 1.54–5.28, $p=8.90\times {10}^{-4}$). In a fixed effects meta-analysis, rs183868412:T was strongly associated with risk of invasive bacterial disease in Kenyan children: OR = 2.22, 95% CI 1.76–2.80, $p=2.35\times {10}^{-11}$. That association was driven by children with culture-confirmed bacteraemia and critically unwell children with malaria parasites, but a low probability of ‘true’ severe malaria. In a stratified analysis ([Figure 4](#fig4), [Table 5](#table5)), rs183868412 was associated with culture-confirmed bacteraemia (OR = 2.12, 95% CI 1.60–2.82, $p=1.94\times {10}^{-7}$) and critical illness with parasitaemia and with a low probability of ‘true’ severe malaria (P(SM|Data)&lt;0.5: OR = 2.37, 95% CI 1.27–4.43, $p=6.82\times {10}^{-3}$), but was not associated with risk of critical illness with a high probability of ‘true’ severe malaria (P(SM|Data)>0.5: $p=0.823$).

chunk: Figure 4.
:::
### Association with invasive bacterial disease at the _BIRC6_ locus.

(**A**) Regional association plot of invasive bacterial disease association at the _BIRC6_ locus in the discovery and replication analyses. SNPs are coloured according to linkage disequilibrium to rs183868412. Genomic coordinates are GRCh38. (**B**) Distribution of ‘true’ severe malaria probabilities among malaria cases estimated from plasma _Pf_HRP2 concentration and platelet count (n=909) and white blood cell count and platelet count (n=234). (**C**) Odds ratios and 95% confidence intervals of rs183868412 association with disease stratified by malaria cases with high (p>0.5, red) and low (p&lt;0.5, orange) probabilities of ‘true’ severe malaria and culture-proven invasive bacterial disease (blue). P(SM|Data) represents the probability of ‘true’ severe malaria estimated from plasma _P_fHRP2 concentration and platelet count (n=909) or white blood cell count and platelet count (n=234). Effect estimates were calculated with multinomial logisitic regression (SM - P.high, SM - P.low, Bacteraemia - disc) and logistic regression (Bacteraemia - rep).

```{r fig.height=15, fig.width=8, message=FALSE, warning=FALSE}
#' @width 40
#' @height 30
for_plot <- read.table("birc6_for_RA_b38.txt", header = T)

birc6.region <- subset(for_plot, bp>31277134 & bp<33274933)

birc6.region$bin_r2 <- 1
birc6.region$bin_r2[which(birc6.region$r2>0.1 & birc6.region$r2 <= 0.3)] <- 2
birc6.region$bin_r2[which(birc6.region$r2>0.3 & birc6.region$r2 <= 0.5)] <- 3
birc6.region$bin_r2[which(birc6.region$r2>0.5 & birc6.region$r2 <= 0.7)] <- 4
birc6.region$bin_r2[which(birc6.region$r2>0.7)] <- 5

birc6.region$bin_dp <- 1
birc6.region$bin_dp[which(birc6.region$dp>0.6 & birc6.region$dp <= 0.7)] <- 2
birc6.region$bin_dp[which(birc6.region$dp>0.7 & birc6.region$dp <= 0.8)] <- 3
birc6.region$bin_dp[which(birc6.region$dp>0.8 & birc6.region$dp <= 0.9)] <- 4
birc6.region$bin_dp[which(birc6.region$dp>0.9)] <- 5

birc6.region$annotate <- 0
birc6.region$annotate[c(which(birc6.region$rsid=="rs183868412:T"))] <- 1
cols3 <- brewer.pal(8,"Paired")
cols2 <- brewer.pal(11,"Spectral")
cols <- brewer.pal(8,"Set2")

#BIRC6 - regional assocaition plot (discovery data)

birc6_total <- ggplot(birc6.region, aes(x=bp, y=-log10(p))) + 
  xlim(31874931, 32874933) +
  geom_point(data=subset(birc6.region, bin_r2==1), color=cols[8], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==2), color=cols2[5], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==3), color=cols2[3], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==4), color=cols2[2], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==5), color=cols2[1], size=3) +
  scale_y_continuous(name="-log P-value", breaks=c(0,2,4,6,8,10)) +
  xlab(NULL) + 
  theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                     axis.title=element_text(size=12)) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())+
  geom_text_repel( data=subset(birc6.region, annotate==1), aes(label=rsid), size=5, col = c("black"), nudge_y = 1) +
  annotate("text", x = (32180000-225069), y = 11, label = quote(r^2), hjust = 0.5) +
  annotate("point", x = (32120000-225069), y = 10.3, size = 3, colour = cols2[1]) +
  annotate("text", x = (32180000-225069), y = 10.3, label = c("0.7-1.0"), hjust = 0.5) +
  annotate("point", x = (32120000-225069), y = 9.6, size = 3, colour = cols2[2]) +
  annotate("text", x = (32180000-225069), y = 9.6, label = c("0.5-0.7"), hjust = 0.5) +
  annotate("point", x = (32120000-225069), y = 8.9, size = 3, colour = cols2[3]) +
  annotate("text", x = (32180000-225069), y = 8.9, label = c("0.3-0.5"), hjust = 0.5) +
  annotate("point", x = (32120000-225069), y = 8.2, size = 3, colour = cols2[5]) +
  annotate("text", x = (32180000-225069), y = 8.2, label = c("0.1-0.3"), hjust = 0.5) +
  annotate("point", x = (32120000-225069), y = 7.5, size = 3, colour = cols[8]) +
  annotate("text", x = (32180000-225069), y = 7.5, label = c("<0.1"), hjust = 0.5) +
  annotate("text", x = (33100000-225069), y = 11, label = c("Discovery"), hjust = 1, size=10)

#BIRC6 - regional assocaition plot (replication data)

birc6_rep <- ggplot(birc6.region, aes(x=bp, y=-log10(p_rep))) + 
  xlim(31874931, 32874933) +
  geom_point(data=subset(birc6.region, bin_r2==1), color=cols[8], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==2), color=cols2[5], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==3), color=cols2[3], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==4), color=cols2[2], size=3) + 
  geom_point(data=subset(birc6.region, bin_r2==5), color=cols2[1], size=3) +
  scale_y_continuous(name="-log10(p)", breaks=c(0,2,4), limits=c(NA,4)) + 
  xlab(NULL) + 
  theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                     axis.title=element_text(size=12)) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())+
  geom_text_repel( data=subset(birc6.region, annotate==1), aes(label=rsid), size=5, col = c("black"), nudge_y = 0.5) +
  annotate("text", x = 32874933, y = 3.8, label = c("Replication"), hjust = 1, size=10)


#plot genes
genes <- read.table("genes_b38.txt", header = T)

plot.range <- c(31874931, 32874933)
genes$order <- rep(seq(1:1),100)[c(1:length(genes$end_position))]
genes.plot <- ggplot(genes, aes(x=start, y=order+1)) + 
  geom_point(size=0) +
  xlim(31874931, 32874933) +
  ylim(c(1.9,2.1)) +
  geom_segment(data = genes[seq(9,15,2),],
               aes(x=start, xend=end, y=order+1, yend=order+1), size = 1, colour = cols3[2],
               arrow = arrow(length = unit(0.25, "cm"))) +
  geom_segment(data = genes[seq(10,16,2),],
               aes(x=start, xend=end, y=order+1.1, yend=order+1.1), size = 1, colour = cols3[2],
               arrow = arrow(length = unit(0.25, "cm"))) +
  geom_text_repel( data = genes[seq(9,15,2),], aes(x=mid, label=external_gene_name), size=4, col = c("black"),
                   nudge_y =-0.05, segment.color = NA) +
  geom_text_repel( data = genes[seq(10,16,2),], aes(x=mid, label=external_gene_name), size=4, col = c("black"),
                   nudge_y =0.05, segment.color = NA) +
  theme_bw() +
  theme(axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        panel.border = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

#plot recombination rate

recomb <- read.table("genetic_map_chr2_b38.txt", header = F)
colnames(recomb) <- c("position", "rate", "map")
recomb_birc6 <- subset(recomb, position>31874931 & position<32874933)

recomb_rate <- ggplot(recomb_birc6, aes(x=position, y=rate)) + 
  geom_line() +
  theme_bw() +
  ylab("cM/Mb") +
  xlab("chromosome 2") +
  scale_x_continuous(breaks=c(32000000,32200000,32400000,32600000,32800000),
                     labels=c("32.0Mb", "32.2Mb", "32.4Mb", "32.6Mb", "32.8Mb")) +
  theme(panel.border = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black"))



cols <- brewer.pal(11,"RdYlBu")

data <- read.table("prob_plot.txt", header = T)

#Plot distribution of 'true' severe malaria probabilities

data$prob_strata_half <- "SM: P(SM|Data)<0.5"
data$prob_strata_half[which(data$p_bact<0.5)] <- "SM: P(SM|Data)>0.5"
data$prob_strata_half[which(data$prob_strata=="BACT")] <- "BACT"
data$prob_strata_half[which(data$prob_strata=="CONTROL")] <- "CONTROL"
data$prob_strata_half <- factor(data$prob_strata_half)
data$prob_strata_half <- relevel(data$prob_strata_half, ref = "CONTROL")

data1 <- data[-c(which(data$prob_strata_half=="CONTROL"), which(data$prob_strata_half=="BACT")),]

prob_SM_hist<-ggplot(data1, aes(x=(1-p_bact), fill=factor(prob_strata_half), colour=factor(prob_strata_half))) + 
  geom_histogram(binwidth=0.012, alpha=0.72) +
  scale_fill_manual(values = cols[c(4,2)]) +
  scale_colour_manual(values = cols[c(4,2)]) +
  theme_bw() + xlab("P(Severe Malaria | Data)") + theme(legend.position = "none") +
  geom_hline(yintercept=0, color=cols[8]) +
  theme(axis.text=element_text(size=12),
        axis.title=element_text(size=14))

cols <- brewer.pal(8, "Set2")
cols2 <- brewer.pal(8, "Dark2")

#Plot effect of rs86868412 genotype across case strata

label <- rep(c("SM - P.high", "SM - P.low", "Bacteraemia - disc", "Bacteraemia - rep"),3)
mean  <- c(
  0.07180831, 
  0.8637545, 
  0.7536469, 
  1.06025)
se  <- c(
  0.3202724, 
  0.3192534, 
  0.1447982, 
  0.313471)
lower <- mean-1.96*se
upper <- mean+1.96*se

df <- data.frame(label, mean, lower, upper)
df$facet <- "rs183868412:T"

#levels(df$label)

cols3 <- brewer.pal(8, "Set2")
cols <- brewer.pal(11,"RdYlBu")

# reverses the factor level ordering for labels after coord_flip()
df$label <- factor(df$label, levels=rev(df$label)[c(1:4)])
birc6.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = cols[c(10,10,4,2)]) + scale_colour_manual(values = cols[c(10,10,4,2)]) +
  geom_hline(yintercept=0, lty=2) +  # add a dotted line at x=1 after flip
  coord_flip() +  # flip coordinates (puts labels on y axis)
  ylab("OR") + 
  scale_y_continuous(breaks=c(-0.693147181, 0, 0.693147181, 1.386294361),
                     labels=c("0.5",	"1.0",	"2.0",	"4.0"), limits=c(-1,2)) +
  scale_x_discrete(expand = expand_scale(add = 1)) +
  theme_bw() + theme(legend.position = "none", axis.title.y = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15))+
  facet_wrap(~facet, ncol=1)+
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 12, color = "white", face = "bold.italic"))

((birc6_total/birc6_rep/genes.plot/recomb_rate)/(prob_SM_hist|birc6.fp)) + plot_layout(heights = c(3, 1.5, 1, 1,3))
```
:::
{#fig4}

figure: Figure 4—figure supplement 1.
:::
![](elife-77461.xml.media/fig4-figsupp1.jpg)

### Principal components of genome-wide genotyping data in replication samples.

Individuals are colour-coded according to self-reported ethnicity; Chonyi in red, Giriama in blue, Kauma in green, and other in grey.

:::
{#fig4s1}

table: Table 5.
:::
### Effect of rs183868412 genotype on risk of invasive bacterial disease in Kenyan children.

|                                         |       |            | Numbers    | Genotypes        | MAF                      | OR (95% CI)      | p-Value                   |
| --------------------------------------- | ----- | ---------- | ---------- | ---------------- | ------------------------ | ---------------- | ------------------------- |
| Discovery                               | Cases | Overall    | 2588       | 3/164/2421       | 0.033                    | 2.13 (1.65–2.74) | $p=4.64\times {10}^{-9}$  |
| Bacteraemia[\*](#table5fn1)             | 1445  | 3/125/1317 | 0.045      | 2.12 (1.60–2.82) | $p=1.94\times {10}^{-7}$ |                  |                           |
| SM − P(SM\|Data)&lt;0.5[\*](#table5fn1) | 375   | 0/20/355   | 0.027      | 2.37 (1.27–4.43) | $p=6.82\times {10}^{-3}$ |                  |                           |
| SM − P(SM\|Data)>0.5[\*](#table5fn1)    | 768   | 0/19/749   | 0.012      | 1.07 (0.57–2.01) | $p=0.823$                |                  |                           |
| Controls                                |       | 2812       | 0/111/2701 | 0.02             |                          |                  |                           |
| Replication                             | Cases |            | 434        | 0/24/410         | 0.028                    | 2.85 (1.54–5.28) | $p=8.90\times {10}^{-4}$  |
| Controls                                |       | 1258       | 0/28/1230  | 0.011            |                          |                  |                           |
| Meta-analysis                           | Cases |            | 3022       | 3/188/2831       | 0.032                    | 2.23 (1.76–2.80) | $p=2.35\times {10}^{-11}$ |
| Controls                                |       | 4070       | 0/139/3931 | 0.017            |                          |                  |                           |
:::
{#table5}

## rs183868412 is associated with risk of invasive bacterial disease secondary to diverse pathogens and is independent of malaria

Previous data describing the genetic risk of invasive bacterial disease in this population have identified pathogen-specific effects. To better-understand the range of pathogens to which genetic variation at _BIRC6_ modifies risk, we estimated the effect of rs183868412 on the risk of bacteraemia caused by the seven most common causative pathogens within this population ([Figure 5A](#fig5)). In that analysis, the data best-supported a model in which genotype increases risk of bacteraemia caused by a broad range of pathogens, including bacteraemia secondary to pneumococcus, nontyphoidal _Salmonellae_, _Escherichia coli_, β-haemolytic streptococci, _Staphylococcus aureus,_ and other less common pathogens grouped as a single stratum (log10 Bayes factor = 4.72). Genotype at rs183868412 similarly modified risk of bacteraemia in the neonatal period and in older children (log10 Bayes factor = 2.70, [Figure 5B](#fig5)).

chunk: Figure 5.
:::
### Genetic variation at _BIRC6_ confers broad susceptibility to invasive bacterial disease.

Odds ratios and 95% confidence intervals of rs183868412 association with invasive bacterial disease stratified by pathogen (**A**), neonatal and non-neonatal sepsis (**B**), and bacteraemia with and without malaria parasitaemia (**C**). Odds ratios and 95% confidence intervals of rs183868412 association with invasive bacterial disease stratified by year (**D**), compared to age-standardised, annual malaria parasite prevalence in Kilifi, Kenya, as estimated from parasite prevalence among trauma admissions (**E**).We calculated effect estimates using multinomial logistic regression. We compared models of association across strata using a Bayesian approach (see Materials and methods). Strata associated with rs183868412 genotype in the most likely model in each analysis are highlighted in blue.

```{r fig.height=10, message=FALSE, warning=FALSE}
#' @width 40
#' @height 30
#Plot pathogen-specific effects of rs183868412 genotype

cols <- brewer.pal(11, "RdYlBu")
cols2 <- brewer.pal(8, "Dark2")
cols3 <- brewer.pal(8, "Set2")

label <- rep(c("Aci (n=118)", "BHS (n=130)", "E. coli (n=141)", "Hib (n=113)", "NTS (n=159)", "Other (n=242)","Pneumo (n=391)", "S. aureus (n=151)"),3)
mean  <- c(0.1294529, 0.97544283, 0.94372723, 0.02255999, 0.99779713, 0.80957666, 0.80415958, 0.53290879)
se  <- c(0.4374372, 0.3009951, 0.2873546, 0.4725781, 0.2694641, 0.2438325, 0.2045006, 0.3296354)
lower <- mean-1.96*se
upper <- mean+1.96*se
  
df <- data.frame(label, mean, lower, upper)
df$facet <- "rs183868412:T"

# reverses the factor level ordering for labels after coord_flip()
df$label <- factor(df$label, levels=rev(df$label)[c(1:8)])
path.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = c(cols[c(10,10,10,10)],cols3[8],cols[c(10,10)],cols3[8])) + scale_colour_manual(values = c(cols[c(10,10,10,10)],cols3[8],cols[c(10,10)],cols3[8])) +
  geom_hline(yintercept=0, lty=2) +  # add a dotted line at x=1 after flip
  coord_flip() +  # flip coordinates (puts labels on y axis)
  ylab("OR") + scale_x_discrete(expand = expand_scale(add = 1)) +
  scale_y_continuous(breaks=c(-0.693147181, 0, 0.693147181, 1.386294361),
                     labels=c("0.5",	"1.0",	"2.0",	"4.0"), limits = c(-1,2)) +
  theme_bw() + theme(legend.position = "none", axis.title.y = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15),plot.title=element_text(size=20))+
  facet_wrap(~facet, ncol=1)+
  ggtitle("Pathogens") +
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 15, color = "white", face = "bold.italic"))

#Plot stratum-specific effects of rs183868412 genotype on neonatal on non-neonatal disease

label <- rep(c("1st 29 days (n=195)", ">28 days (n=1,245)"),3)
mean  <- c(0.9121288, 0.5966586)
se  <- c(0.2547673, 0.1514635)
lower <- mean-1.96*se
upper <- mean+1.96*se

df <- data.frame(label, mean, lower, upper)
df$facet <- "rs183868412:T"

# reverses the factor level ordering for labels after coord_flip()
df$label <- factor(df$label, levels=rev(df$label)[c(1:2)])
neo.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  ylim(-1,2) +
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = c(cols[c(10,10)])) + scale_colour_manual(values = c(cols[c(10,10)])) +
  geom_hline(yintercept=0, lty=2) +  # add a dotted line at x=1 after flip
  coord_flip() +  # flip coordinates (puts labels on y axis)
  ylab("OR") + scale_x_discrete(expand = expand_scale(add = 1)) +
  scale_y_continuous(breaks=c(-0.693147181, 0, 0.693147181, 1.386294361),
                     labels=c("0.5",	"1.0",	"2.0",	"4.0"), limits=c(-1,2)) +
  theme_bw() + theme(legend.position = "none", axis.title.y = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15),plot.title=element_text(size=20))+
  facet_wrap(~facet, ncol=1)+
  ggtitle("Neonatal sepsis") +
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 15, color = "white", face = "bold.italic"))


#Plot stratum-specific effects of rs183868412 genotype on neonatal on non-neonatal disease

label <- rep(c("Bact. no Pf (n=1,236)", "Bact. with Pf (n=204)"),3)
mean  <- c(0.5811352, 0.9540062)
se  <- c(0.1525740, 0.2424632)
lower <- mean-1.96*se
upper <- mean+1.96*se

df <- data.frame(label, mean, lower, upper)
df$facet <- "rs183868412:T"

#levels(df$label)

cols3 <- brewer.pal(8, "Set2")


# reverses the factor level ordering for labels after coord_flip()
df$label <- factor(df$label, levels=rev(df$label)[c(1:2)])
para.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = c(cols[c(10,10)])) + scale_colour_manual(values = c(cols[c(10,10)])) +
  geom_hline(yintercept=0, lty=2) +  # add a dotted line at x=1 after flip
  coord_flip() +  # flip coordinates (puts labels on y axis)
  ylab("OR") + scale_x_discrete(expand = expand_scale(add = 1)) +
  scale_y_continuous(breaks=c(-0.693147181, 0, 0.693147181, 1.386294361),
                     labels=c("0.5",	"1.0",	"2.0",	"4.0"), limits=c(-1,2)) +
  theme_bw() + theme(legend.position = "none", axis.title.y = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15),plot.title=element_text(size=20))+
  facet_wrap(~facet, ncol=1)+
  ggtitle("Parasitaemia") +
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 15, color = "white", face = "bold.italic"))

#Plot stratum-specific effects of rs183868412 genotype on risk of bacteraemia over study period

cols <- brewer.pal(11, "RdYlBu")
cols2 <- brewer.pal(8, "Dark2")


label <- c("95", "96", "97", "98", "98/99/00", "00","01/02","02","03/04","04","05/06","06","07/08","08", "09/10", "10","11" , "12", "13","14", "15")
mean  <- c(NA,NA,NA,NA,0.826615, NA, 0.796132,NA,0.546421,NA,0.510647,NA,0.567562,NA,1.03668,NA,NA,NA,NA,NA,NA)
se  <- c(NA,NA,NA,NA,0.174712, NA,0.203987,NA, 0.200618, NA, 0.242699,NA, 0.268059,NA, 0.509352,NA,NA,NA,NA,NA,NA)
lower <- mean-1.96*se
upper <- mean+1.96*se

df <- data.frame(label, mean, lower, upper)
df$facet <- "Bacteraemia risk & rs183868412:T"

#levels(df$label)

cols3 <- brewer.pal(8, "Set2")


# reverses the factor level ordering for labels after coord_flip()
df$label <- factor(df$label, levels=c("95", "96", "97", "98", "98/99/00", "00","01/02","02","03/04","04","05/06","06","07/08","08", "09/10", "10","11" , "12", "13","14", "15"))
yr.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  scale_x_discrete(breaks=c("95","00","05/06","10"),
                                 labels=c("1995", "2000", "2005", "2010"),expand = expand_scale(add = 1))+
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = cols[rep(10,21)]) + scale_colour_manual(values = c(cols[rep(10,21)])) +
  geom_hline(yintercept=0, lty=2) +  # add a dotted line at x=1 after flip
  ylab("OR") +
  scale_y_continuous(breaks=c(-0.693147181, 0, 0.693147181, 1.386294361, 2.079441542),
                     labels=c("0.5",	"1.0",	"2.0",	"4.0", "8.0"), limits=c(-1,2.5)) +
  theme_bw() + theme(legend.position = "none", axis.title.x = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15),panel.grid.major = element_blank(), 
                     panel.grid.minor = element_blank(),axis.text.x=element_blank(),
                     axis.ticks.x=element_blank())+
  facet_wrap(~facet, ncol=1)+
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 15, color = "white", face = "bold.italic"))

pp = read.table("para_prev.txt", header = TRUE)

#Plot falling parasite prevalence among trauma admission over the study period

para_prev<-ggplot(data=pp, aes(x=year, y=malinc)) + geom_line(lwd=1.5) +
geom_ribbon(aes(ymin=low, ymax=high), linetype=2, alpha=0.3, fill = "grey70", color = "red", 
            outline.type = "both") +
  theme_bw() + xlab("Year") + ylab("Prevalence (%)") + theme(panel.grid.major = element_blank(), 
                                                             panel.grid.minor = element_blank(), 
                                                             axis.title=element_text(size=15), 
                                                             axis.text=element_text(size=12),
                                                             plot.title=element_text(size=18))

(path.fp|(neo.fp/para.fp))/(yr.fp/para_prev) + plot_annotation(tag_levels = 'A')
```
:::
{#fig5}

Malaria infection results in increased risk of invasive bacterial disease secondary to a broad range of pathogens [@bib51], and genetic determinants of malaria risk (e.g. sickle cell trait) modify risk of bacterial infection in malaria-endemic settings [@bib51]. The observation that, among children with a clinical diagnosis of severe malaria, risk of disease is only modified by rs183868412 in children with a low probability that their disease represents ‘true’ severe malaria ([Figure 5](#fig5)) suggests that the effect of genetic variation at _BIRC6_ on invasive bacterial disease risk operates independently of malaria. In keeping with this, the data best-supports an effect of rs183868412 genotype on bacteraemia risk in children both with and without concomitant parasitaemia (log10 Bayes factor = 2.73, [Figure 5C](#fig5)). In addition, unlike sickle cell trait [@bib51], the increased risk of invasive bacterial infection conferred by rs183868412:T carriage in the study setting is stable across a period of declining malaria prevalence (Bayes factor = 8.7, [Figure 5D](#fig5)).

## Evidence of selection pressure and pleiotropy at rs183868412

Common genetic variation associated with a twofold increased risk of invasive bacterial infection in children, in particular across a broad range of pathogens, will be subject to considerable negative selection pressure. The derived allele rs183868412:T, associated with increased risk of invasive bacterial disease in Kenyan children, is absent in non-African populations (<https://gnomad.broadinstitute.org>). Within Africa, rs183868412:T is present in all nine African populations included in the MalariaGEN consortium project ([@bib4]; [Table 6](#table6)), MAFs ranging from 0.011 in The Gambia to 0.034 in Malawi. There is no evidence for within-Africa differentiation at rs183868412 ($p=0.601$) providing no support for a selective sweep at the locus. We further evaluated evidence for recent directional selection pressure, examining integrated haplotype scores (iHS) [@bib54] in seven African populations included in the 1000 Genomes Project. In those data, there is no evidence to support recent selection at the locus (minimum rank $p=0.07$, maximum iHS = 1.3). Finally, to understand whether variation at this locus may be maintained in populations through beneficial effects on other traits, we examined evidence for pleiotropy at rs183868412 in a meta-analysis of GWAS data [@bib25] of 33 cardio-metabolic traits from Uganda, Ghana, Nigeria, South Africa, and Kenya (sample size ranging from 2671 to 14,126 individuals). In keeping with the MalariaGEN consortium project data, MAFs at rs183868412 range from 0.015 to 0.028. In these data there is no evidence for pleiotropy at the locus, with no evidence for an effect of rs183868412 on any of the 33 traits tested (minimum meta-analysis $p=0.078$). The explanation for the persistence of this polymorphism, therefore, remains an open question.

table: Table 6.
:::
### rs183868412 frequencies in Africa.

| Population   | Number | MAF   |
| ------------ | ------ | ----- |
| Gambia       | 2605   | 0.011 |
| Mali         | 183    | 0.021 |
| Burkina Faso | 596    | 0.009 |
| Ghana        | 320    | 0.014 |
| Nigeria      | 22     | 0.024 |
| Cameroon     | 685    | 0.031 |
| Malawi       | 1317   | 0.034 |
| Tanzania     | 402    | 0.028 |
| Kenya        | 1681   | 0.017 |
:::
{#table6}

## rs183868412 is associated with alternative splicing of _BIRC6_ in stimulated monocytes

Trait-associated genetic variation identified by GWAS is highly enriched for regulatory variation. The African specificity of the trait-associated variation identified here makes annotation with publicly available regulatory mapping data challenging. To investigate the regulatory function of rs183868412 in immune cells in African populations, we used eQTL Catalogue (<https://www.ebi.ac.uk/eqtl>) mapping data [@bib36] from 100 African ancestry individuals in primary monocytes with and without innate stimulation [@bib49]; influenza A virus, LPS, Pam3CSK4 (synthetic lipoprotein and TLR1/2 agonist) and R848 (a TLR7/8 agonist). In this dataset, rs183868412 is well imputed (${\displaystyle {\mathrm{r}}^{2}=0.998}$), with an MAF of 0.05 (10 heterozygous individuals). We found no evidence for a regulatory effect of rs183868412 at the gene level in monocytes regardless of stimulation. We did, however, observe an effect of rs183868412 genotype on expression of a 12 bp _BIRC6_ exonic sequence (chr2:32,453,943–32,453,954, $p=1.18\times {10}^{-5}$), with evidence for colocalisation of this eQTL with our GWAS signal (posterior probability of colocalisation, PP4 = 0.951, [Figure 6](#fig6)). This effect was only observed following stimulation with Pam3CSK4 ([Figure 6](#fig6)), with the bacteraemia risk allele, rs183868412:T, being associated with reduced expression of this sequence. That 12 bp sequence is associated with an alternative splicing event that results in extension of a _BIRC6_ exon. The 23rd exon (ENSE00001189810, chr2:32,453,808–32,453,942) of the canonical _BIRC6_ transcript, ENST00000421745.6, is 135 bp long and terminates immediately before the 12 bp sequence associated with rs183868412:T genotype. The 22nd exon (ENSE00003835010, chr2:32,453,808–32,453,942) of an alternative _BIRC6_ transcript, ENST00000648282.1, is 147 bp long, having the same start site but including the 12 bp sequence at its 3’ end. Thus, increased risk of invasive bacterial disease may be associated with decreased expression of an alternative _BIRC6_ transcript in TLR1/2-stimulated monocytes.

chunk: Figure 6.
:::
### Regulatory function of rs183868412 monocytes.

(**A**) Betas and 95% confidence intervals of rs183868412 association with expression of a 12 bp _BIRC6_ exonic sequence (chr2:32,453,943–32,453,954) in monocytes. Monocytes are naive or stimulated with LPS (lipopolysaccharide), IAV (influenza A virus), Pam3CSK4, and R848. (**B**) Colocalisation of the _BIRC6_ exon eQTL in Pam3CSK4-stimulated monocytes colocalises with the risk locus for invasive bacterial disease (${\displaystyle \mathrm{P}\mathrm{P}4=0.951}$). SNPs are coloured according to linkage disequilibrium to rs183868412. Effect estimates are calculated by linear regression.

```{r message=FALSE, warning=FALSE}
#' @width 20
#' @height 14
library(dplyr)
library(ggplot2)
library(patchwork)
library(RColorBrewer)
library(ggrepel)
library(ggplotify)
library(nnet)
library(coloc)

#Colocalisation of BIRC6 exon eQTL with invasive bacterial disease association at rs183868412
d1 <- read.table("exon_birc6.txt", header = T)
ld <- as.matrix(read.table("ld_for_coloc.txt", row.names = 1, header = T))

eqtl.list <- list(beta = d1$BETA, varbeta = (d1$SE)^2, type = "quant", N = 200, MAF = d1$maf, snp = as.character(d1$SNP), LD=ld)
gwas.list <- list(beta = d1$gwas.beta, varbeta = (d1$gwas.se)^2, type = "cc", N = 5400, s = 0.479, snp = as.character(d1$SNP), LD=ld)
E=runsusie(eqtl.list,coverage=0.01)
G=runsusie(gwas.list, coverage=0.01)
susie.res=coloc.susie(E,G)
h4 <- round(max(susie.res$summary[,8]),2)
hit1 <- as.character(susie.res$summary[which(susie.res$summary[,8]==max(susie.res$summary[,8])),2]$hit1)
hit2 <- as.character(susie.res$summary[which(susie.res$summary[,8]==max(susie.res$summary[,8])),3]$hit2)

#plot colocalsation
exon <- d1

cols <- brewer.pal(8,"Set2")
cols2 <- brewer.pal(11,"Spectral")

exon$bin_r2 <- 1
exon$bin_r2[which(exon$r2>0.1 & exon$r2 <= 0.3)] <- 2
exon$bin_r2[which(exon$r2>0.3 & exon$r2 <= 0.5)] <- 3
exon$bin_r2[which(exon$r2>0.5 & exon$r2 <= 0.7)] <- 4
exon$bin_r2[which(exon$r2>0.7)] <- 5

exon.plot <- ggplot(exon, aes(x=-log10(gwas.p), y=-log10(PVAL))) + 
  geom_point(data=subset(exon, bin_r2==1), color=cols[8], size=3) + 
  geom_point(data=subset(exon, bin_r2==2), color=cols2[5], size=3) + 
  geom_point(data=subset(exon, bin_r2==3), color=cols2[3], size=3) + 
  geom_point(data=subset(exon, bin_r2==4), color=cols2[2], size=3) + 
  geom_point(data=subset(exon, bin_r2==5), color=cols2[1], size=3) +
  ylim(0,5) +
  xlim(0,10) +
  ylab("BIRC6 exon eQTL -log10(p)") + 
  xlab("Bacterial disease GWAS -log10(p)") + 
  annotate("text", x = 0, y = 5, label = paste0("PP4=",h4), size=5, hjust=0) +
  theme_bw() + theme(legend.position = "none", axis.text=element_text(size=12),
                     axis.title=element_text(size=15), plot.title=element_text(size=20))


#Plot effects of rs184868412 genotype on exon expression in monocytes across stimulation contexts

label <- c("Naive", "IAV", "LPS", "Pam3CSK4", "R848")
mean  <- c(-0.155186,-0.0536451, 0.227218, -0.654215, -0.386583)
se  <- c(0.138402, 0.248336, 0.152519, 0.156258, 0.198497)
lower <- mean-1.96*se
upper <- mean+1.96*se

df <- data.frame(label, mean, lower, upper)
df$facet <- "rs183868412:T"

cols3 <- brewer.pal(8, "Set2")
df$label <- factor(df$label, levels=rev(df$label)[c(3,1,2,4,5)])

eqtl.fp <- ggplot(data=df, aes(x=label, y=mean, ymin=lower, ymax=upper)) +
  geom_pointrange(fatten=4, size = 1.5) + aes(fill = label, col=label) + scale_fill_manual(values = c(cols3[c(8,8,3,8,8)])) + scale_colour_manual(values = c(cols3[c(8,8,3,8,8)])) +
  geom_hline(yintercept=0, lty=2) +  
  coord_flip() +  
  ylab("beta") + scale_x_discrete(expand = expand_scale(add = 1)) +
  theme_bw() + theme(legend.position = "none", axis.title.y = element_blank(), axis.text=element_text(size=15),
                     axis.title=element_text(size=15),plot.title=element_text(size=20))+
  facet_wrap(~facet, ncol=1)+
  theme(strip.background = element_rect(color="black", fill=cols3[8], size=1.5, linetype="solid"), strip.text.x = element_text(
    size = 15, color = "white", face = "bold.italic"))

(eqtl.fp|exon.plot) + plot_annotation(tag_levels = 'A')
```
:::
{#fig6}

# Discussion

In this study, we have leveraged the close relationship between _P. falciparum_ infection and bacteraemia in African children [@bib51] to perform a GWAS of invasive bacterial infection in 5400 Kenyan children. We approached this by defining the probability with which each critically unwell child with a clinical diagnosis of severe malaria has a disease process directly mediated by malaria, that is ‘true’ severe malaria. We hypothesised that critically unwell children, with a low probability of having ‘true’ severe malaria, are enriched for invasive bacterial infections. We explored the validity of this approach, demonstrating that children with a low probability of ‘true’ severe malaria were indeed enriched for culture-proven bacteraemia and were at a higher risk of death than children with a higher probability. We therefore performed a GWAS weighting cases according to their likelihood of invasive bacterial disease. In doing so, we have identified and validated _BIRC6_ as a novel genetic susceptibility locus for all-cause invasive bacterial disease in Kenyan children.

The disease-associated locus modifies risk of invasive bacterial disease caused by a broad range of pathogens, including β-haemolytic streptococci, _E. coli_, nontyphoidal _Salmonella_, _Streptococcus pneumoniae,_ and _S. aureus_. Moreover it modifies risk of invasive infection in both the neonatal period and in older children. Furthermore, in contrast to the rs334 _HBB_ A>T mutation [@bib51], rs183868412 modifies risk of invasive bacterial disease in a manner independent of malaria, with rs183868412:T carriage increasing risk of disease across a period of falling malaria transmission and in children with and without concurrent parasitaemia.

We further demonstrate that rs183868412 mediates risk of invasive bacterial disease through the modification of _BIRC6_ splicing in Pam3CSK4-stimulated monocytes. _BIRC6_ (Baculovirus inhibitor of apoptosis protein repeat containing 6), also known as _BRUCE_ (BIR repeat containing ubiquitin-conjugating enzyme), encodes a large member of the inhibitor of apoptosis protein (IAP) family [@bib27]. Members of the IAP family bind to cognate caspases, inhibiting their activity, and thereby cell death, through occlusion of their active site [@bib53]. A proportion of IAPs also contain an E3 ubiquitin ligase allowing both direct inhibition of caspases and targeting of caspases for proteasomal degradation [@bib53]. _BIRC6_ contains both inhibitor of apoptosis domains and an E2/E3 ubiquitin ligase, which function to inhibit apoptosis in response to a variety of stimuli, both by interaction with and degradation of caspase-9, but also through the ubiquitination and degradation of SMAC, an IAP antagonist [@bib26; @bib5]. _BIRC6_ also regulates autophagosome-lysosome fusion [@bib19], and ubiquitinates (and targets for degradation) LC3, a key effector of autophagosome formation. Thus, _BIRC6_ also acts as a negative regulator of autophagy [@bib35].

It is highly plausible that _BIRC6_ could determine susceptibility to invasive infection through either its regulation of apoptosis or autophagy (or both). Sepsis induces marked changes in apoptosis across a range of immune cells [@bib32]. There is markedly enhanced apoptosis in both dendritic cells [@bib31] and in lymphocytes. Enhanced lymphocyte apoptosis is most striking in B cells and CD4+ T cells [@bib30] which, at least in part, is mediated by caspase-9. The consequent lymphopoenia is correlated with both severity of sepsis and outcome [@bib39]. In addition to the direct effects of immune cell loss on the innate and adaptive immune responses to invasive infection, sepsis-induced apoptosis induces immune cell dysfunction, phagocytosis of apoptotic cells resulting in reduction in pro-inflammatory cytokine production, and cross-presentation of antigen from apoptotic cells to adaptive immune cells [@bib2]. In keeping with a role for regulators of apoptosis in the pathogenesis of sepsis, members of the IAP family, including _NAIP/BIRC1_ and _BIRC3_, are downregulated in immune cells in patients with sepsis, as is the _BIRC6_ ubiquitination target _SMAC_ [@bib29]. Autophagy contributes to the direct removal of intracellular pathogens and, through the degradation of invading organisms in autophagosomes, directs antigen presentation and pro-inflammatory cytokine secretion [@bib17]. As above, _BIRC6_ regulates autophagy through its interaction with LC3, and overexpression of _LC3B_ limits inflammation and tissue injury in a mouse model of sepsis [@bib41].

In keeping with a role for _BIRC6_ in autophagy and apoptosis in sepsis, our data identify a role for genetic variation at _BIRC6_ in determining risk of invasive infection secondary to a broad range of bacteria. This is in contrast to previously published data describing susceptibility to invasive bacterial infection, which has highlighted a prominent role for genetic risk factors that are specific to single pathogens [@bib15; @bib24; @bib50]. In this study, the derived allele (T) at rs183868412 was associated with increased risk of bacteraemia secondary to gram-positive (β-haemolytic streptococci, _S. pneumoniae,_ and _S. aureus_) and gram-negative (_E. coli_, nontyphoidal _Salmonella_) pathogens, including intracellular and extracellular bacteria, and enteric and respiratory pathogens. Moreover, rs183868412 modified risk of bacteraemia in both the neonatal period, when infection is likely to be maternally derived, and in older children, when sources of community acquired infection will be more diverse. This modulation of invasive bacterial disease risk, despite diverse sources and routes of infection and diverse mechanisms of invasion, suggests a mechanism in which genetic variation at _BIRC6_ modifies risk of invasive infection downstream of initial mechanisms of infection and invasion. In an interesting parallel to this, common genetic variation at another ubiquitin-conjugating enzyme, _UBE2U_, has been shown to modify outcome in meningitis secondary to diverse pathogens in individuals of European ancestry [@bib40].

Our study has some important limitations. The African-specific nature of the trait-associated variation identified here limits our ability to comprehensively interrogate the effect of that variation in immune cells. The eQTL mapping data that we utilise here [@bib49] is limited in that it allows us only to consider a single immune cell type. It is also important to note that there are relatively few African-ancestry individuals in the eQTL mapping data we utilise here, and the splicing signal at _BIRC6_ is based on only 10 individuals heterozygous for rs183868412:T. A more complete understanding of the role played by genetic variation at _BIRC6_ plays in the pathogenesis of sepsis in African children will require more detailed expression and functional studies in African populations. In addition, there is a paucity of large-scale genetic association studies performed in African populations. The African-specific nature of the variation identified in our study therefore limits our ability to explore pleiotropic effects at this locus. Allied to this, the low MAF of rs183868412 will result in very limited power to detect selection events at the locus. Larger and richer datasets detailing genetic variation in African populations will be required to explore the wider phenotypic consequences of variation at _BIRC6_. Our study uses plasma _Pf_HRP2 concentrations to help identify children at low risk of ‘true’ severe malaria. Given the increasing frequency of _pfhrp2_ and _pfhrp3_ deletions in many settings, including in Africa [@bib1; @bib20; @bib22], an understanding of the local prevalence of _pfhrp2/3_ deletions will be important in considering how to translate this model to other settings.

Taken together, our data identify a role for _BIRC6_ in the pathogenesis of invasive bacterial infections in Kenyan children. By maximising our available sample size to include children with a high likelihood of invasive bacterial infection, but without culture-confirmed infection, we facilitate novel variant discovery and reveal a common genetic architecture of invasive bacterial disease secondary to diverse pathogens. In doing so, we expand our understanding of the biology of invasive infection in African children. In particular, these data inform our understanding of the biology shared by diverse bacterial infections causing a common clinical syndrome: sepsis.

# Materials and methods

## Study samples

Recruitment of the severe malaria cases, bacteraemia cases, and healthy controls have been described in detail elsewhere [@bib45; @bib50]. In brief, children under 14 years admitted to the high dependency ward of Kilifi County Hospital with a clinical diagnosis of severe malaria, defined as _P. falciparum_ parasites on blood film and at least one of reduced Blantyre Coma Score, severe anaemia (Hb &lt; 50 g/L), evidence of respiratory distress, hypoglycaemia, or hyperparasitaemia were eligible for recruitment as cases of severe malaria. During the study period, all children admitted to hospital, with the exception of elective surgical admissions and minor trauma, had a blood sample taken for bacterial culture (BACTEC 9050 instrument, Becton Dickinson, Franklin Lakes, NJ). Children under 14 years in whom a pathogenic organism was identified in blood were eligible for study inclusion (_Bacillus_ species, coryneform bacteria, coagulase-negative staphylococci, _Staphylococcus saprophyticus,_ and Viridans group streptococci were considered contaminants). Control children were recruited between 3 and 12 months of age from consecutive live births from the population which Kilifi County Hospital serves. Control children have been subject to longitudinal follow-up. Following explanation of the study, written informed consent was obtained from the parent or guardian of each child included in the study. Ethical approval was obtained from the Kenya Medical Research Institute (KEMRI) National Scientific Steering and Research Committees (approval numbers: SCC1192 and SCC967) and the Oxford Tropical Research Ethics Committee (OxTREC, approval numbers: 020–06 and 014–01).

## Models to define the probability of ‘true’ severe malaria

Among cases of severe malaria recruited to the study, we used probabilistic models to assign a probability of that child’s clinical presentation being mediated by parasite sequestration, that is ‘true’ severe malaria, as described previously [@bib57; @bib58]. Where available (n=1400), we used platelet counts and _Pf_HRP2 concentrations to derive the probabilistic model. In cases where _Pf_HRP2 concentration was not measured (n=800), we used white blood cell counts and platelet counts as input data to the model.

For both models (Model 1: _Pf_HRP2/platelet counts; Model 2: platelet counts/white blood cell counts), the probabilities of ‘true’ severe malaria were derived by fitting parametric latent class models. These assumed that each patient had a binary latent state (true severe malaria versus not severe malaria). For Model 1, we assumed that in each latent state the data were distributed as a single bivariate normal distribution. For Model 2, the data did not fit well to a two-component bivariate normal (white blood cell counts have much weaker diagnostic value) so we assumed that the data had bivariate Student’s t-distribution for the severe malaria state, and a flexible mixture of bivariates for the not severe malaria state.

## Genotyping and imputation

For the discovery analysis, we utilised genotypes generated as part of GWASs of severe malaria [@bib4] and bacteraemia [@bib50] previously reported in this population. Bacteraemia cases and controls were genotyped using the Affymetrix SNP 6.0 array and the severe malaria samples using the Illumina Omni 2.5M platform. SNP and sample quality control for both datasets are highly analogous, and have been described previously [@bib4; @bib50]. In brief, MalariaGEN SNP QC excluded poorly genotyped SNPs using the following metrics; SNP missingness >2.5%, MAF &lt;1%, Hardy-Weinberg equilibrium (HWE) ${\displaystyle p<1\times {10}^{-20}}$, plate effect ${\displaystyle p<1\times {10}^{-3}}$, and a recall test quantifying changes in genotype following a re-clustering process ${\displaystyle p<1\times {10}^{-6}}$. For Affymetrix-genotyped samples, SNP QC excluded poorly performing SNPs using the following metrics: SNP missingness > 2%, MAF &lt; 1%, genotype probability (info) &lt; 0.975, plate effect ${\displaystyle p<1\times {10}^{-6}}$, and HWE ${\displaystyle p<1\times {10}^{-20}}$. Sample QC on both platforms excluded sample outliers with respect to channel intensity, missingness, heterozygosity, population outliers, and duplicated samples (relatedness coefficient >0.75). In addition, for Affymetrix-genotyped samples, samples were excluded in cases of discordant sex as recorded in the clinical record and imputed from mean intensities from X and Y chromosome markers.

To facilitate combining datasets, we applied an additional set of cross-platform QC procedures. We defined a shared subset of SNPs genotyped and passing SNP QC on both platforms (n=167,108), observing high levels of genotype concordance (median concordance 0.993, [Figure 3—figure supplement 4](#fig3s4)) among the subset of samples genotyped on both platforms (n=1365). We used this shared SNP set to compute relatedness estimates and PCs in PLINK v1.90 [@bib10]. The major six PCs of shared genotypes differentiate self-reported ethnicity ([Figure 3—figure supplement 1](#fig3s1)) and are non-differential with respect to genotyping platform ([Figure 3—figure supplement 1](#fig3s1)). To harmonise QC across both platforms, we excluded MalariaGEN samples with discordant clinical and genetic sex (n=136). We further excluded one of duplicate or related sample pairs (relatedness coefficient >0.2) across platforms, retaining case samples where possible and excluding equal numbers of control samples genotyped on each platform (n=1973). Following QC, genotypes were phased using SHAPEIT2 [@bib16], and untyped genotypes imputed genome-wide using IMPUTE2 (v2.3.2) [@bib34; @bib33] with 1000 Genomes Phase III as a reference panel. Following imputation, we excluded SNPs with imputation info scores &lt; 0.5, MAF &lt; 1% and HWE ${\displaystyle p<1\times {10}^{-5}}$, applying each threshold for each platform and overall. Following SNP and sample QC, 14,010,600 autosomal SNPs and indels from 5400 samples (1445 bacteraemia cases, 1143 severe malaria cases, and 2812 control samples: 917 Illumina genotyped and 1895 Affymetrix genotyped) were taken forward as a combined discovery dataset for association analysis. Following QC and association analysis, we identified a further set of population outliers using ABERRANT [@bib7] for downstream sensitivity analysis.

## Estimation of SNP heritability

To estimate SNP heritability of bacteraemia in this population, we used genome-wide genotyping data from culture-confirmed bacteraemia cases and healthy controls genotyped with the Affymetrix SNP 6.0 array. For this analysis we used directly typed markers passing quality control as described above (n = 783,094). In addition to the sample quality control applied above, we additionally excluded one individual from each sample pair with relatedness coefficient >0.05, leaving a final sample size of 2559 (1042 cases, 1517 controls). We estimated SNP heritability using GCTA-GREML [@bib60]. For transformation of the heritability estimate to the liability scale, we assumed a population prevalence for bacteraemia of 2% in this setting.

## Association analysis and fine mapping

In the discovery analysis, we tested for association between genotype at each variant passing QC and invasive bacterial disease by logistic regression in an additive linear model. We used weighted logistic regression to reflect the probability of each case sample being a ‘true’ case of invasive bacterial infection. Cases of culture-confirmed bacteraemia were assigned a weight of 1, whereas cases of severe malaria were assigned weights of 1 − P(SM|Data), re-weighting the contribution of a case to the log-likelihood according to its probability of representing invasive bacterial infection. Control samples were assigned a weight of 1. Our regression thus assumes that the lower the probability of ‘true’ severe malaria, the greater the probability that a case represents culture-negative invasive bacterial disease. To control for confounding variation, we included the six major PCs of genotyping data and genotyping platform as covariates in the model. Weighted logistic regression was implemented using the _glm_ function in R. As described previously [@bib57], standard errors were transformed to reflect the reduced effective sample size resulting from inclusion of sample weights in the model. We considered ${\displaystyle p<5\times {10}^{-8}}$ to be significant.

We used a Bayesian approach to identify a set of SNPs with 95% probability of containing the causal variant at the trait-associated locus. Approximate Bayes factors [@bib56] were calculated for each SNP in the region (a 200 kb surrounding rs183868412) with a prior distribution of ${\displaystyle \mathrm{N}(0,{0.2}^{2})}$. All SNPs were considered equally likely to be the causal variant a priori. A set of SNPs with 95% probability of containing the causal SNP was defined as the smallest number of SNPs for which the summed posterior probabilities exceed 0.95.

## Replication samples and analysis

To replicate our findings from the discovery analysis, we used a second sample set, recruited from the same population as the discovery samples. Replication case samples were cases of bacteraemia only, and did not include cases of severe malaria without culture-confirmed bacterial infection. Case samples were recruited between 1 August 1998 and 30 October 2010. As for the discovery case samples, children under 14 years with a bacterial pathogen isolated from blood on admission to hospital were eligible for recruitment to the study. As above, control samples were recruited as part of a birth cohort from the same population, with children recruited between the ages of 3 and 12 months. Genotyping and QC procedures for these samples have been described previously. In brief, study samples were genotyped using the Immunochip Consortium [@bib12] array (Illumina). Sample QC was performed as for the discovery samples (above), with duplicate control samples (samples common to MalariaGEN and Immunochip controls, n=78) being removed from the replication set. As above, relatedness estimates and PCs were computed in PLINK v1.90 ([@bib10]; [Figure 4—figure supplement 1](#fig4s1)). SNP QC excluded the following variants: SNP missingness > 1%, MAF &lt; 1%, and HWE ${\displaystyle p<1\times {10}^{-10}}$. Following QC, 143,000 genotyped variants in 434 cases and 1258 control samples were taken forward for imputation. As above, imputation was performed with SHAPEIT2 [@bib16] and IMPUTE2 (v2.3.2) [@bib34; @bib33] with 1000 Genomes Phase III as a reference panel.

Following imputation, we further excluded poorly imputed (imputation info score &lt; 0.5) and rare (MAF &lt; 1%) variants and variants with HWE ${\displaystyle p<1\times {10}^{-10}}$. At variants associated with invasive bacterial disease (${\displaystyle p<5\times {10}^{-8}}$) in the discovery analysis, we tested for association with bacteraemia case-control status using logistic regression in an additive model in SNPTEST v2.5.6 [@bib44]. To exclude confounding variation, we included the major six PCs of genotyping data in the model. We considered evidence of association with bacteraemia in the replication samples with ${\displaystyle p<0.05}$ with the same direction of effect as in the discovery analysis to be significant.

## Bayesian comparison of models of association

At the locus of interest, we used multinomial logistic regression, implemented in SNPTEST v2.5.6 [@bib44] to estimate the additive effect of genotype on risk of bacteraemia stratified by pathogen, bacteraemia in the neonatal and non-neonatal periods, bacteraemia with and without _P. falciparum_ parasitaemia, and bacteraemia presenting at different time periods across a period of declining malaria transmission intensity. For these analyses we used only samples with culture-confirmed bacteraemia. In each case we used control status as the baseline stratum, and included the six major PCs of genotyping data to control for confounding variation as above.

For the pathogen-stratified analysis, we defined eight case strata among the discovery cases, one for each of the seven most commonly isolated organisms (_Acinetobacter_, n=118; β-haemolytic streptococci, n=130; _E. coli_, n=141; _Haemophilus influenzae_ type b, n=113; nontyphoidal _Salmonella_, n=159; pneumococci, n=390; _S. aureus_, n=152) and one stratum for the remaining other organisms (n=242). For the neonatal/non-neonatal disease analysis, we stratified cases as presenting in the first 28 days of life (n=195) or beyond that (n=1245). For the analysis comparing bacteraemia with and without malaria, we stratified cases with (n=204) and without (n=1236) _P. falciparum_ on their admission blood film. For each of these analysis, case strata were compared to Affymetrix-genotyped discovery control samples (n=1895) as a baseline stratum.

For the analysis stratified across year of admission, we defined case strata by grouping into six time periods according to their date of admission: 1998–2000 (n=498), 2001–2002 (n=349), 2003–2004 (n=467), 2005–2006 (n=310), 2007–2008 (n=237), 2009–2010 (n=111). For this analysis we used both discovery (Affymetrix) and replication (Immunochip) case and control samples. This allowed better coverage of the years later in the study, which were underrepresented in the discovery samples (the discovery median admission year is 2003, cf 2005 for the replication samples). In that analysis we used multinomial logistic regression in each cohort to estimate stratum-specific effects, combining these results in a fixed effects meta-analysis using BINGWA [@bib43].

We then compared models of association using a Bayesian approach [@bib50], considering the following models:

-   "Null": effect size = i.e. no association with bacteraemia.
-   "Same": effect size $N(0,{0.2}^{2})$ and fixed across strata.

Additional models consider each possible combination of a fixed effect size for associated strata and no association at other strata. For each model, we calculated approximate Bayes factors [@bib56] and posterior probabilities, assuming each model to be equally likely a priori. Statistical analysis was performed in R.

## eQTL mapping and colocalisation analysis

We used the eQTL Catalogue [@bib36] mapping pipeline (<https://github.com/eQTL-Catalogue/>, [@bib37]) to map eQTL in naive and stimulated monocytes [@bib49]. These data include bulk RNA-seq and genome-wide genotyping data from naive and stimulated primary monocytes isolated from individuals of European (n=100) and African (n=100) ancestry [@bib49]. Given the African-specific nature of variation at rs183868412, we performed eQTL mapping in this dataset restricting our analysis to samples of African ancestry. The eQTL Catalogue mapping pipeline has been described previously [@bib36]. In brief, sample genotypes (Illumina HumanOmni5-Quad genotyped) were pre-phased with Eagle v.2.4.1 [@bib42] and genotypes imputed with Minimac4 v.1.0.2 [@bib14] using 1000 Genomes Phase III as a reference panel. Gene expression, exon expression, transcript usage, and transcriptional event usage were quantified from RNA-seq data using HISAT2 v.2.1.0 [@bib38], DEXSeq v.1.18.4 [@bib3], and Salmon v.0.13.1 [@bib47]. Nominal mapping in cis was performed for each phenotype for variants within a 1 Mb window of the start of each gene using FastQTL [@bib46], including six PCs of genotyping and phenotype data as covariates in the model.

We then used the R package coloc v5.1.0 [@bib23] to identify evidence of causal variants shared by our bacterial disease-associated locus of interest and regulatory genetic variation identified in our eQTL mapping data. Coloc adopts a Bayesian approach to compare evidence for independent or shared association signals for two traits at a given genetic locus. We used the coloc.susie() command to allow colocalisation of multiple independent signals at a single locus for each trait. We considered evidence for colocalisation for each gene and exon within a 250 kb window of the peak association (rs183868412). We considered a posterior probability >0.8 supporting a shared causal locus to be significant.