An introduction to the isajsonr package

The isajsonr package

The isajsonr package is developed as a easy-to-use package for reading, modifying and writing files in the Investigation/Study/Assay (ISA) Abstract Model of the metadata framework using the JSON format.

ISA is a metadata framework to manage an increasingly diverse set of life science, environmental and biomedical experiments that employ one or a combination of technologies. Built around the Investigation (the project context), Study (a unit of research) and Assay (analytical measurements) concepts, ISA helps you to provide rich descriptions of experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable.

The ISA-JSON structure

The ISA-JSON format is described in full detail on the ISA-JSON website.

All ISA-JSON content regarding multiple Study and Assay should fall under one Investigation JSON structure, therefore should be recorded in a single JSON file. The JSON file SHOULD have a .json extension.

Example data

As an example for working with the isajsonr package we will use the data set that accompanies Castrillo et al. (2007). The associated file is included in the package.

Reading files in the ISA-JSON format

ISA-JSON files can be stored in two different ways, either as stand-alone file in a directory, or as .zip file containing the file. The example data is included as stand-alone file in the package. Both formats can be read into R using the readISAJSON function.

When reading an ISA-JSON file from a directory, the full location of the file has to be specified. The package will recognize a zipped file and extract it into a temporary folder before reading it.

## Read ISA-JSON file from directory.
isaObject1 <- readISAJSON(file = file.path(system.file("extdata/Castrillo/Castrillo.json", 
                                                       package = "isajsonr")))

In both cases readISATab will validate the content of the .json file against the isa-json model scheme which can be found here. The imported ISA-JSON file is stored in an object of the S4 class ISAjson. Since the information is almost identical for reading content from an unzipped and a zipped-file, the following sections will show the example for the files read from an unzipped file only.

Accessing and updating ISA objects.

All information from the ISA-JSON file is stored within the content slot in the ISAjson object. The path where the file was originally read from is stored in the path slot.

All section in the ISA structure have corresponding functions for accessing and modifying information. The names of these access functions correspond to the section they refer to, e.g. accessing the Ontology Source Reference (OSR) section in an ISAjson object can be done using the oSR() function. There is one notable exception to this. To prevent problems with the path() function, that already exists in quite some other packages, the path slot in an ISA object should be accessed using the isaPath() function.

## Access path for isaObjects
isaPath(isaObject1)
#> [1] "/home/runner/work/_temp/Library/isajsonr/extdata/Castrillo/Castrillo.json"

The path for isaObject1 shows the full path to the file that was read using readISAJSON.

The other sections are accessible in a similar way. Some more examples are shown below.

## Access studies.
isaStudies <- study(isaObject1)

## Print study names.
names(isaStudies)
#> [1] "filename"                      "identifier"                    "title"                        
#> [4] "description"                   "submissionDate"                "publicReleaseDate"            
#> [7] "Comment[Study Funding Agency]" "Comment[Study Grant Number]"

## Access study descriptors.
isaSDD <- sDD(isaObject1)

## Shows study descriptor for study s_BII-S-1.
isaSDD$`s_BII-S-1.txt`
#>       annotationValue termSource                              termAccession
#> 1 intervention design        OBI http://purl.obolibrary.org/obo/OBI_0000115

It is not only possible to access the information in the different sections in an ISAjson object, the information can also be updated. As the access functions, the update functions have the same name as the sections they refer to. As an example, let’s assume an error sneaked into the OSR section and we want to update one of the source versions.

First have a look at the current content of the OSR section.

(isaOSR <- oSR(isaObject1))
#>                                 description                                             file  name
#> 1    Ontology for Biomedical Investigations http://bioportal.bioontology.org/ontologies/1123   OBI
#> 2             BRENDA tissue / enzyme source        ArrayExpress Experimental Factor Ontology   BTO
#> 3            NEWT UniProt Taxonomy Database                                                   NEWT
#> 4                             Unit Ontology                                                     UO
#> 5  Chemical Entities of Biological Interest                                                  CHEBI
#> 6         Phenotypic qualities (properties)                                                   PATO
#> 7 ArrayExpress Experimental Factor Ontology                                                    EFO
#>   version
#> 1   47893
#> 2  v 1.26
#> 3  v 1.26
#> 4  v 1.26
#> 5  v 1.26
#> 6  v 1.26
#> 7  v 1.26

Now we update the version of the Ontology for Biomedical Investigations from 47893 to 47894. Then we update the modified ontology source data.frame in the ISAjson object.

## Update version number.
isaOSR[1, "version"] <- 47894

## Update oSR in ISAjson object.
oSR(isaObject1) <- isaOSR

## Check the updated oSR.
oSR(isaObject1)
#>                                 description                                             file  name
#> 1    Ontology for Biomedical Investigations http://bioportal.bioontology.org/ontologies/1123   OBI
#> 2             BRENDA tissue / enzyme source        ArrayExpress Experimental Factor Ontology   BTO
#> 3            NEWT UniProt Taxonomy Database                                                   NEWT
#> 4                             Unit Ontology                                                     UO
#> 5  Chemical Entities of Biological Interest                                                  CHEBI
#> 6         Phenotypic qualities (properties)                                                   PATO
#> 7 ArrayExpress Experimental Factor Ontology                                                    EFO
#>   version
#> 1   47894
#> 2  v 1.26
#> 3  v 1.26
#> 4  v 1.26
#> 5  v 1.26
#> 6  v 1.26
#> 7  v 1.26

In a similar way all sections in an ISAjson object can be accessed and updated.

Processing assay section

The assay section may contain information about the files used to store the actual data for the assay. Per assay section two types of data files may be referred to: 1) the file(s) containing the raw data, and 2) the file(s) containing derived data.

Looking at the assay section in our example data, we see that the a_proteome assay in survey s_BII-S-1 only contains both raw and derived data files, as shown in the dataFileType column in the output.

## Inspect assay tab for survey s_BII-S-1.
isaAFile <- aFiles(isaObject1)
head(isaAFile$`s_BII-S-1.txt`$a_proteome.txt)
#>                                                    dataFile@id                   dataFilename
#> 1                     #data/proteinassignmentfile-proteins.csv                   proteins.csv
#> 2 #data/derivedspectraldatafile-PRIDE_Exp_Complete_Ac_8763.xml PRIDE_Exp_Complete_Ac_8763.xml
#> 3                    #data/rawspectraldatafile-spectrum.mzdata                spectrum.mzdata
#> 4   #data/posttranslationalmodificationassignmentfile-ptms.csv                       ptms.csv
#> 5 #data/derivedspectraldatafile-PRIDE_Exp_Complete_Ac_8762.xml PRIDE_Exp_Complete_Ac_8762.xml
#> 6 #data/derivedspectraldatafile-PRIDE_Exp_Complete_Ac_8761.xml PRIDE_Exp_Complete_Ac_8761.xml
#>                                      dataFiletype dataFileComment[PRIDE Accession]
#> 1                         Protein Assignment File                             8761
#> 2                      Derived Spectral Data File                             8763
#> 3                          Raw Spectral Data File                             8761
#> 4 Post Translational Modification Assignment File                             8761
#> 5                      Derived Spectral Data File                             8762
#> 6                      Derived Spectral Data File                             8761
#>   dataFileComment[PRIDE Processed Data Accession]
#> 1                                            8761
#> 2                                            8763
#> 3                                            8761
#> 4                                            8761
#> 5                                            8762
#> 6                                            8761

To read the contents of the data files, either raw or derived, in the assay tab file, we can use the processAssay() function. The exact working of this function depends on the technology type of the assay. For most technology types the data files are read as plain .txt files assuming a tab-delimited format. Only for mass spectrometry and microarray data the files are read differently (see the sections below). The protein assignment file in the example above is a regular data file and will be read as such.

Before being able to process the assay file, i.e. read the data, we first have to extract the assay tabs using the getAssayTabs() function. This function extracts all the assay files from an ISAjson object and stores them as assayTab objects. These assayTab objects contain not only the content of the assay tab file, but also extra information, e.g. technology type.

## Get assay tabs for isaObject1.
aTabObjects <- getAssayTabs(isaObject1)
aTabObjects$`s_BII-S-1.txt`[[2]]@aFile <- aTabObjects$`s_BII-S-1.txt`[[2]]@aFile[!startsWith(aTabObjects$`s_BII-S-1.txt`[[2]]@aFile$dataFilename, prefix = "/"), ]

## Process assay data.
isaDat <- processAssay(isaObject = isaObject1,
                       aTabObject = aTabObjects$`s_BII-S-1.txt`[[2]],
                       type = "derived")
#> Error in bpstopOnError(BPPARAM) : could not find function "bpstopOnError"
 
## Display first rows and columns.
#head(isaDat[, 1:10])

The data is now stored in isaDat and can be used for further analysis within R.

Mass spectrometry assay files

Mass spectrometry data is often stored in Network Common Data Form (NetCDF) files, i.e. in .CDF files. Assay data containing these data will be processed in a different way than regular assay data. To be able to do this the xcms package is required. This package is available from Bioconductor.

After reading the ISA-Tab files, we can now process the mass spectrometry assay data. In this example the raw data is available, so when processing the assay we specify type = "raw". The rest of the code is similar to the previous section.

## Get assay tabs for isaObject3.
# aTabObjects3 <- getAssayTabs(isaObject3)
# 
# ## Process assay data.
# isaDat3 <- processAssay(isaObject = isaObject3,
#                         aTabObject = aTabObjects3$s_Proteomic_profiling_of_yeast.txt$a_metabolite.txt,
#                         type = "raw")
# 
# ## Display output.
# isaDat3

As the output shows, processing the mass spectrometry data gives an object of class xcmsSet from the xcms package. This object contains all available information from the .CDF file that was read and can be used for further analysis.

Microarray assay files

Microarray data is often stored in an Affymetrix Probe Results file. These .CEL files contain information on the probe set’s intensity values, and a probe set represents a gene. Assay data containing these data will be processed in a different way than regular assay data. To be able to do this the affy package is required. This package is available from Bioconductor.

Processing microarray data is done in a very similar way as processing mass spectrometry data, as described in the previous section. The main difference is that the resulting object will in this case be an object object of class ExpressionSet, which is used as input in many Bioconductor packages.

Writing files in the ISA-JSON format.

After updating an ISAjson object, it can be written back to a directory using the writeISAjson() function. All content of the ISAjson object will be written to a json file following the ISA-JSON standard specifications. The location of the output file is specified in the file argument.

## Write content of ISA object to a temporary file.
writeISAjson(isaObject = isaObject1, 
             file = tempfile())

References

Castrillo, Juan I, Leo A Zeef, David C Hoyle, Nianshu Zhang, Andrew Hayes, David CJ Gardner, Michael J Cornell, et al. 2007. “Growth Control of the Eukaryote Cell: A Systems Biology Study in Yeast.” Journal of Biology 6 (2). https://doi.org/10.1186/jbiol54.

Bart-Jan van Rossum

2024-11-26