- 5 min read

Visualizing event metadata

Home >> Nanopore training course >> Analyzing your data >> Visualizing event metadata

Event metadata
Sub-level metadata
Plotting and visualization

The examples in this tutorial were done using Nanolyzer™, Northern Nanopore's data analysis software.

The analysis run in the previous section generated an enormous amount of information about all of the events. There are two main types of metadata: those that pertain to a translocation event as a whole, and those that pertain to a sub-level within an event. Unless otherwise noted, the metadata below are available for use immediately upon completion of an analysis run. A few entries listed below are calculated during post-processing, and are explicitly labelled as such. Successfully fitted events have the following properties available for plotting.

Event Metadata

Event Number

Every event that is detected is assigned an integer id. Events that are rejected will appear in the metadata as gaps in the id column, which would otherwise simply count up in integer steps from 1. This parameter can be useful to identify locations in the data where rejected events cluster together, which can be indicative of clogged pores or electrical interference in your signals. It is also used extensively when performing filtering operations on your event database, which is discussed in detail in a dedicated upcoming section of this tutorial.

Type

Indicates which algorithm was used to generate the event. A 0 means that the CUSUM+ algorithm was used, while a 1 indicates that nonlinear fitting was performed. [1,2]

Start Time (s)

The time in seconds when the event begins. This is useful for finding the event in the raw data trace, and for calculating inter-event times for the purposes of capture rate analysis, which we cover in more detail later.

Baseline (pA)

The baseline current local to the event in picoamps.

Dwell Time (us)

The full duration of the translocation event in microseconds.

Equivalent Charge Deficit (pC)

The integrated current difference between the translocation event and the baseline current, which can be interpreted as the total amount of charge that would have passed through the pore in the form of ionic current during the event if the molecule had not been blocking it. This parameter is useful for detecting DNA fragments in what should be a mono-disperse sample, since one expected ECD to be roughly conserved and independent of molecular conformation during translocation, to first order.

Maximum Blockage (pA)

The deepest fitted blockage level in picroamperes, as measured by the difference between the fitted current level and the fitted baseline mean current for that event.

Maximum Blockage Duration (us)

The duration in microseconds of the fitted sub-level with the deepest blockage.

Minimum Blockage (pA)

The is the shallowest fitted blockage level in picroamperes, as measured by the difference between the fitted current level and the fitted baseline mean current for that event.

Minimum Blockage Duration (us)

Theuration in microseconds of the fitted sub-level with the shallowest blockage.

Average Blockage (pA)

The average different between the current trace and the baseline during the event, weighted by time, as calculated using the current data after low-pass Bessel filtering rather than the fit values.

Maximum Deviation (pA)

The difference between the fitted baseline current for that event and the single filtered data point that is furthest from the baseline. This is analogous to Maximum Blockage (pA), but uses the current data after the low-pass Bessel event filter has been applied rather than the fitted value, and therefore includes the influence of the local noise.

Number of Levels

The total number of fitted sub-levels that appear in the event, including both baseline before and after the event.

Residuals (pA)

The root-mean-square difference between the fitted current and the current value during the event after the low-pass Bessel event filter has been applied. This can be useful for quickly identifying bad fits that still pass quality control.

Intra Event Threshold Crossings

Counts the total number of times your internal event thresholds and internal event hysteresis values are crossed by the current during a translocation event. This is useful for classifying events into categories for filtering your event database.

Cluster ID (optionally calculated in post-processing)

The integer label of the cluster to which the event has been assigned during clustering operations. This property will only be accessible after event clustering has been performed.

Cluster Confidence (optionally calculated in post-processing)

The probability that the event belongs to the cluster to which the event has been assigned during clustering operations. Note that every clustering algorithm assigns this probability via different metrics. This property will only be accessible after event clustering has been performed.

Sub-level metadata

Sublevel Current (pA)

The fitted current value of every sub-level within an event in picoamps.

Sublevel Duration (us)

The duration of every fitted sub-level in microseconds.

Sublevel Blockage (pA)

The difference between the fitted current value of every sub-level within an event and the local basliine, in picoamps.

Sublevel Standard Deviation (pA)

The root mean square difference between the fitted current during each sub-level and the current data during that level after low-pass Bessel filtering, in picoamps.

Sublevel Label

The integer label to which the sub-level has been assigned during clustering operations. This property will only be accessible after sub-level clustering has been performed.

Sublevel Confidence

The probability that the sub-level belongs to the cluster to which it has been assigned during clustering operations. Note that every clustering algorithm assigns this probability via different metrics. This property will only be accessible after sub-level clustering has been performed.

Miscellaneous metadata:

Intra-Event Threshold Crossing Times (us)

The time-point during the event in microseconds that your internal event thresholds and internal event hysteresis values are crossed by the current during a translocation event.

A future release of Nanolyzer will include the ability to define and name new metadata columns as mathematical functions of existing ones.

Plotting and visualization

Most of the time spent in Nanolyzer will likely be spent in the Statistics tab, which is the main hub for data visualization. Here you can visualize your experiment in a variety of ways, plotting histograms of metadata in 1 and 2 dimensions and scatterplots of metadata in 2 or 3 dimensions. A few examples of commonly used plot types are given below.

If you've been paying attention so far, you probably noticed a problem: metadata that pertains to the event as a whole is a single number, whereas metadata relating to sub-levels might represent many numbers for a single event. In cases where two or more metadata columns of different types are contributing to a plot, the event metadata will be duplicated as many times as needed to provide a matching point for every sub-level. It is critically important to note that when plotting anything relating to sub-levels that an event with N sub-levels will contribute N points to the plot or histogram. If there is an overzealous fit present in your data with hundreds of sub-levels it can appear to be artificially weighted on your plots. For this reason, it is important to carefully check the distribution of fits and ensure that outliers are appropriately excluded when plotting sub-level metadata.

Once you are happy with your plot, you can export the data simply into a .csv format that can be used in your favorite plotting software, or you can simply save the resulting image directly. Future versions of Nanolyzer will include additional plot customization options to obviate the need for any secondary plotting software.

We will revisit this topic in further detail after we discuss database filtering.

Analysis Table of Contents

Previous Topic: Fitting events

Next Topic: Clustering and substructure labeling

5. References

[1] J. H. Forstater et al., “MOSAIC: A modular single-molecule analysis interface for decoding multistate nanopore data,” Anal. Chem., vol. 88, no. 23, 2016, doi: https://www.doi.org/10.1021/acs.analchem.6b03725.

[2] K. Briggs, “Solid-State Nanopores: Fabrication, Application, and Analysis,” uOttawa, 2018.

Last edited: 2021-08-02