4 min read

Event database filtering

Home >> Nanopore training course >> Analyzing your data >> Event database filtering

Subsets
Constructing filters
Managing filters
Getting help

The examples in this tutorial were done using Nanolyzer™, Northern Nanopore's data analysis software.

1. Subsets

Analysis within Nanolyzer is primarily driven by the notion of a subset: a selection of translocation events that match some user-defined criteria. This could be almost anything: all events that show molecules that are folded, all events that exhibit a particular sequence of sub-levels, all events that are deeper than some limiting current blockage, etc.

Being able to separate your experimental datasets into interesting, physically meaningful subsets for analysis is the most powerful and most complex feature set of Nanolyzer. In this section we provide a basic introduction to event filtering using metadata, and a few canonical examples of common filtering tasks.

Sub-setting in Nanolyzer is based on SQL database queries. We have graphical tools to help you build these queries in what we hope is an intuitive way. This is the closest to programming you will ever get while using Nanolyzer, and as we work with you to understand how you use this tool, you can expect common analysis filtering tasks to make their way into future versions of Nanolyzer as single-click operations.

2. Constructing filters

2.a. Simple filters

To get started, once you have a data set analyzed and are happy with the fits, open the Data Manager through the Data->Data Manager menu and click on New Filter. This will open the basic filtering panel from which we will build our data filters. Optionally, rename it using the control indicated in the figure below.

Say for example we want to look only at events that have a maximum blockage greater than 1000 pA. We will label out operation deep_events, choose the Maximum Blockage (pA) parameter for the first operand, an operator (in this case ">") and enter 2000 in the second operand box. Finally, we click "Add Operation", and see that our operation has appeared in the list below. If you made a mistake, you can edit it using the indicated controls.

2.b. Custom filters

If you're familiar with SQL you probably recognize it already in the structure of the filters generated. The "CU" box, when ticked, allows you to enter custom SQL queries, which enable almost arbitrarily complex filters to be created on the fly. This is particularly useful for filtering events by sub-level properties. For example, to select all events that contain at least one sub-level with a blockage greater than 1000 pA, you might enter

id IN (SELECT id FROM sublevels WHERE (blockages_pA > 500))

though there are many possible ways to achieve this same outcome. Note that internally, all queries will have "SELECT * from <table> WHERE" prepended to them.

In the Help section, you will find a list of all variables and associated data tables on which to filter and select data.

2.c. Combining filters

Filter operations, once defined, can be chained or nested to produce more complex and powerful filters. For example, suppose we want to not only consider only events with a deep max blockage, but we want to only consider events that have at least two internal sub-levels (or more precisely, at least 4 sub-levels, since the baseline before and after count). As before, we can build that operation. To combine them, note that both of our operations now appear in the list of operands, and we can choose <deep events> AND <multilevel events> to create a third filter that combines the two.

2.d. Applying filters

To apply our combined operation to our dataset, we select the database of event metadata (to start, this will be called "eventsdb" in the dropdown indicated below), type a name for our new subset, and press "Filter and make subset" to the right. Optionally, click the "Auto-update plot" box to see the new subset appear immediately on any Statistics tab plots you have active.

We can now use our new subset for visualization, clustering, and capture rate calculations. By default, 1D histograms and scatterplots will plot all actively defined subsets, while the 2D histogram will only plot the selected subset. To disable subsets in the plot, simple go to the Data Manager again, select the offending subset, and uncheck "Enable Plot", as shown below. You can also view the filters applied or save, delete, or reset the subset easily through the buttons indicated below.

3. Managing Filters

3.a. Saving filters

Once you are satisfied with your filters, you can save them for future analysis. While the process of first building the required queries can be a bit tedious, you only need to do it once, and thereafter can simply save, reload, and edit as needed for your next analysis task.

3.b. Loading filters

3.c. Editing filters

Filters can be edited after they have been made using the edit button to the right of the filter. Filters built using the "OP" mode will propagate edits through all queries that they form part of, while custom filters built in "CU" mode will not propagate edits. A common use for the editor would be to load a previously used filter and tweak the numbers to suit he new data set.

4. Getting help

Nanolyzer has a help page for filtering that covers the basics of building SQL queries, and contains a reference book for metadata that is available for filter queries. Accessing this page is done through the ? button indicated below.

This page contains basic usage guide, full reference to all columns accessible for plotting, visualization, and filtering, a reference for common SQL queries, as well as a saveable notepad for testing ideas.

4.c. Ask the experts

Building subsets can be a daunting task to start out. Don't worry, we're here to help. If you have a complex filtering task for which you need help, reach out and we will be happy to assist in constructing the filters you need. Our sincere hope is that you share with us your common filtering tasks so that we can package them into future versions of Nanolyzer to make analysis easier for everyone.

Analysis Table of Contents

Previous Topic: Clustering and substructure labeling

Next Topic: Capture rate analysis

Last edited: 2021-08-02