Skip to content
scinexus logo scinexus logo

About

Just as attrs and dataclasses use type hints to simplify data type definition, scinexus uses them to simplify writing best-practice scientific algorithms.

scinexus (pronounced 'sigh-nexus') is a Python framework for rapid development of data processing applications. It enables interoperability between apps through defined data types, allowing development of scientific domain app ecosystems (for examples see cogent3 and piqtree).

Many scientific problems require repeating calculations across many files or database records. Such tasks suit data-level parallelism on multi-core CPUs, but writing robust, maintainable code for them is often tedious and quickly becomes complex.

With scinexus apps, you can use a functional programming style when developing your application. Combined with scinexus app composition, this greatly simplifies your programming logic making it easier to understand and thus easier to explain. And as we know

Quote

If the implementation is easy to explain, it may be a good idea.

-- Tim Peters, "Zen of Python"

What you get

  • Type checking at composition time
  • Durable computing1
  • Greatly simplified data level parallel execution
  • Automated logging
  • Automated citation tracking
  • Checkpointing via data stores
  • Customisable experience (progress bars2, parallelisation3, data store representations etc..)

Standalone utilities

scinexus also provides generally useful utilities for developers of data analysis applications. Utilities for file IO, parallel execution, and progress tracking are usable independently of the app framework.

Get started

The scinexus origin story

The app infrastructure code was originally developed within cogent3, where it accumulated over seven years of development, testing, and real-world use in computational genomics before being extracted into scinexus. The design is mature and has underpinned analyses in published studies.

We acknowledge here that many members of the cogent3 community contributed to the code that now lives here, including @GavinHuttley, @rmcar17, @Nick-Foto, @KatherineCaley, @fredjaya, and @khiron.


  1. Failures are automatically recorded as NotCompleted records which get propagated and stored in data stores. These records record salient details that help you identify the cause of the failure. 

  2. tqdm is the default because of its robustness in notebooks, but you can choose rich

  3. The default is Python’s standard library multiprocessing module. If you're using Jupyter Notebooks, however, it's recommended that you use loky. This is an installation option and configuration is easy