About
Just as attrs and dataclasses use type hints to simplify data type definition, scinexus uses them to simplify writing best-practice scientific algorithms.
scinexus (pronounced 'sigh-nexus') is a Python framework for rapid development of data processing applications. It enables interoperability between apps through defined data types, allowing development of scientific domain app ecosystems (for examples see cogent3 and piqtree).
Many scientific problems require repeating calculations across many files or database records. Such tasks suit data-level parallelism on multi-core CPUs, but writing robust, maintainable code for them is often tedious and quickly becomes complex.
With scinexus apps, you can use a functional programming style when developing your application. Combined with scinexus app composition, this greatly simplifies your programming logic making it easier to understand and thus easier to explain. And as we know
Quote
If the implementation is easy to explain, it may be a good idea.
-- Tim Peters, "Zen of Python"
What you get
- Type checking at composition time
- Durable computing1
- Greatly simplified data level parallel execution
- Automated logging
- Automated citation tracking
- Checkpointing via data stores
- Customisable experience (progress bars2, parallelisation3, data store representations etc..)
Standalone utilities
scinexus also provides generally useful utilities for developers of data analysis applications. Utilities for file IO, parallel execution, and progress tracking are usable independently of the app framework.
Get started
- Install
scinexus-- see Installing from PyPI - Build algorithms -- see How to write apps
- Build applications for others -- see Why composable apps?
- Use existing apps -- see Composing apps
The scinexus origin story
The app infrastructure code was originally developed within cogent3, where it accumulated over seven years of development, testing, and real-world use in computational genomics before being extracted into scinexus. The design is mature and has underpinned analyses in published studies.
We acknowledge here that many members of the cogent3 community contributed to the code that now lives here, including @GavinHuttley, @rmcar17, @Nick-Foto, @KatherineCaley, @fredjaya, and @khiron.
-
Failures are automatically recorded as
NotCompletedrecords which get propagated and stored in data stores. These records record salient details that help you identify the cause of the failure. ↩ -
tqdmis the default because of its robustness in notebooks, but you can chooserich. ↩ -
The default is Python’s standard library
multiprocessingmodule. If you're using Jupyter Notebooks, however, it's recommended that you useloky. This is an installation option and configuration is easy. ↩