Log and cite
How to use scitrack logging in apps, control logging in apply_to, and access citation records from composed pipelines.
Leveraging scitrack for reproducibility
We reproduce here one of the examples from scitrack.
Using scitrack in a click app
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
A single statement and you have captured all the input arguments and their values, including defaults!
- This captures the version numbers of the packages our application depends on.
- This logs the path to
infileand its md5sum. - Until you assign the path where you want the file written, this content has been cached.
Controlling logging in apply_to
By default, apply_to creates a CachingLogger that records the composable function, package versions, output paths, MD5 checksums of every result, and total elapsed time. The log is then written into the output data store. This is the recommended setting for production analyses because it gives you a complete, self-contained record of what ran and what it produced.
result = process.apply_to(dstore) # logger=True by default
You can also pass your own CachingLogger instance if you want to configure it beforehand or reuse one across multiple calls.
from scitrack import CachingLogger
LOGGER = CachingLogger()
LOGGER.log_args()
result = process.apply_to(dstore, logger=LOGGER)
Disabling logging
Set logger=False to skip logging entirely.
result = process.apply_to(dstore, logger=False)
This is useful when:
- Your project is small and a full provenance log is unnecessary.
- Logging is handled externally, for example by a workflow manager or your own
CachingLoggerthat wraps severalapply_tocalls. - You want to avoid the overhead of computing an MD5 checksum for every result object, which can be noticeable for large or numerous outputs.
Make it easy for your work to be cited
Correctly attributing the authors of algorithms and software is a requirement of good scientific practice. scinexus makes this easy by letting app authors declare citations that are automatically tracked through composed pipelines.
Use the cite parameter of define_app (or the base classes) to attach a citation. The citeable library provides several classes for this purpose.
Adding a citation to your app
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | |
- Use the
citeparameter ofdefine_appto attach a citation - The
.citationsproperty returns citations as a tuple. When apps are composed into a pipeline,.citationscollects unique citations from all apps in the chain. - The
.bibgives the BibTeX string.
Extracting citations from a data store
When a composed pipeline is run via apply_to(), citations are automatically saved in the output data store.
Citations in data stores
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
- Because we are using
cogent3, the property returns acogent3Tableof all citations stored in the data store. - You can export to a BibTeX file.
Note
ReadOnlyDataStoreZipped supports reading stored citations but not writing them.