Use data stores
How to use open_data_store in read, write, and append modes with directory, zip, and SQLite backends, iterate over members, and inspect .completed, .not_completed, and .summary_<methods>.
How do I use a data store?
A data store is just a "container". To open a data store you use the open_data_store() function. To load the data for a member of a data store you need an appropriately selected loader type of app.
Supported operations on a data store
All data store classes can be iterated over, indexed, checked for membership. These operations return a DataMember object. In addition to providing access to members, the data store classes have convenience methods for describing their contents and providing summaries of log files that are included and of the NotCompleted members (see not completed).
Opening a data store
Use the open_data_store() function, illustrated below. Use the mode argument to identify whether to open as read only (mode="r"), write (mode="w") or append(mode="a").
Opening a read only data store
We open the zipped directory described above, defining the filenames ending in .fa as the data store members. All files within the directory become members of the data store (unless we use the limit argument).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
- Open a data store.
- The
.describeproperty summarises the contents. - You can index like any Python sequence.
- Or loop over members.
- And read data from a member.
Note
For a DataStoreSqlite member, the default data storage format is bytes. So reading the content of an individual record is best done using the load_db app.
Making a writeable data store
The creation of a writeable data store is specified with mode="w", or (to append) mode="a". In the former case, any existing records are overwritten. In the latter case, existing records are ignored.
DataStoreSqlite stores serialised data
When you specify a Sqlitedb data store as your output (by using open_data_store()) you write multiple records into a single file making distribution easier.
One important issue to note is the process which creates a Sqlitedb "locks" the file. If that process exits unnaturally (e.g. the run that was producing it was interrupted) then the file may remain in a locked state. If the db is in this state, scinexus will not modify it unless you explicitly unlock it.
This is represented in the display as shown below.
1 | |
To unlock, you execute the following:
dstore.unlock(force=True)
Interrogating run logs
If you use the apply_to() method, a scitrack logfile will be stored in the data store. This includes useful information regarding the run conditions that produced the contents of the data store.
1 2 3 4 | |
Log files can be accessed via a special attribute.
1 2 3 | |
Each element in that list is a DataMember which you can use to get the data contents. The following
print(dstore.logs[0].read()[:225])
Produces
1 2 3 4 | |
Citations – giving credit to package developers
When apps declare citations, those citations are automatically saved alongside your results when you use apply_to().
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
The summary_citations property returns a table of all citations stored in the data store (line 24). Export to BibTeX with write_bib() (line 26).
Note
ReadOnlyDataStoreZipped supports reading stored citations but not writing them.