IO utilities

API for file IO functions that work independently of the app framework.

`atomic_write`

performs atomic write operations, cleans up if fails

`init(path, tmpdir=None, in_zip=None, mode='w', encoding=None)`

Parameters:

Name	Type	Description	Default
`path`	`PathType`	path to file, or relative to directory specified by in_zip	required
`tmpdir`	`PathType \| None`	directory where temporary file will be created	`None`
`in_zip`	`PathType \| bool \| None`	path to the zip archive containing path, e.g. if in_zip="path/to/data.zip", then path="data/seqs.tsv" Decompressing the archive will produce the "data/seqs.tsv"	`None`
`mode`	`str`	file writing mode	`'w'`
`encoding`	`str \| None`	text encoding	`None`

`close()`

closes file

`write(text)`

writes text to file

`get_format_suffixes(filename)`

returns file, compression suffixes

`is_url(path)`

whether a path is a url

`iter_line_blocks(path, num_lines=1000, chunk_size=5000000)`

yields list with num_lines str from path

Parameters:

Name	Type	Description	Default
`path`	`PathType`	data file	required
`num_lines`	`int \| None`	number of lines per block. If None just returns all lines.	`1000`
`chunk_size`	`int \| None`	number of bytes to load in one go from path	`5000000`

`iter_record_chunks(*, path, delimiter, chunk_size=5000000)`

yield bytes between successive occurrences of delimiter

Parameters:

Name	Type	Description	Default
`path`	`PathType`	data file. Accepts a path, URL, or any `PathType` and opens it via `open_(path, mode="rb")` so compressed formats are handled transparently. If `path` is a URL the stream is read in full (`chunk_size` is forced to `None`).	required
`delimiter`	`bytes`	bytes delimiter on which records are split. Must be non-empty.	required
`chunk_size`	`int \| None`	bytes read per iteration. If `None`, or if the on-disk file is smaller than `chunk_size`, the file is read in a single call.	`5000000`

Yields:

Type	Description
`bytes`	each item is the content between two successive delimiters. The first item is whatever precedes the first delimiter (often empty for files that start with a delimiter). The final item is whatever follows the last delimiter; callers filter as needed for their format.

Raises:

Type	Description
`ValueError`	if `delimiter` is empty.

Notes

Reads path in chunks of chunk_size bytes and splits on delimiter, holding any trailing partial record across chunk boundaries so that delimiters spanning a boundary are detected correctly. Peak memory is bounded by chunk_size plus the size of the largest record, rather than the full file size.

Operates on raw bytes only; callers that need text decoding should do so per yielded record.

Examples:

>>> import tempfile, pathlib
>>> with tempfile.NamedTemporaryFile(suffix=".bin", delete=False) as f:
...     _ = f.write(b">a\nAAA>b\nBBB>c\nCCC")
...     tmp = pathlib.Path(f.name)
>>> list(iter_record_chunks(path=tmp, delimiter=b">", chunk_size=8))
[b'', b'a\nAAA', b'b\nBBB', b'c\nCCC']
>>> tmp.unlink()

`iter_splitlines(path, chunk_size=1000000)`

yields line from file

Parameters:

Name	Type	Description	Default
`path`	`PathType`	data file	required
`chunk_size`	`int \| None`	number of bytes to load in one go from path	`1000000`

Notes

Loads chunks of data from the file, yields one line at a time

`open_(filename, mode='rt', **kwargs)`

open that handles different compression

Parameters:

Name	Type	Description	Default
`filename`	`PathType`	path or url, if a url delegates processing to open_url	required
`mode`	`str`	standard file opening mode	`'rt'`
`kwargs`	`Any`	passed to open functions	`{}`

Returns:

Type	Description
`an object compatible with the file protocol`

`open_url(url, mode='rt', **kwargs)`

open a url

Parameters:

Name	Type	Description	Default
`url`	`str \| ParseResult`	A url of file in http or https web address	required
`mode`	`str`	mode of reading file, 'rb', 'rt', 'r'	`'rt'`

Raises:

Type	Description
`Rasies IOError if mode is write or it's not a url.`

Returns:

Type	Description
`file object which reads binary if "b" in mode, else text.`

`open_zip(filename, mode='r', **kwargs)`

open a single member zip-compressed file

Note

If mode="r". The function raises ValueError if zip has > 1 record. The returned object is wrapped by TextIOWrapper with latin encoding (so it's not a bytes string).

If mode="w", returns an atomic_write() instance.

`path_exists(path)`

whether path is a valid path and it exists

IO utilities

atomic_write

__init__(path, tmpdir=None, in_zip=None, mode='w', encoding=None)

close()

write(text)

get_format_suffixes(filename)

is_url(path)

iter_line_blocks(path, num_lines=1000, chunk_size=5000000)

iter_record_chunks(*, path, delimiter, chunk_size=5000000)

iter_splitlines(path, chunk_size=1000000)

open_(filename, mode='rt', **kwargs)

open_url(url, mode='rt', **kwargs)

open_zip(filename, mode='r', **kwargs)

path_exists(path)

`atomic_write`

`init(path, tmpdir=None, in_zip=None, mode='w', encoding=None)`

`close()`

`write(text)`

`get_format_suffixes(filename)`

`is_url(path)`

`iter_line_blocks(path, num_lines=1000, chunk_size=5000000)`

`iter_record_chunks(*, path, delimiter, chunk_size=5000000)`

`iter_splitlines(path, chunk_size=1000000)`

`open_(filename, mode='rt', **kwargs)`

`open_url(url, mode='rt', **kwargs)`

`open_zip(filename, mode='r', **kwargs)`

`path_exists(path)`