References
API references
cli
Command Line Interface for py-file-attributes.
main()
Run the CLI application.
Source code in src/file_attributes/cli.py
str2bool(v)
Convert string to boolean.
Parameters
v : str The string to convert.
Returns
bool The converted boolean value.
Raises
argparse.ArgumentTypeError If the string cannot be converted to a boolean.
Source code in src/file_attributes/cli.py
utils
Utils using FileAttributes to retrieve data from cloud storage.
FileRecallManager
Context Manager to recall a cloud-stored file to local storage.
A context manager that checks the attributes of the passed filepath and, if the file is not available on the local hard-drive, performs a data access request to trigger it's download.
As long as the data access request is not completed successfully (no OSError), there is a retry policy managed by RETRY_MAX and RETRY_DELAY.
The returned object is simply the filepath to not disrupt other classes.
See Also
FileAttributes
Attributes
filename : Path The file we are accessing fileattributes : FileAttributes Class that retrieves all FileAttributes from the OS. Only works with Windows environment.
Examples
import pandas as pd
with FileRecallManager(test.xlsx) as f: display(pd.read_excel(f, engine="calamine"))
Source code in src/file_attributes/utils.py
download_offline_file(file, RETRY_MAX=5, RETRY_DELAY=10, READ_MODE='r+b')
Trigger download from cloud storage for a single file.
Source code in src/file_attributes/utils.py
download_offline_files_parallel(to_download, RETRY_MAX=5, RETRY_DELAY=10, READ_MODE='r+b', max_workers=4)
Trigger download from cloud storage for all provided files in parallel.
Parameters
to_download : list[str | Path] List of files to ensure are available on HDD. RETRY_MAX : int, optional Amount of times to try and trigger the download, by default 5. RETRY_DELAY : int, optional Amount of time to wait between two tries, by default 10 seconds. READ_MODE : str, optional Read mode to be used by open(file, READ_MODE) to trigger the data access, by default "r+b". max_workers : int, optional Maximum number of threads to use for parallel processing, by default 4.
Raises
OSError If FileAttributes.in_cloud does not shift to False (= Available on HDD) after the amount of tries is larger than RETRY_MAX, then fail.
Source code in src/file_attributes/utils.py
download_offline_files_sequential(to_download, RETRY_MAX=5, RETRY_DELAY=10, READ_MODE='r+b')
Trigger download from cloud storage for all provided files.
Parameters
to_download : list[str | Path] List of files to ensure are available on HDD RETRY_MAX : int, by default = 5 Amount of times to try and trigger the download RETRY_DELAY : int, by default = 10 Amount of time to wait between two tries READ_MODE : str, by default = r+b Read mode to be used by open(file, READ_MODE) to trigger the data_access It should not matter, but just in case.
See Also
FileAttributes
Raises
OSError If FileAttributes.in_cloud does not shift to False (= Available on HDD) after amount of tries is larger then RETRY_MAX, then fail.