qufe.filehandler module
- class qufe.filehandler.FileHandler[source]
Bases:
objectA comprehensive file handling utility class providing various file operations, directory management, and data persistence functionality.
- static batch_copy_files(copy_tasks: list, verbose: bool = True) dict[source]
Execute multiple file copy tasks with different extensions or directories.
- Parameters:
- Returns:
Results for each task with statistics
- Return type:
Example
- tasks = [
- {
‘source_dir’: ‘/source_folder_a’, ‘dest_dir’: ‘/dest_folder/data_a’, ‘extension’: ‘.db’, ‘flatten’: True
}, {
‘source_dir’: ‘/source_folder_b’, ‘dest_dir’: ‘/dest_folder/data_b’, ‘extension’: ‘.csv’, ‘flatten’: True
}
]
results = FileHandler.batch_copy_files(tasks)
- for i, task in enumerate(tasks):
print(f”Task {i+1}: Copied {results[i][‘copied_count’]} {task[‘extension’]} files”)
- static copy_files_by_extension(source_dir: str, dest_dir: str, extension: str, flatten: bool = True, preserve_structure: bool = False, verbose: bool = True) tuple[source]
Copy all files with specific extension from source directory to destination.
- Parameters:
source_dir (str) – Source directory path to search files
dest_dir (str) – Destination directory path to copy files
extension (str) – File extension to search (e.g., ‘.db’, ‘db’, ‘*.db’)
flatten (bool) – If True, copy all files to dest_dir root without subdirectories
preserve_structure (bool) – If True, preserve source directory structure in destination
verbose (bool) – If True, print copy progress
- Returns:
- (copied_count, failed_files, copied_files)
copied_count (int): Number of successfully copied files
failed_files (list): List of tuples (source_path, error_message) for failed copies
copied_files (list): List of tuples (source_path, dest_path) for successful copies
- Return type:
Example
from qufe import filehandler as qfh
fh = qfh.FileHandler()
# Copy all files with the specified extension from the source folder to the destination folder source = ‘/source_folder’ dest = ‘/dest_folder/data’
- copied, failed, files = fh.copy_files_by_extension(
source_dir=source, dest_dir=dest, extension=’.db’, flatten=True
)
print(f”Successfully copied: {copied} files”) if failed:
print(f”Failed to copy: {len(failed)} files”)
- extract_iterable(itrb: Iterable, depth=0) list[source]
Flatten nested dictionaries or iterables with proper indentation.
- get_contents(base_path: str, print_tree: bool = False) dict[source]
Extract text file contents from directory structure.
- static get_datetime_from_date_pattern(pattern: str, filename: str) datetime[source]
Extract datetime from filename using a regex pattern.
- static get_int_from_timestamp_pattern(pattern: str, filename: str) int[source]
Extract integer timestamp from filename using a regex pattern.
- static get_latest_by_pattern(directory, pattern)[source]
Deprecated method. Use get_latest_file instead.
- static get_latest_file(directory, extract_fn, pattern, analysis: bool = False)[source]
Find the latest file in a directory based on a datetime/timestamp pattern.
- Parameters:
- Returns:
(latest_file_path, timestamp_latest, files)
- Return type:
- Example 1.:
from qufe import filehandler as qfh
f_path = ‘./temp/data/’ pattern = r’page_data_(d{10}).pickle’ extract_fn = qfh.FileHandler.get_int_from_timestamp_pattern
- (latest_file, timestamp_latest, files) = qfh.FileHandler.get_latest_file(
f_path, extract_fn, pattern)
print(latest_file)
- Example 2.:
- if,
pattern = r”Receipt_(d{4})_(d{2})_(d{2}).pickle”
- then,
Receipt_2024_10_15.pickle Receipt_2025_01_20.pickle Receipt_2025_03_25.pickle
2025_03_25 is the latest.
- static get_tree(folder_path: str, normalize: bool = False)[source]
Get all file paths from a directory recursively.
- static get_unique_filename(base_dir: Path, base_name: str, extension: str = '.csv') Path[source]
Generate unique filename in given directory to avoid conflicts.
- Parameters:
- Returns:
Unique file path
- Return type:
Path
Example
output_dir = Path(“output”) output_dir.mkdir(exist_ok=True)
- for (key, df) in container.items():
base_name = FileHandler.sanitize_filename(key) file_path = FileHandler.get_unique_filename(output_dir, base_name) df.to_csv(file_path, index=False, encoding=’utf-8-sig’)
- iterable_to_txt_file(itrb: Iterable, file_name: str, path: str = '') None[source]
Save iterable data to a text file.
- static list_to_txt_file(lines: list, file_name: str) None[source]
Deprecated method. Use iterable_to_txt_file instead.
- make_file_path(path: str, file_name: str) str[source]
Create full file path by joining directory and filename.
- static pickle_to_txt(input_pickle_name: str, output_txt_name: str)[source]
Deprecated method. Use iterable_to_txt_file instead.
- class qufe.filehandler.PathFinder(start_path='.')[source]
Bases:
objectInteractive directory exploration utility for step-by-step folder traversal. Useful when you don’t know the folder structure and want to explore gradually without overwhelming output from os.walk.
- get_one_depth(input_path: str = '') tuple[source]
Get directories and files at one depth level using os.scandir.