qufe.filehandler module

class qufe.filehandler.FileHandler[source]

Bases: object

A comprehensive file handling utility class providing various file operations, directory management, and data persistence functionality.

__init__()[source]
static batch_copy_files(copy_tasks: list, verbose: bool = True) dict[source]

Execute multiple file copy tasks with different extensions or directories.

Parameters:
  • copy_tasks (list) – List of dictionaries with copy task parameters Each dict should have: source_dir, dest_dir, extension, and optional flatten/preserve_structure

  • verbose (bool) – If True, print progress for each task

Returns:

Results for each task with statistics

Return type:

dict

Example

tasks = [
{

‘source_dir’: ‘/source_folder_a’, ‘dest_dir’: ‘/dest_folder/data_a’, ‘extension’: ‘.db’, ‘flatten’: True

}, {

‘source_dir’: ‘/source_folder_b’, ‘dest_dir’: ‘/dest_folder/data_b’, ‘extension’: ‘.csv’, ‘flatten’: True

}

]

results = FileHandler.batch_copy_files(tasks)

for i, task in enumerate(tasks):

print(f”Task {i+1}: Copied {results[i][‘copied_count’]} {task[‘extension’]} files”)

build_tree(path)[source]

Build a nested dictionary representation of directory structure.

Parameters:

path (str) – Directory path to build tree from

Returns:

Nested structure representation

Return type:

list

static copy_files_by_extension(source_dir: str, dest_dir: str, extension: str, flatten: bool = True, preserve_structure: bool = False, verbose: bool = True) tuple[source]

Copy all files with specific extension from source directory to destination.

Parameters:
  • source_dir (str) – Source directory path to search files

  • dest_dir (str) – Destination directory path to copy files

  • extension (str) – File extension to search (e.g., ‘.db’, ‘db’, ‘*.db’)

  • flatten (bool) – If True, copy all files to dest_dir root without subdirectories

  • preserve_structure (bool) – If True, preserve source directory structure in destination

  • verbose (bool) – If True, print copy progress

Returns:

(copied_count, failed_files, copied_files)
  • copied_count (int): Number of successfully copied files

  • failed_files (list): List of tuples (source_path, error_message) for failed copies

  • copied_files (list): List of tuples (source_path, dest_path) for successful copies

Return type:

tuple

Example

from qufe import filehandler as qfh

fh = qfh.FileHandler()

# Copy all files with the specified extension from the source folder to the destination folder source = ‘/source_folder’ dest = ‘/dest_folder/data’

copied, failed, files = fh.copy_files_by_extension(

source_dir=source, dest_dir=dest, extension=’.db’, flatten=True

)

print(f”Successfully copied: {copied} files”) if failed:

print(f”Failed to copy: {len(failed)} files”)

extract_iterable(itrb: Iterable, depth=0) list[source]

Flatten nested dictionaries or iterables with proper indentation.

Parameters:
  • itrb (Iterable) – The iterable to flatten

  • depth (int) – Current indentation depth

Returns:

Flattened representation with indentation

Return type:

list

get_contents(base_path: str, print_tree: bool = False) dict[source]

Extract text file contents from directory structure.

Parameters:
  • base_path (str) – Base directory path

  • print_tree (bool) – Whether to print the directory tree

Returns:

Dictionary containing file contents

Return type:

dict

static get_datetime_from_date_pattern(pattern: str, filename: str) datetime[source]

Extract datetime from filename using a regex pattern.

Parameters:
  • pattern (str) – Regex pattern to match datetime parts

  • filename (str) – Filename to extract datetime from

Returns:

Extracted datetime object or None if no match

Return type:

datetime

static get_file_name(folder_path)[source]

Get all file names from a directory recursively.

Parameters:

folder_path (str) – Path to the directory to search

Returns:

List of file names found in the directory

Return type:

list

static get_int_from_timestamp_pattern(pattern: str, filename: str) int[source]

Extract integer timestamp from filename using a regex pattern.

Parameters:
  • pattern (str) – Regex pattern to match timestamp

  • filename (str) – Filename to extract timestamp from

Returns:

Extracted timestamp or None if no match

Return type:

int

static get_latest_by_pattern(directory, pattern)[source]

Deprecated method. Use get_latest_file instead.

static get_latest_file(directory, extract_fn, pattern, analysis: bool = False)[source]

Find the latest file in a directory based on a datetime/timestamp pattern.

Parameters:
  • directory (str) – Directory path to search

  • extract_fn (Callable) – Function to extract datetime/timestamp from filename

  • pattern (str) – Regex pattern for filename matching

  • analysis (bool) – Whether to print analysis information

Returns:

(latest_file_path, timestamp_latest, files)

Return type:

tuple

Example 1.:

from qufe import filehandler as qfh

f_path = ‘./temp/data/’ pattern = r’page_data_(d{10}).pickle’ extract_fn = qfh.FileHandler.get_int_from_timestamp_pattern

(latest_file, timestamp_latest, files) = qfh.FileHandler.get_latest_file(

f_path, extract_fn, pattern)

print(latest_file)

Example 2.:
if,

pattern = r”Receipt_(d{4})_(d{2})_(d{2}).pickle”

then,

Receipt_2024_10_15.pickle Receipt_2025_01_20.pickle Receipt_2025_03_25.pickle

2025_03_25 is the latest.

static get_tree(folder_path: str, normalize: bool = False)[source]

Get all file paths from a directory recursively.

Parameters:
  • folder_path (str) – Path to the directory to search

  • normalize (bool) – Whether to apply Unicode normalization

Returns:

List of full file paths found in the directory

Return type:

list

static get_unique_filename(base_dir: Path, base_name: str, extension: str = '.csv') Path[source]

Generate unique filename in given directory to avoid conflicts.

Parameters:
  • base_dir (Path) – Base directory path

  • base_name (str) – Base filename without extension

  • extension (str) – File extension

Returns:

Unique file path

Return type:

Path

Example

output_dir = Path(“output”) output_dir.mkdir(exist_ok=True)

for (key, df) in container.items():

base_name = FileHandler.sanitize_filename(key) file_path = FileHandler.get_unique_filename(output_dir, base_name) df.to_csv(file_path, index=False, encoding=’utf-8-sig’)

iterable_to_txt_file(itrb: Iterable, file_name: str, path: str = '') None[source]

Save iterable data to a text file.

Parameters:
  • itrb (Iterable) – Data to save

  • file_name (str) – Output file name

  • path (str) – Output directory path

static list_to_txt_file(lines: list, file_name: str) None[source]

Deprecated method. Use iterable_to_txt_file instead.

static load_pickle(pkl, rb: bool = True)[source]

Load data from a pickle file.

Parameters:
  • pkl (str) – Path to pickle file

  • rb (bool) – Whether to open in binary mode

Returns:

Loaded data from pickle file

Return type:

object

make_file_path(path: str, file_name: str) str[source]

Create full file path by joining directory and filename.

Parameters:
  • path (str) – Directory path

  • file_name (str) – File name

Returns:

Full file path

Return type:

str

make_path(path: str) str[source]

Create directory if it doesn’t exist.

Parameters:

path (str) – Path to create

Returns:

Created path

Return type:

str

pickle_temp_data(data, file_name: str, path: str = '') None[source]

Save data to a pickle file.

Parameters:
  • data – Data to save

  • file_name (str) – Output file name

  • path (str) – Output directory path

static pickle_to_txt(input_pickle_name: str, output_txt_name: str)[source]

Deprecated method. Use iterable_to_txt_file instead.

static sanitize_filename(name: str, replacement: str = '_') str[source]

Sanitize filename by removing invalid characters.

Parameters:
  • name (str) – Original filename

  • replacement (str) – Character to replace invalid characters with

Returns:

Sanitized filename

Return type:

str

tree_to_dict(start_path)[source]

Convert directory tree to dictionary format.

Parameters:

start_path (str) – Starting directory path

Returns:

Dictionary representation of directory tree

Return type:

dict

class qufe.filehandler.PathFinder(start_path='.')[source]

Bases: object

Interactive directory exploration utility for step-by-step folder traversal. Useful when you don’t know the folder structure and want to explore gradually without overwhelming output from os.walk.

__init__(start_path='.')[source]
get_one_depth(input_path: str = '') tuple[source]

Get directories and files at one depth level using os.scandir.

Parameters:

input_path (str) – Path to scan (uses current_path if empty)

Returns:

(path, directories, files)

Return type:

tuple

go_up_n_level(n_level: int = 1, set_current: bool = True)[source]

Navigate up directory levels.

Parameters:
  • n_level (int) – Number of levels to go up

  • set_current (bool) – Whether to update current_path or just return new path

Returns:

New path if set_current is False

Return type:

str

static print_each(label: str, items: list) None[source]

Print list items with numbering and formatting.

Parameters:
  • label (str) – Label for the items

  • items (list) – Items to print

print_result(result: tuple) None[source]

Print formatted result from get_one_depth.

Parameters:

result (tuple) – Result tuple from get_one_depth