qufe.dbhandler module

Database handlers for PostgreSQL and SQLite with unified interface.

This module provides handlers for both PostgreSQL and SQLite databases with a common base class for consistent usage patterns.

Required dependencies:

For PostgreSQL: pip install qufe[database]
For SQLite: No additional dependencies (uses standard library)

class qufe.dbhandler.BaseDBHandler[source]

Bases: ABC

Abstract base class for database handlers.

Provides common interface and shared functionality for different database implementations.

__enter__()[source]: Context manager entry.

__exit__(exc_type, exc_val, exc_tb)[source]: Context manager exit.

__init__()[source]: Initialize base handler.

abstractmethod connect() → None[source]: Establish database connection.

abstractmethod disconnect() → None[source]: Close database connection.

execute_many(query: str, params_list: List[tuple]) → None[source]

Execute same query multiple times with different parameters.

Parameters:

query – SQL query string with placeholders
params_list – List of parameter tuples

abstractmethod execute_query(query: str, params: tuple | None = None) → List[source]

Execute a query and return results.

Parameters:

query – SQL query string
params – Query parameters for safe execution

Returns:

List of query results

table_exists(table_name: str) → bool[source]

Check if a table exists in the database.

Parameters:: table_name – Name of the table to check
Returns:: True if table exists, False otherwise

transaction()[source]

Context manager for database transactions.

Automatically commits on success, rolls back on error.

Bases: BaseDBHandler

PostgreSQL connection handler with automatic environment variable support.

Extends BaseDBHandler for PostgreSQL-specific functionality.

Initialize PostgreSQL connection handler.

Parameters:

db_name – Database name (defaults to POSTGRES_DB env var)
user – Username (defaults to POSTGRES_USER env var)
password – Password (defaults to POSTGRES_PASSWORD env var)
host – Host address (defaults to POSTGRES_HOST or ‘localhost’)
port – Port number (defaults to POSTGRES_PORT or 5432)

connect() → None[source]: Establish PostgreSQL connection.

disconnect() → None[source]: Close PostgreSQL connection.

execute_query(query: str, params: tuple | None = None) → List[source]

Execute a SQL query and return results.

Parameters:

query – SQL query string
params – Query parameters (not used in current implementation)

Returns:

List of query results for SELECT queries, empty list for DDL/DML

get_database_list(print_result: bool = False) → List[str][source]

Get list of all databases in the PostgreSQL server.

Parameters:: print_result – Whether to print the database list
Returns:: List of database names

get_table_list(print_result: bool = True) → List[Dict[str, str]][source]

Get list of all tables and views in the current database.

Parameters:: print_result – Whether to print the table list
Returns:: List of dictionaries containing table information

class qufe.dbhandler.SQLiteHandler(db_path: str | Path, create_dir: bool = False)[source]

Bases: BaseDBHandler

SQLite database handler for local file-based databases.

Provides convenient methods for common SQLite operations in Jupyter notebook environments.

__init__(db_path: str | Path, create_dir: bool = False)[source]

Initialize SQLite handler.

Parameters:

db_path – Path to SQLite database file
create_dir – Whether to create parent directory if missing

classmethod analyze_scan_results(results: Dict[str, Any], top_n: int = 10, sort_by: str = 'rows', show_unmatched: bool = False) → None[source]

Analyze and display detailed scan results.

Parameters:

results – Results from scan_databases
top_n – Number of top files to show
sort_by – Sort criterion (‘rows’, ‘tables’, ‘size’, ‘match_rate’)
show_unmatched – Whether to show unmatched tables (default: False)

connect() → None[source]: Establish SQLite connection.

create_table(table_name: str, columns: Dict[str, str]) → None[source]

Create a table with specified columns.

Parameters:

table_name – Name of the table to create
columns – Dictionary of column_name: data_type

classmethod describe(db_path: str) → None[source]

Show complete database structure.

Parameters:: db_path – Path to SQLite database

disconnect() → None[source]: Close SQLite connection.

execute_query(query: str, params: tuple | None = None) → List[source]

Execute a SQL query and return results.

Parameters:

query – SQL query string
params – Query parameters for safe execution

Returns:

List of query results for SELECT queries, empty list for DDL/DML

classmethod export_scan_results(results: Dict[str, Any], output_path: str, format: str = 'csv') → None[source]

Export scan results to file.

Parameters:

results – Results from scan_databases
output_path – Output file path
format – Export format (‘csv’, ‘json’, ‘pickle’)

get_table_info(table_name: str) → List[Dict[str, Any]][source]

Get column information for a table with multiple fallback methods.

Tries multiple approaches in order: 1. PRAGMA table_info (standard SQLite metadata) 2. Parse CREATE TABLE statement from sqlite_master 3. Infer from sample data using cursor.description

Parameters:

table_name – Name of the table

Returns:

cid: Column ID (position)
name: Column name
type: Data type (may be ‘UNKNOWN’ if inferred)
notnull: Whether NOT NULL constraint exists (0 or 1)
dflt_value: Default value
pk: Whether primary key (0 or 1)

Return type:

List of column information dictionaries with keys

get_table_schema(table_name: str, verbose: bool = False) → Dict[str, Any][source]

Get comprehensive table schema information.

Provides detailed information about table structure including the method used to obtain the information.

Parameters:

table_name – Name of the table
verbose – Whether to include additional diagnostic info

Returns:

columns: List of column information
method: Method used to obtain info (‘pragma’, ‘schema’, ‘inferred’)
accuracy: Confidence level (‘high’, ‘medium’, ‘low’)
warnings: List of any warnings

Return type:

Dictionary containing

get_tables() → List[str][source]

Get list of all tables in the database.

Returns:: List of table names

insert_data(table_name: str, data: Dict[str, Any]) → None[source]

Insert a single row of data into a table.

Parameters:

table_name – Name of the table
data – Dictionary of column_name: value

classmethod quick_peek(db_path: str, max_rows: int = 5) → None[source]

Quick preview of database contents without creating instance.

Parameters:

db_path – Path to SQLite database
max_rows – Maximum rows to show per table

classmethod quick_scan(folder_path: str, **kwargs) → Dict[str, Any][source]

Convenience method for quick database scanning.

Automatically detects Jupyter environment and enables appropriate display mode for the best user experience.

Parameters:

folder_path – Directory path to scan
**kwargs – Additional arguments for scan_databases

Returns:

Scan results dictionary

Example

>>> # Quick scan with automatic environment detection
>>> results = SQLiteHandler.quick_scan('/data/databases')

>>> # Quick scan with specific columns
>>> results = SQLiteHandler.quick_scan(
...     '/data',
...     required_columns=['column_a', 'column_b']
... )

classmethod read_all_dbs(folder_path: str, table_name: str, pattern: str = '*.db') → Dict[str, List][source]

Read same table from multiple database files in a folder.

Parameters:

folder_path – Path to folder containing .db files
table_name – Name of table to read from each database
pattern – File pattern to match (default: “*.db”)

Returns:

Dictionary mapping database filename to data

classmethod read_by_date(db_path: str, table_name: str, date_column: str = 'timestamp', start: str | None = None, end: str | None = None) → List[source]

Read data filtered by date range.

Parameters:

db_path – Path to SQLite database
table_name – Name of the table
date_column – Name of the date column
start – Start date (inclusive)
end – End date (inclusive)

Returns:

List of query results

read_table(table_name: str, columns: str = '*', where: str | None = None, limit: int | None = None) → List[source]

Read data from a table with optional filtering.

Parameters:

table_name – Name of the table
columns – Columns to select (default: ‘*’)
where – WHERE clause (without ‘WHERE’ keyword)
limit – Maximum number of rows to return

Returns:

List of query results

classmethod scan_databases(folder_path: str, required_columns: List[str] | None = None, pattern: str = '**/*.db', verbose: bool = True, jupyter_mode: bool = False) → Dict[str, Any][source]

Scan all SQLite databases in a folder for tables with specific columns.

Efficiently scans multiple database files to find tables containing specified columns and counts their rows. Useful for data exploration and migration planning.

Parameters:

folder_path – Directory path to scan for database files
required_columns – List of column names that tables must have (None means count all tables)
pattern – Glob pattern for finding database files (default: ‘**/*.db’)
verbose – Display progress and statistics (default: True)
jupyter_mode – Enable Jupyter notebook display mode (default: False)

Returns:

total_rows: Total number of rows across all matching tables
matching_tables: Count of tables with required columns
total_tables_scanned: Total number of tables examined
processed_files: Number of successfully processed files
error_files: Number of files with errors
details: List of detailed information per file
errors: List of error messages
summary: Processing summary statistics

Return type:

Dictionary containing

Example

>>> # Basic usage - find tables with specific columns
>>> results = SQLiteHandler.scan_databases(
...     folder_path='/data/databases',
...     required_columns=['Column_A', 'Column_B'],
...     verbose=True
... )

>>> # Scan all tables in all databases
>>> results = SQLiteHandler.scan_databases(
...     folder_path='/data',
...     pattern='**/*.sqlite',
...     required_columns=None
... )

>>> # Jupyter notebook with progress tracking
>>> results = SQLiteHandler.scan_databases(
...     folder_path='/data',
...     required_columns=['user_id', 'created_at'],
...     jupyter_mode=True
... )

table_exists(table_name: str) → bool[source]

Check if a table exists.

Parameters:: table_name – Name of the table
Returns:: True if table exists

classmethod to_dataframe(db_path: str, table_name: str, where: str | None = None) → Any | None[source]

Read table directly into pandas DataFrame.

Parameters:

db_path – Path to SQLite database
table_name – Name of the table
where – Optional WHERE clause

Returns:

pandas DataFrame or None if pandas not available

qufe.dbhandler.help()[source]

Display help information for database handler functionality.

Shows usage examples and available handlers.