qufe.dbhandler module
Database handlers for PostgreSQL and SQLite with unified interface.
This module provides handlers for both PostgreSQL and SQLite databases with a common base class for consistent usage patterns.
- Required dependencies:
For PostgreSQL: pip install qufe[database]
For SQLite: No additional dependencies (uses standard library)
- class qufe.dbhandler.BaseDBHandler[source]
Bases:
ABCAbstract base class for database handlers.
Provides common interface and shared functionality for different database implementations.
- execute_many(query: str, params_list: List[tuple]) None[source]
Execute same query multiple times with different parameters.
- Parameters:
query – SQL query string with placeholders
params_list – List of parameter tuples
- abstractmethod execute_query(query: str, params: tuple | None = None) List[source]
Execute a query and return results.
- Parameters:
query – SQL query string
params – Query parameters for safe execution
- Returns:
List of query results
- class qufe.dbhandler.PostgreSQLHandler(db_name: str | None = None, user: str | None = None, password: str | None = None, host: str | None = None, port: int | None = None)[source]
Bases:
BaseDBHandlerPostgreSQL connection handler with automatic environment variable support.
Extends BaseDBHandler for PostgreSQL-specific functionality.
- __init__(db_name: str | None = None, user: str | None = None, password: str | None = None, host: str | None = None, port: int | None = None)[source]
Initialize PostgreSQL connection handler.
- Parameters:
db_name – Database name (defaults to POSTGRES_DB env var)
user – Username (defaults to POSTGRES_USER env var)
password – Password (defaults to POSTGRES_PASSWORD env var)
host – Host address (defaults to POSTGRES_HOST or ‘localhost’)
port – Port number (defaults to POSTGRES_PORT or 5432)
- execute_query(query: str, params: tuple | None = None) List[source]
Execute a SQL query and return results.
- Parameters:
query – SQL query string
params – Query parameters (not used in current implementation)
- Returns:
List of query results for SELECT queries, empty list for DDL/DML
- class qufe.dbhandler.SQLiteHandler(db_path: str | Path, create_dir: bool = False)[source]
Bases:
BaseDBHandlerSQLite database handler for local file-based databases.
Provides convenient methods for common SQLite operations in Jupyter notebook environments.
- __init__(db_path: str | Path, create_dir: bool = False)[source]
Initialize SQLite handler.
- Parameters:
db_path – Path to SQLite database file
create_dir – Whether to create parent directory if missing
- classmethod analyze_scan_results(results: Dict[str, Any], top_n: int = 10, sort_by: str = 'rows', show_unmatched: bool = False) None[source]
Analyze and display detailed scan results.
- Parameters:
results – Results from scan_databases
top_n – Number of top files to show
sort_by – Sort criterion (‘rows’, ‘tables’, ‘size’, ‘match_rate’)
show_unmatched – Whether to show unmatched tables (default: False)
- create_table(table_name: str, columns: Dict[str, str]) None[source]
Create a table with specified columns.
- Parameters:
table_name – Name of the table to create
columns – Dictionary of column_name: data_type
- classmethod describe(db_path: str) None[source]
Show complete database structure.
- Parameters:
db_path – Path to SQLite database
- execute_query(query: str, params: tuple | None = None) List[source]
Execute a SQL query and return results.
- Parameters:
query – SQL query string
params – Query parameters for safe execution
- Returns:
List of query results for SELECT queries, empty list for DDL/DML
- classmethod export_scan_results(results: Dict[str, Any], output_path: str, format: str = 'csv') None[source]
Export scan results to file.
- Parameters:
results – Results from scan_databases
output_path – Output file path
format – Export format (‘csv’, ‘json’, ‘pickle’)
- get_table_info(table_name: str) List[Dict[str, Any]][source]
Get column information for a table with multiple fallback methods.
Tries multiple approaches in order: 1. PRAGMA table_info (standard SQLite metadata) 2. Parse CREATE TABLE statement from sqlite_master 3. Infer from sample data using cursor.description
- Parameters:
table_name – Name of the table
- Returns:
cid: Column ID (position)
name: Column name
type: Data type (may be ‘UNKNOWN’ if inferred)
notnull: Whether NOT NULL constraint exists (0 or 1)
dflt_value: Default value
pk: Whether primary key (0 or 1)
- Return type:
List of column information dictionaries with keys
- get_table_schema(table_name: str, verbose: bool = False) Dict[str, Any][source]
Get comprehensive table schema information.
Provides detailed information about table structure including the method used to obtain the information.
- Parameters:
table_name – Name of the table
verbose – Whether to include additional diagnostic info
- Returns:
columns: List of column information
method: Method used to obtain info (‘pragma’, ‘schema’, ‘inferred’)
accuracy: Confidence level (‘high’, ‘medium’, ‘low’)
warnings: List of any warnings
- Return type:
Dictionary containing
- get_tables() List[str][source]
Get list of all tables in the database.
- Returns:
List of table names
- insert_data(table_name: str, data: Dict[str, Any]) None[source]
Insert a single row of data into a table.
- Parameters:
table_name – Name of the table
data – Dictionary of column_name: value
- classmethod quick_peek(db_path: str, max_rows: int = 5) None[source]
Quick preview of database contents without creating instance.
- Parameters:
db_path – Path to SQLite database
max_rows – Maximum rows to show per table
- classmethod quick_scan(folder_path: str, **kwargs) Dict[str, Any][source]
Convenience method for quick database scanning.
Automatically detects Jupyter environment and enables appropriate display mode for the best user experience.
- Parameters:
folder_path – Directory path to scan
**kwargs – Additional arguments for scan_databases
- Returns:
Scan results dictionary
Example
>>> # Quick scan with automatic environment detection >>> results = SQLiteHandler.quick_scan('/data/databases')
>>> # Quick scan with specific columns >>> results = SQLiteHandler.quick_scan( ... '/data', ... required_columns=['column_a', 'column_b'] ... )
- classmethod read_all_dbs(folder_path: str, table_name: str, pattern: str = '*.db') Dict[str, List][source]
Read same table from multiple database files in a folder.
- Parameters:
folder_path – Path to folder containing .db files
table_name – Name of table to read from each database
pattern – File pattern to match (default: “*.db”)
- Returns:
Dictionary mapping database filename to data
- classmethod read_by_date(db_path: str, table_name: str, date_column: str = 'timestamp', start: str | None = None, end: str | None = None) List[source]
Read data filtered by date range.
- Parameters:
db_path – Path to SQLite database
table_name – Name of the table
date_column – Name of the date column
start – Start date (inclusive)
end – End date (inclusive)
- Returns:
List of query results
- read_table(table_name: str, columns: str = '*', where: str | None = None, limit: int | None = None) List[source]
Read data from a table with optional filtering.
- Parameters:
table_name – Name of the table
columns – Columns to select (default: ‘*’)
where – WHERE clause (without ‘WHERE’ keyword)
limit – Maximum number of rows to return
- Returns:
List of query results
- classmethod scan_databases(folder_path: str, required_columns: List[str] | None = None, pattern: str = '**/*.db', verbose: bool = True, jupyter_mode: bool = False) Dict[str, Any][source]
Scan all SQLite databases in a folder for tables with specific columns.
Efficiently scans multiple database files to find tables containing specified columns and counts their rows. Useful for data exploration and migration planning.
- Parameters:
folder_path – Directory path to scan for database files
required_columns – List of column names that tables must have (None means count all tables)
pattern – Glob pattern for finding database files (default: ‘**/*.db’)
verbose – Display progress and statistics (default: True)
jupyter_mode – Enable Jupyter notebook display mode (default: False)
- Returns:
total_rows: Total number of rows across all matching tables
matching_tables: Count of tables with required columns
total_tables_scanned: Total number of tables examined
processed_files: Number of successfully processed files
error_files: Number of files with errors
details: List of detailed information per file
errors: List of error messages
summary: Processing summary statistics
- Return type:
Dictionary containing
Example
>>> # Basic usage - find tables with specific columns >>> results = SQLiteHandler.scan_databases( ... folder_path='/data/databases', ... required_columns=['Column_A', 'Column_B'], ... verbose=True ... )
>>> # Scan all tables in all databases >>> results = SQLiteHandler.scan_databases( ... folder_path='/data', ... pattern='**/*.sqlite', ... required_columns=None ... )
>>> # Jupyter notebook with progress tracking >>> results = SQLiteHandler.scan_databases( ... folder_path='/data', ... required_columns=['user_id', 'created_at'], ... jupyter_mode=True ... )
- table_exists(table_name: str) bool[source]
Check if a table exists.
- Parameters:
table_name – Name of the table
- Returns:
True if table exists
- classmethod to_dataframe(db_path: str, table_name: str, where: str | None = None) Any | None[source]
Read table directly into pandas DataFrame.
- Parameters:
db_path – Path to SQLite database
table_name – Name of the table
where – Optional WHERE clause
- Returns:
pandas DataFrame or None if pandas not available