Advanced features: a deeper dive¶
This section details the architecture and workflow of the key operational features of the codebase, focusing on the graphical user interface (GUI), logging, session management, and the profiling/timing system.
1. High-level workflow¶
The application is designed to run scientific studies which can be time-consuming. To provide user feedback and manage complexity, the system employs a multi-process architecture.
- Main Process: A lightweight PySide6 GUI (
ProgressGUI) is launched. This GUI is responsible for displaying progress, logs, and timing information. - Study Process: The actual study (
NearFieldStudyorFarFieldStudy) is executed in a separate process using Python'smultiprocessingmodule. This prevents the GUI from freezing during intensive calculations. - Communication: The study process communicates with the GUI process through a
multiprocessing.Queue. It sends messages containing status updates, progress information, and timing data.
The entry point for the study process is the study_process_wrapper function, which sets up a special QueueGUI object. This object mimics the real GUI's interface but directs all its output to the shared queue.
graph TD
A[Main Process: ProgressGUI] -- Spawns --> B[Study Process: study_process_wrapper];
B -- Instantiates --> Study[NearFieldStudy/FarFieldStudy];
Study -- Uses --> QueueGUI[QueueGUI object];
QueueGUI -- Puts messages --> C{multiprocessing.Queue};
C -- Polled by QTimer
every 100ms --> E[QueueHandler];
E -- Updates GUI --> A;
E -- Forwards messages --> F[WebGUIBridge];
F -- HTTP POST
throttled --> G[Monitoring Dashboard
API];
A -- Updates UI --> D[User];
G -- Displays status --> H[Web Dashboard];
style C fill:#FFE082
style E fill:#C5E1A5
style F fill:#BBDEFB
style G fill:#E1BEE7
2. GUI and profiling system¶
The user interface, progress estimation, and timing systems are tightly integrated to provide a responsive and informative experience. The GUI architecture is modular, with specialized components handling different aspects of the interface. The core components are the ProgressGUI and the Profiler.
The ProgressGUI¶
The GUI runs in the main process. It uses a QTimer to poll a multiprocessing.Queue for messages sent from the study process. This design keeps the UI responsive. The GUI is responsible for two primary progress indicators:
- Overall Progress: Tracks the progress of the entire study (e.g., 5 out of 108 simulations complete).
- Stage Progress: Tracks the progress of the current major phase (
setup,run, orextract) for the current simulation.
The GUI is built using modular components located in goliat/gui/components/:
StatusManager: Manages status messages and log display with color-coding.DataManager: Handles CSV data files for progress tracking, time series visualization, and system utilization data.TimingsTable: Displays execution statistics in a table format.PieChartsManager: Generates pie charts showing time breakdown by phase and subtask.ProgressAnimation: Manages smooth animations for progress bars during long-running phases.TrayManager: Provides system tray integration for background operation.QueueHandler: Processes messages from the study process queue and forwards them to the web bridge.WebBridgeManager: Manages connection to the web monitoring dashboard, forwards GUI messages, and handles screenshot capture.UtilizationManager: Updates CPU, RAM, and GPU utilization displays from system monitoring data.SystemMonitor: Provides system resource monitoring (CPU, RAM, GPU) via psutil and nvidia-smi.ScreenshotCapture: Captures GUI tab screenshots for remote monitoring via web dashboard.UIBuilder: Constructs the window layout and manages UI components.
Plot components¶
Plotting functionality is organized into separate classes in goliat/gui/components/plots/:
TimeRemainingPlot: Displays time remaining estimates over the course of the study.OverallProgressPlot: Shows overall study progress percentage over time.SystemUtilizationPlot: Time-series plots for CPU, RAM, GPU utilization, and GPU VRAM.PieChartsManager: Pie charts showing time breakdown by phase and subtask.
Each plot class manages its own matplotlib figure and canvas, updating independently based on data from the DataManager. Common utilities (like timezone conversion) are centralized in plots/utils.py.
The Profiler¶
The Profiler class is the engine for all timing and estimation.
- Session-Based Timing: The profiler maintains a session-specific timing configuration file in the
data/folder (e.g.,profiling_config_31-10_14-15-30_a1b2c3d4.json). The filename includes a timestamp prefix followed by a unique hash. This file stores the average time taken for each major phase (avg_setup_time,avg_run_time, etc.) and for granular subtasks. The session-specific approach means each study run tracks its own timing data, allowing for cleaner session management and avoiding conflicts between concurrent runs. - ETA Calculation: The
get_time_remainingmethod provides the core ETA logic. It calculates the total estimated time for all simulations based on the current session's timing averages and subtracts the time that has already elapsed. This elapsed time is a combination of the total time for already completed simulations and the real-time duration of the current, in-progress simulation. - Weighted Progress: The
Profilercalculates the progress within a single simulation by using phase weights. These weights are derived from the average time of each phase in the current session, normalized to sum to 1. This makes a longer phase, likerun, contribute more to the intra-simulation progress than a shorter one, likeextract.
The animation system¶
For long-running phases where the underlying process provides no feedback (like iSolve.exe), the GUI employs a smooth animation for the Stage Progress bar.
How it works:
-
Initiation: When a major phase (e.g.,
setup) begins, theprofilecontext manager in the study process retrieves the estimated duration for that entire phase from theProfiler(e.g.,avg_setup_time). It sends astart_animationmessage to the GUI with this duration. -
Animation Execution: The
ProgressGUIreceives the message. It resets the stage progress bar to 0% and starts aQTimerthat fires every 50ms. -
Frame-by-Frame Update: With each tick of the timer, the
update_animationmethod calculates the percentage of the estimated duration that has elapsed and updates the stage progress bar to that value. This creates a smooth animation from 0% to 100% over the expected duration of the phase. -
Synchronization: The
update_animationmethod is also responsible for updating the Overall Progress bar. On each tick, it asks theProfilerfor the current weighted progress of the entire study and updates the overall bar accordingly. This keeps both bars synchronized. -
Termination: When the actual phase completes in the study process, an
end_animationmessage is sent. The GUI stops the timer and sets the stage progress bar to its final value of 100%, correcting for any deviation between the estimate and the actual time taken.
This system presents the user with a constantly updating and reasonably accurate view of the system's progress, even without direct feedback from the core simulation.
3. Logging (logging_manager.py)¶
The system uses Python's standard logging module, configured to provide two distinct streams of information.
Loggers:¶
progresslogger: For high-level, user-facing messages. These are shown in the GUI and saved to*.progress.log.verboselogger: For detailed, internal messages. These are saved to the main*.logfile.
Implementation details:¶
- Log Rotation: The
setup_loggersfunction checks the number of log files in thelogsdirectory. If it exceeds a limit (15 pairs), it deletes the oldest pair (.logand.progress.log) to prevent the directory from growing indefinitely. - Data File Cleanup: Similarly, the system automatically manages CSV and JSON files in the
data/directory (progress tracking and profiling files). When more than 50 such files exist, the oldest files are automatically deleted to prevent excessive disk usage. These files follow the naming patterntime_remaining_DD-MM_HH-MM-SS_hash.csv,overall_progress_DD-MM_HH-MM-SS_hash.csv, andprofiling_config_DD-MM_HH-MM-SS_hash.json, where the timestamp allows easy identification of when each session was run. - Handler Configuration: The function creates file handlers and stream (console) handlers for each logger, routing messages to the right places.
propagate = Falseis used to prevent messages from being handled by parent loggers, avoiding duplicate output.
4. Configuration (config.py)¶
The Config class uses a powerful inheritance mechanism to avoid duplicating settings.
-
Inheritance: A config can "extend" a base config. The
_load_config_with_inheritancemethod recursively loads the base config and merges it with the child config. The child's values override the parent's.For example,
near_field_config.jsonmight only specify the settings that differ from the mainbase_config.json.
5. Project management¶
project_manager.py: This class is critical for reliability. The underlying.smashproject files can become corrupted or locked. The_is_valid_smash_filemethod is a key defensive measure. It first attempts to rename the file to itself (a trick to check for file locks on Windows) and then usesh5pyto ensure the file is a valid HDF5 container before attempting to open it in the simulation software. This prevents the application from crashing on a corrupted file.
6. Phantom rotation for by_cheek placement¶
A specialized feature for the by_cheek placement scenario is the ability to rotate the phantom to meet the phone, rather than the other way around. This is controlled by a specific dictionary format in the configuration and uses an automatic angle detection algorithm to ensure precise placement.
Configuration¶
To enable this feature, the orientation in placement_scenarios is defined as a dictionary:
rotate_phantom_to_cheek: A boolean that enables or disables the phantom rotation.angle_offset_deg: An integer that specifies an additional rotation away from the cheek (0 being the default).
Automatic angle detection¶
The system uses a binary search algorithm to find the exact angle at which the phantom's "Skin" entity touches the phone's ground plane. This is handled by the _find_touching_angle method in goliat/setups/near_field_setup.py. The search is performed between 0 and 30 degrees with a precision of 0.5 degrees.
Workflow integration¶
The phantom rotation is handled in the NearFieldSetup.run_full_setup method, occurring after the antenna is placed but before the final scene alignment. This keeps the phone positioned correctly relative to the un-rotated phantom, after which the phantom is rotated into the final position. When phantom rotation is enabled, the rotation instruction is removed from the antenna's orientation list to prevent the antenna from being rotated along with the phantom.
6.5. Scene alignment for by_cheek placements¶
For by_cheek placements, GOLIAT automatically aligns the entire simulation scene with the phone's upright orientation. This optimization aligns the computational grid with the phone's orientation, which can reduce simulation time.
How it works¶
The alignment process occurs after antenna placement and phantom rotation (if enabled). It identifies reference entities on the phone that define its orientation:
- For PIFA antennas: Uses
component1:Substrateandcomponent1:Batteryas reference points. - For IFA antennas: Uses
GroundandBatteryas reference points.
The system calculates a transformation matrix that makes the phone upright and applies this transformation to all scene entities:
- Phantom group
- Antenna group
- Simulation bounding box
- Antenna bounding box
- Head and trunk bounding boxes
- Point sensors
Only parent groups and bounding boxes are transformed, not individual tissue entities, to avoid double transformation. This keeps the entire scene's relative geometry correct while optimizing grid alignment.
Configuration¶
No configuration is required. The alignment is automatically applied for by_cheek placements.
7. The Verify and Resume caching system¶
GOLIAT integrates a Verify and Resume feature to prevent redundant computations by caching simulation results. The system intelligently determines whether a simulation with an identical configuration has already been successfully completed, skipping re-runs and saving significant time.
Verification workflow¶
The verification logic is multi-tiered, prioritizing the integrity of the final result files ("deliverables") over simple metadata flags. This maintains robustness against interrupted runs or manual file deletions.
-
Configuration hashing: Before verification, a "surgical" configuration is created. This is a snapshot containing only the parameters relevant to a single, specific simulation run (e.g., one phantom, one frequency, one placement). This configuration is then serialized and hashed (SHA256), producing a unique fingerprint that represents the exact setup.
-
Metadata and deliverable validation: The core logic resides in
ProjectManager.create_or_open_project, which is called at the start of each simulation. It performs a sequence of checks:- Hash comparison: The hash of the current surgical configuration is compared against the
config_hashstored in theconfig.jsonmetadata file within the simulation's results directory. A mismatch signifies that the configuration has changed, rendering the cached results invalid and triggering a full re-run. .smashfile integrity: If the hashes match, the system validates the.smashproject file itself. This is a critical step for stability, as these files can become locked or corrupted. The validation involves checking for.s4l_lockfiles and verifying the HDF5 structure withh5py. A missing or corrupt.smashfile indicates that the setup phase is incomplete.- Deliverable verification: This is the definitive check. The system looks for the actual output files generated by the
runandextractphases. It verifies their existence and that their modification timestamps are newer than thesetup_timestamprecorded in the metadata.- Run phase deliverables: A valid
*_Output.h5file. - Extract phase deliverables:
sar_results.json,sar_stats_all_tissues.pkl, andsar_stats_all_tissues.html.
- Run phase deliverables: A valid
- Hash comparison: The hash of the current surgical configuration is compared against the
-
Status reporting and phase skipping: The verification process returns a detailed status dictionary, such as
{'setup_done': True, 'run_done': True, 'extract_done': False}. The study orchestrator (NearFieldStudyorFarFieldStudy) uses this status to dynamically skip phases that are already complete. For instance, ifrun_doneisTrue, thedo_runflag for that specific simulation is internally set toFalse, and the run phase is skipped. -
Metadata update: Upon the successful completion of the
runandextractphases, theBaseStudy._verify_and_update_metadatamethod is triggered. It re-confirms that the deliverables exist on the file system and then updates therun_doneorextract_doneflags in theconfig.jsonfile totrue. This keeps the metadata accurately reflecting the state of the deliverables for future runs.
This deliverable-first approach is a key design choice. It guarantees that the system is resilient; even if the metadata file claims a phase is complete, the absence of the actual result files will correctly force the system to re-run the necessary steps.
Overriding the cache¶
The entire caching and verification mechanism can be bypassed using the --no-cache command-line flag.
When this flag is active, GOLIAT will ignore any existing project files or metadata. It skips the verification process, deletes any existing .smash file for the target simulation, and executes all phases (setup, run, extract) from a clean state. This is useful for debugging configuration issues, validating code changes, or when a fresh run is explicitly required.
The --no-cache flag can also be used when you need to ensure that cached results from a previous configuration are not reused, even if the current configuration appears identical.
8. Web monitoring dashboard integration¶
GOLIAT supports remote monitoring through a web dashboard. Monitor multiple worker machines from one interface, track progress across distributed studies, and view real-time logs and system information.
Architecture overview¶
The web monitoring system uses a bridge pattern to forward GUI messages to a remote dashboard API without interfering with local GUI operation. The architecture consists of four components:
-
QueueHandler: Processes messages from the study process queue. After updating the local GUI, forwards a copy of each message to the web bridge (if enabled). -
WebBridgeManager: Manages the web bridge connection lifecycle. Initializes the bridge, collects system information (GPU, CPU, RAM, hostname), and handles connection status updates. -
WebGUIBridge: Core bridge component that forwards messages to the dashboard API. Uses an internal queue to decouple from the multiprocessing queue and implements message throttling to prevent API overload. -
HTTPClient: Handles HTTP requests to the dashboard API endpoints (/api/gui-updateand/api/heartbeat).
Message flow¶
Messages flow from the study process to the web dashboard through this path:
sequenceDiagram
participant Study as Study Process
participant Queue as multiprocessing.Queue
participant Handler as QueueHandler
participant GUI as ProgressGUI
participant Bridge as WebGUIBridge
participant API as Dashboard API
participant Dashboard as Web Dashboard
Study->>Queue: Put message (status, progress, etc.)
Queue->>Handler: Poll (every 100ms)
Handler->>GUI: Update local UI
Handler->>Bridge: Enqueue message copy
Bridge->>Bridge: Throttle & batch
Bridge->>API: HTTP POST /api/gui-update
API->>Dashboard: Update worker state
Bridge->>Bridge: Send heartbeat (every 30s)
Bridge->>API: HTTP POST /api/heartbeat
Message types and handling¶
The QueueHandler processes several message types, each forwarded to the web bridge (with appropriate sanitization):
status: Log messages with color coding. Batched together (up to 20 messages per batch, sent every 300ms) for efficiency.overall_progress: Overall study progress (e.g., 5 out of 108 simulations). Sent immediately with throttling (up to 50 Hz).stage_progress: Progress within the current phase (setup/run/extract). Sent immediately with throttling.profiler_update: ETA and timing information. The profiler object is sanitized to extract only serializable data (e.g.,eta_seconds).finished: Study completion notification.fatal_error: Fatal error notification.
Throttling and batching¶
The WebGUIBridge throttles messages to prevent API overload:
- Progress updates (
overall_progress,stage_progress,profiler_update): Sent immediately but throttled to 50 Hz (20ms minimum interval). - Log messages (
status): Batched together and sent every 300ms, or immediately if the batch reaches 20 messages. - Heartbeats: Sent every 30 seconds to maintain worker registration and update connection status.
Connection management¶
The web bridge maintains connection state and provides feedback to the GUI:
- Connection callback: The bridge calls
ProgressGUI._update_web_statuswhenever the connection status changes. Updates a visual indicator (green dot for connected, red dot for disconnected) in the GUI. - Graceful degradation: If the dashboard is unavailable or the
requestslibrary is not installed, the GUI continues to function normally. Messages are silently dropped (not queued) to prevent memory buildup. - System information: On initialization, the bridge collects and sends system information (GPU model, CPU cores, RAM capacity, hostname) with the initial heartbeat. This information is displayed on the web dashboard.
Worker identification¶
Workers are identified by their IP address (or local IP if no public IP is available). The dashboard handles IP changes (e.g., VPN reconnections) by matching workers by hostname and transferring running assignments to the new worker session.
API endpoints¶
The web bridge communicates with two API endpoints:
POST /api/gui-update: Sends GUI state updates (progress, logs, status). Payload includesmachineId,message, andtimestamp.POST /api/heartbeat: Registers or updates worker status and sends system information. Called automatically every 30 seconds.
Initialization¶
The web bridge initializes automatically when:
- The
requestslibrary is installed (pip install requests). - A machine ID can be detected (public IP or local IP).
- The dashboard URL is accessible (default:
https://goliat.waves-ugent.be).
No configuration is required. The GUI shows a connection status indicator to inform users whether web monitoring is active.
Error handling¶
The web bridge handles errors gracefully:
- Network errors: Connection timeouts and errors are logged but do not affect GUI operation.
- Message serialization: Non-serializable objects (like the
Profilerinstance) are sanitized before sending. - Thread safety: HTTP requests are executed in a thread pool to avoid blocking the GUI thread.
For more information about using the monitoring dashboard, see the Monitoring Dashboard documentation.
9. System utilization monitoring¶
The GUI includes real-time system resource monitoring to track CPU, RAM, GPU utilization, and GPU VRAM usage during simulations. This helps identify bottlenecks and optimize performance.
Architecture¶
System monitoring uses two components:
SystemMonitor: Provides low-level system resource queries usingpsutilfor CPU/RAM andnvidia-smifor GPU metrics. Handles missing dependencies gracefully (returns 0.0 or None if unavailable).UtilizationManager: Updates GUI progress bars and labels with current utilization values. Called every second by a Qt timer.
Metrics tracked¶
- CPU utilization: Percentage (0-100) using non-blocking
psutil.cpu_percent()calls - RAM utilization: Used and total GB, plus percentage with/without cacheable memory
- GPU utilization: Percentage (0-100) via
nvidia-smiqueries - GPU VRAM: Used and total GB, plus percentage utilization
Data collection and export¶
Utilization data is written to CSV files (system_utilization_DD-MM_HH-MM-SS_hash.csv) for analysis and plotting. The GUI includes a dedicated "System Utilization" tab with time-series plots showing all metrics over the simulation duration.
Update frequency¶
- Progress bars: Updated every 1 second via
UtilizationManager.update() - CSV data: Written every 2 seconds (via
GraphManagertimer) - Plot updates: Refreshed every 5 seconds
The monitoring system gracefully handles missing GPU drivers or unavailable hardware, continuing to track CPU and RAM even when GPU data isn't available.
10. GUI screenshot streaming¶
For remote monitoring scenarios (cloud deployments, distributed workers), GOLIAT streams GUI screenshots to the web dashboard, enabling visual monitoring of simulation progress without direct access to the worker machine.
Architecture¶
Screenshot capture is handled by two components:
ScreenshotCapture: Captures all GUI tabs as JPEG images using Qt'srender()method. Excludes the Progress tab (data sent separately via web bridge).WebBridgeManager: Manages screenshot capture timer (1 FPS) and forwards screenshots to the web bridge.
Capture process¶
- Timer initialization: A Qt timer fires every 1 second (1 FPS) to capture screenshots
- Tab rendering: Each visible tab is rendered to a QPixmap using
render()without switching tabs (avoids GUI jumping) - Compression: Screenshots are compressed to JPEG format (95% quality) to reduce bandwidth
- Asynchronous upload: Screenshots are enqueued to the web bridge and sent via HTTP POST to
/api/gui-screenshots
Screenshot format¶
Screenshots are sent as multipart/form-data with: - Each tab as a separate file field (tab name sanitized for form field names) - machineId included as form data - JPEG format with 95% quality for balance between quality and file size
Error handling¶
Screenshot capture failures don't affect GUI operation. Errors are logged but don't interrupt the simulation or GUI updates. If screenshot capture isn't available (missing dependencies, initialization failures), the GUI continues normally without screenshots.
Dashboard integration¶
The web dashboard displays screenshots for each worker, allowing remote monitoring of: - Progress tab (via data, not screenshot) - System Utilization tab - Timings tab - Logs tab - Plots tab - Settings tab
Screenshots are stored on the dashboard server and served via API endpoints (/api/gui-screenshots/[workerId]/[tabName]).
For a complete reference of all features mentioned here and more, see the Full List of Features.