Skip to content

Commit b8fbfc2

Browse files
authored
6640 fixes data analyzer device fallback (#6658)
Fixes #6640 ### Description the issue was when `device_count==0` fallback to CPU didn't work ``` [2023-06-26T12:57:56.270Z] 2023-06-26 12:57:56,044 - INFO - test_fl compute data statistics on train... [2023-06-26T12:57:56.270Z] 2023-06-26 12:57:56,044 - INFO - Found 0 GPUs for data analyzing! [2023-06-26T12:57:56.270Z] Finished test: test_shape_5 (tests.test_varautoencoder.TestVarAutoEncoder) (0.007s) [2023-06-26T12:57:56.270Z] Starting test: test_get_data_stats_0 (tests.test_fl_monai_algo_stats.TestFLMonaiAlgo)... [2023-06-26T12:57:56.526Z] 0%| | 0/2 [00:00<?, ?it/s]2023-06-26 12:57:56,326 - INFO - Unable to process data /home/jenkins/agent/workspace/Monai-pytorch-versions/tests/testing_data/anatomical.nii on cuda:0. No CUDA GPUs are available [2023-06-26T12:57:56.526Z] 2023-06-26 12:57:56,326 - INFO - DataAnalyzer `device` set to GPU execution hit an exception. Falling back to `cpu`. [2023-06-26T13:19:27.823Z] 50%|█████ | 1/2 [00:00<00:00, 3.17it/s]Sending interrupt signal to process [2023-06-26T13:19:27.824Z] Killing processes ``` ### Types of changes <!--- Put an `x` in all the boxes that apply, and remove the not applicable items --> - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Integration tests passed locally by running `./runtests.sh -f -u --net --coverage`. - [ ] Quick tests passed locally by running `./runtests.sh --quick --unittests --disttests`. - [ ] In-line docstrings updated. - [ ] Documentation updated, tested `make html` command in the `docs/` folder. --------- Signed-off-by: Wenqi Li <[email protected]>
1 parent 70e2151 commit b8fbfc2

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

monai/apps/auto3dseg/data_analyzer.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -320,6 +320,9 @@ def _get_all_case_stats(
320320
)
321321
result_bycase: dict[DataStatsKeys, Any] = {DataStatsKeys.SUMMARY: {}, DataStatsKeys.BY_CASE: []}
322322
device = self.device if self.device.type == "cpu" else torch.device("cuda", rank)
323+
if device.type == "cuda" and not (torch.cuda.is_available() and torch.cuda.device_count() > 0):
324+
logger.info(f"device={device} but CUDA device is not available, using CPU instead.")
325+
device = torch.device("cpu")
323326
if not has_tqdm:
324327
warnings.warn("tqdm is not installed. not displaying the caching progress.")
325328

@@ -332,12 +335,12 @@ def _get_all_case_stats(
332335
label = torch.argmax(label, dim=0) if label.shape[0] > 1 else label[0]
333336
batch_data[self.label_key] = label.to(device)
334337
d = summarizer(batch_data)
335-
except BaseException:
338+
except BaseException as err:
336339
if "image_meta_dict" in batch_data.keys():
337340
filename = batch_data["image_meta_dict"]["filename_or_obj"]
338341
else:
339342
filename = batch_data[self.image_key].meta["filename_or_obj"]
340-
logger.info(f"Unable to process data {filename} on {device}.")
343+
logger.info(f"Unable to process data {filename} on {device}. {err}")
341344
if self.device.type == "cuda":
342345
logger.info("DataAnalyzer `device` set to GPU execution hit an exception. Falling back to `cpu`.")
343346
batch_data[self.image_key] = batch_data[self.image_key].to("cpu")

0 commit comments

Comments
 (0)