{"id":2231,"date":"2025-05-13T11:47:14","date_gmt":"2025-05-13T02:47:14","guid":{"rendered":"https:\/\/www.kwonline.org\/memo2\/?p=2231"},"modified":"2025-06-11T11:18:37","modified_gmt":"2025-06-11T02:18:37","slug":"enable-nvidia-h100-gpu-mig","status":"publish","type":"post","link":"https:\/\/www.kwonline.org\/memo2\/2025\/05\/13\/enable-nvidia-h100-gpu-mig\/","title":{"rendered":"Nvidia H100 GPU \u3067 MIG \u3092\u4f7f\u3046"},"content":{"rendered":"<p>&nbsp;<br \/>\nNvidia H100 GPU \u3092\u642d\u8f09\u3057\u305f RHEL8 \u3067 MIG \u3092\u6709\u52b9\u5316\u3057\u305f\u306e\u3067\u30e1\u30e2\u3002 <\/p>\n<p>\u307e\u305a\u306f Nvidia \u30c9\u30e9\u30a4\u30d0\u30fc\u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\nsudo dnf update -y\r\nsudo dnf install kernel-devel kernel-headers gcc make -y\r\nsudo dnf config-manager --add-repo https:\/\/developer.download.nvidia.com\/compute\/cuda\/repos\/rhel8\/x86_64\/cuda-rhel8.repo\r\nsudo dnf -y install nvidia-driver-latest-dkms\r\n<\/pre>\n<p>.bash_profile \u306b\u4ee5\u4e0b\u3092\u8ffd\u8a18\u3057\u3066\u304a\u304f\u3002<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\nexport PATH=\/usr\/local\/bin:$PATH\r\nexport PATH=\/usr\/local\/cuda\/bin:$PATH\r\nexport LD_LIBRARY_PATH=\/usr\/local\/cuda\/lib64:$LD_LIBRARY_PATH\r\nexport TF_FORCE_GPU_ALLOW_GROWTH=true\r\n<\/pre>\n<p>\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\u3057\u305f\u3089\u518d\u8d77\u52d5<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\nsudo reboot\r\n<\/pre>\n<p>\u3064\u3065\u3044\u3066 CUDA \u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\nsudo dnf install -y https:\/\/developer.download.nvidia.com\/compute\/cuda\/repos\/rhel8\/x86_64\/cuda-keyring-1.0-1.el8.noarch.rpm\r\nsudo dnf config-manager --add-repo https:\/\/developer.download.nvidia.com\/compute\/cuda\/repos\/rhel8\/x86_64\/cuda-rhel8.repo\r\nsudo dnf install -y cuda-drivers\r\n<\/pre>\n<p>\u305d\u3057\u3066 MIG \u3092\u6709\u52b9\u5316<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\n# Enable persistence mode\r\nsudo nvidia-smi -pm 1\r\n\r\n# Enable MIG mode\r\nsudo nvidia-persistenced --persistence-mode\r\nsudo nvidia-smi -mig 1\r\n<\/pre>\n<p>\u305d\u3057\u305f\u3089 MIG \u306e GPU instance profile \u3092\u78ba\u8a8d<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\n$ sudo nvidia-smi mig -lgip\r\n+-----------------------------------------------------------------------------+\r\n| GPU instance profiles:                                                      |\r\n| GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |\r\n|                              Free\/Total   GiB              CE    JPEG  OFA  |\r\n|=============================================================================|\r\n|   0  MIG 1g.12gb       19     7\/7        10.75      No     16     1     0   |\r\n|                                                             1     1     0   |\r\n+-----------------------------------------------------------------------------+\r\n|   0  MIG 1g.12gb+me    20     1\/1        10.75      No     16     1     0   |\r\n|                                                             1     1     1   |\r\n+-----------------------------------------------------------------------------+\r\n|   0  MIG 1g.24gb       15     4\/4        21.62      No     26     1     0   |\r\n|                                                             1     1     0   |\r\n+-----------------------------------------------------------------------------+\r\n|   0  MIG 2g.24gb       14     3\/3        21.62      No     32     2     0   |\r\n|                                                             2     2     0   |\r\n+-----------------------------------------------------------------------------+\r\n|   0  MIG 3g.47gb        9     2\/2        46.38      No     60     3     0   |\r\n|                                                             3     3     0   |\r\n+-----------------------------------------------------------------------------+\r\n|   0  MIG 4g.47gb        5     1\/1        46.38      No     64     4     0   |\r\n|                                                             4     4     0   |\r\n+-----------------------------------------------------------------------------+\r\n|   0  MIG 7g.94gb        0     1\/1        93.12      No     132    7     0   |\r\n|                                                             8     7     1   |\r\n+-----------------------------------------------------------------------------+\r\n<\/pre>\n<p>\u3068\u308a\u3042\u3048\u305a 1g.12gb \u306e Compute instance \u3092 7\u500b\u4f5c\u308b\u3002<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\n$ sudo nvidia-smi mig -cgi 1g.12gb,1g.12gb,1g.12gb,1g.12gb,1g.12gb,1g.12gb,1g.12gb -C\r\nSuccessfully created GPU instance ID 13 on GPU  0 using profile MIG 1g.12gb (ID 19)\r\nSuccessfully created compute instance ID  0 on GPU  0 GPU instance ID 13 using profile MIG 1g.12gb (ID  0)\r\nSuccessfully created GPU instance ID 11 on GPU  0 using profile MIG 1g.12gb (ID 19)\r\nSuccessfully created compute instance ID  0 on GPU  0 GPU instance ID 11 using profile MIG 1g.12gb (ID  0)\r\nSuccessfully created GPU instance ID 12 on GPU  0 using profile MIG 1g.12gb (ID 19)\r\nSuccessfully created compute instance ID  0 on GPU  0 GPU instance ID 12 using profile MIG 1g.12gb (ID  0)\r\nSuccessfully created GPU instance ID  7 on GPU  0 using profile MIG 1g.12gb (ID 19)\r\nSuccessfully created compute instance ID  0 on GPU  0 GPU instance ID  7 using profile MIG 1g.12gb (ID  0)\r\nSuccessfully created GPU instance ID  8 on GPU  0 using profile MIG 1g.12gb (ID 19)\r\nSuccessfully created compute instance ID  0 on GPU  0 GPU instance ID  8 using profile MIG 1g.12gb (ID  0)\r\nSuccessfully created GPU instance ID  9 on GPU  0 using profile MIG 1g.12gb (ID 19)\r\nSuccessfully created compute instance ID  0 on GPU  0 GPU instance ID  9 using profile MIG 1g.12gb (ID  0)\r\nSuccessfully created GPU instance ID 10 on GPU  0 using profile MIG 1g.12gb (ID 19)\r\nSuccessfully created compute instance ID  0 on GPU  0 GPU instance ID 10 using profile MIG 1g.12gb (ID  0)\r\n<\/pre>\n<p>\u51fa\u6765\u4e0a\u304c\u3063\u305f\u304b\u78ba\u8a8d\u3059\u308b\u3002 7 \u500b\u51fa\u6765\u3066\u305f\u3002<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\n$ sudo nvidia-smi mig -lgi\r\n+-------------------------------------------------------+\r\n| GPU instances:                                        |\r\n| GPU   Name             Profile  Instance   Placement  |\r\n|                          ID       ID       Start:Size |\r\n|=======================================================|\r\n|   0  MIG 1g.12gb         19        7          0:1     |\r\n+-------------------------------------------------------+\r\n|   0  MIG 1g.12gb         19        8          1:1     |\r\n+-------------------------------------------------------+\r\n|   0  MIG 1g.12gb         19        9          2:1     |\r\n+-------------------------------------------------------+\r\n|   0  MIG 1g.12gb         19       10          3:1     |\r\n+-------------------------------------------------------+\r\n|   0  MIG 1g.12gb         19       11          4:1     |\r\n+-------------------------------------------------------+\r\n|   0  MIG 1g.12gb         19       12          5:1     |\r\n+-------------------------------------------------------+\r\n|   0  MIG 1g.12gb         19       13          6:1     |\r\n+-------------------------------------------------------+\r\n<\/pre>\n<p>\u3053\u308c\u3067\u6e96\u5099\u5b8c\u4e86\u3002<br \/>\nPython \u3067\u52d5\u4f5c\u78ba\u8a8d\u3059\u308b\u3002<\/p>\n<p>venv\u4f5c\u308a\u307e\u3059\u3002<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\npython3.11 -m venv ~\/python311_tf_env\r\n\r\n# Activate the virtual environment\r\nsource ~\/python311_tf_env\/bin\/activate\r\n\r\n# Upgrade pip first and install packages\r\npip install --upgrade pip\r\npip install tensorflow&#x5B;and-cuda]\r\npip install matplotlib\r\npip install scikit-learn\r\n<\/pre>\n<p>\u307e\u305a\u306f MIG \u3092\u8a8d\u8b58\u3067\u304d\u3066\u308b\u304b\u306e\u78ba\u8a8d\u306e\u305f\u3081\u306e python<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nimport os\r\nimport tensorflow as tf\r\n\r\n# Enable memory growth to avoid allocating all GPU memory at once\r\nphysical_devices = tf.config.list_physical_devices('GPU')\r\nprint(&quot;Physical devices:&quot;, physical_devices)\r\n\r\nif physical_devices:\r\n    try:\r\n        for gpu in physical_devices:\r\n            tf.config.experimental.set_memory_growth(gpu, True)\r\n        print(&quot;Memory growth enabled&quot;)\r\n    except Exception as e:\r\n        print(f&quot;Error setting memory growth: {e}&quot;)\r\n\r\n# Print detailed GPU info\r\nprint(&quot;TensorFlow version:&quot;, tf.__version__)\r\nprint(&quot;CUDA visible devices:&quot;, os.environ.get('CUDA_VISIBLE_DEVICES', 'Not set'))\r\n\r\n# Try a simple GPU operation\r\ntry:\r\n    with tf.device('\/GPU:0'):\r\n        a = tf.constant(&#x5B;&#x5B;1.0, 2.0], &#x5B;3.0, 4.0]])\r\n        b = tf.constant(&#x5B;&#x5B;5.0, 6.0], &#x5B;7.0, 8.0]])\r\n        c = tf.matmul(a, b)\r\n        print(&quot;Matrix multiplication result:&quot;, c.numpy())\r\n        print(&quot;GPU operation successful!&quot;)\r\nexcept Exception as e:\r\n    print(f&quot;GPU operation failed: {e}&quot;)\r\n<\/pre>\n<p>\u5b9f\u884c\u7d50\u679c<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\n$ python checkmig.py\r\nPhysical devices: &#x5B;PhysicalDevice(name='\/physical_device:GPU:0', device_type='GPU')]\r\nMemory growth enabled\r\nTensorFlow version: 2.19.0\r\nCUDA visible devices: MIG-a9bb5f16-9a29-5172-9fe5-7af5908664bf\r\nMIG-5bb83066-7217-5c42-b99e-fe75efe6d7e5\r\nMIG-1f6a0e31-cc52-5909-b14e-3a742942bd35\r\nMIG-56f1b757-9dfd-58cc-b886-8ebe2c442094\r\nMIG-0da82664-471e-5a6b-9738-4a6acf986175\r\nMIG-2e7ddef9-c2f3-5e0b-965f-f4024769c795\r\nMIG-d3ec2ae4-8d3d-5317-920b-3b8219c3176d\r\nMatrix multiplication result: &#x5B;&#x5B;19. 22.]\r\n &#x5B;43. 50.]]\r\nGPU operation successful!\r\n<\/pre>\n<p>\u3061\u3083\u3093\u3068 MIG instance \u3092\u8a8d\u8b58\u3057\u3066\u308b\u3002<br \/>\n\u6b21\u306f\u3001\u4e00\u756a\u76ee\u306e MIG instance \u3092\u4f7f\u3063\u3066 python \u3092\u5b9f\u884c\u3057\u3066\u307f\u308b\u3002<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n# TensorFlow and keras\r\nimport tensorflow as tf\r\nimport keras\r\nimport numpy as np\r\nimport matplotlib.pyplot as plt\r\nfrom tensorflow.python.client import device_lib\r\nprint(tf.__version__)\r\n\r\n# GPU available?\r\nprint(device_lib.list_local_devices())\r\n\r\n# load the dataset \r\nfashion_mnist = keras.datasets.fashion_mnist\r\n(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()\r\n\r\nclass_names = &#x5B;'T-shirt\/top', 'Trouser', 'Pullover', 'Dress', 'Coat', \r\n               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']\r\n\r\ntrain_images = train_images \/ 255.0\r\ntest_images = test_images \/ 255.0\r\n\r\n# define and train the model\r\nmodel = keras.Sequential(&#x5B;\r\n    keras.layers.Flatten(input_shape=(28, 28)),\r\n    keras.layers.Dense(8192, activation=tf.nn.relu),\r\n    keras.layers.Dense(4096, activation=tf.nn.softmax)\r\n])\r\n\r\nmodel.compile(optimizer='adam',\r\n              loss='sparse_categorical_crossentropy',\r\n              metrics=&#x5B;'accuracy'])\r\n\r\nmodel.fit(train_images, train_labels, epochs=5)\r\n<\/pre>\n<p>\u5b9f\u884c\u7d50\u679c\u3002\u3061\u3083\u3093\u3068 GPU \u4f7f\u3063\u3066\u308b\u304b\u3089\u901f\u3044\u3002<\/p>\n<pre class=\"brush: bash; title: ; notranslate\" title=\"\">\r\n$ CUDA_VISIBLE_DEVICES=MIG-a9bb5f16-9a29-5172-9fe5-7af5908664bf python gputest.py\r\n2.19.0\r\n&#x5B;name: &quot;\/device:CPU:0&quot;\r\ndevice_type: &quot;CPU&quot;\r\nmemory_limit: 268435456\r\nlocality {\r\n}\r\nincarnation: 7443360534658311321\r\nxla_global_id: -1\r\n, name: &quot;\/device:GPU:0&quot;\r\ndevice_type: &quot;GPU&quot;\r\nmemory_limit: 9541976064\r\nlocality {\r\n  bus_id: 1\r\n  links {\r\n  }\r\n}\r\nincarnation: 7883565882498998567\r\nphysical_device_desc: &quot;device: 0, name: NVIDIA H100 NVL MIG 1g.12gb, pci bus id: 0001:00:00.0, compute capability: 9.0&quot;\r\nxla_global_id: 416903419\r\n]\r\nEpoch 1\/5\r\n1875\/1875 \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 10s 5ms\/step - accuracy: 0.7798 - loss: 0.6827\r\nEpoch 2\/5\r\n1875\/1875 \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 9s 5ms\/step - accuracy: 0.8595 - loss: 0.3803\r\nEpoch 3\/5\r\n1875\/1875 \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 9s 5ms\/step - accuracy: 0.8797 - loss: 0.3253\r\nEpoch 4\/5\r\n1875\/1875 \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 9s 5ms\/step - accuracy: 0.8930 - loss: 0.2903\r\nEpoch 5\/5\r\n1875\/1875 \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 9s 5ms\/step - accuracy: 0.8960 - loss: 0.2788\r\n&lt;keras.src.callbacks.history.History at 0x7f9cdc376350&gt;\r\n<\/pre>\n<p>\u4ee5\u4e0a<br \/>\n&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Nvidia H100 GPU \u3092\u642d\u8f09\u3057\u305f RHEL8 \u3067 MIG \u3092\u6709\u52b9\u5316\u3057\u305f\u306e\u3067\u30e1\u30e2\u3002 \u307e\u305a\u306f Nvidia \u30c9\u30e9\u30a4\u30d0\u30fc\u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb sudo dnf update -y sudo dnf insta [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21,8,29],"tags":[],"class_list":["post-2231","post","type-post","status-publish","format-standard","hentry","category-data-engineering","category-linux","category-python"],"_links":{"self":[{"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/posts\/2231","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/comments?post=2231"}],"version-history":[{"count":2,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/posts\/2231\/revisions"}],"predecessor-version":[{"id":2269,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/posts\/2231\/revisions\/2269"}],"wp:attachment":[{"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/media?parent=2231"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/categories?post=2231"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kwonline.org\/memo2\/wp-json\/wp\/v2\/tags?post=2231"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}