How to use ndmpcopy to copy data between the same or different modes of Data ONTAP


客戶的 NetApp Volume Size 其實不大,約2TB左右,但是小檔非常多,應有上千萬個小檔。
這次要從舊的 8.1 7-mode 轉移到新的 9.6 ONTAP cluster
8.1 7-mode /vol/qtree => ndmpcopy => 9.6 ONTAP cluster /vol/qtree
由於版本差異過大無法使用snapmirror
第一次來試試看 Cluster Mode 下的 ndmpcopy

官方文件說明
How to run ndmpcopy in Clustered Data ONTAP

有兩個模式: vserver scope mode / node scope mode
顧名思義,一個走 vserver lif, 一個走 node mgmt lif
果然其中還是有陷阱,因為在 cluster shell 下無法執行 ndmpcopy 指令,需要透過 node run 去執行 ndmpcopy,所以就算是用 vserver scope mode,也還是會走到 node mgmt lif
而當 node mgmt lif 跟 vserver lif 不同網段時,就會發生connection failed 了,如下列這個 BugID 所述

BugID:467842 Local NDMPCopy fails when connection is attempted between routing groups

n2750::> node run -node n2750-01 ndmpcopy -sa ndmp:XXXXX -da ndmpuser:XXXXXXX 172.25.1.3:/vol/nas01/images 172.25.1.13:/svm_nfs/nas02/images

Ndmpcopy: Starting copy [ 1 ] ...
Ndmpcopy: Socket bind or connect for IP 172.25.1.13 failed
Ndmpcopy: Ensure that  node management or cluster management or intercluster IP of family type inet is present on the source filer
Ndmpcopy: Issue 'ndmpd on' on the source filer to enable NDMP request then retry the connection
Ndmpcopy: Done

n2750::> net int show
n2750
            inter_lif_1  up/up    172.20.1.11/24     n2750-01      a0a-50  true
            inter_lif_2  up/up    172.20.1.13/24     n2750-02      a0a-50  true
            n2750-01     up/up    10.97.11.173/24    n2750-01      e0M     true
            n2750-02     up/up    10.97.11.174/24    n2750-02      e0M     true
            n2750-clus_mgmt
                         up/up    10.97.11.171/24    n2750-01      e0M     true
svm_nfs
            NFS-25-1     up/up    172.25.1.13/24     n2750-01      a0a-100 true
            NFS-25-2     up/up    172.25.1.14/24     n2750-02      a0a-100 true

有兩個解決方式
* 在 svm 加一個 10.97 的 lif
* 在 node mgmt 加一個 172.25 的 lif

因為 172.25 這段是走 10g, 10.97 只有 1g 網路,所以我選擇第二個方式。另外,GUI 無法 create node lif, 只能透過 cluster shell 指令
n2750::> net int create -vserver n2750 -lif 172_mgmt -role node-mgmt -address 172.25.1.136 -netmask 255.255.255.0 -home-node n2750-01 -home-port a0a-100 -status-admin up

n2750::> net int show
n2750
            172_mgmt     up/up    172.25.1.136/24    n2750-01      a0a-100 true
            inter_lif_1  up/up    172.20.1.11/24     n2750-01      a0a-50  true
            inter_lif_2  up/up    172.20.1.13/24     n2750-02      a0a-50  true
            n2750-01     up/up    10.97.11.173/24    n2750-01      e0M     true
            n2750-02     up/up    10.97.11.174/24    n2750-02      e0M     true
            n2750-clus_mgmt
                         up/up    10.97.11.171/24    n2750-01      e0M     true
svm_nfs
            NFS-25-1     up/up    172.25.1.13/24     n2750-01      a0a-100 true
            NFS-25-2     up/up    172.25.1.14/24     n2750-02      a0a-100 true


然後 ndmpcopy 就可以順利運作了

n2750::> node run -node n2750-01 ndmpcopy -sa ndmp:XXXXXXX -da ndmpuser:XXXXXXX 172.25.1.2:/vol/nas01/event 172.25.1.13:/svm_nfs/NFS1_n1/event

Ndmpcopy: Starting copy [ 13 ] ...
Ndmpcopy: 172.25.1.2: Notify: Connection established
Ndmpcopy: 172.25.1.13: Notify: Connection established
Ndmpcopy: 172.25.1.2: Connect: Authentication successful
Ndmpcopy: 172.25.1.13: Connect: Authentication successful
Ndmpcopy: 172.25.1.13: Log: Session identifier: 53249
Ndmpcopy: 172.25.1.13: Log: Session identifier for Restore : 53249
Ndmpcopy: 172.25.1.2: Log: DUMP: creating "/vol/nas01/../snapshot_for_backup.1686" snapshot.
Ndmpcopy: 172.25.1.2: Log: DUMP: Using Full Quota Tree Dump
Ndmpcopy: 172.25.1.2: Log: DUMP: Date of this level 0 dump: Thu Mar 19 22:48:19 2020.
Ndmpcopy: 172.25.1.2: Log: DUMP: Date of last level 0 dump: the epoch.
Ndmpcopy: 172.25.1.2: Log: DUMP: Dumping /vol/nas01/event to NDMP connection
Ndmpcopy: 172.25.1.2: Log: DUMP: mapping (Pass I)[regular files]
Ndmpcopy: 172.25.1.2: Log: DUMP: mapping (Pass II)[directories]
Ndmpcopy: 172.25.1.2: Log: DUMP: estimated 24853097 KB.
Ndmpcopy: 172.25.1.2: Log: DUMP: dumping (Pass III) [directories]
Ndmpcopy: 172.25.1.13: Log: RESTORE: Thu Mar 19 22:51:22 2020: Begin level 0 restore
Ndmpcopy: 172.25.1.13: Log: RESTORE: Thu Mar 19 22:51:23 2020: Reading directories from the backup
Ndmpcopy: 172.25.1.2: Log: DUMP: dumping (Pass IV) [regular files]
Ndmpcopy: 172.25.1.13: Log: RESTORE: Thu Mar 19 22:51:30 2020: Creating files and directories.
Ndmpcopy: 172.25.1.13: Log: RESTORE: Thu Mar 19 22:52:15 2020: Writing data to files.
Ndmpcopy: 172.25.1.2: Log: DUMP: Thu Mar 19 22:56:11 2020 : We have written 18944314 KB.
Ndmpcopy: 172.25.1.13: Log: RESTORE: Thu Mar 19 22:56:11 2020 : We have read 18944434 KB from the backup.
Ndmpcopy: 172.25.1.2: Log: ACL_START is '25464141824'
Ndmpcopy: 172.25.1.2: Log: DUMP: dumping (Pass V) [ACLs]
Ndmpcopy: 172.25.1.2: Log: DUMP: 24871424 KB
Ndmpcopy: 172.25.1.2: Log: DUMP: DUMP IS DONE
Ndmpcopy: 172.25.1.2: Log: DUMP: Deleting "/vol/nas01/../snapshot_for_backup.1686" snapshot.
Ndmpcopy: 172.25.1.2: Log: DUMP_DATE is '5879596595'
Ndmpcopy: 172.25.1.2: Notify: dump successful
Ndmpcopy: 172.25.1.13: Log: RESTORE: RESTORE IS DONE
Ndmpcopy: 172.25.1.13: Notify: restore successful
Ndmpcopy: Transfer successful [ 0 hours, 15 minutes, 34 seconds ]
Ndmpcopy: Done


NDMP 設定參考文件
How to use NDMP-based copy utilities (such as ndmpcopy) to copy data between the same or different modes of Data ONTAP