How to use ndmpcopy to copy data between the same or different modes of Data ONTAP


客戶的 NetApp Volume Size 其實不大,約2TB左右,但是小檔非常多,應有上千萬個小檔。
這次要從舊的 8.1 7-mode 轉移到新的 9.6 ONTAP cluster
8.1 7-mode /vol/qtree => ndmpcopy => 9.6 ONTAP cluster /vol/qtree
由於版本差異過大無法使用snapmirror
第一次來試試看 Cluster Mode 下的 ndmpcopy

官方文件說明
How to run ndmpcopy in Clustered Data ONTAP

有兩個模式: vserver scope mode / node scope mode
顧名思義,一個走 vserver lif, 一個走 node mgmt lif
果然其中還是有陷阱,因為在 cluster shell 下無法執行 ndmpcopy 指令,需要透過 node run 去執行 ndmpcopy,所以就算是用 vserver scope mode,也還是會走到 node mgmt lif
而當 node mgmt lif 跟 vserver lif 不同網段時,就會發生connection failed 了,如下列這個 BugID 所述

BugID:467842 Local NDMPCopy fails when connection is attempted between routing groups

n2750::> node run -node n2750-01 ndmpcopy -sa ndmp:XXXXX -da ndmpuser:XXXXXXX 172.25.1.3:/vol/nas01/images 172.25.1.13:/svm_nfs/nas02/images

Ndmpcopy: Starting copy [ 1 ] ...
Ndmpcopy: Socket bind or connect for IP 172.25.1.13 failed
Ndmpcopy: Ensure that  node management or cluster management or intercluster IP of family type inet is present on the source filer
Ndmpcopy: Issue 'ndmpd on' on the source filer to enable NDMP request then retry the connection
Ndmpcopy: Done

n2750::> net int show
n2750
            inter_lif_1  up/up    172.20.1.11/24     n2750-01      a0a-50  true
            inter_lif_2  up/up    172.20.1.13/24     n2750-02      a0a-50  true
            n2750-01     up/up    10.97.11.173/24    n2750-01      e0M     true
            n2750-02     up/up    10.97.11.174/24    n2750-02      e0M     true
            n2750-clus_mgmt
                         up/up    10.97.11.171/24    n2750-01      e0M     true
svm_nfs
            NFS-25-1     up/up    172.25.1.13/24     n2750-01      a0a-100 true
            NFS-25-2     up/up    172.25.1.14/24     n2750-02      a0a-100 true

有兩個解決方式
* 在 svm 加一個 10.97 的 lif
* 在 node mgmt 加一個 172.25 的 lif

因為 172.25 這段是走 10g, 10.97 只有 1g 網路,所以我選擇第二個方式。另外,GUI 無法 create node lif, 只能透過 cluster shell 指令
n2750::> net int create -vserver n2750 -lif 172_mgmt -role node-mgmt -address 172.25.1.136 -netmask 255.255.255.0 -home-node n2750-01 -home-port a0a-100 -status-admin up

n2750::> net int show
n2750
            172_mgmt     up/up    172.25.1.136/24    n2750-01      a0a-100 true
            inter_lif_1  up/up    172.20.1.11/24     n2750-01      a0a-50  true
            inter_lif_2  up/up    172.20.1.13/24     n2750-02      a0a-50  true
            n2750-01     up/up    10.97.11.173/24    n2750-01      e0M     true
            n2750-02     up/up    10.97.11.174/24    n2750-02      e0M     true
            n2750-clus_mgmt
                         up/up    10.97.11.171/24    n2750-01      e0M     true
svm_nfs
            NFS-25-1     up/up    172.25.1.13/24     n2750-01      a0a-100 true
            NFS-25-2     up/up    172.25.1.14/24     n2750-02      a0a-100 true


然後 ndmpcopy 就可以順利運作了

n2750::> node run -node n2750-01 ndmpcopy -sa ndmp:XXXXXXX -da ndmpuser:XXXXXXX 172.25.1.2:/vol/nas01/event 172.25.1.13:/svm_nfs/NFS1_n1/event

Ndmpcopy: Starting copy [ 13 ] ...
Ndmpcopy: 172.25.1.2: Notify: Connection established
Ndmpcopy: 172.25.1.13: Notify: Connection established
Ndmpcopy: 172.25.1.2: Connect: Authentication successful
Ndmpcopy: 172.25.1.13: Connect: Authentication successful
Ndmpcopy: 172.25.1.13: Log: Session identifier: 53249
Ndmpcopy: 172.25.1.13: Log: Session identifier for Restore : 53249
Ndmpcopy: 172.25.1.2: Log: DUMP: creating "/vol/nas01/../snapshot_for_backup.1686" snapshot.
Ndmpcopy: 172.25.1.2: Log: DUMP: Using Full Quota Tree Dump
Ndmpcopy: 172.25.1.2: Log: DUMP: Date of this level 0 dump: Thu Mar 19 22:48:19 2020.
Ndmpcopy: 172.25.1.2: Log: DUMP: Date of last level 0 dump: the epoch.
Ndmpcopy: 172.25.1.2: Log: DUMP: Dumping /vol/nas01/event to NDMP connection
Ndmpcopy: 172.25.1.2: Log: DUMP: mapping (Pass I)[regular files]
Ndmpcopy: 172.25.1.2: Log: DUMP: mapping (Pass II)[directories]
Ndmpcopy: 172.25.1.2: Log: DUMP: estimated 24853097 KB.
Ndmpcopy: 172.25.1.2: Log: DUMP: dumping (Pass III) [directories]
Ndmpcopy: 172.25.1.13: Log: RESTORE: Thu Mar 19 22:51:22 2020: Begin level 0 restore
Ndmpcopy: 172.25.1.13: Log: RESTORE: Thu Mar 19 22:51:23 2020: Reading directories from the backup
Ndmpcopy: 172.25.1.2: Log: DUMP: dumping (Pass IV) [regular files]
Ndmpcopy: 172.25.1.13: Log: RESTORE: Thu Mar 19 22:51:30 2020: Creating files and directories.
Ndmpcopy: 172.25.1.13: Log: RESTORE: Thu Mar 19 22:52:15 2020: Writing data to files.
Ndmpcopy: 172.25.1.2: Log: DUMP: Thu Mar 19 22:56:11 2020 : We have written 18944314 KB.
Ndmpcopy: 172.25.1.13: Log: RESTORE: Thu Mar 19 22:56:11 2020 : We have read 18944434 KB from the backup.
Ndmpcopy: 172.25.1.2: Log: ACL_START is '25464141824'
Ndmpcopy: 172.25.1.2: Log: DUMP: dumping (Pass V) [ACLs]
Ndmpcopy: 172.25.1.2: Log: DUMP: 24871424 KB
Ndmpcopy: 172.25.1.2: Log: DUMP: DUMP IS DONE
Ndmpcopy: 172.25.1.2: Log: DUMP: Deleting "/vol/nas01/../snapshot_for_backup.1686" snapshot.
Ndmpcopy: 172.25.1.2: Log: DUMP_DATE is '5879596595'
Ndmpcopy: 172.25.1.2: Notify: dump successful
Ndmpcopy: 172.25.1.13: Log: RESTORE: RESTORE IS DONE
Ndmpcopy: 172.25.1.13: Notify: restore successful
Ndmpcopy: Transfer successful [ 0 hours, 15 minutes, 34 seconds ]
Ndmpcopy: Done


NDMP 設定參考文件
How to use NDMP-based copy utilities (such as ndmpcopy) to copy data between the same or different modes of Data ONTAP



VCS MySQL Agent Configuration

太久沒有碰到客戶用,又忘的差不多了,這次筆記下來吧


MySQL Agent Function
The operations or functions that the Symantec High Availability agent for MySQL
can perform are as follows:
online
$ BaseDir/bin/mysqld_safe --defaults-file=MyCnf --datadir=DataDir --user=MySQLUser
monitor
$ BaseDir/bin/mysqladmin --user=MySQLAdmin --password=MySQLAdminPasswd status
offline
$ BaseDir/bin/mysqladmin --user=MySQLAdmin --password=MySQLAdminPasswd shutdown
clean
$ BaseDir/bin/mysqladmin --user=MySQLAdmin --password=MySQLAdminPasswd shutdown


Create a MySQLAdmin user with shutdown privileages only:
# mysql -uroot -p{PASSWORD}
mysql> use mysql;
mysql> select user,host,password from user;
mysql> create user 'MySQLAdmin'@'localhost' identified by 'passw0rd' ;
mysql> create user 'MySQLAdmin'@'127.0.0.1' identified by 'passw0rd' ;
mysql> grant shutdown on *.* to 'MySQLAdmin'@'localhost' ;
mysql> grant shutdown on *.* to 'MySQLAdmin'@'l27.0.0.1' ;
mysql> select user,host,password from user;
mysql> flush privileges;

Test the user can shutdown the MySQL database
# $BaseDir/bin/mysqladmin --user=MySQLAdmin --password=passw0rd status
# $BaseDir/bin/mysqladmin --user=MySQLAdmin --password=passw0rd shutdown


Installation
# rpm -Uvh VRTSacclib-6.2.0.0-GENERIC.noarch.rpm
# rpm -Uvh VRTSmysql-6.2.0.0-GENERIC.noarch.rpm


Import MySQL resource type: 
/etc/VRTSagents/ha/conf/MySQL/MySQLTypes.cmd


Encrypt MySQLadmin password
# vcsencrypt -agent
Enter Password:
Enter Again:
GUGsHUjUJuNSvUIsK


VCS MySQL Resource Attribute
group MySQL (
        SystemList = { node1 = 0, node2 = 1 }
        )

        MySQL MySQLdb (
                Critical = 1
                MySQLAdmin = MySQLAdmin
                MySQLAdminPasswd = GUGsHUjUJuNSvUIsK   <=== encrypt password
                BaseDir = "/usr"
                DataDir = "/var/lib/mysql"
                MyCnf = "/etc/my.cnf"
                )


Running multiple instances of MySQL on a single node
VCS ENV file
#!/bin/ksh
MYSQL_UNIX_PORT=/tmp/mysql.sock; export MYSQL_UNIX_PORT
MYSQL_TCP_PORT=3307; export MYSQL_TCP_PORT


VxVM DCPA issue (Data Corruption Protection Activated)



http://www.symantec.com/docs/TECH128862
http://www.symantec.com/docs/TECH76355
http://www.symantec.com/docs/TECH167983
http://www.symantec.com/docs/HOWTO55903

Netbackup Replication Director, Netapp Plugin for Netbackup

NetBackup 結合 NetApp snapshot 備份

NetBackup 7.5 Replication Director configuration demo
http://www.symantec.com/connect/videos/netbackup-75-replication-director-configuration-demo

Configuring NetApp for Replication Director
http://www.symantec.com/connect/videos/configuring-netapp-replication-director

NetBackup Replication Director Unifies End-to-End Management of Snapshots and Backup
http://www.symantec.com/connect/connect-view-protected-content/2579911

VxVM Serial Split Brain - Detection & Resolution

Serial Split Brain - Detection & Resolution
http://www.symantec.com/docs/TECH33020

文件中有詳細說明前因後果與處理流程

第一次碰到這種 vxdg import fail 的狀況
過程是要 import HDS SI 的 disk 時無法 import
(怪,S-Vol 不是應該跟 P-Vol 完全一致嗎?)

VxVM vxdg ERROR V-5-1-10978 Disk group ctidbdg: import failed:
Serial Split Brain detected. Run vxsplitlines to import the diskgroup

發生的原因是 disk 存放的 dg config 不一致
所以 vxvm 不知道要用哪一份 dg config 來 import dg

那我們就指定他要用哪一顆硬碟的 dg config
就可以順利 import 了~

那我們怎麼知道哪顆硬碟的 dg config 是 ok 的呢?

文件中有詳細步驟 vxsplitlines 怎麼用
可是奇怪我跑卻好像卡住一樣,沒東西出來
我只好手動作業了

vxdisk list disk 找到一顆 config enable 的
再用 /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/cxtxd0s2
確認是否可以看到 dg config

ok 找到這顆硬碟 dg config 可用
vxdisk list disk 中有 diskid 資訊

# /usr/sbin/vxdg (-s) -o selectcp= import newdg

耶,成功 dg import 進來了~

.