Tuesday, May 28, 2013

Fixing CIFS ACLs On a DataDomain

On my current project, our customer makes use of DataDomain storage to provide nearline backups of system data. Backups to these devices is primarily done through tools like Veeam (for ESX-hosted virtual machines and associated application data) and NetBackup.

However, a small percentage of the tenants my customer hosts are "self backup" tenants. In this case, "self backup" means that, rather than leveraging the enterprise backup frameworks, the tenants are given an NFS or CIFS share directly off of the DataDomain that is: A) closest to their system(s); and, B) has the most free space to accommodate their data.

"Self backup" tenants that use NFS shares are minimally problematic. Most of the backup problems come from the fact that DataDomains weren't really designed for multi-tenancy. Things like quota controls are fairly lacking. So, it's possible for a tennants of a shared DataDomain to screw each other over by either soaking up all of the device's bandwidth or soaking up all the space.

Still, those problems aside, providing NFS service to tenants is fairly straight-forward. You create a directory, you go into the NFS share-export interface, create the share and access controls and call it a day. CIFS shares, on the other hand...

While we'd assumed that providing CIFS service would be on a par to providing NFS service, it's proven to be otherwise. While the DataDomains provide an NTFS-style ACL capability in their filesystem, it hasn't proven to work quite as one might expect.

The interface for creating shares allows you to set share-level access controls based on both calling-host as well as assigned users and/or groups. One would reasonably assume that this would mean that the correct way to set up a share is to export it with appropriate client-allow lists and user/group-allow lists and that the shares would be set with appropriate filesystem permissions automagically. This isn't exactly how it's turned out to work.

What we've discovered is that you pretty much have to set the shares up as being universally accessible from all CIFS clients and that you grant global "full control" access to the top-level share-folder. Normally, this would be a nightmare, but, once created, you can lock the shares down. You just have to manage the NTFS attributes from a Windows-based host. Basically, you create the share, present it to a Windows-based administrative host, then use the Windows folder security tools to modify the permissions on the share (e.g., remove all the "Everyone" rights, then manually assign appropriate appropriate ownerships and posix gropus to the folder and set up the correct DACLs.

From an engineering perspective, it means that you have to document the hell out of things and try your best to train the ops folks on how to do things The Right Way™. Then, with frequent turnovers in Operations and other "shit happens" kind of things, you have to go back and periodically audit configurations for correctness and repair the brokenness that has crept in.

Unfortunately, one of the biggest sources of brokenness that creeps in is broken permissions structures. When doing the initial folder-setup, it's absolutely critical that the person setting up the folder remembers to click the "Replace all child object permissions with inheritable permissions from this object" checkbox (accessed by clicking on the "Change Permissions" button within the "Advanced Security Settings" section for the folder). Failure to do so makes it so that each folder, subfolder and file created (by tenants) in the share have their own, tenant-created permissions structures. What this results in is a share whose permissions are not easily maintainable by the array-operators. Ultimately, it results in trouble tickets opened by tenants whose applications and/or operational folks eventually break access for themselves

Once those tickets come in, there's not much that can be easily done if the person who "owns" the share has left the organization. If you find yourself needing to fix such a situation, you need to either involve DataDomain's support staff to fix it (assuming your environment is reachable via an WebEx-type of support session) or get someone to slip you instructions on how to access the array's "Engineering Mode"

There's actually two engineering modes: there's the regular SE shell and the BASH shell. The SE shell is basically a super-set of the regular system administration CLI. The BASH shell is basically a Linux BASH shell with DataDomain-specific management commands enabled. For the most part, the two modes are interchangable. However, if you need the ability to do mass modifications or script on your array, you'll need to access the DataDomain's BASH shell mode to do it. See my prior article on accessing the DataDomain's BASH shell mode.

Once you've gotten the engineering BASH shell, you have pretty much unfettered access to the guts of the DataDomain. The BASH shell is pretty much the same as you'd encounter on a stock Linux system. Most of the GNU utilities you're used to using will be there and will work the same way they do on Linux. You won't have man pages, so, if you forget flags to a given shell command, look them up on a Linux host that has the man pages installed. In addition to the standard Linux commands will be some DataDomain-specific commands. For the purposes of fixing your NTFS ACL mess, you'll be wanting to use the "dd_xcacls" command:

  • Use "dd_xcacls -O '[DomainObject]' [Object]" to set the Ownership of an object. For example, to set the ownership attribute to your AD domain account, issue the command "dd_xcacls -O 'MDOMAIN\MYUSER' /data/col1/backup/ShareName".
  • Use "dd_xcacls -G '[DomainObject]' [Object]" to set the POSIX group of an object.  For example, to set the POSIX group attribute to your AD domain group, issue the command "dd_xcacls -O 'MDOMAIN\MYUSER' /data/col1/backup/ShareName".
  • Use "dd_xcacls -D '[ActiveDirectorySID]:[Setting]/[ScopeMask]/[RightsList]' [OBJECT]" to set the POSIX group of an object. For example, to give "Full Control" rights to your domain account, issue the command "dd_xcacls -D 'MDOMAIN\MYUSER:ALLOW/4/FullControl' /data/col1/backup/ShareName".

A couple of notes apply to the immediately preceding:

  1. While the "dd_xcacls" command can notionally set rights-inheritance, I've discovered that this isn't 100% reliable in the DDOS 5.1.x family. It will likely be necessary that once you've placed the desired DACLs on the filesystem objects, you'll need to use a Windows system to set/force inheritance onto objects lower in the filesystem hierarchy.
  2. When you set a DACL with "dd_xcacls -D", it replaces whatever DACLS are in place. Any permissions previously on the filesystem object will be removed. If you want more than one user/group DACL applied to the filesystem-object, you need to apply them all at once. Use the ";" token to separate DACLs within the quoted text-argument to the "-D" flag

Because you'll need to fix all of your permissions, one at a time, from this mode, you'll want to use the Linux `find` command to power your use of the  "dd_xcacls" command. On a normal Linux system, when dealing with filesystems that have spaces in directory or file object-names, you'd do something like `find [DIRECTORY] -print0 | xargs -0 [ACTION]` to more efficiently handle this. However, that doesn't seem to work exactly like on a generic Linux system, at least not on the DDOS 5.x systems I've used. Instead, you'll need to use a `find [Directory] -exec [dd_xcacls command-string] {} \;`. This is very slow and resource intensive. On a directory structure with thousands of files, this can take hours to run. Further, because of how resource-intensive using this method is, you won't be able to run more than one such job at a time. Attempting to do so will result in SEGFAULTs - and the more you attempt to run concurrently, the more frequent the SEGFAULTs will be. These SEGFAULTs will cause individual "dd_xcacls" iterations to fail, potentially leaving random filesystem objects permissions unmodified.

1 comment:

  1. I have a DD2500 (5.4.0.8-404909) and I logged in with SSH by Putty. When type the command dd_xcacls then appears this message: "That doesn't look like a valid command, displaying help..."

    I want to create a CIFS share and set Full rights to the group "domain administrators" and only read to "domain users" on the top level of the share, so that each new folder inherits the ACL from this point. I tried to edit the ACL from my Windows client (right click on the share, "Properties", "Security". Then I added a group and wanted to apply the changes, but Windows answered "Access denied". If I create a folder I can edit the ACL from this folder. If I create a subfolder to this folder, the ACL are inherited. But I want to have around 100 folders at the top level of the share.

    With my Qnap I was able to select AD users and groups inside the Qnap itself, with DD that's not possible.

    ReplyDelete