ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    What's the Best Way to Deduplicate & Organize Files/Folders on a 200 TB NAS?

    IT Discussion
    11
    40
    3.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller @anthonyh
      last edited by

      @anthonyh said in What's the Best Way to Deduplicate & Organize Files/Folders on a 200 TB NAS?:

      Though it looks like ZFS on BSD (which IIRC FreeNAS is based on FreeBSD) might support reflinks...

      Yes, FreeNAS is just an older version of FreeBSD (not very old, just a little.)

      1 Reply Last reply Reply Quote 0
      • scottalanmillerS
        scottalanmiller
        last edited by

        More on reflinks.

        stacksofplatesS 1 Reply Last reply Reply Quote 0
        • scottalanmillerS
          scottalanmiller
          last edited by

          ZFS does not have reflinks, and doesn't plan to. It's a BtrFS feature back ported to XFS on Linux.

          anthonyhA 1 Reply Last reply Reply Quote 0
          • anthonyhA
            anthonyh @scottalanmiller
            last edited by

            @scottalanmiller said in What's the Best Way to Deduplicate & Organize Files/Folders on a 200 TB NAS?:

            ZFS does not have reflinks, and doesn't plan to. It's a BtrFS feature back ported to XFS on Linux.

            That's what I thought, but I didn't have the data to back it up.

            1 Reply Last reply Reply Quote 0
            • StrongBadS
              StrongBad
              last edited by

              ZFS has a lot of similar stuff built in, I don't think that they want to do it two ways. It's not often that people want the extra reflinks functionality.

              anthonyhA 1 Reply Last reply Reply Quote 0
              • anthonyhA
                anthonyh @StrongBad
                last edited by

                @strongbad said in What's the Best Way to Deduplicate & Organize Files/Folders on a 200 TB NAS?:

                ZFS has a lot of similar stuff built in, I don't think that they want to do it two ways. It's not often that people want the extra reflinks functionality.

                Yeah. ZFS's deduplication functionality is good...just resource intensive. I've talked to guys who build out large storage arrays using ZFS and deduplication and it gets complicated (at least from my ZFS novice point of view) if you want it to perform well.

                scottalanmillerS 1 Reply Last reply Reply Quote 0
                • scottalanmillerS
                  scottalanmiller @anthonyh
                  last edited by

                  @anthonyh said in What's the Best Way to Deduplicate & Organize Files/Folders on a 200 TB NAS?:

                  @strongbad said in What's the Best Way to Deduplicate & Organize Files/Folders on a 200 TB NAS?:

                  ZFS has a lot of similar stuff built in, I don't think that they want to do it two ways. It's not often that people want the extra reflinks functionality.

                  Yeah. ZFS's deduplication functionality is good...just resource intensive. I've talked to guys who build out large storage arrays using ZFS and deduplication and it gets complicated (at least from my ZFS novice point of view) if you want it to perform well.

                  ZFS was never built for performance (Sun said this directly.) It was for low cost, giant scale with good reliability and durability. So that it doesn't handle performance great while doing a feature like dedupe is not at all surprising.

                  It's also 13 years old and the granddaddy of its type of product.

                  1 Reply Last reply Reply Quote 1
                  • matteo nunziatiM
                    matteo nunziati @scottalanmiller
                    last edited by

                    @scottalanmiller said in What's the Best Way to Deduplicate & Organize Files/Folders on a 200 TB NAS?:

                    @tim_g said in What's the Best Way to Deduplicate & Organize Files/Folders on a 200 TB NAS?:

                    @scottalanmiller said in What's the Best Way to Deduplicate & Organize Files/Folders on a 200 TB NAS?:

                    @dbeato would need 256GB of RAM to attempt that with ZFS. That's a lot of RAM on a NAS.

                    How did you get 256GB of RAM needed?

                    That FreeNAS article recommends 5GB RAM per 1 TB of deduped data...
                    Considering he has 200TB of data he'd want to dedup, that's at least 1TB of RAM to start.

                    This is because dedup on ZFS/FreeNAS is much more RAM intensive than all other file systems. (and also because 200TB is a ton of data)

                    What caused it to balloon so much recently? Traditionally it has been 1GB per 1TB.

                    https://serverfault.com/questions/569354/freenas-do-i-need-1gb-per-tb-of-usable-storage-or-1gb-of-memory-per-tb-of-phys

                    Freebsd zfs page stated up to 5gb per 1tb last time I checked

                    1 Reply Last reply Reply Quote 1
                    • matteo nunziatiM
                      matteo nunziati
                      last edited by

                      The starwind dedup estimator can be a thing here?!

                      1 Reply Last reply Reply Quote 1
                      • stacksofplatesS
                        stacksofplates @scottalanmiller
                        last edited by

                        @scottalanmiller said in What's the Best Way to Deduplicate & Organize Files/Folders on a 200 TB NAS?:

                        More on reflinks.

                        It's already in Fedora. You don't need the sources any longer.

                        0_1520203479452_reflink.png

                        1 Reply Last reply Reply Quote 0
                        • stacksofplatesS
                          stacksofplates
                          last edited by

                          Here's using duperemove. It's annoyingly verbose so I can't get the output and the command in the same screenshot.

                          0_1520205045861_cp noref.png

                          I ran

                          /tmp/duperemove/duperemove -hdr --hashfile=tmp/stuff.hash /mnt
                          

                          And got

                          0_1520205120120_afterdedup.png

                          1 Reply Last reply Reply Quote 1
                          • 1
                          • 2
                          • 1 / 2
                          • First post
                            Last post