ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    How do you find duplicates from Windows SMB shares using Linux

    IT Discussion
    linux duplication reporting
    6
    15
    1.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DustinB3403D
      DustinB3403
      last edited by

      I'm just looking for a way to tally the amount of duplicate files there are on any given share, doesn't need to be anything fancy. I would ideally like it to check the hashes of the files and then post a summary to a log file.

      I'm looking at fdupes ( dnf install fdupes ) as this might do what I want, but I'm open to suggestions.

      scottalanmillerS IRJI 2 Replies Last reply Reply Quote 2
      • scottalanmillerS
        scottalanmiller @DustinB3403
        last edited by

        @DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:

        I'm looking at fdupes ( dnf install fdupes ) as this might do what I want, but I'm open to suggestions.

        I looked it up and that's what I found as likely the best option, too.

        1 Reply Last reply Reply Quote 1
        • IRJI
          IRJ @DustinB3403
          last edited by

          @DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:

          I'm just looking for a way to tally the amount of duplicate files there are on any given share, doesn't need to be anything fancy. I would ideally like it to check the hashes of the files and then post a summary to a log file.

          I'm looking at fdupes ( dnf install fdupes ) as this might do what I want, but I'm open to suggestions.

          I would assume you can just write command output to a file and that should accomplish what you want with most simplicity.

          DustinB3403D 1 Reply Last reply Reply Quote 0
          • DustinB3403D
            DustinB3403 @IRJ
            last edited by

            @IRJ Yeah the output part is really simple, fdupes seems really simple too.

            fdupes -rmsHA --sameline /target > output.log is running.

            I just wasn't sure if there was any better options out there.

            IRJI 1 Reply Last reply Reply Quote 0
            • DustinB3403D
              DustinB3403
              last edited by DustinB3403

              @DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:

              @IRJ Yeah the output part is really simple, fdupes seems really simple too.

              fdupes -rmsHA --sameline /target > output.log is running.

              I just wasn't sure if there was any better options out there.

              I just realized that the --sameline option can be replaced with -1 as in number one. The manual isn't clear about that and reading the option itself is difficult to delineate the difference.

              1 Reply Last reply Reply Quote 0
              • IRJI
                IRJ @DustinB3403
                last edited by

                @DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:

                @IRJ Yeah the output part is really simple, fdupes seems really simple too.

                fdupes -rmsHA --sameline /target > output.log is running.

                I just wasn't sure if there was any better options out there.

                you also may want to grep for certain data if the entire output is too noisy

                DustinB3403D 1 Reply Last reply Reply Quote 0
                • DustinB3403D
                  DustinB3403 @IRJ
                  last edited by

                  @IRJ said in How do you find duplicates from Windows SMB shares using Linux:

                  @DustinB3403 said in How do you find duplicates from Windows SMB shares using Linux:

                  @IRJ Yeah the output part is really simple, fdupes seems really simple too.

                  fdupes -rmsHA --sameline /target > output.log is running.

                  I just wasn't sure if there was any better options out there.

                  you also may want to grep for certain data if the entire output is too noisy

                  Normally I would filter down, but since I'm just trying to get a grasp on the amount of potential duplication that there is, filtering at this point would only skew that number.

                  P 1 Reply Last reply Reply Quote 0
                  • P
                    pattonb @DustinB3403
                    last edited by

                    @DustinB3403 some folks claim jdupes is faster, I have used both, and did not much of a difference.
                    Both work well.

                    P 1 Reply Last reply Reply Quote 1
                    • P
                      pattonb @pattonb
                      last edited by pattonb

                      @pattonb to get an idea of how many dupes use the following

                      fdupes -r -m /directory(share to scan)

                      1 Reply Last reply Reply Quote 0
                      • DashrenderD
                        Dashrender
                        last edited by

                        I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.

                        IRJI 1 Reply Last reply Reply Quote 0
                        • IRJI
                          IRJ @Dashrender
                          last edited by

                          @Dashrender said in How do you find duplicates from Windows SMB shares using Linux:

                          I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.

                          I gathered that the SMB shares are hosted on Linux, but I could be wrong.

                          If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.

                          DashrenderD 1 Reply Last reply Reply Quote 0
                          • DashrenderD
                            Dashrender @IRJ
                            last edited by Dashrender

                            @IRJ said in How do you find duplicates from Windows SMB shares using Linux:

                            @Dashrender said in How do you find duplicates from Windows SMB shares using Linux:

                            I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.

                            I gathered that the SMB shares are hosted on Linux, but I could be wrong.

                            If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.

                            The title says - Windows SMB Shares.

                            My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.

                            JaredBuschJ 1 Reply Last reply Reply Quote 0
                            • JaredBuschJ
                              JaredBusch @Dashrender
                              last edited by

                              @Dashrender said in How do you find duplicates from Windows SMB shares using Linux:

                              @IRJ said in How do you find duplicates from Windows SMB shares using Linux:

                              @Dashrender said in How do you find duplicates from Windows SMB shares using Linux:

                              I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.

                              I gathered that the SMB shares are hosted on Linux, but I could be wrong.

                              If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.

                              The title says - Windows SMB Shares.

                              My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.

                              His company is significantly Mac.

                              DashrenderD 1 Reply Last reply Reply Quote 0
                              • DashrenderD
                                Dashrender @JaredBusch
                                last edited by

                                @JaredBusch said in How do you find duplicates from Windows SMB shares using Linux:

                                @Dashrender said in How do you find duplicates from Windows SMB shares using Linux:

                                @IRJ said in How do you find duplicates from Windows SMB shares using Linux:

                                @Dashrender said in How do you find duplicates from Windows SMB shares using Linux:

                                I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.

                                I gathered that the SMB shares are hosted on Linux, but I could be wrong.

                                If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.

                                The title says - Windows SMB Shares.

                                My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.

                                His company is significantly Mac.

                                aww, that's right - he has been asking a lot of MAC questions lately.

                                DustinB3403D 1 Reply Last reply Reply Quote 0
                                • DustinB3403D
                                  DustinB3403 @Dashrender
                                  last edited by

                                  @Dashrender said in How do you find duplicates from Windows SMB shares using Linux:

                                  @JaredBusch said in How do you find duplicates from Windows SMB shares using Linux:

                                  @Dashrender said in How do you find duplicates from Windows SMB shares using Linux:

                                  @IRJ said in How do you find duplicates from Windows SMB shares using Linux:

                                  @Dashrender said in How do you find duplicates from Windows SMB shares using Linux:

                                  I wonder if this would run faster directly on the server in powershell instead? I'm assuming with doing this over SMB you have to download all files, run the hash - if ran locally, you get to skip the download time, I assume.

                                  I gathered that the SMB shares are hosted on Linux, but I could be wrong.

                                  If they are hosted on Windows like you are assuming, then I would agree that PowerShell would probably be most performant for this.

                                  The title says - Windows SMB Shares.

                                  My guess is that Dustin is a lone wolf running a 'nix OS as his machine - and the rest of the company is using Windows. Nothing wrong with that, just my guess.

                                  His company is significantly Mac.

                                  aww, that's right - he has been asking a lot of MAC questions lately.

                                  Unix questions to be more precise, but yeah we are a heavy Mac shop.

                                  1 Reply Last reply Reply Quote 0
                                  • 1 / 1
                                  • First post
                                    Last post