Group files based on similar names within various sub folders

zelarra

Board Regular
Joined
Jan 2, 2021
Messages
70
Office Version
  1. 365
Platform
  1. Windows
Hi guys.

I'm looking for a way to improve a code I made to classify files with similar names into folders.

The code I'm commenting on is as follows:

VBA Code:
Option Explicit

Public Sub IniciarClasificar()

    Clasificar

    Borrar
    
    MsgBox "Se ha completado correctamente la tarea de clasificar los archivos en carpetas.", vbInformation, "Eventos masivos"
    
    Application.GoTo Hoja1.Range("A1"), True

End Sub

Private Sub Clasificar()

    Dim Celda               As Range
    Dim NombreArchivo       As String
    Dim NombreCarpeta       As String
    Dim fso                 As Object
    Dim RutaOrigen          As String
    Dim RutaDestino         As String
    
    On Error Resume Next
    
    For Each Celda In Hoja1.Range("A1").CurrentRegion.Cells

        RutaOrigen = Celda.Value
        
        NombreArchivo = Right(Celda.Value, Len(Celda.Value) - InStrRev(Celda.Value, "\")) 'Nombre del archivo
        
        If Len(NombreArchivo) - Len(Replace(NombreArchivo, "_", "")) = 0 Then
        
            NombreCarpeta = "Varios"
              
        Else
        
            If Left(NombreArchivo, 1) = "_" Then
            
                NombreCarpeta = Split(NombreArchivo, "_")(1)

            ElseIf Left(NombreArchivo, 3) = "IMG" Or Left(NombreArchivo, 3) = "VID" Then
                
                NombreCarpeta = Split(NombreArchivo, "_")(1)

            ElseIf Left(NombreArchivo, 1) <> "_" Then
            
                NombreCarpeta = Split(NombreArchivo, "_")(0)
                                
            End If
            
        End If

        RutaDestino = Replace(RutaOrigen, NombreArchivo, "") & NombreCarpeta
    
        If Dir(RutaDestino, vbDirectory) = "" Then
    
            Set fso = CreateObject("Scripting.FileSystemObject")

            fso.CreateFolder RutaDestino

            Set fso = Nothing
    
        End If
    
        RutaDestino = RutaDestino & "\" & NombreArchivo
        
        FileCopy RutaOrigen, RutaDestino
        Kill RutaOrigen

    Next Celda
               
End Sub

Private Sub Borrar()
    
    Hoja1.Range("A:A").ClearContents
    
End Sub

Broadly speaking, I distinguish four types:

- If it is a video (because VID is in the name)

- If it is an image (because IMG is in the name)

- If it has "_".

- If it is none of the above, it is put in a junk drawer.

My problem comes with "_" and, by extension, with the others (if you notice, VID and IMG also have "_").

I would like to get it to organize the files by similar names, without having to take any reference (VID, IMG or _).

Here I copy a list of file names for you to use as an example.

accionpoeticafrases_20240126_p_3288885058160057668_1_3288885058160057668.jpg
accionpoeticafrases_20240129_p_3291197434721271638_1_3291197434721271638.jpg
adelapordiosxd_20231101_p_3226479731074067045_1_3226479720361666220.jpg
adictosdelhumor_20240309_p_3319580724477181957_1_3319580724477181957.jpg
buitrago.monologos_20240430_p_3357800092432231595_1_3357800092432231595.jpg
club_deletras_20231001_p_3203563422477877505_1_3203563422477877505.jpg
club_deletras_20231006_p_3207871459997585273_1_3207871459997585273.jpg
club_deletras_20231014_p_3213024703165492340_1_3213024703165492340.jpg
IMG_20240629_134052.jpg
PANO_20240629_134109.jpg
PANO_20240629_134117.jpg
VID_20220629_105434.mp4
VID_20230629_134058.mp4
VID_20240629_105434.mp4
VID_20240329_134058.mp4
VID_20240629_134058.mp4

Thank you very much.
 
Hello.

I'll send you the Excel again.

In column B, the result appears with the code you suggested.

In column C, you have the value that should be displayed and that your code doesn't show.

Note that with the code you suggested, what I'm getting is the name of the folder where I will save those files. Thus, you will see that in many files the word "Varios" (Miscellaneous) is repeated, which is a folder where the files do not have a common pattern, so to speak.

 
Upvote 0

Excel Facts

Which lookup functions find a value equal or greater than the lookup value?
MATCH uses -1 to find larger value (lookup table must be sorted ZA). XLOOKUP uses 1 to find values greater and does not need to be sorted.
Few lines like row 50

10kveces_20230927_p_3201265874214447789_1_3201265874214447789.jpg

I need an explanation why is should be "Varios".
Code:
Sub test()
    Dim r As Range
    With CreateObject("VBScript.RegExp")
        For Each r In Range("a1", Range("a" & Rows.Count).End(xlUp))
            .Pattern = "^\d+?_.*$"
            If .test(r) Then
                r(, 2) = "Varios"
            Else
                .Pattern = "^([a-z]+)(_[a-z]+)+\d*\..*"
                If .test(r) Then
                    r(, 2) = .Replace(r, "$1")
                Else
                    .Pattern = ".*\-(\d{4})(\d{2})(\d{2})\-.*"
                    If .test(r) Then
                        r(, 2) = .Replace(r, "$1 $2 $3")
                    Else
                        .Pattern = "(([A-Za-z\d].+?)_+\d{8,}[_.].*|[A-Z].*?_(\d{4})(\d{2})(\d{2})\d*_?.*)$"
                        If .test(r) Then
                            r(, 2) = Trim$(.Replace(r, "$2$3 $4 $5"))
                        Else
                            r(, 2) = "Varios"
                        End If
                    End If
                End If
            End If
        Next
    End With
End Sub
 
Upvote 1
Thank you very much.

You almost have it. I'm attaching the Excel and I'll tell you the four details that are missing:

1. On the one hand, I don't know why it doesn't take f.elconfidencial.com_original_.

2. The name IMG_20240309_WA0006.jpg should take 2024 03 09.

3. Both the file Screenshot_2023-09-07-20-23-16-154_com.google.android.youtube.jpg and Screenshot_2024-03-18-12-35-18-690_com.instagram.android.jpg should put the date. The crux here is that after Screenshot there is _ and not -.

4. Finally, there are two files with strange characters: chousa_53🖕_1719245389225.mp4 and La Vanguardia 🗞_1717222768119.mp4. You should ignore and remove those strange characters.

Again: thank you very much.

 
Upvote 0
Try this one
Code:
Sub test()
    Dim a, e, i&, x, flg As Boolean
    x = Array(Array("^\d+?_.*$", "Varios"), Array("^([a-z]+)(_[a-z]+)+\d*\..*", "$1"), _
          Array(".*[_-](\d{4})[_-](\d{2})[_-](\d{2})[_-].*", "$1 $2 $3"), _
          Array("^[A-Z]+[_-].*?_?(\d{4})(\d{2})(\d{2}).*", "$1 $2 $3"), _
          Array("^[a-z\d]+?(_\d{1,2}\D*)+\.[a-z]{3,5}$", "Varios"), _
          Array("(([A-Za-z\d].+?)_+\d{8,}[_.].*|[A-Z].*?_(\d{4})(\d{2})(\d{2})\d*_?.*)$", "$2$3 $4 $5"), _
          Array("^(.+?)_\d.*", "$1"))
    With Range("a1", Range("a" & Rows.Count).End(xlUp))
        a = .Value2
        With CreateObject("VBScript.RegExp")
            .Global = True
            For i = 1 To UBound(a, 1)
                .Pattern = "[^\u0000-\u0fff]"
                a(i, 1) = .Replace(a(i, 1), "")
                For Each e In x
                    .Pattern = e(0)
                    If .test(a(i, 1)) Then
                        If e(1) Like "*$*" Then
                            a(i, 1) = Trim$(.Replace(a(i, 1), e(1)))
                        Else
                            a(i, 1) = e(1)
                        End If
                        flg = True: Exit For
                    End If
                Next
                If Not flg Then a(i, 1) = "Varios"
                flg = False
            Next
        End With
        .Columns(2) = a
    End With
End Sub
 
Upvote 1
Solution
Hi, @Fuji

If I move the column containing the name to be extracted from the folder, for example, from column A to column C, do I need to modify anything in the code?

Thank you very much.
 
Upvote 0
Rich (BB code):
    With Range("a1", Range("a" & Rows.Count).End(xlUp))
This is the column in which the raw data resides. i.e. Col.A (base column)
Rich (BB code):
        .Columns(2) = a
This is the output column, i.e Col.B.
If you change it to 1, it will overwrite the base column as 1 is the column itself.
As you increase/decrease it, it moves to left/right.
e.g
3 = 3rd column from base column, means Col.C.
4 =4th column from base column, mease Col.D
If the base column is E;
0 = 1 left from base column, means Col.D
-1 = Col.C
etc.
HTH.
 
Upvote 0
I get an error if there is only one register:

1720241051619.png


1720241062474.png


1720241074333.png
 
Upvote 0
When the data is in only one cell, variable a is not an array, so change
Rich (BB code):
    With Range("c1", Range("c" & Rows.Count).End(xlUp))
        a = .Value2
to
Rich (BB code):
    With Range("c1", Range("c" & Rows.Count).End(xlUp))
        a = .Resize(, 2).Value2
So that variable a is always an array.
 
Upvote 0

Forum statistics

Threads
1,221,283
Messages
6,159,018
Members
451,531
Latest member
Ian Denney

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top