How to scan streaming data in multiple paired CSV files

Ed Harris

Board Regular
Joined
Dec 9, 2017
Messages
63
Office Version
  1. 2010
Platform
  1. Windows
Hello,

I am trying to scan some streaming data and compare the results for anomalies.

The data in the pairs of files corresponds by file number and the two file sets are in separate folders.

I currently copy the data from the two CSV files and paste into Excel with the text import wizard. Then run two VBA macros to highlight any anomalies in columns 2 to 4 and 13 to 15 with yellow cell fill. I then use find in excel to locate cells with the yellow fill.

This is not sustainable as I am producing a pair of files every 2 hours 24/7.

An additional prolem is that some of the files have no anomalies in them so are currently of no interest and creating an excel file to look at them is going to be a waste of data storage space.

Any highlighed cells are potentially interesting but the result I am most interested in is if there are anomalies in any two columns that are in the same row. But near rows also, in case the synchronisation of the files becomes slightly off.



Any suggestions how to streamline this process would be very much appreciated. A first step might be to fix the VBA code so that it doesnt highlight the longer rows and fail at the last row



Code to highlight anomalies in the data that are 0.004 greater or less than the cells either side.

VBA Code:
Sub TestyellowColumn1314() ' searches for cells that are higher or lower than their column neighbours and
'colours them yellow (works ok except for last row)Change "j" for relevant columns

   Dim i As Long, j As Long
   Dim lastRow As Long
   Dim InArr As Variant
   Dim DeltaLeft As Double
   Dim DeltaRight As Double
   
   Const Tolerance = 0.004
   
   lastRow = Cells(Rows.Count, "A").End(xlUp).Row
   InArr = Range(Cells(1, 1), Cells(lastRow, 80))
   
   
   For i = 4 To lastRow
      For j = 2 To 4
      'For j = 13 To 15
         DeltaLeft = Abs(InArr(i, j) - InArr(i - 1, j))
         
         If Not IsEmpty(InArr(i - 1, j)) Then
            DeltaRight = Abs(InArr(i, j) - InArr(i + 1, j))
         Else
            DeltaRight = 0
         End If
         
         If DeltaRight >= Tolerance And DeltaLeft >= Tolerance Then
         With Range(Cells(i, j), Cells(i, j)).Interior
            .Pattern = xlSolid
            .PatternColorIndex = xlAutomatic
            .Color = RGB(255, 255, 0) 'yellow
            .TintAndShade = 0
            .PatternTintAndShade = 0
         End With
         
         End If
      Next j
   Next i

End Sub





This is an example of the data but each block between may be 6000 rows.

0110016.1
0.52142171.0969972.6075773.46169112.556469.5516269.5517769.55188112.4397
0.5189571.0970572.6071673.46151112.656569.5518569.5516669.55194112.5405
0.5189571.0969772.6065873.46164112.756469.5519669.5516669.55171112.6407
0.51153671.0967772.6067273.46151112.85669.5518269.5518769.55154112.7422
0.51647971.0966572.6068973.46166112.956269.5515569.5520169.55164112.8422
45715.9545715.9545715.9545715.950.0000845715.9545715.9545715.9545715.950601.244245715.9545715.95601.1686
0.5189571.0966672.6066973.46155113.057869.5515669.5517569.55193112.9431
0.51400871.0966472.6064273.46169113.157569.551869.5515869.55158113.0424
0.52142171.0970172.6063273.46166113.258669.5574969.5506569.55287113.1432
0.52389271.0969872.6064973.46162113.359869.5530469.5529669.55114113.2435
0.51647971.0969972.6066473.46152113.459269.5513869.5527969.55188113.3435
45715.9545715.9545715.9545715.950.0001145715.9645715.9645715.9645715.960601.264845715.9545715.96602.2369
 
Please try,
VBA Code:
Sub testAnomalies()
    Dim wsSource As Worksheet
    Dim wsSummary As Worksheet
    Dim i As Long, j As Long
    Dim lastRow As Long
    Dim InArr As Variant
    Dim DeltaLeft As Double, DeltaRight As Double
    Const Tolerance As Double = 0.004
    Dim anomalyCount As Integer
    Dim summaryRow As Long
    Dim flatRow As Boolean

    Set wsSource = ActiveSheet
    lastRow = wsSource.Cells(wsSource.Rows.Count, "B").End(xlUp).Row
    InArr = wsSource.Range(wsSource.Cells(1, 1), wsSource.Cells(lastRow, 80)).Value

    On Error Resume Next
    Set wsSummary = Sheets("Anomalies")
    If wsSummary Is Nothing Then
        Set wsSummary = Sheets.Add
        wsSummary.Name = "Anomalies"
    Else
        wsSummary.Cells.Clear
    End If
    On Error GoTo 0

    summaryRow = 1

    For i = 2 To lastRow - 1
        anomalyCount = 0
        flatRow = True

        For j = 2 To 4
            If InArr(i, j) <> InArr(i, j + 1) Then flatRow = False
        Next j
        For j = 13 To 14
            If InArr(i, j) <> InArr(i, j + 1) Then flatRow = False
        Next j
        If flatRow Then GoTo SkipRow

        For j = 2 To 4
            If IsNumeric(InArr(i, j)) And IsNumeric(InArr(i - 1, j)) And IsNumeric(InArr(i + 1, j)) Then
                DeltaLeft = Abs(InArr(i, j) - InArr(i - 1, j))
                DeltaRight = Abs(InArr(i, j) - InArr(i + 1, j))
                If DeltaLeft >= Tolerance And DeltaRight >= Tolerance Then
                    anomalyCount = anomalyCount + 1
                    wsSource.Cells(i, j).Interior.Color = RGB(255, 255, 0)
                End If
            End If
        Next j

        For j = 13 To 15
            If IsNumeric(InArr(i, j)) And IsNumeric(InArr(i - 1, j)) And IsNumeric(InArr(i + 1, j)) Then
                DeltaLeft = Abs(InArr(i, j) - InArr(i - 1, j))
                DeltaRight = Abs(InArr(i, j) - InArr(i + 1, j))
                If DeltaLeft >= Tolerance And DeltaRight >= Tolerance Then
                    anomalyCount = anomalyCount + 1
                    wsSource.Cells(i, j).Interior.Color = RGB(255, 255, 0)
                End If
            End If
        Next j

        If anomalyCount >= 2 Then
            wsSource.Rows(i).Copy Destination:=wsSummary.Rows(summaryRow)
            summaryRow = summaryRow + 1
        End If

SkipRow:
    Next i
   
End Sub
 
Upvote 1
@ Sam_D_Ben,
That looks impressive, I am not quite sure how to use it. Do I have to import the csv files to a spreadsheet beforehand? I failed to mention clearly that the data example is shown how it appears already put in a worksheet from the two CSV source files. The 11 leftmost columns from one file and the four rightmost columns being from the other file.
 
Upvote 0
Yes, starting at rows 31194 and 33951.
I am wondering if I should be looking to create a list of file names and row numbers so that I can then create excel files of just the interesing csv files?
 
Upvote 0
According to the files uploaded.
When dialog appears, select File A then File B, otherwise the code will not work since the files have different formats.
Rich (BB code):
Sub test()
    Dim fn$(1), a, i&, ii&, iii&, n&, x, y(2), s$, t&, flg As Boolean, ws As Worksheet
    For i = 0 To 1
        fn(i) = Application.GetOpenFilename("TextFiles,*.txt")
        If fn(i) = "False" Then Exit Sub
    Next
    Set ws = Sheets("sheet1")   '<--- output sheet
    ws.Cells.Delete
    ReDim a(1 To Rows.Count, 1 To 1)
    For i = 0 To 1
        s = Space(FileLen(fn(i)))
        Open fn(i) For Binary As #1
            Get #1, , s
        Close #1
        x = Split(s, vbCrLf): ReDim a(1 To UBound(x) * 3, 1 To 1)
        t = IIf(i = 0, 0, 1): s = "": a(1, 1) = fn(i): n = 1
        For ii = 2 To UBound(x) - 1
            y(0) = Split(x(ii - 1), ",")
            y(1) = Split(x(ii), ",")
            y(2) = Split(x(ii + 1), ",")
            If UBound(y(2)) < 3 Then Exit For
            flg = False
            For iii = 0 To 2
                If Trim$(y(iii)(0)) Like "*:*" Then flg = True: Exit For
            Next
            If Not flg Then
                For iii = 1 To 3
                    If (Abs(Val(y(1)(iii - t)) - Val(y(0)(iii - t))) >= 0.004) * _
                        (Abs(Val(y(2)(iii - t)) - Val(y(1)(iii - t))) >= 0.004) Then
                        n = n + 4
                        a(n - 2, 1) = "Row " & ii & "," & x(ii - 1)
                        a(n - 1, 1) = "Row " & ii + 1 & "," & x(ii)
                        a(n, 1) = "Row " & ii + 2 & "," & x(ii + 1)
                        ii = ii + 2: Exit For
                    End If
                Next
            End If
        Next
        ii = ws.Evaluate(Replace("max(if(#<>"""",column(#)))", "#", ws.UsedRange.Address))
        With ws.Cells(1, ii + t + 1).Resize(n)
            .Value = a: .Cells(1) = fn(i)
            .TextToColumns .Cells(1), 1, comma:=True
            If n > 1 Then
                With .Offset(1, 2 - t).Resize(, 3)
                    .FormatConditions.Delete
                    .FormatConditions.Add 2, Formula1:= _
                    "=and(count(" & .Cells(0, 1).Resize(3).Address(0, 0) & ")=3," & _
                    "abs(" & .Cells(1).Address(0, 0) & "-" & .Cells(0, 1).Address(0, 0) & ")>=0.004" & _
                    ",abs(" & .Cells(2, 1).Address(0, 0) & "-" & .Cells(1).Address(0, 0) & ")>=0.004)"
                    .FormatConditions(.FormatConditions.Count).Interior.Color = vbYellow
                End With
            End If
        End With
    Next
End Sub
 
Upvote 1
Solution
@Fuji, Thats a great help. It will save me hours. I am often stunned by the generosity of people on this forum.
Ed
 
Upvote 0

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top