Search, concatenate, move, then delete

dguenther · Jun 18, 2011

Hi,

I am trouble doing the following in a macro:

OUTCOME: The court reversed the judgment of the district court. The court
remanded the question of the amount of costs to the district court for a further
remand to the arbitrators to make an appropriate award.
JUDGES:
*** [...***1] Joseph F. Baca, Justice. Dan Sosa, Jr., Chief Justice, Richard E.
Ransom, Justice, Kenneth B. Wilson, Justice, concur. Seth D. Montgomery,
Justice, Concurring in part, Dissenting in part. ...
2 of 27 DOCUMENTS
State v. Ybarra
No. 18,506
SUPREME COURT OF NEW MEXICO
111 N.M. 234; 804 P.2d 1053; 1990 N.M. LEXIS 415
November 28, 1990, Filed

I need excel to:

(1) search for judges in the first row
(2) concatenate everything in the row(s) after judges and before the next line that starts with a number.
(3) move concatenated rows to G1
(4) delete "judges" and everything before it EXCEPT row 1

Thanks if you can help.

Rick Rothstein · Jun 18, 2011

Does this macro do what you want (obviously run it against a copy of your file, not the original file)?

Code:

Sub ParseCases()
  Dim LastRow As Long, JUDGES As Long, DOCs As Long, NextDOCs As Long, Cell As Range
  LastRow = Cells(Rows.Count, "A").End(xlUp).Row
  Cells(LastRow + 1, "A").Value = "Ending DOCUMENTS"
  DOCs = 1
  Do
    Cells(DOCs, "A").Resize(, 5) = WorksheetFunction.Transpose(Cells(DOCs + 1, "A").Resize(5))
    JUDGES = Cells(DOCs, "A").Resize(LastRow).Find("JUDGES:", LookAt:=xlWhole, SearchDirection:=xlNext, MatchCase:=True).Row
    NextDOCs = Cells(DOCs, "A").Resize(LastRow).Find(" DOCUMENTS", LookAt:=xlPart, SearchDirection:=xlNext, MatchCase:=True).Row
    Cells(DOCs, "F").Value = Join(WorksheetFunction.Transpose(Range(Cells(JUDGES + 1, "A"), Cells(NextDOCs - 1, "A"))))
    If NextDOCs > LastRow Then Exit Do
    DOCs = NextDOCs
  Loop
  Range(Cells(2, "B"), Cells(LastRow, "B")).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
  LastRow = Cells(Rows.Count, "A").End(xlUp).Row
  Cells(LastRow, "A").Clear
  For Each Cell In Range("A1:G" & (LastRow - 1))
    Cell.Value = Trim(Cell.Value)
  Next
End Sub

hiker95 · Jun 18, 2011

Rick Rothstein,

Nicely done. Very nice approach. This is one for the archives.

The following will depend on Danny's requirements/results/expected outcome:

Code:

  Cells(DOCs, "A").Resize(, 5) = WorksheetFunction.Transpose(Cells(DOCs + 1, "A").Resize(5))

The above line of code assumes that all DOCs are five lines, when in reality, most are 5, but some are 6 or 7 lines, like the following screenshot:

Excel Workbook

A

20

Stinbrink v. Farmers Ins. Co.

21

No. 18253

22

SUPREME COURT OF NEW MEXICO

23

111 N.M. 179; 803 P.2d 664; 1990 N.M. LEXIS 412

24

November 28, 1990, Filed

25

26

K.L. Conwell Corp. v. Albuquerque

27

No. 18984

28

SUPREME COURT OF NEW MEXICO

29

111 N.M. 125; 802 P.2d 634; 1990 N.M. LEXIS 394; 29 N.M. St.

30

B. Bull. 1108

31

November 26, 1990, Decided

32

November 26, 1990, Filed

33

34

Grantland v. Lea Regional Hosp.

35

No. 19230

36

SUPREME COURT OF NEW MEXICO

37

110 N.M. 378; 796 P.2d 599; 1990 N.M. LEXIS 273

38

August 23, 1990, Decided

39

August 20, 1990, Filed

Sheet1

Rick Rothstein · Jun 18, 2011

Thank you for your nice comment.

I completely missed that there were longer sections (a usual problem when only one example output is given by the OP... we, well at least I, always assume the rest of the file is identically structured). In this case, Danny will have to tell us how to handle the varying length data areas. I assumed he wanted aligned columns, but that would mean additional (and I'm thinking possibly very messy) parsing. For example, the 4 of 27 and 6 of 27 documents each have two dates in a row in their data sections... do they get concatenated and placed in Column E (likely) or spread out in the parsed row resulting in a staggered alignment of the information (doubtful)? And what about the 3 of 27 document... how is the "leaked over" names handled?

Danny... you need to clarify this for us.

dguenther · Jun 19, 2011

Rick Rothstein,

You are fantastic. No, that's an understatement- you are truly awesome! Thanks so much!!!

It worked like a dream for the one sheet!!
-

But, perhaps like the problem hiker95 was referring to, it ran into an error in what I thought was a similar sheet.

See, the utility of this code will be that it will help me run these parsers on hundreds of .xlsm files. Which in turn will have anywhere from 10 to 100 cases in them. So, we are talking about 1,000s of cases (and probably weeks of time you are saving me!!!!!) Almost all of them should be the same.

So, I pulled up another "testable" file. It's called il cases.xlsm
I ran the macro and it ran into an error.

This is the link to the sheet called ilcases.xlsm:
https://my.syncplicity.com/share/1i2hdihrzl/il_cases.XLSM

Pics of the error message:

-
Now as to your question about cases with irregular structure, the answer is what ever can most consistently be applied. Actually what is easiest. The only things that absolutely HAVE TO be included in full is the docket number which is always after the case name, and all rows of "judges" which always end before the next occurrence of X of XX documents.

Everything related to one case must be in 1 row.

The first cell must be the case name, the second cell the docket number, etc.
For the possible weird fields like "date" and the "citation" that could spill over....that's tough...for any field that is not the case name, the docket number, or the listing of the judges, it can just be the first line OR both lines concatenated into the right cell.

I hope that helps a little.
Again thanks for everything.
-Danny

Rick Rothstein · Jun 19, 2011

Well, thank you for your nice comments, that was nice of you to post them. Give me a little time... I'm working on modification to my code which will handle multipli-lined case names, citations, dates and judges. I'll handle them by concatenating the multiple lines into a single line. I'm using a semi-colon as the delimiter for the dates and a single space for all the other delimiter (hopefully, that will be okay). Oh, and I am pretty sure I know why the error was generated in that other file, so I'll fix that in the process.

dguenther · Jun 19, 2011

Rick,

You are amazing. I've PM'ed you.

Sincerely,
Danny

Rick Rothstein · Jun 19, 2011

Okay, I think I finally got all the kinks worked out. There were a few anomolies in your file structure that took some work to figure out. First, there were several non-breaking spaces (ASCII 160) in your file which got in the way of my attempts to trim the leading/trailing spaces out of the rows. Second, there were some citation that looked like dates to VB's IsDate function (Dec. 176 for example) and that screwed things up royally for awhile. Then the one that gave me the biggest problem to work around was that not all your docket numbers started with "No."... some started with "Nos.". Once I spotted this anomolies in your structure, I was able to figure out a work-around to handle them. I think the following code works correctly, but you should check it against several known good conversions just to make sure.

Code:

Sub ParseCases()
  Dim X As Long, LastRow As Long, NO As Long, NOS As Long, FIRSTDATE As Long, LASTDATE As Long, SUMMARY As Long, JUDGES As Long, DOCs As Long, NextDOCs As Long, Cell As Range
  '  Get last row of file
  LastRow = Cells(Rows.Count, "A").End(xlUp).Row
  '  I noted some non-breaking spaces in one of the files (ASCII-160), so let's convert them to real spaces
  Range("A1:A" & LastRow).Replace Chr$(160), " ", xlPart
  '  Get rid of leading/trailing spaces
  For Each Cell In Range("A1:A" & LastRow)
    If Left(Cell.Value, 1) = " " Then Cell.Value = Trim(Cell.Value)
  Next
  '  Put an ending marker for the Do..Loop to find (the Nos. marker is needed in order for the docket numbers check code to be able to work correctly)
  Cells(LastRow + 1, "A").Value = "Nos. Ending DOCUMENTS"
  '  Combine multiple "to be saved" lines into single lines
  DOCs = 1
  Do
    '  Get docket number (there are two ways that cell can start.... No. or Nos.)
    NO = Cells(DOCs, "A").Resize(LastRow).Find("No. *", LookAt:=xlWhole, SearchDirection:=xlNext, MatchCase:=True).Row
    On Error Resume Next
    NOS = Cells(DOCs, "A").Resize(LastRow).Find("Nos. *", LookAt:=xlWhole, SearchDirection:=xlNext, MatchCase:=True).Row
    On Error GoTo 0
    '  Use the minimum one
    If NOS < NO Then NO = NOS
    '  Get JUDGES row number
    JUDGES = Cells(DOCs, "A").Resize(LastRow).Find("JUDGES:", LookAt:=xlWhole, SearchDirection:=xlNext, MatchCase:=True).Row
    '  Get first date line
    For X = NO + 3 To JUDGES
      If IsDate(Left(Cells(X, "A").Value, InStr(Cells(X, "A").Value, ",")) & " 2000") Then
        FIRSTDATE = X
        Exit For
      End If
    Next
    '  Get SUMMARY row number
    SUMMARY = Cells(DOCs, "A").Resize(LastRow).Find("CASE SUMMARY:", LookAt:=xlWhole, SearchDirection:=xlNext, MatchCase:=True).Row
    '  Get last date line
    For LASTDATE = SUMMARY To NO Step -1
      If IsDate(Left(Cells(LASTDATE, "A").Value, InStr(Cells(LASTDATE, "A").Value, ",")) & " 2000") Then Exit For
    Next
    '  Combine multiple date lines
    If LASTDATE > FIRSTDATE Then
      Cells(FIRSTDATE, "A").Value = Join(WorksheetFunction.Transpose(Range(Cells(FIRSTDATE, "A"), Cells(LASTDATE, "A"))), "; ")
      Range("A" & (FIRSTDATE + 1) & ":A" & LASTDATE).Clear
    End If
    '  Combine multiple lines for the "case names"
    If NO - DOCs > 2 Then
      Cells(DOCs + 1, "A").Value = Join(WorksheetFunction.Transpose(Range(Cells(DOCs + 1, "A"), Cells(NO - 1, "A"))))
      Range("A" & (DOCs + 2) & ":A" & (NO - 1)).Clear
    End If
    '  Combine multiple lines for the "citations"
    If FIRSTDATE - NO > 3 Then
      Cells(NO + 2, "A").Value = Join(WorksheetFunction.Transpose(Range(Cells(NO + 2, "A"), Cells(FIRSTDATE - 1, "A"))))
      Range("A" & (NO + 3) & ":A" & (FIRSTDATE - 1)).Clear
    End If
    '  Get next DOCUMENT row number
    NextDOCs = Cells(DOCs, "A").Resize(LastRow).Find(" DOCUMENTS", LookAt:=xlPart, SearchDirection:=xlNext, MatchCase:=True).Row
    '  Combine multiple lines for the JUDGES
    If NextDOCs - JUDGES > 2 Then
      Cells(JUDGES + 1, "A").Value = Join(WorksheetFunction.Transpose(Range(Cells(JUDGES + 1, "A"), Cells(NextDOCs - 1, "A"))))
      Range("A" & (JUDGES + 2) & ":A" & (NextDOCs - 1)).Clear
    End If
    '  Check to see if we are at the end of file
    NextDOCs = Cells(DOCs, "A").Resize(LastRow).Find(" DOCUMENTS", LookAt:=xlPart, SearchDirection:=xlNext, MatchCase:=True).Row
    If NextDOCs > LastRow Then Exit Do
    '  Update the DOCs variable and run next loop
    DOCs = NextDOCs
  Loop
  Columns("A").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
  '  The file has been normalized as to its structure, so let's run the code I originally posted to parse it
  '  ========================================================================================================
  '  Get updated LastRow (remember, we might have deleted some rows earlier)... the minus one excludes the "Ending DOCUMENTS" flag we placed after the last row
  LastRow = Cells(Rows.Count, "A").End(xlUp).Row - 1
  '  Now that each DOCUMENTS page is structured the same, lets move the data into the correct columns
  DOCs = 1
  Do
    '  Fill Columns A thru E with data
    Cells(DOCs, "A").Resize(, 5) = WorksheetFunction.Transpose(Cells(DOCs + 1, "A").Resize(5))
    '  Fill Column F with the judges
    JUDGES = Cells(DOCs, "A").Resize(LastRow).Find("JUDGES:", LookAt:=xlWhole, SearchDirection:=xlNext, MatchCase:=True).Row
    Cells(DOCs, "F").Value = Cells(JUDGES + 1, "A").Value
    '  Get the next DOCUMENTS cells
    NextDOCs = Cells(DOCs, "A").Resize(LastRow).Find(" DOCUMENTS", LookAt:=xlPart, SearchDirection:=xlNext, MatchCase:=True).Row
    '  Exit the loop if it is greater than the last row (if it is, then it is the "Nos. Ending DOCUMENTS" text we put in the blank cell under the last data row to mark the end of the file.
    If NextDOCs > LastRow Then Exit Do
    '  If we didn't exit the loop, then update the DOCs variable for the next loop
    DOCs = NextDOCs
  Loop
  '  The only thing that will be in Column B is text we placed there as part of the parsing process; all other cells in the column are blanks and we use SpecialCells to locate them in order to delete the rows we already parsed
  Range(Cells(2, "B"), Cells(LastRow, "B")).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
  '  Find out the current last row (now that all the deletions are done
  LastRow = Cells(Rows.Count, "A").End(xlUp).Row
  '  And delete the end of file flag we put there
  Cells(LastRow, "A").Clear
End Sub

NOTE: The code looks longer, well, it actually is longer, but it looks longer because I added comments to the code to help explain how it works. I hope that help you out in case you have to account for any other anomolies in the file structure you may encounter in the future.

dguenther · Jun 19, 2011

Rick,

First of all, you deserve some kind of Mr. Excel of the Year award. Second, thank you! Third, I will try this code out on a couple of samples to see how it works.

Thanks again,

Danny

Rick Rothstein · Jun 19, 2011

dguenther said:
First of all, you deserve some kind of Mr. Excel of the Year award.

Aww.... shucks.... nah... I, uh, don't really think so.

dguenther said:
Second, thank you!

You are quite welcome.

dguenther said:
Third, I will try this code out on a couple of samples to see how it works.

Please, let me (and the rest of the people following this thread) know how it works out.

dguenther · Jun 19, 2011

Rick,

The code had some amazing successes, but also a couple of failures.

First, sometimes a state would list the docket as Record No. or Record Nos. I took care of that by eliminating all instances of "record" in the document.

Then there were some critical failures where the code stopped working entirely. In almost every case it happened, the debugger went to this line of your code: (specifically the second line starting with "Cells"

Code:

 If LASTDATE > FIRSTDATE Then
      Cells(FIRSTDATE, "A").Value = Join(WorksheetFunction.Transpose(Range(Cells(FIRSTDATE, "A"), Cells(LASTDATE, "A"))), "; ")
      Range("A" & (FIRSTDATE + 1) & ":A" & LASTDATE).Clear

These documents all failed to run because of that line of code:

https://my.syncplicity.com/share/u02osutza0/missourifailatdateline.XLSM
https://my.syncplicity.com/share/8ktsblgixm/washingotn_fail_at_date_line.XLSM

---
These two examples worked like a dream with only one or two final rows being slightly messed up. But in every case where the code did run, the judges were always PERFECTLY where they should be which is really the heart of this project.

https://my.syncplicity.com/share/igrfklrj1u/wi_98percentsuccess.XLSM
https://my.syncplicity.com/share/yaimihrhau/ID_99percentawesome.XLSM

--

So, I don't know if you want to keep on this, or if you are ready to pass this on to someone else. You have done soooo much, and I would completely understand if you have other things to do.

But, if you or any other forum member wants to keep going, there is one development I think is worth mentioning.

When I downloaded the files originally, I used an option to eliminate all of the blank lines.

So, files could look like
https://my.syncplicity.com/share/ecku7q1h5w/al_state_cases,_combined2011-06-19_13-46.XLSM

or

https://my.syncplicity.com/share/k8cwxqbcqi/in_state_cases,_combined2011-06-19_13-39.XLSM

with blank lines acting as sort of delimiters.

-Danny

Search, concatenate, move, then delete

dguenther

Board Regular

Rick Rothstein

MrExcel MVP

Excel Facts

hiker95

Well-known Member

Rick Rothstein

MrExcel MVP

dguenther

Board Regular

Rick Rothstein

MrExcel MVP

dguenther

Board Regular

Rick Rothstein

MrExcel MVP

dguenther

Board Regular

Rick Rothstein

MrExcel MVP

dguenther

Board Regular

Similar threads

Forum statistics

Share this page

Search, concatenate, move, then delete

Board Regular

MrExcel MVP

Excel Facts

Well-known Member

MrExcel MVP

Board Regular

MrExcel MVP

Board Regular

MrExcel MVP

Board Regular

MrExcel MVP

Board Regular

Similar threads

Forum statistics

Share this page

We've detected that you are using an adblocker.

Which adblocker are you using?

Disable AdBlock

Disable AdBlock Plus

Disable uBlock Origin

Disable uBlock