Extracting and Manipulating Text from Cells.

Living

New Member
Joined
Sep 16, 2013
Messages
12
Hi,

This could be a bit complicated, but hopefully also useful for others. We have the following sample of 10 cells from a large file with similar data:

N: 1; B: 162; M: 278, 278; T: 24; A: 103, 105; I: 393; Ḥ: 7240, 7390, 7465, 7546, 7617, 7756, 27399, 8380, 8741, 8894, 8985, 9559, 9671, 9741, 10119, 10211; W: 40; D: 766.
N: 2; B: 246, 889, 1136; M: 255, 255; A: 55; I: 286; Ḥ: 22731, 22802, 22857, 22906, 22948; D: 685.
N: 3; B: 244; M: 254; A: 49; Ḥ: 19238.
N: 4; B: 2261, 6923; M: 1733, 1824; A: 2930, 3579, 4354; Ḥ: 19167.
N: 5; Ḥ: 23683, 23811, 24404, 24609, 25483; D: 684.
N: 6; B: 888; Ḥ: 12050; D: 681.
N: 7; B: 887, 7240; M: 252; T: 22; A: 46; I: 287; Ḥ: 7294, 7364, 7457, 7794, 8928, 8941, 9264, 9308, 9612, 10240, 10318, 10487; W: 147, 148; D: 683, 1484.
N: 8; M: 253, 253; A: 51; I: 290; Ḥ: 27601, 24274, 24958, 25020, 25064, 25466.
N: 9; B: 5889, 5891, 6297; M: 257; T: 2756; A: 4198; I: 292; Ḥ: 7092, 7220, 7754, 9066, 9965; W: 1709.
N: 10; B: 5889, 5891, 6297; M: 257; T: 2756; A: 4198; I: 292; Ḥ: 7092, 7220, 7754, 9066, 9965; W: 1709.

We want to extract the same letters with its numbers and put the same letters with numbers in one column and do this for all letters, but - this is important - while keeping them in their original row. The purpose of this is addition of data and easy rearrangement of the letter plus numbers order, e.g., to: B M N A T I W D Ḥ. As you can see not all the rows have all letters, nor are the seizes the same. There are thousands of such cells. Some parts are partly in bold, we want to maintain that layout after sorting. Such a thing should be easy for computers, but how could this extraction and manipulation be done (I have Excel 2010)?

:warning: P.S. There are two spaces after every ; that were eaten up here.
 
If the cells to process are in column A, put this in column B1 then pull down and right the filling plus (or click twice):

=TRIM(MID(SUBSTITUTE($A1,";",REPT(" ",999)),FIND(INDEX({"B","M","N","A","T","I","W","D","H"},COLUMN(A$1)),SUBSTITUTE($A1&"NBMTAIHWD",";",REPT(" ",999))),999))

I had to delete your character "H" (code 63?) and enter a new "H" character (code 72).[TABLE="width: 291"]
<tbody>[TR]
[TD][/TD]
[/TR]
</tbody><colgroup><col></colgroup>[/TABLE]
[TABLE="width: 291"]
<tbody>[TR]
[TD][/TD]
[/TR]
</tbody><colgroup><col></colgroup>[/TABLE]
 
Upvote 0

Excel Facts

Highlight Duplicates
Home, Conditional Formatting, Highlight Cells, Duplicate records, OK to add pink formatting to any duplicates in selected range.
Dear István, Thank you for your reply. Your formula works (also with Ḥ), but does not preserve the bold and punctuation (; ).
I also wanted to check the order of the numbers after the letters and sort them per (letter)cell in increasing order when that's not the case, for example:

Ḥ: 7240, 7390, 7465, 7546, 7617, 7756, 27399, 8380, 8741, 8894, 8985, 9559, 9671, 9741, 10119, 10211;
to:
Ḥ: 7240, 7390, 7465, 7546, 7617, 7756, 8380, 8741, 8894, 8985, 9559, 9671, 9741, 10119, 10211, 27399;

This could be done in a separate step, as I may add data to some columns in between.

Would it be possible to preserve the (partial) bold text layout format (e.g., N: 1; ) when putting (merging) the columns back together, and how?
 
Upvote 0
Okay, I'm back with new code for you that adds the sort functionality. Delete the code I gave you earlier and replace it will all of the following...

Code:
Sub DistributeOrderedData()
  Dim X As Long, Z As Long, LastRow As Long, Position As Long, Chars As String
  Dim Order(1 To 9) As String, Parts() As String, temp() As String
  Chars = "BMNATIWD"
  LastRow = Cells(Rows.Count, "A").End(xlUp).Row
  For X = 1 To LastRow
    Erase Parts
    Parts = Split(Cells(X, "A").Value, ":")
    Erase Order
    For Z = 1 To UBound(Parts)
      Position = InStr(Chars, Right(Parts(Z - 1), 1))
      If Position = 0 Then Position = -9 * (AscW(Right(Parts(Z - 1), 1)) = 7716)
      If Position Then
        temp = Split(Trim(Split(Parts(Z), ";")(0)), ", ")
        If UBound(temp) > 0 Then
          ShellSortNumbers temp
          Order(Position) = Right(Parts(Z - 1), 1) & ": " & Join(temp, ", ")
        Else
          Order(Position) = Right(Parts(Z - 1), 1) & ": " & Trim(Split(Parts(Z), ";")(0))
        End If
        If Right(Order(Position), 1) <> "." Then Order(Position) = Order(Position) & ";  "
        Order(Position) = Replace(Order(Position), ".", ";  ")
      End If
    Next
    Columns("D").Font.Bold = True
    Parts(0) = ""
    Parts(1) = ""
    Cells(X, "B").Resize(, 9) = Order
  Next
End Sub

Sub ShellSortNumbers(vArray As Variant)
  Dim lLoop1 As Long, lHold As Long, lHValue As Long, lTemp As Long
  lHValue = LBound(vArray)
  Do
    lHValue = 3 * lHValue + 1
  Loop Until lHValue > UBound(vArray)
  Do
    lHValue = lHValue / 3
    For lLoop1 = lHValue + LBound(vArray) To UBound(vArray)
      lTemp = vArray(lLoop1)
      lHold = lLoop1
      Do While vArray(lHold - lHValue) > lTemp
        vArray(lHold) = vArray(lHold - lHValue)
        lHold = lHold - lHValue
        If lHold < lHValue Then Exit Do
      Loop
      vArray(lHold) = lTemp
    Next lLoop1
  Loop Until lHValue = LBound(vArray)
End Sub

You will still run the macro named DistributeOrderedData as you did before, the extra procedure named ShellSortNumbers (not a macro) is a subroutine that the DistributeOrderedData macro calls repeatedly in order to resort each letter's numbers (when such letter contains two or more numbers). Let me know how it worked out for you.
 
Upvote 0
Excellent, thank you very much. It works, this is truly amazing; a surprising demonstration of the to me previously unknown power of Excel.
There is a small error with the N: there are two spaces after the :, but there should be one, and the fattening should be only of the N and the number, not of the : and ; .
Do you know of a way to maintain the (partially) bold format when merging the columns back together? Else it should be possible to redo that easily in Word, I think.
 
Upvote 0
I am a little unclear about two things (see below)...


There is a small error with the N: there are two spaces after the :, but there should be one, and the fattening should be only of the N and the number, not of the : and ; .
Are you saying my code is giving you two spaces after the colon for the N column? What about after the colon in the other columns. When I run my code, I only get one space after every colon (no matter how many spaces are after the colon, whether one, two or more) which is what it sounds like you are asking for. Please clarify.



Do you know of a way to maintain the (partially) bold format when merging the columns back together? Else it should be possible to redo that easily in Word, I think.
Are you asking me to change the output in the N column so only the N and number are bold while the colon and semi-colon remain unbolded (because I can do that)? Or are you seeking advice how to get that result after "merging the columns back together" in what I assume would be a later action by you unrelated to my code? It the latter, how are you "merging the columns back together"? If the former, let me know and I'll modify my code to do that.
 
Upvote 0
As far as I can see, in the meantime Rick solved the sorting problem - this refers to your questions related with my formula only:<o:p></o:p>
<o:p></o:p>
Entering surplus spaces does not match with the formula I suggested, but there are some workarounds:<o:p></o:p>
If you attach this to the end of the former formula: &”; „ then the cells end as you need, unfortunately the later expression will appear in the blank cells, too (?). Or:<o:p></o:p>
If you change this part: ,Rept(„ „,999)to ,”;”&Rept(„ „,999) (at both places in the formula) then the cells will end with a semicolon and the needed two spaces can be inserted later if in the working area you replace ; with ;__(semicolon followed by two spaces). I wonder what the role of that end is: if you want to keep that end to be able to restore the original layout in a column after merging, you do not need to keep it, just insert this expression during merging, like this: B1&”; „&;C1… etc.<o:p></o:p>
<o:p></o:p>
When processing a formula, the original formatting is lost; if the text in column D should be bold, select that column and press Ctrl + B. Later merging will not preserve this formatting. If you plan to merge the columns, a suitable regex or VBA can reproduce the bold format where you need.
 
Last edited:
Upvote 0
Yes, strangely it gives two spaces only for the N column, the other columns are OK.

By the way, could you please separate the number sorting step to another macro, because I will add data to some columns (intermediate step) before merging them, messing up the order again.

Yes, in the end result, after merging by me. I use &, or the CONCATENATE function, but the layout is not picked up. Word 2010 has some advanced functions that will enable redoing it, although copy-pasting large files is not ideal.
 
Upvote 0
Yes, strangely it gives two spaces only for the N column, the other columns are OK.
It is not doing that for me. Perhaps one of those two spaces is not a "real" space. I am thinking it may be a non-breaking space (ASCII code 160). Try this experiment to see. Select the column with the N in it and then press CTRL+H to bring up the replace dialog box. Click the "Options>>" button and make sure no checkboxes are checked, the click into "Find what" field (clearing any text that might already be there) and hold down the ALT key while you type 0160 on the Number Pad only, leave the "Replace with" field empty and click the OK button. Did the extra space disappear? If so, I can build a filter for it into the code.


By the way, could you please separate the number sorting step to another macro, because I will add data to some columns (intermediate step) before merging them, messing up the order again.
That will be very difficult and, I think maybe, time consuming (for me to write, not the code execution time) to do... the code is set up to process each letter's number entries individually as it comes to them while looping the rows (this is done by splitting the row apart, row-by-row on the fly). To do what you want would required processing every cell individually... I'll try to look at this later to see if it really is as much of a problem as it seem right now to my mind's eye.


Yes, in the end result, after merging by me. I use &, or the CONCATENATE function, but the layout is not picked up. Word 2010 has some advanced functions that will enable redoing it, although copy-pasting large files is not ideal.
You cannot display text created by a formula with multiple character formatting like you can do with a text string constant... you will need code to create the concatenated text as a text string and then have that code do the individual characters formatting.
 
Upvote 0
You are right, it was not a normal space (also looked smaller); they have disappeared.

OK, no problem, I'll just merge the columns after adding more data and run the macro again, as Excel has problems with copying the exact format anyway; you could leave the sorting macro if it requires reprogramming.
Then I'll use Word's advanced replacing features afterwards to easily redo the bold formatting.

This is a separate issue, but would you know how to remove duplicate numbers from a (letter)cell/column (not in this particular example, but in a similar (letter)column)?
 
Upvote 0
Er, sorry, it seems my last question (about duplicates) can be ignored, but would it instead be possible to retain the bold (the first number) in the sorted list of the following example? Or perhaps to redo the bold format within the cell afterwards by some kind of comparison with another column for example? So:

B: 7001, 2788, 2799, 2877, 2894, 6282;
B: 7002, 2789, 2800, 2878, 2895, 2924, 6283;
B: 7003, 1243, 2687, 3929, 7004, 7018;
B: 7004, 1243, 2687, 3929, 7003, 7018;
B: 7005, 3292, 5747, 6984, 6986, 6995, 6996, 7044;
B: 7006, 82, 3681, 7007, 7027, 7032;
B: 7007, 82, 3681, 7006, 7027, 7032;
B: 7008, 23, 3691, 7009;
B: 7009, 23, 3691, 7008;
B: 7010, 3813, 7014;

to:

B: 2788, 2799, 2877, 2894, 6282, 7001;
B: 2789, 2800, 2878, 2895, 2924, 6283, 7002;
B: 1243, 2687, 3929, 7003, 7004, 7018;
B: 1243, 2687, 3929, 7003, 7004, 7018;
B: 3292, 5747, 6984, 6986, 6995, 6996, 7005, 7044;
B: 82, 3681, 7006, 7007, 7027, 7032;
B: 82, 3681, 7006, 7007, 7027, 7032;
B: 23, 3691, 7008, 7009;
B: 23, 3691, 7008, 7009;
B: 3813, 7010, 7014;
 
Upvote 0

Forum statistics

Threads
1,223,164
Messages
6,170,444
Members
452,326
Latest member
johnshaji

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top