Removing weird/garbage text from Excel

desibouy

Board Regular
Joined
Nov 20, 2014
Messages
98
Office Version
  1. 365
Platform
  1. Windows
Hi,
So before posting I have tried searching online and searching here, whilst I did get it resolved I've found another issue.

So my Excel files have these –, ¿½, â„¢ & ÔÇô etc. So as I said checking online it's to do with encoding, I re-saved the file as ANSI in notepad which got rid of these weird characters.
Now I'm getting "???", sometimes with a space in front " ???" or "??" Now I don't know how to get rid of these.

So my question is

1. Is there any way to quick fix these? I'll stick to the weird symbols rather than question marks. I have over 5000 Product Description. Can't tell how many I need to fix here and what to use i.e. Quote Marks, Single Apostrophe, Commas, Hyphens or something else.

2. How can I prevent this from happening? I work with a LOT of Products and I had this a lot but I've just fixed when I came across it. Same thing with UPC Codes where they auto change to like "759393994672" to "750000000002".

Any help appreciated.

Thanks
 

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.
I think I can help with request 1.

This algorithm uses Excel 365's dynamic array functions that 'spill' and uses TEXTJOIN, which I hope you have received in your most recent update. If you haven't, I don't know how else to do this.

It fragments the cell in question into each character, ascertains the UNICODE for each character, checks each against a list of acceptable characters, and re-constructs the components into a single cell that now lacks the unacceptable characters.

I only show here a portion of the sheet. The formulas in Rows 6 through 9 spill all the way over to Column BF. And the formulas in D18 and E18 spill all the way down to Row 116. You can edit the list of acceptable characters to include or exclude whatever range you like, but this one (i.e., between character code 28 and 126) is the common set.

The single-cell formula in C12 does it all, the others are there to show the logic trail.

MrExcel posts19.xlsx
BCDEFGHI
4topic cell:So my Excel files have these –, ¿½, â„¢ & ÔÇô etc. So as
5
6cell fragmentedSo my
7unicode for each fragment831113210912132
8is it acceptable?111111
9cleanedSo my
10recomposed cleaned cell:So my Excel files have these , , & etc. So as
11
12single-cell; recomposed cleaned cell:So my Excel files have these , , & etc. So as
13
1428
15126
16acceptable characters
17unicodeunichar
1828 
1929
2030
Sheet23
Cell Formulas
RangeFormula
D6:BI6D6=MID(C4,TRANSPOSE(SEQUENCE(LEN(C4))),1)
D7:BI7D7=UNICODE(D6#)
D8:BI8D8=(D7#>=D14)*(D7#<=D15)
D9:BI9D9=IF(D8#,D6#,"")
C10C10=TEXTJOIN("",0,D9#)
C12C12=TEXTJOIN("",0,IF((UNICODE(MID(C4,TRANSPOSE(SEQUENCE(LEN(C4))),1))>=D14)*(UNICODE(MID(C4,TRANSPOSE(SEQUENCE(LEN(C4))),1))<=D15),MID(C4,TRANSPOSE(SEQUENCE(LEN(C4))),1),""))
D18:D116D18=SEQUENCE(D15-D14+1,,D14)
E18:E116E18=UNICHAR(D18#)
Dynamic array formulas.
 
Last edited:
Upvote 0
Could you use something like this?

VBA Code:
Sub Remove_Unwanted_Characters()
  Dim RX As Object
  Dim a As Variant
  Dim i As Long
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = "[^A-Za-z0-9 ~`!@#$%^&*()_+|\-=\/\\\{\}\[\]:"";'<>,\.?]"
  With Range("C4", Range("C" & Rows.Count).End(xlUp))
    a = .Value
    For i = 1 To UBound(a)
      a(i, 1) = RX.Replace(a(i, 1), "")
    Next i
    .Offset(, 1).Value = a
  End With
End Sub

My sample data in column C with code results in column D

desibouy.xlsm
CD
41. So my Excel files have these –, ¿½, â„¢ & ÔÇô etc.1. So my Excel files have these , , & etc.
5
62. Any (very) unusual_characters here?2. Any (very) unusual_characters here?
Sheet1
 
Upvote 0
If the characters suggested by @DRSteele are the acceptable ones (code 28 to 126) then another formula approach might be this.

desibouy.xlsm
CD
41. So my Excel files have these –, ¿½, â„¢ & ÔÇô etc.1. So my Excel files have these , , & etc.
5 
62. Any (very) unusual_characters here?2. Any (very) unusual_characters here?
Sheet3
Cell Formulas
RangeFormula
D4:D6D4=LET(t,MID(C4,SEQUENCE(LEN(C4)),1),IF(C4="","",CONCAT(IF(ABS(77-CODE(t))<50,t,""))))
 
Upvote 0
I think I can help with request 1.

This algorithm uses Excel 365's dynamic array functions that 'spill' and uses TEXTJOIN, which I hope you have received in your most recent update. If you haven't, I don't know how else to do this.

It fragments the cell in question into each character, ascertains the UNICODE for each character, checks each against a list of acceptable characters, and re-constructs the components into a single cell that now lacks the unacceptable characters.

I only show here a portion of the sheet. The formulas in Rows 6 through 9 spill all the way over to Column BF. And the formulas in D18 and E18 spill all the way down to Row 116. You can edit the list of acceptable characters to include or exclude whatever range you like, but this one (i.e., between character code 28 and 126) is the common set.

The single-cell formula in C12 does it all, the others are there to show the logic trail.

MrExcel posts19.xlsx
BCDEFGHI
4topic cell:So my Excel files have these –, ¿½, â„¢ & ÔÇô etc. So as
5
6cell fragmentedSo my
7unicode for each fragment831113210912132
8is it acceptable?111111
9cleanedSo my
10recomposed cleaned cell:So my Excel files have these , , & etc. So as
11
12single-cell; recomposed cleaned cell:So my Excel files have these , , & etc. So as
13
1428
15126
16acceptable characters
17unicodeunichar
1828 
1929
2030
Sheet23
Cell Formulas
RangeFormula
D6:BI6D6=MID(C4,TRANSPOSE(SEQUENCE(LEN(C4))),1)
D7:BI7D7=UNICODE(D6#)
D8:BI8D8=(D7#>=D14)*(D7#<=D15)
D9:BI9D9=IF(D8#,D6#,"")
C10C10=TEXTJOIN("",0,D9#)
C12C12=TEXTJOIN("",0,IF((UNICODE(MID(C4,TRANSPOSE(SEQUENCE(LEN(C4))),1))>=D14)*(UNICODE(MID(C4,TRANSPOSE(SEQUENCE(LEN(C4))),1))<=D15),MID(C4,TRANSPOSE(SEQUENCE(LEN(C4))),1),""))
D18:D116D18=SEQUENCE(D15-D14+1,,D14)
E18:E116E18=UNICHAR(D18#)
Dynamic array formulas.
Hi @DRSteele,

Sorry got busy yesterday with another issue that I had to sort out ASAP, I had a read through however (and some more today) and not going to lie but I couldn't wrap my small head around it lol.

I can only guess what is happening in the logic trail and it does look amazing so thank you, I'll probably just try and stick to C12.
My question is D14, like do I need to create this cell somewhere in my file?


=TEXTJOIN("",0,IF((UNICODE(MID(C4,TRANSPOSE(SEQUENCE(LEN(C4))),1))>=D14)*(UNICODE(MID(C4,TRANSPOSE(SEQUENCE(LEN(C4))),1))<=D15),MID(C4,TRANSPOSE(SEQUENCE(LEN(C4))),1),""))

when I created the threat I was pretty tired and panicking so my fault for leaving out an example of my workbook. No excuse really but yeah here is a snippet of my file.

Export_2022-04-09_161827.xlsx
DEFG
1Body HTMLRow #Top RowVariant SKU
2FEATURES <li>Box of 50</li><li>3-Ply</li><li>Bacterial Filtration Efficiency (BFE) %: ≥ 95%</li><li>FDA Registered</li> <li>Color: Blue</li> <li>Size: 17.5×9.5 cm</li> <li>Weight: around 3 g</li> === DESCRIPTION A adjustable nose piece, Elastic earloop <span style="box-sizing: border-box; font-family:"></span>1TRUEJTPMEL50
32
43
54
65
7FEATURES <li>Soft, Cloth-like Backing - For Added Comfort </li> <li>Breathable Side Panels</li> <li>Refastenable Tabs </li> <li>Super-Absorbent Polymer</li> <li>High Performance Acquisition Layer - Keeps Skin Dry</li> <li>Blue Dryness Strip - For Added Protection and Odor Control</li> === DESCRIPTION The Covidien Wings HL 3D Adult Brief designed for heavy incontinence with plastic-free sides and higher leg cut-outs for comfortable fit and anatomically contoured mat for maximum absorbency, dryness and odor control. It also features with a soft, durable cloth-like backing for added dignity and comfort. Its hook and loop fastening allows easy on and off application and ensures a correct fit.6TRUE004-63063/CS
87004-63064/CS
9<p>DESCRIPTION Relieve the pain from your abdomen with this 3-Panel Abdominal Binder. This abdominal Binder is made of a soft, strechable materials which provides you a firm support even when moving around. Its featured velcro closure can be easily adjusted to fit your desired size and allows you to easily access your wound quickly. The binder can also be cut down to accommodate drainage tube when needed.</p>8TRUEZRB410/EA
109ZRB411/EA
1110ZRB418/EA
12FEATURES <li> Three Positions: </li> <li> Upright </li> <li> Deep Recline </li> <li> Elevated Leg rest/Footrest </li> <li> Retractable Lock Bar - Locks Chair in Desired Position </li> <li> Self-adjusting Suspension - Prevents Bottoming Out </li> <li> Large, Blow Molded Tray </li> <li> 5" Casters </li> <li> Comfortable, Built in Headrest </li> <li> Includes Warranty On: </li> <li> Frame - Limited Lifetime </li> <li> Upholstery and Moving Parts - 1 Year </li> === DESCRIPTION Gain an extra support and relief from your back pain with this 3-Position Bariatric Geri Recliner Chair. This Heavy duty recliner is constructed with three comfortable positions; retractable lock bar, self-adjusting suspension, and a convenient tray. Its strong and durable metal frames can handle weight up to 500 lbs. This recliner can be positioned Upright, in deep recline, and elevated leg rest/footrest. It can also be easily adjusted to your desired position. The self-adjusting suspension can prevent the users from bottoming-up from the recliner and has a moisture barrier which can prevent seepage. It is equipped with a built-in headrest and padded armrests which allows you to rest comfortably. Proposition 65 Warning: This product contains a chemical known to the State of California to cause cancer, birth defects, or other reproductive harm.11TRUE002-D574EW-BR/EA
1312002-D574EW-J/EA
1413002-D574EW-R/EA
15FEATURES <li>Padded Seat for added comfort when sitting</li> <li>Removable, Hinged, Padded Backrest allows back stability</li> <li>Removable Basket perfect for carrying your personal belongings</li> <li>Durable Steel Frame that can carry weight up to 300 lbs..</li> <li>Easy-to-Use Deluxe Loop Lock to ensure users safety</li> <li>Adjustable Handle Height to allow user to choose desired height</li> <li>Includes Limited Lifetime Warranty</li> === DESCRIPTION The 4 Wheel Rollator With Fold Up Removable Back is an extremely lightweight mobility device weighing only 20 lbs.. This makes it very easy to transport in and out of cars and public transport and perfect for indoor or outdoor use. The backseat enables the user to rest comfortably when tired of walking. The Large 7.5inch castors offers easy maneuverability even in rough terrain making them great for your active, busy lifestyle. The Easy-to-Use Deluxe Loop Lock made of aluminum will ensure that the rollator will not move when user is on idle mode.<br><br>Proposition 65 Warning: This product contains a chemical known to the State of California to cause cancer, birth defects, or other reproductive harm.14TRUEDRV10257BL-1
1615DRV10257RD-1
1716
1817
1918
2019
2120
2221
2322
2423
2524
Products


Another thing I'm noticing on the end result from yourself and @Peter_SSs is that there are blanks now?
VBA Code:
So my Excel files have these , ,  &  etc.

I actually need to replace the weird symbols to the correct ones, for example Googling this ’s is actually an apostrophe, ≥ is actually ≥ else it's going to look really weird.
 
Last edited:
Upvote 0
Another thing I'm noticing on the end result from yourself and @Peter_SSs is that there are blanks now?

I actually need to replace the weird symbols to the correct ones
Well, that is a very different question to ..
"Removing weird/garbage text from Excel" per your thread title or
I re-saved the file as ANSI in notepad which got rid of these weird characters.
Now I'm getting "???", sometimes with a space in front " ???" or "??" Now I don't know how to get rid of these.
 
Upvote 0
It seems you've moved the goalposts mid match. Tinkering with text in Excel is not easy, not reliable and not transferable between requirements.

Maybe PowerQuery might be something to try here. If I come up with anything I'll get back to the thread.
 
Upvote 0
Code:
Sub Is_This_A_Start()
Dim arr, i As Long
arr = Range("A1:A" & Cells(Rows.Count, 1).End(xlUp).Row).Value
    For i = LBound(arr) To UBound(arr)
        arr(i, 1) = Replace(Replace(arr(i, 1), "’s", ","), "≥", ChrW(8804))
    Next i
Sheets("Sheet2").Range("A1").Resize(UBound(arr)) = arr
End Sub
 
Upvote 0
Well, that is a very different question to ..
"Removing weird/garbage text from Excel" per your thread title or

Yes I'm sorry. Stupid of me to not to write the full thing in detail. I do apologize.
 
Upvote 0
It seems you've moved the goalposts mid match. Tinkering with text in Excel is not easy, not reliable and not transferable between requirements.

Maybe PowerQuery might be something to try here. If I come up with anything I'll get back to the thread.

Hi,

I do apologize, I had no intention to do so. It's my fault for not explaining it properly and also not attaching a mini sheet.
 
Upvote 0

Forum statistics

Threads
1,224,823
Messages
6,181,181
Members
453,021
Latest member
Justyna P

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top