Extract allways changing (and unknown) number at end of string

strooman

Active Member
Joined
Oct 29, 2013
Messages
333
Office Version
  1. 2016
Platform
  1. Windows
I'm cycling through multiple webpages and have to find the number at the end of a string like: Page 1 of 356 . Result would be 356. Not to difficult HOWEVER . . .
Because there are hundreds of webpages this last number changes all the time ( and I do not know it). I know how to find "Page 1 of " in the webpage for there is only one instance of this string.
But how can I extract this always changing number at the right in these multiple webpages? Some examples and desired results:
Page 1 of 356 result should be 356
Page 1 of 1284 result should be 1284
Page 1 of 13 result should be 13

I cannot use the MID-function because this requires the whole string to search but at the moment of executing I do not now what that last number while be.
MyNumber = Mid("Page 1 of " (i do not know the number here) , 1, 1)

To be complete, this is the complete string within te webpage.
<span><a href="javascript://" class="popupctrl">Page 1 of 99</a></span>
In summery, find the instance "Page 1 of " in the webpage and extract the 99.
 

Excel Facts

Best way to learn Power Query?
Read M is for (Data) Monkey book by Ken Puls and Miguel Escobar. It is the complete guide to Power Query.
Perhaps this:

Excel Formula:
=TRIM(RIGHT(SUBSTITUTE(B1," ",REPT(" ",100)),99))

There may also be more elegant solutions to be had, we shall see. Edit: Of course, this requires "Page 1 of #" is the very last part of the string already... if it is not, then the formula must change.
 
Upvote 0
And perhaps this for strings where "Page 1 of #" is not at the end. Again, there is likely something much shorter, but this should get the job done.

Excel Formula:
=LEFT(MID(REPLACE(B2,FIND("Page",B2),10,"@"),FIND("@",REPLACE(B2,FIND("Page",B2),10,"@"))+1,5),FIND(" ",MID(REPLACE(B2,FIND("Page",B2),10,"@"),FIND("@",REPLACE(B2,FIND("Page",B2),10,"@"))+1,5))-1)

Also, if the max page count can be greater than 99,999, the formula may need to be adjusted.
 
Upvote 0
And if I had finished reading your post, I would have gotten to the end about the complete string... oops. So, if that is the exact string every time, here is a much simpler solution that should work:

Excel Formula:
=REPLACE(REPLACE(B4,1,FIND("Page",B4)+9,""),FIND("<",REPLACE(B4,1,FIND("Page",B4)+9,"")),11,"")

All the options I provided above, but A4 is by far the best.
Book1
AB
120Page 1 of 20
2255ABC DEF GHI Page 1 of 255 JKL MNO
3<span><a href="javascript://" class="popupctrl">Page 1 of 99</a></span>
499<span><a href="javascript://" class="popupctrl">Page 1 of 99</a></span>
Sheet3
Cell Formulas
RangeFormula
A1A1=TRIM(RIGHT(SUBSTITUTE(B1," ",REPT(" ",100)),99))
A2A2=LEFT(MID(REPLACE(B2,FIND("Page",B2),10,"@"),FIND("@",REPLACE(B2,FIND("Page",B2),10,"@"))+1,5),FIND(" ",MID(REPLACE(B2,FIND("Page",B2),10,"@"),FIND("@",REPLACE(B2,FIND("Page",B2),10,"@"))+1,5))-1)
A4A4=REPLACE(REPLACE(B4,1,FIND("Page",B4)+9,""),FIND("<",REPLACE(B4,1,FIND("Page",B4)+9,"")),11,"")
 
Upvote 1
You could also try

Excel Formula:
=FILTERXML("<x><y>"&SUBSTITUTE(SUBSTITUTE(A1,"<",">"),">","</y><y>")&"</y></x>","//y[contains(.,'Page')]")
 
Upvote 0
For just the number

Excel Formula:
=FILTERXML("<x><y>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"<"," "),">"," ")," ","</y><y>")&"</y></x>","//y[preceding::*[3]='Page']")
 
Upvote 1
Both, solutions will do the job.
First I couldn't find the FILTERXML function and I thought it wouldn't be available in my excel version (which is Dutch) but I found that the function is called XML.FILTEREN. And more important, it worked. Thanks for putting time in this one.
 
Upvote 0

Forum statistics

Threads
1,223,894
Messages
6,175,250
Members
452,623
Latest member
Techenthusiast

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top