What I am trying to build
In Sheet1 column A row2 down to x amount of rows, there will be a list of website URLs. I need the code to go through the urls and find the phone numbers and emails and place them in column B + C next to the urls, if nothing is found place a hyphen in the cell.
I have almost got this working. The code loops through a list of URLS in Sheet1 column A and pulls the phone numbers and emails, places them into column B and C. I just have 3 problems with the current code i wrote, these are stated below Problem 3 might be a simple fix.
THE NEW CODE
Private Sub CommandButton1_Click()
' Run main code
Dim wb As Workbook
Dim wsSheet As Worksheet, links As Variant, IE As Object, link As Variant
Dim rw As Long
Dim html As New HTMLDocument
Dim regxp As New RegExp, post As Object, phone_list As Object, email_list As Object
'SHEET1 as sheet with URL
Set wb = ThisWorkbook
Set wsSheet = wb.Sheets("Sheet1")
'Set IE = InternetExplorer
Set IE = CreateObject("InternetExplorer.Application")
rw = wsSheet.Cells(wsSheet.Rows.Count, "A").End(xlUp).Row
links = wsSheet.Range("A2:A" & rw)
'IE Open Time per page 4sec and check links on Sheet2 Column A
With IE
.Visible = True
Application.Wait (Now + TimeValue("00:00:04"))
For Each link In links
.navigate (link)
While .Busy Or .readyState <> 4: DoEvents: Wend
Set html = .document
'Application.Wait (Now + TimeValue("00:00:04"))
With regxp
.Pattern = "(?:+1)?(?:+[0-9])?(?([0-9]{3}))?[-. ]?([0-9]{3})[-. ]?([0-9]{4})"
Set phone_list = .Execute(html.body.innerHTML)
.Pattern = "[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+"
Set email_list = .Execute(html.body.innerHTML)
End With
Sheet1.Cells(Sheet1.Cells(Sheet1.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = phone_list(0)
Sheet1.Cells(Sheet1.Cells(Sheet1.Rows.Count, "C").End(xlUp).Row + 1, "C").Value = email_list(0)
''''################################ I TRIED TO FIX THE PROBLEM WITH THIS #########################
'''' ############################### TO PLACE A HYPHEN IF NOTHING IS FOUND #########################
'''' If regxp Is Nothing Then
'''' wsSheet.Cells(Sheet1.Cells(Sheet1.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = phone_list()
'''' Else
'''' wsSheet.Cells(Sheet1.Cells(Sheet1.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = phone_list(0)
'''' End If
''''
'''' If regxp Is Nothing Then
'''' wsSheet.Cells(Sheet1.Cells(Sheet1.Rows.Count, "C").End(xlUp).Row + 1, "C").Value = email_list()
'''' Else
'''' wsSheet.Cells(Sheet1.Cells(Sheet1.Rows.Count, "C").End(xlUp).Row + 1, "C").Value = email_list(0)
'''' End If
''''################################ I TRIED TO FIX THE PROBLEM WITH THIS #########################
''''################################################################################################
'navigate links
Next link
'Close IE Browser
.Quit
End With
Set IE = Nothing
End Sub
Problem 1
If there is no item to extract then the code does not go to the next url, for some reason it just stay on that page, or I get an error message. e.g website has phone number but no email the page will not navigate to the next url. I tried to fix this with an IF statement but could not get it to work.. What it should do If there is nothing to extract go to the next urls in column A
Problem 2
If the website has an invalid security certificate or the url is DEAD then the code does not navigate to the next url, it waits for a user input. If I click "NO" to state I do not wish to to navigate to this site the code crashes. If the certificate is invalid or url is DEAD then it should move to the next url, so if site has not loaded in X amount of time move to the next url. Not sure if this could also be used for problem1
I think I need something like this, but can't work it out with my code Mr Excel
Problem 3
This might just be an excel column formating issue unless I have have got the phone number expression wrong in the code. As you can see the phone numbers are not showing correct. I am not sure if excel is clearing an "0" and that is why the numbers are wrong or the phone number expression is wrong.
Thanks for having a look Please could sombody help me out on anyone of the three issues. As aways THANKS in advance.
UPDATED TODAY 24/7/2020 AT 12:56 UK TIME
I have added a better Regxp for finding phone numbers, since posting PROBLEM 3, it has improved a bit .Pattern = "(?:+1)?(?:+[0-9])?(?([0-9]{3}))?[-. ]?([0-9]{3})[-. ]?([0-9]{4})"
it is still however clipping some digits of the numbers, see image below, numbers in green where found and last digit is missing
Also Posted on Mr Excel Mr Excel.
####### Added Today Thursday 30th July 2020 4:00pm Uk time ########
I am trying an If statement so If Nothing is found then place a hyphen, see below
If email_list Is Nothing Then
'On Error Resume Next
wsSheet.Cells(Sheet1.Cells(Sheet1.Rows.Count, "c").End(xlUp).Row + 1, "c").Value = "-"
Else
On Error Resume Next
wsSheet.Cells(Sheet1.Cells(Sheet1.Rows.Count, "c").End(xlUp).Row + 1, "c").Value = email_list(0)
End If
End With
However I can Not get it to work, the On Error Resume Next allows me to move to the next url and gets rid of the first error message.
The emails NOW pull off as such, I have colour coded them for easy viewing. As you can see from the colours they are NOT next to the correct urls, This is why I was trying to place an hyphen in the cell at least then that cell would be poplulated an the date would go into the next blank cell, thus keeping everything in line.
############## Updated Today FRIDAY 31st JULY 2020 1.26PM UK time
I have fixed the problem with data NOT going into the right place, By uking the IF statement code. So now Problems 1 and 2 seem fine. Only problem 3 remains, which I though would be a simple fix LOL.
The problem was this
If regxp Is Nothing Then
It should have been
If Phone_List (0) Is Nothing Then
And
If Email_List (0) Is Nothing Then
########### UPDATED TODAY Monday 3rd August 11.45 Uk time #############
This is my workaround to overcome Problem 3 for phone numbers not pulling of correct.
I have changed the Pattern part of the code, so now it pulls the REGXP pattern from the Sheet, Sheet1.Range D1. This way I can change the regxp pattern in the cell to pull off different phone number types.
''' ########## Phone Numbers Pattern ###########
.Pattern = ThisWorkbook.Sheets("Sheet1").Range("D1")
.Global = False
.IgnoreCase = True
Set phone_list = .Execute(html.body.innerHtml)
This is the Regxp pattern I am using for now, for uk. It is in placed in Sheet1 CELL D1
(?:+1)?(?:+[0-9])?(?([0-9]{4}))?[-. ]?([0-9]{4})[-. ]?([0-9]{3}?)
If anyone has a better pattern please post.
########## Updated today Tuesday 5th August 2020 1:35 uk time ##########
I have MSXML2.ServerXMLHTTP code which works much faster, but misses a few emails and numbers. Where as the IE version I wrote and the code ANSWER written by SMTH pick up the extra emails and phone numbers. I changed the regxp patters in SMTH ANSWER to mine for better results.
If anyone knows why then please advise, otherwise SMTH code is the answer as it does the same job as mine, but is written much better.
Private Sub CommandButton2_Click()
'''######### NO IE THIS CODE IS FASTER ######
Dim wb As Workbook
Dim wsSheet As Worksheet, links As Variant, link As Variant
Dim rw As Long
Dim regxp As New RegExp, post As Object, phone_list As Object, email_list As Object
Dim Html As New HTMLDocument
''''SHEET1 as sheet with URL
Set wb = ThisWorkbook
Set wsSheet = wb.Sheets("Sheet1")
rw = wsSheet.Cells(wsSheet.Rows.Count, "A").End(xlUp).Row
links = wsSheet.Range("A2:A" & rw)
For Each link In links
'Set doc = NewHTMLDocument(CStr(link))
Set Html = NewHTMLDocument(CStr(link))
With regxp
''' ########## Phone Numbers Pattern ###########
.Pattern = "(?:+1)?(?:+[0-9])?(?([0-9]{4}))?[-. ]?([0-9]{4})[-. ]?([0-9]{3}?)" '"(?:+1)?(?:+[0-9])?(?([0-9]{3}))?[-. ]?([0-9]{3})[-. ]?([0-9]{3}?)"
.Global = False
.IgnoreCase = True
Set phone_list = .Execute(Html.body.innerHtml)
''' ########## Email Pattern ###########
.Pattern = "([a-zA-Z0-9_-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([a-zA-Z0-9-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)"
.Global = False
.IgnoreCase = True
Set email_list = .Execute(Html.body.innerHtml)
'''########## PHONE LIST ############# ADD TO SHEET
On Error Resume Next
If phone_list(0) Is Nothing Then
On Error Resume Next
Sheet1.Cells(Sheet1.Cells(Sheet1.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = "-"
Else
On Error Resume Next
Sheet1.Cells(Sheet1.Cells(Sheet1.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = phone_list(0)
End If
'''########## EMAIL LIST ############# ADD TO SHEET
On Error Resume Next
If email_list(0) Is Nothing Then
On Error Resume Next
Sheet1.Cells(Sheet1.Cells(Sheet1.Rows.Count, "C").End(xlUp).Row + 1, "C").Value = "-"
Else
On Error Resume Next
Sheet1.Cells(Sheet1.Cells(Sheet1.Rows.Count, "C").End(xlUp).Row + 1, "C").Value = email_list(0)
End If
E