I got problem with my python parser. its a part of my file:
<tr>
<td class="zeit"><div>03.12. 10:45:00</div></td>
<td class="system"><div><a target="_blank" href="detail.php?host=CG&factor=2&delay=1&Y=15">CG</div></a></td>
<td class="fehlertext"><div>System steht nicht zur Verfügung!</div></td>
</tr>
<tr>
<td class="zeit"><div>03.12. 10:10:01</div></td>
<td class="system"><div><a target="_blank" href="detail.php?host=DEXProd&factor=2&delay=5&Y=15">DEX</div></a></td>
<td class="fehlertext"><div>ssh: Connection refused Couldn't read packet: Connection reset by peer</div></td>
</tr>
<tr>
<td class="zeit"><div>03.12. 06:23:06</div></td>
<td class="system"><div><a target="_blank" href="detail.php?host=FRAUD&factor=2&delay=1&Y=15">Boni</div></a></td>
<td class="fehlertext"><div>ID Fehler</div></td>
</tr>
Now i'm going to get few information for each:
1) DATA 2) NAME 3) ERROR
so for 1st table should be:
03.12. 10:45:00 CG System steht nicht zur Verfügung!
i was reading some information about BS4 but i have no idea how to initiate below python script.
-bash-3.2$ cat out2.py
from bs4 import BeautifulSoup
with open ("file.txt", "r") as myfile:
html=myfile.read().replace('
', '')
soup = BeautifulSoup(html)
tag = soup.findAll('a') #all "a" tag in a list
count = 0
passx = 0
for i in tag:
if count > 3:
print "-------------------------------"
#FILE.write("-------------------------------" + "
")
count = 0
passx = 0
if passx == 0:
print i['href']
#FILE.write(i['href'] + "
")
passx = 1
print i.text
count = count + 1
#FILE.close()
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…