I have an HTML file where I am interested in BBox information with the text. After extracting the BBox with text, I appended it into a list. However, the output seems it's appending the first list (first added the first line into a list) into a second list (added a second line of string into a list). To better illustrate this problem, I attached a snippet of this problem.
However, I want this into one single list. The following snippet illustrating the output that I want.
Below is the simple code that I wrote:
import bs4
xml_input = open("1.html","r",encoding="utf-8")
soup = bs4.BeautifulSoup(xml_input,'lxml')
ocr_lines = soup.findAll("span", {"class": "ocr_line"})
#We will save coordinates of line and the text contained in the line in lines_structure list
lines_structure = []
for line in ocr_lines:
line_text = line.text.replace("
"," ").strip()
title = line['title']
#The coordinates of the bounding box
x1,y1,x2,y2 = map(int, title[5:title.find(";")].split())
lines_structure.append({"x1":x1,"y1":y1,"x2":x2,"y2":y2,"text": line_text})
print(lines_structure)
I would really appreciate your help regarding this problem.
question from:
https://stackoverflow.com/questions/65836667/how-to-convert-appended-list-of-strings-into-one-single-list 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…