I'm using the python ebook library ebooklib to modify a batch of epub files. The dummy code is as the following.
book = epub.read_epub(input_path)
page_add = epub.EpubHtml(title='index_add', file_name='index_add.html', lang='en')
page_add.content = u'''
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<body>
<div>
I'm a new added page
</div>
</body>
</html>
'''
book.add_item(page_add)
book.spine.insert(1, page_add)
epub.write_epub(output_path, book, {})
After running the code, a new epub file was generated, and the new page was added to it. The issue is that all the original content of the original epub file lost their styles.
As we know, the epub file is composed of HTML files. I changed the file extension from .epub to .zip, and then unzip it, then I can get all the HTML files. After digging into these files for a while, I found the reason of losing all the style is that all the stylesheet file was located inside the <head>
tag of all the original HTML files, but the new file lost all of these content inside the <head>
tag. The original <head>
looks like the following:
<head>
<link href="../stylesheet.css" rel="stylesheet" type="text/css"/>
<link href="../page_styles.css" rel="stylesheet" type="text/css"/>
</head>
From the ebooklib's doc, I found the following description:
When defining content you can define it as valid HTML file or just parts of HTML elements you have as a content. It will ignore whatever you have in <head>
element.
I think this may be the reason why all the content inside <head>
tag was lost. I don't know why ebooklib did this. Does anyone have a way to fix it? I think my requirement is quite common. Just add a page into lots of existed epub files.
Any help will be highly appreciated.
question from:
https://stackoverflow.com/questions/66061399/modify-epub-file-by-pythons-ebooklib-but-all-the-contents-inside-head-was-lo 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…