modify epub file by python's ebooklib, but all the contents inside <head> was lost

Question

Welcome To Ask or Share your Answers For Others

modify epub file by python's ebooklib, but all the contents inside <head> was lost

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

modify epub file by python's ebooklib, but all the contents inside <head> was lost

I'm using the python ebook library ebooklib to modify a batch of epub files. The dummy code is as the following.

book = epub.read_epub(input_path)

page_add = epub.EpubHtml(title='index_add', file_name='index_add.html', lang='en')
page_add.content = u'''
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <body>
        <div>
            I'm a new added page
        </div>
    </body>
</html>
'''
book.add_item(page_add)

book.spine.insert(1, page_add)

epub.write_epub(output_path, book, {})

After running the code, a new epub file was generated, and the new page was added to it. The issue is that all the original content of the original epub file lost their styles.

As we know, the epub file is composed of HTML files. I changed the file extension from .epub to .zip, and then unzip it, then I can get all the HTML files. After digging into these files for a while, I found the reason of losing all the style is that all the stylesheet file was located inside the <head> tag of all the original HTML files, but the new file lost all of these content inside the <head> tag. The original <head> looks like the following:

<head>
    <link href="../stylesheet.css" rel="stylesheet" type="text/css"/>
    <link href="../page_styles.css" rel="stylesheet" type="text/css"/>
</head>

From the ebooklib's doc, I found the following description:

When defining content you can define it as valid HTML file or just parts of HTML elements you have as a content. It will ignore whatever you have in <head> element.

I think this may be the reason why all the content inside <head> tag was lost. I don't know why ebooklib did this. Does anyone have a way to fix it? I think my requirement is quite common. Just add a page into lots of existed epub files.

Any help will be highly appreciated.

question from:https://stackoverflow.com/questions/66061399/modify-epub-file-by-pythons-ebooklib-but-all-the-contents-inside-head-was-lo

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T03:07:17+0000

Author of Ebooklib here. The only proper way to do it with Ebooklib is to read EPUB file and construct new EPUB from scratch by cherry picking what you need from the original file. You were never supposed to read the file, modify it and write it down because we wanted to always end up with valid EPUB3 and our approach was "I will ignore all the garbage metadata, extra files, just take what I need and keep my layout of the folders".

That being said, that was for the online publishing system we worked on. Using Ebooklib outside of the system it does make a lot of sense to be able to do something like that. I am not sure at the moment how much changes that would require. Will take a look.

Aleksandar

Categories

modify epub file by python's ebooklib, but all the contents inside <head> was lost

modify epub file by python's ebooklib, but all the contents inside <head> was lost

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags