Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
208 views
in Technique[技术] by (71.8m points)

python - Programmatically convert pandas dataframe to markdown table

I have a Pandas Dataframe generated from a database, which has data with mixed encodings. For example:

+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 31 | data/Nor/H?ylandet.txt  | Nor      | 2015-07-22 | H?gskolen i ?stfold er et eksempel...          | Som skuespiller har jeg b?de...                        | 253          | 15th and 16th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+

As seen there is a mix of English and Norwegian (encoded as ISO-8859-1 in the database I think). I need to get the contents of this Dataframe output as a Markdown table, but without getting problems with encoding. I followed this answer (from the question Generate Markdown tables?) and got the following:

import sys, sqlite3

db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()

rows = []
for index, row in df.iterrows():
    items = (row['date'], 
             row['path'], 
             row['language'], 
             row['shortest_sentence'],
             row['longest_sentence'], 
             row['number_words'], 
             row['readability_consensus'])
    rows.append(items)

headings = ['Date', 
            'Path', 
            'Language',
            'Shortest Sentence', 
            'Longest Sentence since', 
            'Words',
            'Grade level']

fields = [0, 1, 2, 3, 4, 5, 6]
align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'),
         ('^','^'), ('^','^')]

table(sys.stdout, rows, fields, headings, align)

However, this yields an UnicodeEncodeError: 'ascii' codec can't encode character u'xe5' in position 72: ordinal not in range(128) error. How can I output the Dataframe as a Markdown table? That is, for the purpose of storing this code in a file for use in writing a Markdown document. I need the output to look like this:

| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
|----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------|
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
| 31 | data/Nor/H?ylandet.txt  | Nor      | 2015-07-22 | H?gskolen i ?stfold er et eksempel...          | Som skuespiller har jeg b?de...                        | 253          | 15th and 16th grade   |
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Improving the answer further, for use in IPython Notebook:

def pandas_df_to_markdown_table(df):
    from IPython.display import Markdown, display
    fmt = ['---' for i in range(len(df.columns))]
    df_fmt = pd.DataFrame([fmt], columns=df.columns)
    df_formatted = pd.concat([df_fmt, df])
    display(Markdown(df_formatted.to_csv(sep="|", index=False)))

pandas_df_to_markdown_table(infodf)

Or use tabulate:

pip install tabulate

Examples of use are in the documentation.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...