I have a large dataframe and I am storing a lot of redundant values that are making it hard to handle my data. I have a dataframe of the form:
import pandas as pd
df = pd.DataFrame([["a","g","n1","y1"], ["a","g","n2","y2"], ["b","h","n1","y3"], ["b","h","n2","y4"]], columns=["meta1", "meta2", "name", "data"])
>>> df
meta1 meta2 name data
a g n1 y1
a g n2 y2
b h n1 y3
b h n2 y4
where I have the names of the new columns I would like in name
and the respective data in data
.
I would like to produce a dataframe of the form:
df = pd.DataFrame([["a","g","y1","y2"], ["b","h","y3","y4"]], columns=["meta1", "meta2", "n1", "n2"])
>>> df
meta1 meta2 n1 n2
a g y1 y2
b h y3 y4
The columns called meta
are around 15+ other columns that contain most of the data, and I don't think are particularly well suited to for indexing. The idea is that I have a lot of repeated/redundant data stored in meta
at the moment and I would like to produce the more compact dataframe presented.
I have found some similar Qs but can't pinpoint what sort of operations I need to do: pivot, re-index, stack or unstack, etc.?
PS - the original index values are unimportant for my purposes.
Any help would be much appreciated.
Question I think is related:
I think the following Q is related to what I am trying to do, but I can't see how to apply it, as I don't want to produce more indexes.
See Question&Answers more detail:
os