Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
137 views
in Technique[技术] by (71.8m points)

r - colSums - shifted results

so I wanted to calculate the sum of each column and the results are shifted. I have no idea how to handle it.

My data:

> head(df)

K    2  5  4  2
L    2  1  4  1
M    1  3  4  3
N    3  2  1  1
Sum  7  8  11 13

so as you see the results are not proper. The sum of first column is in the second column and in the first column is the sum of the last one. How to handle it ?

I used that code to calculate the sum:

df <- suppressWarnings(rbind(data, Sum=colSums(data[, -1])))

That how my data looks like:

> dput(head(data,4))
structure(list(Name = structure(c(95L, 331L, 161L, 156L
), .Label = c(" 1-deoxy-D-xylulose 5-phosphate reductoisomerase ", 
" 2-cysteine peroxiredoxin B ", " 2-oxoacid dehydrogenases acyltransferase family protein ", 
" 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase superfamily protein ", 
" 26S proteasome, regulatory subunit Rpn7;Proteasome component (PCI) domain ", 
" 3-dehydroquinate synthase, putative ", " 3-ketoacyl-acyl carrier protein synthase I ", 
" 3-ketoacyl-acyl carrier protein synthase III ", " 4-hydroxy-3-methylbut-2-enyl diphosphate reductase ", 
" 40s ribosomal protein SA ", " 5'adenylylphosphosulfate reductase 2 ", 
" AAA-type ATPase family protein ", " ACC oxidase 2 ", " acetoacetyl-CoA thiolase 2 ", 
" ACT domain-containing small subunit of acetolactate synthase protein ", 
" actin 7 ", " actin 8 ", " adenosine kinase 1 ", " adenosine kinase 2 ", 
" adenylosuccinate synthase ", " ADP-glucose pyrophosphorylase family protein ", 
" ADP glucose pyrophosphorylase  1 ", " ADP/ATP carrier 1 ", 
" Aha1 domain-containing protein ", " alanine:glyoxylate aminotransferase ", 
" alanine:glyoxylate aminotransferase 2 ", " Alba DNA/RNA-binding protein ", 
" Aldolase-type TIM barrel family protein ", " Aldolase superfamily protein ", 
" alkenal reductase ", " allene oxide synthase ", " Alpha-S1 casein precursor - Bos taurus (Bovine).", 
" alpha/beta-Hydrolases superfamily protein ", " amidase 1 ", 
" Amino acid dehydrogenase family protein ", " ankyrin repeat-containing 2B ", 
" ankyrin repeat-containing protein 2 ", " Ankyrin repeat family protein ", 
" annexin 1 ", " annexin 7 ", " APS reductase 3 ", " Arginase/deacetylase superfamily protein ", 
" aspartate aminotransferase 1 ", " aspartate aminotransferase 2 ", 
" aspartate aminotransferase 3 ", " aspartate aminotransferase 5 ", 
" ATP-dependent caseinolytic (Clp) protease/crotonase family protein ", 
" ATP citrate lyase (ACL) family protein ", " ATP phosphoribosyl transferase 1 ", 
" ATP phosphoribosyl transferase 2 ", " ATP sulfurylase 1 ", 
" ATP synthase alpha/beta family protein ", " ATP synthase protein I -related ", 
" ATP synthase subunit alpha ", " ATP synthase subunit beta ", 
" ATPase, F0 complex, subunit B/B', bacterial/chloroplast ", 
" ATPase, F1 complex, alpha subunit protein ", " ATPase, F1 complex, gamma subunit protein ", 
" ATPase, V0/A0 complex, subunit C/D ", " ATPase, V1 complex, subunit B protein ", 
" basic transcription factor 3 ", " beta-1,3-glucanase_putative ", 
" Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin superfamily protein ", 
" binding to TOMV RNA 1L (long form) ", " branched-chain amino acid aminotransferase 5 / branched-chain amino acid transaminase 5 (BCAT5) ", 
" branched-chain aminotransferase 3 ", " branched-chain aminotransferase4 ", 
" calcium sensing receptor ", " carbonic anhydrase 1 ", " carbonic anhydrase 2 ", 
" catalase 2 ", " chaperonin-60alpha ", " chaperonin 60 beta ", 
" chlorophyll A/B binding protein 3 ", " chloroplast heat shock protein 70-1 ", 
" chloroplast heat shock protein 70-2 ", " chloroplast RNA binding ", 
" chloroplast stem-loop binding protein of  41 kDa ", " chloroplastic NIFS-like cysteine desulfurase ", 
" chorismate synthase, putative / 5-enolpyruvylshikimate-3-phosphate phospholyase, putative ", 
" cinnamyl alcohol dehydrogenase 9 ", " Citrate synthase family protein ", 
" Class I glutamine amidotransferase-like superfamily protein ", 
" Clp ATPase ", " Coatomer, alpha subunit ", " Cobalamin-independent synthase family protein ", 
" cold, circadian rhythm, and RNA binding 1 ", " cold, circadian rhythm, and rna binding 2 ", 
" Coproporphyrinogen III oxidase ", " Cupredoxin superfamily protein ", 
" Cyclophilin-like peptidyl-prolyl cis-trans isomerase family protein ", 
" cyclophilin 38 ", " cysteine synthase 26 ", " cysteine synthase C1 ", 
" Cytosol aminopeptidase family protein ", " cytosolic NADP+-dependent isocitrate dehydrogenase ", 
" D-3-phosphoglycerate dehydrogenase ", " D-cysteine desulfhydrase ", 
" D-ribulose-5-phosphate-3-epimerase ", " DegP protease 1 ", 
" Dehydrin family protein ", " delta tonoplast integral protein ", 
" desulfo-glucosinolate sulfotransferase 18 ", " Di-glucose binding protein with Kinesin motor domain ", 
" dicarboxylate diiron protein, putative (Crd1) ", " dicarboxylate transport 2.1 ", 
" dicarboxylate transporter 1 ", " Dihydrolipoamide succinyltransferase ", 
" Disease resistance protein (TIR-NBS-LRR class) family ", " DNA repair ATPase-related ", 
" DNAJ heat shock N-terminal domain-containing protein ", " Domain of unknown function (DUF3598) ", 
" dual specificity protein phosphatase (DsPTP1) family protein ", 
" edited nad9/rpl16 transcript found in intergenic region. From Philippe Giege (CNRS)", 
" edited PSBE", " eif4a-2 ", " elicitor-activated gene 3-1 ", 
" Enolase ", " epithiospecifier modifier 1 ", " epithiospecifier protein ", 
" ethylene-dependent gravitropism-deficient and yellow-green-like 2 ", 
" ethylene-forming enzyme ", " Eukaryotic aspartyl protease family protein ", 
" eukaryotic initiation factor 4A-III ", " eukaryotic translation initiation factor 2 alpha subunit ", 
" Eukaryotic translation initiation factor 2 subunit 1 ", " eukaryotic translation initiation factor 4A1 ", 
" FASCICLIN-like arabinogalactan protein 13 precursor ", " FASCICLIN-like arabinoogalactan 9 ", 
" ferredoxin-NADP(+)-oxidoreductase 1 ", " ferredoxin-NADP(+)-oxidoreductase 2 ", 
" flavanone 3-hydroxylase ", " formate dehydrogenase ", " fructose-bisphosphate aldolase 1 ", 
" fructose-bisphosphate aldolase 2 ", " FTSH protease 1 ", " FUNCTIONS IN: molecular_function unknown; INVOLVED IN: biological_process unknown; LOCATED IN: cyt", 
" GDP-D-mannose 3',5'-epimerase ", " GDSL-like Lipase/Acylhydrolase superfamily protein ", 
" germin 3 ", " Glucose-1-phosphate adenylyltransferase family protein ", 
" glutamate-1-semialdehyde-2,1-aminomutase ", " glutamate-1-semialdehyde 2,1-aminomutase 2 ", 
" glutamate-cysteine ligase ", " glutamate synthase 1 ", " glutamate:glyoxylate aminotransferase ", 
" glutamine synthase clone F11 ", " glutamine synthetase 2 ", 
" glyceraldehyde-3-phosphate dehydrogenase B subunit ", " glyceraldehyde-3-phosphate dehydrogenase C subunit 1 ", 
" glyceraldehyde-3-phosphate dehydrogenase C2 ", " glyceraldehyde-3-phosphate dehydrogenase of plastid 2 ", 
" glyceraldehyde 3-phosphate dehydrogenase A subunit ", " glyceraldehyde 3-phosphate dehydrogenase A subunit 2 ", 
" Glycine cleavage T-protein family ", " glycine decarboxylase P-protein 1 ", 
" GroES-like zinc-binding alcohol dehydrogenase family protein ", 
" GroES-like zinc-binding dehydrogenase family protein ", " GTP-binding protein-related ", 
" GTP binding ", " GTP binding Elongation factor Tu family protein ", 
" heat shock protein 70 (Hsp 70) family protein ", " Heat shock protein 70 (Hsp 70) family protein ", 
" heat shock protein 90.1 ", " high chlorophyll fluorescent 109 ", 
" high cyclic electron flow 1 ", " histidinol dehydrogenase ", 
" Histone superfamily protein ", " homolog of bacterial cytokinesis Z-ring protein FTSZ 1-1 ", 
" homoserine kinase ", " HOPW1-1-interacting 1 ", " HSP20-like chaperones superfamily protein ", 
" HXXXD-type acyl-transferase family protein ", " Hyaluronan / mRNA binding family ", 
" hydroxymethylbilane synthase ", " hydroxyproline-rich glycoprotein family protein ", 
" hydroxypyruvate reductase ", " Inositol monophosphatase family protein ", 
" Insulinase (Peptidase family M16) protein ", " Involved in response to salt stress.  Knockout mutants are hypersensitive to salt stress. ", 
" Iron-sulphur cluster biosynthesis family protein ", " isocitrate dehydrogenase subunit 2 ", 
" isocitrate dehydrogenase V ", " isocitrate dehydrogenase VI ", 
" isopropyl malate isomerase large subunit 1 ", " isopropylmalate dehydrogenase 1 ", 
" isopropylmalate dehydrogenase 2 ", " isopropylmalate dehydrogenase 3 ", 
" Keratin 1 - Homo sapiens (Human).", " Keratin 10 - Homo sapiens (Human).", 
" Keratin 14 (Epidermolysis bullosa simplex, Dowling-Meara, Koebner) - Homo sapiens (Human).", 
" Keratin 2a - Homo sapiens (Human).", " Keratin 5 - Homo sapiens (Human).", 
" Keratin, type I cytoskeletal 9 (Cytokeratin 9) (K9) (CK 9) - Homo sapiens (Human).", 
" Keratin, type II cytoskeletal 2 epidermal (Cytokeratin 2e) (K2e) (CK 2e) - Homo sapiens (Human).", 
" ketol-acid reductoisomerase ", " lactate/malate dehydrogenase family protein ", 
" Lactate/malate dehydrogenase family protein ", " Late embryogenesis abundant protein, group 2 ", 
" Leucine-rich repeat (LRR) family protein ", " Leucine-rich repeat protein kinase family protein ", 
" light-harvesting chlorophyll-protein complex I subunit A4 ", 
" light harvesting complex of photosystem II 5 ", " light harvesting complex photosystem II ", 
" light harvesting complex photosystem II subunit 6 ", " lipoxygenase 2 ", 
" magnesium chelatase i2 ", " malate dehydrogenase ", " Mannose-binding lectin superfamily protein ", 
" MAP kinase 11 ", " metallopeptidase M24 family protein ", " methionine adenosyltransferase 3 ", 
" Molecular chaperone Hsp40/DnaJ family protein ", " Molybdenum cofactor sulfurase family protein ", 
" monodehydroascorbate reductase 1 ", " monodehydroascorbate reductase 6 ", 
" Mov34/MPN/PAD-1 family protein ", " myosin heavy chain-related ", 
" NAD(P)-binding Rossmann-fold superfamily protein ", " NAD(P)-linked oxidoreductase superfamily protein ", 
" NAD(P)H dehydrogenase C1 ", " NAD(P)H dehydrogenase subunit H ", 
" NADH dehydrogenase subunit 7 ", " NagB/RpiA/CoA transferase-like superfamily protein ", 
" NDH-dependent cyclic electron flow 1 ", " nitrilase 1 ", " Nitrilase/cyanide hydratase and apolipoprotein N-acyltransferase family protein ", 
" nitrile specifier protein 3 ", " nitrite reductase 1 ", " non-ATPase subunit 9 ", 
" non-intrinsic ABC protein 6 ", " non-photochemical quenching 1 ", 
" nuclear factor Y, subunit A2 ", " Nucleic acid-binding proteins superfamily ", 
" Nucleic acid-binding, OB-fold-like protein ", " nucleobase-ascorbate transporter 7 ", 
" Nucleotidylyl transferase superfamily protein ", " O-acetylserine (thiol) lyase B ", 
" O-acetylserine (thiol) lyase isoform C ", " O-methyltransferase 1 ", 
" O-methyltransferase family protein ", " ornithine carbamoyltransferase ", 
" Oxidoreductase family protein ", " Oxidoreductase, zinc-binding dehydrogenase family protein ", 
" oxidoreductases, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor;copper ion", 
" oxophytodienoate-reductase 3 ", " P-loop containing nucleoside triphosphate hydrolases superfamily protein ", 
" Pectinacetylesterase family protein ", " Peptidase M20/M25/M40 family protein ", 
" Peptide chain release factor 1 ", " peroxidase CB ", " Peroxidase superfamily protein ", 
" peroxisomal 3-keto-acyl-CoA thiolase 2 ", " peroxisomal 3-ketoacyl-CoA thiolase 3 ", 
" peroxisomal NAD-malate dehydrogenase 2 ", " pfkB-like carbohydrate kinase family protein ", 
" phenylalanyl-tRNA synthetase class IIc family protein ", " phosphatase-related ", 
" phosphoglycerate kinase ", " phosphoglycerate kinase 1 ", " Phosphoglycerate kinase family pro

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Noting your actual attempted solution posted in the comment to @ChristopherLouden's answer, which looks suspiciously like the solution offered by @Jilber to a question from earlier today, I can finally reproduce your problem and offer a solution.

For the sake of simplicity, here's a much smaller data.frame to start our work with. Note that the data.frame has two non-numeric columns (one character and one factor). Something as small as this is sufficient to demonstrate your problem and is much easier for others to follow.

data <- structure(list(Name = c("a", "b", "c", "d"), 
    time1 = c(6692.50136510743, 41682.9111356503, 405946.374877924, 
    4640.34876265179), time2 = c(14404.8414547167, 40466.9047986558, 
    638019.540242027, 2397.71968447607), time3 = c(10146.3608040476, 
    34148.4389867747, 459639.431186888, 10490.8359468475), 
    New = structure(1:4, .Label = c("A", "B", "C", "D"), 
    class = "factor")), .Names = c("Name", "time1", "time2", "time3", 
    "New"), class = "data.frame", row.names = c(NA, 4L))
data
#   Name      time1     time2     time3 New
# 1    a   6692.501  14404.84  10146.36   A
# 2    b  41682.911  40466.90  34148.44   B
# 3    c 405946.375 638019.54 459639.43   C
# 4    d   4640.349   2397.72  10490.84   D

Here is your current solution, complete with strange "shifting" of column means.

df <- suppressWarnings(
  rbind(data, colMeans=colMeans(data[, sapply(data, is.numeric)])))
df
#                      Name      time1     time2     time3  New
# 1                       a   6692.501  14404.84  10146.36    A
# 2                       b  41682.911  40466.90  34148.44    B
# 3                       c 405946.375 638019.54 459639.43    C
# 4                       d   4640.349   2397.72  10490.84    D
# colMeans 114740.534035333 173822.252 128606.27 114740.53 <NA>

The solution I'm offering makes use of rbind.fill from "plyr" to bind the results to your original data.frame. The results are calculated only on the numeric columns of your original data.frame.

library(plyr) ## For `rbind.fill`
useme <- sapply(data, is.numeric)
rbind.fill(data, data.frame(t(colMeans(data[useme]))))
#   Name      time1     time2     time3  New
# 1    a   6692.501  14404.84  10146.36    A
# 2    b  41682.911  40466.90  34148.44    B
# 3    c 405946.375 638019.54 459639.43    C
# 4    d   4640.349   2397.72  10490.84    D
# 5 <NA> 114740.534 173822.25 128606.27 <NA>

mean(data$time1) ## Just for verification...
# [1] 114740.5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...