Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

postgresql - Counting chars in sequences via SQL

I have a database with a sequence table. Each (amino acid) sequence in this table comprises of 20 different chars (A, V, ...). For instance "MQSHAMQCASQALDLYD...".

I would like to count the number of appearance of each char, so that I get something like "2xM, 3xQ, ...".

Furthermore, I would like to do this over all sequences in my DB, so I get the overall appearance of each char. ("248xM, 71x W,...").

How can I do this in PostgreSQL? At the moment, I am doing it with Ruby, but I have 25,000 sequences with a length of about 400 chars each. This takes a while and I hope it will be faster with SQL.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is How to find all A's in a string:

select length(regexp_replace('AAADDD', '[^A]', '', 'g'));

This is how to find all A's in a table:

select sum(length(regexp_replace(field, '[^A]', '', 'g'))) from table;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...