Is there an easy way to avoid dealing with text encoding problems?
If you are starting off with a String you can also do the following:
new ByteArrayInputStream(inputString.getBytes("UTF-8"))
1.4m articles
1.4m replys
5 comments
56.9k users