SOLR is basically an Apache Tomcat container that implements a REST interface to query an Apache Lucene index. Yes, you need to be able to run a Java application on your web server. This is an issue for you to work out with your hosting provider.
Clients using your web app don't need to run Java. Your PHP app could make a REST query to the SOLR service and format the results in HTML. A client sees only the HTML output; it never needs to know that the data came from a service implemented in Java.
Zend_Search_Lucene
is a pure-PHP implementation that is supposed to work identically to Apache Lucene. The Zend solution even uses an identical index file format. So storage-wise they should be equal.
I used Java Lucene to index the StackOverflow data dump (October 2009). I indexed 1.5 million rows, including about 1 gig of text data. The Lucene index was 1323 MB, whereas the MySQL FULLTEXT index of the same data was only 466 MB.
Using SQL LIKE
predicates in lieu of any fulltext indexing solution requires no space of course, because it cannot make use of a conventional index anyway. But in my tests using LIKE
was about 200 times slower than Java Lucene, which was in turn about 40% slower than a MySQL FULLTEXT index on the same data.
See my recent presentation about fulltext indexing solutions with MySQL:
http://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql
It's not surprising that it can't match the performance and scalability of the Java Lucene technology. PHP's advantage as a language is increasing development efficiency, not runtime efficiency.
update: I just tried creating an index using Zend_Search_Lucene
. Creating an index is far slower with PHP than with the Java Lucene technology, so I only indexed 10,000 documents. This took almost 15 minutes, which would make it take about 36 hours to index the whole collection. Compare this to Java Lucene, which in my test indexed the full collection of 1.5 million documents in under 7 minutes.
The size of the index I created with Zend_Search_Lucene
is 8.75 MB. Extrapolating this 150x, I estimate the full index would be 1312.5 MB. So I conclude that Zend_Search_Lucene
creates an index of about the same size as the index produced by Java Lucene. This is as expected.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…