apache tika - How to boost a SOLR document when indexing with /solr/update -


to index website, have ruby script in turn generates shell script uploads every file in document root solr. shell script has many lines this:

  curl -s \  "http://localhost:8983/solr/update/extract?literal.id=/about/core-team/&commit=false" \  -f "myfile=@/extra/www/docroot/about/core-team/index.html" 

...and ends with:

curl -s http://localhost:8983/solr/update --data-binary \ '<commit/>' -h 'content-type:text/xml; charset=utf-8' 

this uploads documents in document root solr. use tika , extractingrequesthandler upload documents in various formats (primarily pdf , html) solr.

in script generates shell script, boost documents based on whether id field (a/k/a url) matches regular expressions.

let's these boosting rules (pseudocode):

boost = 2 if url =~ /cool/ boost = 3 if url =~ /verycool/ # otherwise not specify boost 

what's simplest way add index-time boost http request?

i tried:

curl -s \  "http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \  -f "myfile=@/extra/www/docroot/verycool/core-team/index.html" \  -f boost=3 

and:

curl -s \  "http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \  -f "myfile=@/extra/www/docroot/verycool/core-team/index.html" \  -f boost.id=3 

neither made difference in ordering of search results. want boosted results come first in search results, regardless of user searched (provided of course document contains query).

i understand if post in xml format can specify boost value either entire document or specific field. if that, isn't clear how specify file document contents. actually, tika page provides partial example:

curl "http://localhost:8983/solr/update/extract?literal.id=doc5&defaultfield=text" \ --data-binary @tutorial.html -h 'content-type:text/html' 

but again isn't clear where/how specify boost. tried:

curl \  "http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultfield=text&boost=3"\ --data-binary @mydoc.html -h 'content-type:text/html' 

and

curl \  "http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultfield=text&boost.id=3"\ --data-binary @mydoc.html -h 'content-type:text/html' 

neither of altered search results.

is there way update boost attribute of document (not specific field) without altering document contents? if so, accomplish goal in 2 steps: 1) upload/index document have been doing 2) specify boost documents

to index document in solr, have post /update handler. documents index put in body of post request. in general, have use xml format format of solr. using xml, can add boost value specific field or whole document.


Comments

Popular posts from this blog

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -