apache tika - How to boost a SOLR document when indexing with /solr/update -

- September 15, 2015

to index website, have ruby script in turn generates shell script uploads every file in document root solr. shell script has many lines this:

  curl -s \  "http://localhost:8983/solr/update/extract?literal.id=/about/core-team/&commit=false" \  -f "myfile=@/extra/www/docroot/about/core-team/index.html"

...and ends with:

curl -s http://localhost:8983/solr/update --data-binary \ '<commit/>' -h 'content-type:text/xml; charset=utf-8'

this uploads documents in document root solr. use tika , extractingrequesthandler upload documents in various formats (primarily pdf , html) solr.

in script generates shell script, boost documents based on whether id field (a/k/a url) matches regular expressions.

let's these boosting rules (pseudocode):

boost = 2 if url =~ /cool/ boost = 3 if url =~ /verycool/ # otherwise not specify boost

what's simplest way add index-time boost http request?

i tried:

curl -s \  "http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \  -f "myfile=@/extra/www/docroot/verycool/core-team/index.html" \  -f boost=3

and:

curl -s \  "http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \  -f "myfile=@/extra/www/docroot/verycool/core-team/index.html" \  -f boost.id=3

neither made difference in ordering of search results. want boosted results come first in search results, regardless of user searched (provided of course document contains query).

i understand if post in xml format can specify boost value either entire document or specific field. if that, isn't clear how specify file document contents. actually, tika page provides partial example:

curl "http://localhost:8983/solr/update/extract?literal.id=doc5&defaultfield=text" \ --data-binary @tutorial.html -h 'content-type:text/html'

but again isn't clear where/how specify boost. tried:

curl \  "http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultfield=text&boost=3"\ --data-binary @mydoc.html -h 'content-type:text/html'

and

curl \  "http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultfield=text&boost.id=3"\ --data-binary @mydoc.html -h 'content-type:text/html'

neither of altered search results.

is there way update boost attribute of document (not specific field) without altering document contents? if so, accomplish goal in 2 steps: 1) upload/index document have been doing 2) specify boost documents

to index document in solr, have post /update handler. documents index put in body of post request. in general, have use xml format format of solr. using xml, can add boost value specific field or whole document.

Search This Blog

Manage

apache tika - How to boost a SOLR document when indexing with /solr/update -

Comments

Post a Comment

Popular posts from this blog

How do .net 4.0 [named] tuples work under the hood? -

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -