file - How to organize a large number of objects -
we have large number of documents , metadata (xml files) associated these documents. best way organize them?
currently have created directory hierarchy:
/repository/category/date(when loaded our db)/document_number.pdf , .xml
we use path unique identifier document in our system. having flat structure doesn't seem option. using path id helps keep our data independent our database/application logic, can reload them in case of failure, , documents maintain old ids. yet, introduces limitations. example can't move files once they've been placed in structure, takes work put them way. best practice? how websites such scribd deal problem?
your approach not seem unreasonable, might suffer if more few thousand documents added within single day (file systems tend not cope large numbers of files in directory).
storing .xml document beside .pdf seems bit odd - if it's metadata document, should not in database (which sounds have) can queries , indexed etc?
when storing large numbers of files i've taken file's key (say, url), hashed it, , stored x levels deep in directories based on first characters of hash...
say started key 'how organize large number of objects'. md5 hash 0a74d5fb3da8648126ec106623761ac5 might store at...
base_dir/0/a/7/4/http___stackoverflow.com_questions_2734454_how-to-organize-a-large-number-of-objects
...or can find again given key started with.
this kind of approach has 1 advantage on date 1 in can scaled suit large numbers of documents (even per day) without 1 directory becoming large, on other hand, it's less intuitive having manually find particular file.
Comments
Post a Comment