Author Topic: Increase speed by using RAM instead of HDD for processing  (Read 5716 times)

Offline mano3m

  • Contributor
  • ***
  • Posts: 10
Increase speed by using RAM instead of HDD for processing
« on: November 20, 2010, 09:16:25 am »
Increasing speed and reducing HDD traffic is required for people with one slow HDD and a fast (e.g. 120Mbit) internet connection. I have seen the processing take 3 times as long as the downloading itself. I have two ideas for doing this for a standard rarred release (many rar files in which the relase is stored and not compressed):
- Load all parts of each downloaded file (up to a certain file size limit) into memory instead of HDD and decode it to memory. (I guess memory required is 3 times the file size because we need to store the parts of file 1, the decoded file, and parts of file 2 when decoding file 1)
- the second item is to unrar directly from memory and not store the rar file at all. When repairing is required, repair the extracted file directly by using a similar sceme as the ReScene Data Recovery Archive tool uses. What I mean is when loading the par2 blocks, add the rar file headers on the fly, and write the repaired block directly to the unrarred file (without the rar headers off course)
These ideas would reduce data being written and read to the harddisk by a factor of 3 and thus increase the speed by a factor of 3 if the HDD is limiting. I suggest to make it an option with a "maximum memory to use" parameter.


Offline Hecks

  • Contributor
  • ***
  • Posts: 2011
  • naughty cop
Re: Increase speed by using RAM instead of HDD for processing
« Reply #1 on: November 20, 2010, 10:49:29 am »
On the first: Setup > Download #2 > Article Caching.

On the second: I guess this is a limitation of par2.exe and unrar.dll, but installing a RAM disk for your d/l directory would probably give you this functionality.
« Last Edit: November 20, 2010, 10:53:27 am by Hecks »

Offline mano3m

  • Contributor
  • ***
  • Posts: 10
Re: Increase speed by using RAM instead of HDD for processing
« Reply #2 on: November 20, 2010, 11:21:19 am »
Thanks for the article caching option. Wasn't aware of that one :)

About option 2, you are right, what would need to be done is to include par2 and unrar code into the Alt.Binz code. I can imagine this is a lot of work, but for me it would be worth it :)

A RAM disk would have its limitations. With the current way of storing one would require a ram disk of at least the size of the download plus some more for repaired files. For downloading a nice 1080p .mkv this means a ramdisk of 10GB ;) The advantage of using my proposed method would be that you dont require that much RAM.

Offline Hecks

  • Contributor
  • ***
  • Posts: 2011
  • naughty cop
Re: Increase speed by using RAM instead of HDD for processing
« Reply #3 on: November 20, 2010, 11:35:05 am »
My understanding of how parity checking works is very basic, so maybe I'm missing it: doesn't the process require all data in the set to be present (minus the missing parts)? In which case, you'd still need to cache all the rars in RAM?

Offline mano3m

  • Contributor
  • ***
  • Posts: 10
Re: Increase speed by using RAM instead of HDD for processing
« Reply #4 on: November 20, 2010, 04:10:17 pm »
For checking you dont need all the files (Alt.Binz tells you which files are corrupted before finishing the download, so this can be during downloading. For repairing it indeed is different and you need the download to be complete (or at least have enough blocks to repair the incomplete files). Alt.Binz is currently acutally checking the files twice before repairing, first after each file is stored, second when it is clear repairs are needed. So instead of storing the file first, then reading it again to check, then reading it for the second time to check, this can all be done at once in memory. Then after all files are downloaded, checked and extracted (with errors), repair can start based on which blocks are missing from the check and the data in extracted file plus the downloaded extra blocks. The repaired data can now directly be insterted into extracted file. During the read of correct / complete blocks and write of repaired data you need to take into account the .rar file header, etc.

You still use the disk as storage, but read it only once from disk instead of three times and write it only once plus the repaired data instead of twice (plus repared files). This off course only works if it is one big file you are extracting and this file is stored in the rars and not compressed. As this is the case for all scene releases I dont see an issue here though (you just have to check for it before starting to process everything in this manner).

Offline Hecks

  • Contributor
  • ***
  • Posts: 2011
  • naughty cop
Re: Increase speed by using RAM instead of HDD for processing
« Reply #5 on: November 20, 2010, 06:06:32 pm »
Er ... I think I'm following.  You mean the rars shouldn't be repaired, but the archive content unrared with errors, then repaired blocks written directly to this file? What if you want to keep the rars?  How to repair without needing to keep the whole archive in RAM?

It might be easier if you broke it down into a step 1, 2, 3 for us ...



Offline Chuckle123

  • Contributor
  • ***
  • Posts: 100
Re: Increase speed by using RAM instead of HDD for processing
« Reply #6 on: November 20, 2010, 06:30:23 pm »
I think there are reasons for each one of the checks. Let me take a stab at it. What you see AB denoting as incomplete queued files is probably a guess based on expected file size vs actual; resulting in a good guess as to the number of blocks missing. The first actual check is probably used to determine the actual d/led size, needed for AB to queue up par files to make needed repairs. The second one is a final check before repairs begin; more of a sanity check but I would say good to have.  I would recommend you use the Article Caching Hecks mentioned; works like a champ.  If you have a slow drive you may want to Pause d/ling while repairing and unraring; two options.  You may also want to unrar to a different drive if you are not already doing so; suggested to me by Hecks when I first joined the party.  This is not to say that what you propose would not improve performance; naturally YMMV.  It would entail a larger memory footprint, and that is before we start adding Article Caching and other goodies.

ce

Offline mano3m

  • Contributor
  • ***
  • Posts: 10
Re: Increase speed by using RAM instead of HDD for processing
« Reply #7 on: November 20, 2010, 06:59:02 pm »
Hecks, That indeed is what I mean. If you want to keep the rars it indeed does not add any value. You could use ReScene Data Recovery Archive tool to reconstruct them though... Note that I'm only suggesting it to be a selectable option and maybe not even the default one ;)

So in steps what I suggest.

0. NZB download -> sort files in right order -> download .par2 file and create file list

For each archive file:
1. Download the parts to RAM
2. Decode the parts to archive file in RAM (note that AB can continue downloading parts for the next file)
3. Check the file integrity -> store which blocks are OK and which are corrupted / missing
4. Unrar to HDD (or continue unrarring with this file) and dont stop when errors are encountered
5. Store archive header (just like the ReScene Data Recovery Archive tool does) in RAM or in sepperate file on HDD
6. Remove archive from RAM
Do this until all archive files are downloaded and extracted

@4 With scene releases the movie/iso is only stored, so instead of unrarring you can also just remove the headers and write it all to disk.
@4 If an entire file is missing, make sure space is left over in the file (just like for missing blocks)

If blocks were found missing or corrupted, start repair of extracted archive:
11. Unpause / download number of required extra blocks to HDD (as found in step 3.)
12. Construct RS matrix with exponents from 11. (using data from step 3.)
13. Reconstruct missing blocks with RS matrix and all blocks from the extracted file, by using the stored archive headers to reconstuct original archive files / original blocks (on the fly, one archive file file at a time, no need to store the archive files themselves in RAM)
14. Write missing blocks directly to the extracted file (taking into account archive file headers)

And voila. We only wrote to the HDD in step 4 and 14, we only read from disk in step 13. (assuming the ReScene Data is kept in RAM). So in summary: we need a lot less HDD activity using this method. The difficulty is that all processes have to be running at the same time and waiting for eachother: encoding, then verifying, then unrarring. This makes it impossible to use the stand-alone par2.exe and unrar.dll, but require the code to be integrated.

Note that coping with exiting the program while downloading and continuing when restarted might be a bit more tricky than normal and crashing / unexpected shutdown could result in loosing then entire download.

Hope this clarifies what I was thinking... :)
« Last Edit: November 20, 2010, 11:01:39 pm by mano3m »

Offline mano3m

  • Contributor
  • ***
  • Posts: 10
Re: Increase speed by using RAM instead of HDD for processing
« Reply #8 on: November 20, 2010, 07:05:53 pm »
I think there are reasons for each one of the checks. Let me take a stab at it. What you see AB denoting as incomplete queued files is probably a guess based on expected file size vs actual; resulting in a good guess as to the number of blocks missing. The first actual check is probably used to determine the actual d/led size, needed for AB to queue up par files to make needed repairs. The second one is a final check before repairs begin; more of a sanity check but I would say good to have.  I would recommend you use the Article Caching Hecks mentioned; works like a champ.  If you have a slow drive you may want to Pause d/ling while repairing and unraring; two options.  You may also want to unrar to a different drive if you are not already doing so; suggested to me by Hecks when I first joined the party.  This is not to say that what you propose would not improve performance; naturally YMMV.  It would entail a larger memory footprint, and that is before we start adding Article Caching and other goodies.

ce


To comment on your items:
The guess you mentioned above might also be just a 1 file par check, which does the same thing as you suggest but then 100% accurate. This is exactly the same as the final check (so it does the check twice). What I propose would remove your first check, and only keep the second check (which now is done internally in par2.exe before it starts to repair).

Just turned article caching on and works like a charm :D

The problems I mentioned are actually happening at someone else's, I use a SSD disk for temporary files and a large HDD for the end result. In my case caching increases my SSD life, but has no effect on performance ;)

Concerning memory footprint, if you already use caching, the only extra memory required is to store one archive file (the largest ones I've seen are 200MB) so that should not be a big issue.

Offline Hecks

  • Contributor
  • ***
  • Posts: 2011
  • naughty cop
Re: Increase speed by using RAM instead of HDD for processing
« Reply #9 on: November 20, 2010, 09:14:14 pm »
I see.  Well, that's certainly very clear now, thanks for taking the time to explain. :)  Only Rdl can say if this is feasible for Alt.Binz ofc, and as you point out it would need a complete rewrite to include code that would work with the equivalent of a ReScene matrix in RAM rather than files.  A potential probem here I guess would be if the missing parts included the actual header bytes for the RAR file blocks themselves (although this would be relatively unlikely for a single file in the archive).

@Chuckle123: AltBinz doesn't need to guess whether files are incomplete when displaying in the queue, as this info is given by the NZB itself.  If the NZB has segments missing or the articles can't be found on the server, then it will be marked as incomplete.

Offline DipDancer

  • Contributor
  • ***
  • Posts: 16
Re: Increase speed by using RAM instead of HDD for processing
« Reply #10 on: November 22, 2010, 09:48:39 pm »
would be nice to see such an implementation. it should be possible in general, as I heard new newsbin beta version will be able to do that.

the article caching thing is not a real solution, as for example par checking and par repairing are not affected by that. if all the processing (par checking, repairing, article download destination + building) could be shifted over to ram, slow computers or those with only 1 HDD could work MUCH better.
« Last Edit: November 22, 2010, 09:50:24 pm by DipDancer »

Offline Hecks

  • Contributor
  • ***
  • Posts: 2011
  • naughty cop
Re: Increase speed by using RAM instead of HDD for processing
« Reply #11 on: November 23, 2010, 02:13:02 am »
I'd like to see the link to the discussion of that Newsbin beta, as all I see for the current beta pre-release is AutoPAR which uses files (including a database file).

Article caching has a big impact, since yEnc decoding is expensive and this is where the disk thrashing really happens if it's not enabled. Otherwise I would guess that a major rewrite of Alt.Binz to accomodate a small minority of slow computers would not be a high priority for most users.  The disk reads for par2 checking & repair I would also guess are not the true bottleneck in the process.

Offline opentoe

  • Contributor
  • ***
  • Posts: 103
Re: Increase speed by using RAM instead of HDD for processing
« Reply #12 on: November 25, 2010, 01:32:26 am »
Could someone give me a good cache size number to use? Is there like a sweet spot that is usually the norm?

Offline necrocowboy

  • Contributor
  • ***
  • Posts: 62
Re: Increase speed by using RAM instead of HDD for processing
« Reply #13 on: November 26, 2010, 10:21:51 am »
You would need a large amount of available RAM in order for this to work.  If you D/L a 4Gb movie, you would need at least 10Gb available RAM (4 to download into, 4 to unpack into, 2 working) - any break in download / power cycle would lose all data, not something most people would be happy with.

Would it be possible to use more RAM for unpacking (i.e. use all available & read 500Mb / 1Gb RAR into RAM, unpack / decode and write to HDD?)

Offline mano3m

  • Contributor
  • ***
  • Posts: 10
Re: Increase speed by using RAM instead of HDD for processing
« Reply #14 on: December 13, 2010, 04:01:08 pm »
You would need a large amount of available RAM in order for this to work.  If you D/L a 4Gb movie, you would need at least 10Gb available RAM (4 to download into, 4 to unpack into, 2 working) - any break in download / power cycle would lose all data, not something most people would be happy with.

Would it be possible to use more RAM for unpacking (i.e. use all available & read 500Mb / 1Gb RAR into RAM, unpack / decode and write to HDD?)

Please reread my explanation. You only need to store one rar file in memory for all stages (in addition to the currently already  cashed parts). As you are extracting to disc as you download you have the backup right there if the program / system crashes. (In addition you need the stored rar file headers off course.)