This night i was in the process of mirroring all the tmbo.org daily pics for fast viewing. Their site has to be hosted on an ADSL link (like ush.it, hey this site is on a 200kbs/300kbs link, very unprofessional but no one can raid [stupid wordpress plug-in, this is not RAID in the sense of Redundant Disk Array but raid the verb] our server without our knowledge, think about the autistici/inventati aruba raid for example).
I don't want to explain how to grab something over HTTP, go to tmbo.org and figure out the structure yourself, i want to focus on how to implement pseudo-threading in the bash language. The first task i had to accomplish in order to grab the images was to understand witch daily archives were available. Easy task!
For each year in the interval i had decided, for each month of the year, for each day in the month, make an HTTP query like site.com/path/$year/$month/$day/ and see if the HTTP status is 200 or 404. I thought.
But the problem is that their site is so slow: it has an initial delay of 10, 15 seconds before receiving some data after the request. Also the other problem was that it was really late in the night, like 06:30 AM or so.
What I needed was parallelization of these HTTP requests to minimize the impact of this initial delay. Okay I said to myself, no problem, i'll use C and pthreads.
Holy shit it's 06:30 AM and I'm tired. And drunk. And lazy. Just waited 2 minutes in front of an empty nano screen with just "GNU nano X.X.X File: tmbo.org.c" and a blinking cursor to realize that i can't use C and pthreads in these conditions.
I'll do it in BASH or PHP I said. The problem is that none of the two have real thread support. What i needed was a hack, a clever solution to a problem. Hack to hack and considering that the solution will be inelegant/unoptimized in both languages i choose BASH Scripting. So this is MY solution:
#!/bin/bash TH_MAX=10 for j in `seq 2000 2006`; do for m in `seq -w 2 1 12`; do for d in `seq -w 2 1 31`; do while [ TRUE ]; do TH_NUM=`ps aux | grep "^ascii" | grep -v "grep" | grep "curl" | wc -l` if [ "$TH_NUM" -le "$TH_MAX" ]; then echo $( RES=`curl -Is "http://tmbo.org/offensive/$j/$m/$d/" | head -n1` echo "http://tmbo.org/offensive/$j/$m/$d/ $RES"; ) >> results & echo -en "." break else echo -en "W" sleep 1 fi done done done done
Pretty simple uh? It worked 100% as expected. At 07:00 i was sleeping and the program running. An extra for you are these screenshots i taken (cause ush.it has almost no images and i know you like images).
Fixed number of "child" processes.