summaryrefslogtreecommitdiff
path: root/README
blob: 5667a7a090a01b7725616bb27a56e6dbb296d9a1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
1. Installation

1.1 Pre-requirements

 * HTTP server with mod_rewrite support
 * Apache mod_python
 * Python 2.x (tested on 2.7)
 * BeautifulSoup4
 * Python modules: hashlib, json, os, re, time, urllib, urllib2, urlparse

1.2 Installation

 * http/ folder is the publicly visible document root for the project
 * cache/ folder is where cached (HTML) data is held, it's outside the document
   root by default (if you can't put it elsewhere, you can change the path)

1.3 Configuration

Main application file, http/index.py defines some configuration parameters:
 * BASE_PATH: absolute path for the application within host/domain
              (change if you're planning on running the app from a subfolder)
 * CACHE_PATH: path to the cache data folder (must be writable by web-server),
               relative to http/index.py
 * CEZAR_URL: remote URL for fetched site (main page)
 * CACHE_EXPIRY_LIMIT: oldest cache timestamp to not be refreshed from remote

2. Usage

The following assumes BASE_PATH is set to default "/" (app runs on separate
host/domain).

Every request in form of /[ANYTHING] is forwarded to the search form
of the remote site (?pid_search=[ANYTHING]&p=21).
Fetched data is cached for 24 hours (cache key is [ANYTHING]). Cache can be
force-refreshed by requesting /[ANYTHING]/refresh URL.

Every request to /pic/[RESOURCE_FILE] for files which are not present locally
is being forwarded to the remote server. The file is fetched and from then on
served from local server (so once we become mature enough not to steal content,
we can remake these files).

Fetched data is then diced and sliced to provide a compact, business card-like
view of the profile. All images outside of /pic/[RESOURCE_FILE] sources are
being fetched from remote server and served locally from the /foto directory.

For added evilness, all requests to remote servers spoof User Agent String to
the original UAS from end-user browser request.

Good thing the remote site has metric shitton of inline CSS, so that very little
style alteration was necessary.