We will talk about the following

an example of a workload not well-suited for RDBMS
an overview of Redis
a procedure to convert PostgreSQL schema to Redis schema
performance improvements achieved
when Redis would not cut it
Redis as a message bus
Redis as a message queue

DHCP leases history application

DHCP leases history database table

Column	Type
ip	varchar(15)
circuit_number	varchar(64)
start	integer (UNIX time)
stop	integer (UNIX time)
service_type	varchar(32)
access_type	varchar(32)
mac	varchar(20)

DHCPD syslog entries

Jun 17 06:25:18 \
  111.222.33.44 radiusd[12345]: \
  DHCPUSER 11.22.33.44 on CN-123456 \
  duration 1800 \
  service-type Internet \
  access-type DSL \
  mac 01:02:03:04:05:06

About a hundred such lines a second.

API requests

substantially less frequent than updates/inserts
about 2 per second
request types:
- lease information about an IP now
- leases information about a circuit now
- leases information about an IP from start_time to end_time
- leases information about a circuit from start_time to end_time
the majority of the requests is about an IP now
- requirement: no significant processing delay (= no old data)

Processing costs for the indexer

Minimum 2, maximum 3 database operations per line.
Typically 2, there are many more lease extensions than expirations or new allocations
on a modestly priced server, it is a problem
especially when DB vacuuming kicks in
- significant delays are introduced

Possible solutions

database optimization (make your schema smarter)
database optimization (buffers, query planner etc)
clustering
~~more expensive~~ faster hardware
big ^%%@$*^!# Oracle cluster
MySQL
one of the NoSQL solutions
- Cassandra
- CouchDB
- HBase
- MongoDB
- Redis
- Riak

Redis - an in-memory key-value store

http://redis.io/
supports complex datatypes
- strings
- hashes
- lists
- sets
- sorted sets
simple text-based protocol
- command-line client is available
- telnet or nc can be a client as well
CPAN modules are available

Redis - what it is used for

From http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis, a comparison of various NoSQL databases:

Best used: For rapidly changing data with a foreseeable database size (should fit mostly in memory).

Our case:

rapidly changing? check
foreseeable database size? check

Redis - operations on strings

use Redis;

my $rd = Redis->new(encoding => undef);
for my $k (1..100000) {
    $rd->set("key_$k", rand);
}
say $rd->get("key_4242");
say join ", ", $rd->keys("*_4242*");

0.634714091076177
key_42420, key_42421, key_42422, key_42423,
key_42424, key_42425, key_42426, key_42427,
key_42428, key_42429, key_4242

Takes 12 seconds = handles more than 8k sets/sec on my desktop.

Redis - operations on hashes

use Redis;

my $rd = Redis->new;
$rd->hset("hashname", field => "value");
$rd->hmset("hashname", first_name => "John",
    last_name => "Doe", age => 20);
if ($rd->hexists("hashname", "first_name")) {
    say "Hi, ", $rd->hget("hashname", "first_name");
}
$rd->hdel("hashname", "field");
$rd->hincrby("hashname", "age", 22);
say "keys: ", join ", ", $rd->hkeys("hashname");
say "vals: ", join ", ", $rd->hvals("hashname");
say "name: ", join " ", $rd->hmget("hashname",
    "first_name", "last_name");

Hi, John
keys: first_name, last_name, age
vals: John, Doe, 42
name: John Doe

Redis - operations on lists

my $rd = Redis->new;
$rd->rpush("mylist", "value");
$rd->lpush("mylist", "Hello");
$rd->linsert("mylist", "before", "value", "World");
$rd->lset("mylist", 1, "Sweden");
say $rd->lindex("mylist", 2);
say $rd->rpop("mylist");
say join ", ", $rd->lrange("mylist", 0, -1);

value
value
Hello, Sweden

Redis - operations on sets

$rd->sadd("myset", "red");
$rd->sadd("myset", "green");
$rd->sadd("myset", "green");
$rd->sadd("myset", "green");
$rd->sadd("myset", "blue");
$rd->sadd("myset", "supernova");
say $rd->sismember("myset", "supernova");
$rd->srem("myset", "supernova");
say $rd->scard("myset");
say join ", ", $rd->smembers("myset");
say $rd->srandmember("myset");
# unions, intersections, differences

1
3
red, blue, green
red

Redis - atomicity concerns and transactions

all standard single fetch/store operations are atomic
what about reading a value and modifying it?
there are special commands provided by Redis
- increment/decrement
- set a value if the key is not there
- multi/watch/exec combo

Redis - atomic increment

Used for sequence emulation.

my $rd = Redis->new;
say $rd->incr("myseq");
sleep 5;
say $rd->incr("myseq");

1
5

(Obviously, something else was doing INCR at the time)

Redis - set if not there (poor man's locking)

if ($rd->setnx("mylock", "can lock")) {
    say "We can lock!";
    # do stuff
    sleep 5;
    $rd->del("mylock");
} else {
    say "Sorry, already locked"
}

There is a problem if the process dies in between;
this can be worked around with more code.

Redis - multi/watch/exec

In the latest versions of Redis there is a support for check-and-set transactions.

Unfortunately, it seems that CPAN modules have not caught up yet.

Redis - reliability

what if the server crashes?
it's a main-memory store, is everything lost?

Well, thankfully not quite.

Redis provides two mechanisms to keep your data safe.

it can save the complete database on disk periodically
it can write a kind of a journal file

Redis - saving database on disk

forks, then saves the full copy
- the main instance continues serving requests
on startup, load the latest copy
if the server crashes, all changes after the latest save are lost

save 900 1
save 300 10
save 60 10000

this method is generally not recommended
- can loose too much

Redis - journaling

constantly appends update commands to a journal file
calls fsync, configurable (never, on every write, every second)
- "every second" is often a good option
obviously, the journal file can grow pretty large

set x 42
set x 666
set x 42

you get the idea

Redis - journaling (continued)

truncate with BGREWRITEAOF command
- BackGround REWRITE Append-Only-File
forks, writes minimal changes file with a new name, syncs the diff, renames the file
if crashes, you loose 1 second's worth of updates
- it is tolerable for most tasks
- if it is not tolerable, Redis is probably not for your task
the BGREWRITEAOF is typically run from cron
- */5 * * * * echo BGREWRITEAOF | redis-cli >/dev/null
- how often depends on your update rate; experiment!

Conversion steps

identify operations needed
design a Redis schema supporting all operations
estimate memory requirements
decide whether to do a complete or a gradual switch over
write code
test
put into production

Identify operations needed

actual SQL statements from production code serve as a good approximation
what operations are the bottleneck?
- which ones are executed most often?
- which ones must be very fast (not necessarily the same)?
are there any batch reports? how those are going to be handled?
are there any periodic batch jobs?

Indexer: SQL for "fetch latest"

SELECT * FROM leases
WHERE ip = '11.22.33.44'
ORDER BY stop DESC
LIMIT 1

Indexer: SQL for "update"

UPDATE leases
SET
  stop = 1308344787,
  service_type = 'Internet',
  access_type = 'DSL',
  mac = '01:02:03:04:05:06'
WHERE
  ip = '11.22.33.44' AND
  circuit_number = 'CN-123456' AND
  start = 1308344185

The "expire latest" op is implemented in terms of "update" as well.

Indexer: SQL for "insert"

INSERT into leases
  (ip, circuit_number,
  start, stop,
  service_type, access_type
  mac) VALUES (
  '11.22.33.44',
  'CN-123456',
  1308344185,
  1308344787,
  'Internet',
  'DSL',
  '01:02:03:04:05:06')

API: SQL for "IP now"

SELECT * FROM leases
WHERE
  ip = '11.22.33.44' AND
  stop >= 1308344185 AND
  1308344185 >= start

API: SQL for "Circuit now"

SELECT * FROM leases
WHERE
  circuit_number = 'CN-123456' AND
  stop >= 1308344185 AND
  1308344185 >= start

API: SQL for "IP history"

SELECT * FROM leases
WHERE
  ip = '11.22.33.44' AND
  stop >= 1308344185 AND
  1308399185 >= start

API: SQL for "Circuit history"

SELECT * FROM leases
WHERE
  circuit_number = 'CN-123456' AND
  stop >= 1308344185 AND
  1308399185 >= start

Periodic cleanup: SQL

DELETE FROM leases
WHERE stop < 1276863185

"1 year ago"

Design a Redis schema: observations

updates are much more frequent than inserts
- expirations are always paired with inserts
most queries are for "IP now" and "Circuit now"
the history for a given IP or circuit is limited in size
the current state is changed and queried much more often than the history
constraints on the values - no spaces
a circuit might have multiple IPs at a given time
an IP cannot have multiple circuits at a given time

Redis schema: four kinds of keys

"now" state for an IP
"now" state for a circuit
IP history
Circuit history

Redis schema: "now" state for an IP

key:   ip:11.22.33.44
type:  string
value: CN-193688 1307003895 1308427546 Internet DSL 01:02:de:ad:be:ef

Redis schema: "now" state for a circuit

key:   cn:CN-193688
type:  set
values:
 1. 11.22.33.44
 2. 10.235.75.143
 3. 10.13.94.142

Redis schema: IP history

key:   hip:11.22.33.44
type:  string
value:
CN-180676 1303078615 1303338407 IPTV DSL 01:02:03:04:fe:ef\n
CN-186104 1304247778 1305890933 IPTV DSL 01:02:03:04:f0:0f\n
etc.

Need to clean old values somehow.

Redis schema: Circuit history

key:   hcn:CN-180676
type:  string
value:
11.22.33.44 1303078615 1303338407 IPTV DSL 01:02:03:04:fe:ef\n
22.33.44.11 1300572400 1300577082 Internet DSL 01:08:22:33:ee:dd\n
etc.

Again, need to clean old values somehow.

Indexer: redis ops for "fetch latest"

my $ip = "11.22.33.44";
my $s = $rd->get("ip:$ip");
return undef unless $s;
my ($cn, $start, $end, $st, $at, $mac) = split / /, $s;
return {
    ip           => $ip,
    cn           => $cn,
    start        => $start,
    end          => $end,
    service_type => $st,
    access_type  => $at,
    mac          => $mac,
};

Indexer: redis ops for "update"

$rd->set("ip:$ip", "$cn $start $end $st $at $mac");
$rd->sadd("cn:$ip", "$ip $start $end $st $at $mac");

Two Redis operations. No need for locking, single updater. Can be extended to multiple updaters if we guarantee disjoint IP set.

Note: "insert" is identical to "update".

Indexer: redis ops for "expire latest"

my $hip = $rd->get("hip:$ip");
my $ip_info = "$cn $start $end $st $at $mac";
$hip = $hip ? "$hip\n$ip_info" : $ip_info;
$rd->set("hip:$ip", $hip);

my $hcn = $rd->get("hcn:$cn");
my $cn_info = "$ip $start $end $st $at $mac";
$hcn = $hcn ? "$hcn\n$cn_info" : $cn_info;
$rd->set("hcn:$cn", $hcn);

$rd->srem("cn:$cn", $ip);
$rd->del("ip:$ip");

API: redis ops for "IP now"

we do "fetch latest" first
we return "nothing is active" if fetch latest returns undef
we return "nothing is active" if now is not between start and stop
otherwise, we return the info

API: redis ops for "Circuit now"

my @ips = $rd->smembers("cn:$cn");
my $now = time;
my @r;
for my $ip (@ips) {
    my $info = $rd->get("ip:$ip");
    next unless $info;
    my ($cn2, $start, $stop, $st, $at, $mac) = split / /, $info;
    next unless $cn eq $cn2;
    next unless $start <= $now && $stop >= $now;
    push @c, {
        ip           => $ip,
        service_type => $st,
        access_type  => $at,
        mac          => $mac,
    };
}

API: redis ops for "IP history"

we get "ip:$ip"
we get "hip:$ip", split by "\n"
go through all lines
reject if stored start is greater than requested stop
reject if stored stop is less than requested start
return all matches, sorted

API: redis ops for "Circuit history"

we go through all IPs in "cn:$cn", fetching their info
we get "hcn:$cn", split by "\n"
go through all lines
reject if stored start is greater than requested stop
reject if stored stop is less than requested start
return all matches, sorted

Estimate memory requirements

make absolutely sure that expected dataset will fit in main memory
note that 32-bit Redis will generally use less memory than 64-bit one using the same dataset, due to pointer size difference
dump your RDBMS to a test machine
write a script that converts RDBMS dataset into Redis dataset
see how much memory it uses
this should give you a good idea about what to expect
in the DHCP leases example, I got around 700 MB, which is very adequate and leaves room for growth
memory is cheap!
- actually, one of our sysadmins claims that memory is free

Decide whether to do a complete or a gradual switch over

can we "pull the plug"?
we can on the indexer
but we cannot on the API side
- keep the API code for DB access
- pull the plug once no DB data left

Testing the solution

For a while, run two indexers in parallel and do the checks on the data generated by the Redis one.

Do the queries against both DB and Redis.

But don't return results of the Redis queries to the user yet.

Put into production

In our case, this is an easy step:

we already kept two indexers running
we already have a version of API code that queries both DB and Redis
switch over to this version of API - start returning Redis results
then kill the DB indexer
make a mark in your calendar to switch to the Redis-only query API whenever there is nothing interesting in the DB

Performance improvements achieved

BWA-HA-HA-HA

Redis: publish/subscribe mechanism

A simple message sender:

$rd->publish("ch1", "message to ch1");
$rd->publish("ch2", "something to ch2");
$rd->publish("ch1", "quit");

Redis: subscribe to a channel

my $quit = 0;
$rd->subscribe("ch1", sub {
    my $msg = shift;
    print "$msg\n";
    $quit = 1 if $msg eq "quit";
});
$rd->wait_for_messages(1) until $quit;

message to ch1
quit

Redis: subscribe to many channels

my $quit = 0;
$rd->psubscribe("ch*", sub {
    my $msg = shift;
    print "$msg\n";
    $quit = 1 if $msg eq "quit";
});
$rd->wait_for_messages(1) until $quit;

message to ch1
something to ch2
quit

What about persistent queues?

Emulate them!

There is a feature in Redis which I did not talk about yet.

It can be made to expire keys.

my $rd = Redis->new;
$rd->setex("mykey", 3, 42);  # 3 seconds
say $rd->get("mykey");
sleep(1);
say $rd->ttl("mykey");
sleep(3);
say $rd->get("mykey");

42
2
(undef)

RT transactions queuing

Request tracker is instrumented to put JSON-encoded transactions into Redis:

The code for populating $val is too large to fit on this ~~margin~~ screen.

my $val = {...};
my $seq = $rd->incr("seq-tr");
my $key = "tr:$seq";
$rd->set($key, encode_json($val));
$rd->expire($key, 3600);

RT transactions querying

we create a class, TART, to do the heavy lifting
the users of this class register a unique name to identify themselves
the TART instances provide an iterator-like interface:

my $tart = TART->new("interested-party");
while (my $transaction = $tart->next) {
    # do something with %$transaction hash
    # which is the same as %$val on the previous slide
}
# register last processed transaction for "interested-party"
$tart->commit;

TART constructor

sub new
{
    my ($class, $id) = @_;
    # $id == "interested-party" in our example
    return bless { id => $id }, $class;
}

TART iterator

sub next
{
   my $me = shift;
   $me->_fetch unless $me->{rd};
   while (@{$me->{keys}}) {
      my $k = shift @{$me->{keys}};
      my $v = $me->{rd}->get("tr:$k");
      if ($v) {
         $me->{last_seen} = $k;
         return decode_json(Encode::encode("utf-8", $v));
      }
   }
   return undef;
}

TART fetcher

sub _fetch
{
   my $me = shift;
   $me->{rd} = Redis->new;
   $me->{last_seen} =
     $me->{rd}->get("tart:daemon:last:$me->{id}") || 0;
   $me->{keys} = [
      sort { $a <=> $b }
      grep { $_ > $me->{last_seen} }
      map { /^tr:(\d+)$/ ? ($1) : () }
      $me->{rd}->keys("tr:*")
   ];
}

TART commit

sub commit
{
    my $me = shift;
    $me->_fetch unless $me->{rd};
    $me->{rd}->set("tart:daemon:last:$me->{id}",
      $me->{last_seen});
}

Malmö, June 2011

Replacing Relational DB with Redis: a Case Study