There are two types of proxies

  • reverse proxies (hide server from the client)
  • forward proxies (hide client from the server)

Reverse proxies could be easily implemented with vegur/cowboyku from heroku. cowboyku is a heroku fork of cowboy webserver created exactly for building reverse proxies in erlang. It is used in heroku for http routing feature.

For forward proxies, it’s a little bit tricky. None of the popular erlang webservers does support CONNECT method. Let’s elaborate on how to do this from scratch.

First of all, we need to start to listen to a tcp port

{ok, ListenSocket} = gen_tcp:listen(3128, [
    binary,
    {packet, http},
    {active, false},
    {reuseaddr, true}
])

Couple words about options:

  • {packet, http} will allow us not to worry about parsing http and rely on Erlang stdlib.
  • {active, false} Simple rule: don’t use {active, true}. It’s needed to avoid message-box overload.

Once we have a listen socket, we can accept connections from a client

{ok, ClientSocket} = gen_tcp:accept(ListenSocket).

Accept is a blocking operation. It will wait for a client as long as needed.

Once we have a client socket, we can begin an exchange of bytes via tcp with the client. For this, we will need some data structure to accumulate data related to the request. The most natural for erlang is to use the record for this purpose

-record(request, {
    uri,
    headers = []
}).

Besides uri, we also want to preserve headers. If we will want to introduce authentication to our proxy, we will need those headers. Because credentials are passed in the Proxy-Authentication header.

Thanks to {packet, http} we don’t have to parse any http-traffic and we are able to receive ready http messages

recv_loop(ClientSocket, #request{headers = Headers} = Request) ->
    case gen_tcp:recv(ClientSocket, 0) of
        {ok, {http_request, "CONNECT", Uri, _Version}} ->
            recv_loop(ClientSocket, Request#request{uri = Uri});
        {ok, {http_header, _, Name, _, Value}} ->
            recv_loop(ClientSocket, Request#request{headers = [{Name, Value} | Headers]});
        {ok, http_eoh} ->
            {ok, Request#request{headers = lists:reverse(Headers)}}
    end.

We deliberately do not match other methods to keep the only essence of the process. We are only interested in CONNECT method because we are building proxy now, not a regular web-server.

How does CONNECT method work? CONNECT method is an appeal to the server to establish a tunnel connection with a target host and resend all the data it receives.

{scheme, Host, Port} = Request#request.uri,
{ok, ServerSocket} = gen_tcp:connect(Host, list_to_integer(Port), [{packet, raw}, {active, false}].

Now we are using {packet, raw} because it’s not our business what traffic will go to the target host. Most likely it will be an encrypted https-traffic. Thus it’s just bytes for us.

After tunnel connection is established we should notify a client that CONNECT request is over and it can start sending the main request. Also, we need to change socket mode from http to raw for client socket as well

ok = inet:setopts(ClientSocket, [{packet, raw}]),
ok = gen_tcp:send(ClientSocket, <<"HTTP/1.1 200 OK\r\n\r\n">>).

Well, Everything is ready for proxy. But there is a little problem here. Due to we don’t know the nature of traffic and protocol between client and server, we cannot know when to stop receiving data from the client and start sending it to the server and vice versa. But the client and server know. So we have to do a client-server transfer and server-client transfer and at the same time until the sockets will be closed.

proxy(ClientSocket, ServerSocket) ->
    spawn_link(fun() -> transfer(ClientSocket, ServerSocket) end),
    spawn_link(fun() -> transfer(ServerSocket, ClientSocket) end).

transfer(From, To) ->
    case gen_tcp:recv(From, 0) of
        {ok, Data} ->
            ok = gen_tcp:send(To, Data),
            transfer(From, To);
        {error, _Error} ->
            ok
    end.

That is all. Erlang has everything to build forward proxies in a very straightforward and easy way. You can see a full implementation here.