Resolving DNS in-image

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Resolving DNS in-image

Holger Freyther
Norbert and me looked at using DNS for service discovery and ran into some of the limitations of the NetNameResolver[1]. In the end I created an initial DNS implementation in Pharo called Paleo-DNS[2] to overcome these.

DNS is a protocol we use every day but rarely think of. There is an active IETF community that is evolving the protocol and finding new usages (service discovery is one of them).

In DNS there are different types of resource records (RR). The most commonly used ones in a client ("stub") are "A" for IPv4 addresses, "AAAA" for IPv6 addresses, "CNAME" for aliases, "SRV" records. So far only support for "A" records was implemented.

So if you are curious about DNS then this is a great opportunity to add your favorite RR implementation to it and send a PR. There are probably 20+ of them to go. ;)


Query example using DNS-over-TLS (DoT) to Google Public DNS

PaleoDNSTLSTransport new
        destAddress: #[8 8 4 4] port: 853;
        timeout: 2 seconds;
        query: (PaleoDNSQuery new
                           transactionId: (SharedRandom globalGenerator nextInt: 65535);
                                addQuestion: (PaleoRRA new rr_name: 'pharo.org.');
                                addAdditional: (PaleoRROpt new udpPayloadSize: 4096))


[1] It's blocking on Unix, on Mac only one look-up may occur at a time and it returns exactly one address. There is also no IPv6 support.

[2] https://github.com/zecke/paleo-dns
Reply | Threaded
Open this post in threaded view
|

Re: Resolving DNS in-image

Sven Van Caekenberghe-2
Hi Holger & Norbert,

Great, but ...

This did already exist in various forms, a couple of years ago I made a newer version, they can all be found in http://www.smalltalkhub.com/#!/~BenComan/DNS/ - including unit tests (but some of the older code in there is a bit stale).

It covers most record types, but most of them are not used a lot.

Simple usage:

 NeoSimplifiedDNSClient default addressForName: 'pharo.org'. "104.28.27.35"

One of my goals was to use it as a more reliable, non-blocking 'do we have internet access' test:

 NeoNetworkState default hasInternetConnection. "true"


From the class comments:

=======================
I am NeoSimplifiedDNSClient.

I resolve fully qualified hostnames into low level IP addresses.

  NeoSimplifiedDNSClient default addressForName: 'stfx.eu'.

I use the UDP DNS protocol.
I handle localhost and dot-decimal notation.

I can be used to resolve Multicast DNS addresses too.

  NeoSimplifiedDNSClient new useMulticastDNS; addressForName: 'zappy.local'.

Implementation

I execute requests sequentially and do not cache results.
This means that only one request can be active at any single moment.
It is technically not really necessary to use my default instance as I do not hold state.
====================================
I am NeoDNSClient.
I am a NeoSimplifiedDNSClient.

  NeoDNSClient default addressForName: 'stfx.eu'.

I add caching respecting ttl to DNS requests.
I allow for multiple outstanding requests to be handled concurrently.

Implementation

UDP requests are asynchroneous and unreliable by definition. Since DNS requests can take some time, it should be possible to have multiple in flight at the same time, thus concurrently. Replies will arrive out of order and need to be matched to their outstanding request by id.

If a request has been seen before and its response is not expired, it will be answered from the cache.

Each incoming request is handled by creating a NeoDNSRequest object and adding that to the request queue. This triggers the start up of the backend process, if necessary. The client then waits on the semaphore inside the request object, limited by the timeout.

The backend process loops while there are still outstanding requests that have not expired. It sends all unsent requests at once, and then listens briefly for incoming replies. It cleans up expired requests. When a reply comes in, it is connected to its request by id. The semaphore in the request object is then signalled so that the waiting client can continue and the request is removed from the queue. The process then loops. If the queue is empty, the backend process stops.
================

But that last one is less reliable, it was my last addition.


The main problems are concurrent and asynchronous requests, as well as error handling.

I would be great if we could do this well in-image. (But getting OS DNS settings is hard too).

We should talk ;-)

Sven

> On 28 Mar 2019, at 01:32, Holger Freyther <[hidden email]> wrote:
>
> Norbert and me looked at using DNS for service discovery and ran into some of the limitations of the NetNameResolver[1]. In the end I created an initial DNS implementation in Pharo called Paleo-DNS[2] to overcome these.
>
> DNS is a protocol we use every day but rarely think of. There is an active IETF community that is evolving the protocol and finding new usages (service discovery is one of them).
>
> In DNS there are different types of resource records (RR). The most commonly used ones in a client ("stub") are "A" for IPv4 addresses, "AAAA" for IPv6 addresses, "CNAME" for aliases, "SRV" records. So far only support for "A" records was implemented.
>
> So if you are curious about DNS then this is a great opportunity to add your favorite RR implementation to it and send a PR. There are probably 20+ of them to go. ;)
>
>
> Query example using DNS-over-TLS (DoT) to Google Public DNS
>
> PaleoDNSTLSTransport new
> destAddress: #[8 8 4 4] port: 853;
> timeout: 2 seconds;
> query: (PaleoDNSQuery new
>   transactionId: (SharedRandom globalGenerator nextInt: 65535);
> addQuestion: (PaleoRRA new rr_name: 'pharo.org.');
> addAdditional: (PaleoRROpt new udpPayloadSize: 4096))
>
>
> [1] It's blocking on Unix, on Mac only one look-up may occur at a time and it returns exactly one address. There is also no IPv6 support.
>
> [2] https://github.com/zecke/paleo-dns


Reply | Threaded
Open this post in threaded view
|

Re: Resolving DNS in-image

Holger Freyther


> On 28. Mar 2019, at 08:02, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Hi Holger & Norbert,
>

great. Regardless of how many versions exist. We should get one into the image with proper platform integration.  

I wasn't aware of your code but I assumed it is something you could write, hence the Paleo prefix. Now that the Paleo code is the Neo one is more funny...



> NeoSimplifiedDNSClient default addressForName: 'pharo.org'. "104.28.27.35"
>
> One of my goals was to use it as a more reliable, non-blocking 'do we have internet access' test:
>
> NeoNetworkState default hasInternetConnection. "true"


What is internet access and how would this be used? Is this about captive portals? With local network policy the big anycast services might be blocked but the user can still reach services. Or with deployed microservices they might reach other but not the outside?


... snip ...


> The main problems are concurrent and asynchronous requests, as well as error handling.
>
> I would be great if we could do this well in-image. (But getting OS DNS settings is hard too).
>
> We should talk ;-)

Agreed that getting the OS DNS settings is hard but not impossible (go seems to get away with its implementation on unix). It seems we manage to honor platform settings for http proxies and I am confident we can do it for DNS as well.


I planned to solve concurrency by creating one stub resolver per request and having a shared cache. The internet can be a hostile place and DNS is an easy victim. Cache poisoning does exist and random source port, 0x20 randomization, random transaction ids, disrespecting PTMU ICMP messages are the few mitigations we have.


Let's definitely talk. I hang out in the pharo discord group. :)


holger
Reply | Threaded
Open this post in threaded view
|

Re: Resolving DNS in-image

Sven Van Caekenberghe-2
Holger,

I'll write most of my comments inline.

Yesterday I moved my code to https://github.com/svenvc/NeoDNS using Tonel format, to make it a bit easier to consume. I also did a couple of minor updates while going over the functionality. Note the 'my code' is in the package Net-Protocols-DNS-Experimental, which is using the object model in 'Net-Protocols-DNS-MessageFormat' (that already existed before, I just ported/cleaned it a little bit to my taste).

Are you coming to the Pharo Days ? We could talk there.

(More below)

> On 28 Mar 2019, at 09:29, Holger Freyther <[hidden email]> wrote:
>
>
>
>> On 28. Mar 2019, at 08:02, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> Hi Holger & Norbert,
>>
>
> great. Regardless of how many versions exist. We should get one into the image with proper platform integration.  
>
> I wasn't aware of your code but I assumed it is something you could write, hence the Paleo prefix. Now that the Paleo code is the Neo one is more funny...

Yeah, naming is always fun, isn't it ?

>> NeoSimplifiedDNSClient default addressForName: 'pharo.org'. "104.28.27.35"
>>
>> One of my goals was to use it as a more reliable, non-blocking 'do we have internet access' test:
>>
>> NeoNetworkState default hasInternetConnection. "true"
>
>
> What is internet access and how would this be used? Is this about captive portals? With local network policy the big anycast services might be blocked but the user can still reach services. Or with deployed microservices they might reach other but not the outside?

For years there is this issue in Pharo that if we build features that require internet access (say for example automatic loading of the Catalog when you start using Spotter, but there are many more places where this could add lots of value), that people say "don't do this, because it won't work when I have slow or no internet (like on a train)".

The core cause of the problems is that the current NameResolver is totally blocking, especially in failure cases, which gives a terrible experience.

One way to fix this would be with the concept of NetworkState, a cheap, reliable, totally non-blocking way to test if the image has a working internet connection. Related is the option of 'Airplane Mode', so that you can manually say: "consider the internet unreachable".

The main scope was real internet access, not (possibly) limited internal network access. But that is a good point to think about.

> ... snip ...
>
>
>> The main problems are concurrent and asynchronous requests, as well as error handling.
>>
>> I would be great if we could do this well in-image. (But getting OS DNS settings is hard too).
>>
>> We should talk ;-)
>
> Agreed that getting the OS DNS settings is hard but not impossible (go seems to get away with its implementation on unix). It seems we manage to honor platform settings for http proxies and I am confident we can do it for DNS as well.

I hacked a #getSystemDNS class method that goes via /etc/resolv.conf and it works (but of course not on Windows). In the 'initialise' protocol, there are #useCloudflareDNS, #useGoogleDNS, #useSystemDNS as well as #useMulticastDNS methods.

I would *very* much prefer not to depend on any obscure, hard to maintain VM code (FFI would just be acceptable).

> I planned to solve concurrency by creating one stub resolver per request and having a shared cache. The internet can be a hostile place and DNS is an easy victim. Cache poisoning does exist and random source port, 0x20 randomization, random transaction ids, disrespecting PTMU ICMP messages are the few mitigations we have.

What I tried/implemented in NeoDNSClient (which inherits from the one-shot NeoSimplifiedDNSClient) is a requestQueue and a cache (respecting ttl), where clients put a request on the requestQueue and wait on a semaphore inside the request (respecting their call's timeout). A single process (that starts/stops as needed) handles the sending & receiving of the actual protocol, signalling the matching request's semaphore. (The #beThreadSafe option needs a bit more work though).

But yes, ultimately, things might be more complex. There is of course a problem in doing in-image DNS: it creates a (possibly) huge maintenance burden with a lot of responsibility. Now, since we already do socket streams, TLS/SSL and HTTP, the risk/challenge is probably acceptable.

In any case, any solution would first need to prove itself in lots of real world situations.

I actually added a #install class method to NeoDNSClient that hacks into the existing NetNameResolver and takes over responsibility for #addressForName:[timeout:] - for the whole image - of course this is still pure alpha.

I am curious though, what was your initial motivation for starting PaleoDNS ? Which concrete issues did you encounter that you wanted to fix ?

> Let's definitely talk. I hang out in the pharo discord group. :)
>
>
> holger

I am only an occasional IM users, I more of an email guy.

Sven



Reply | Threaded
Open this post in threaded view
|

Re: Resolving DNS in-image

cedreek

> The core cause of the problems is that the current NameResolver is totally blocking, especially in failure cases, which gives a terrible experience.
>
> One way to fix this would be with the concept of NetworkState, a cheap, reliable, totally non-blocking way to test if the image has a working internet connection. Related is the option of 'Airplane Mode', so that you can manually say: "consider the internet unreachable".
>
> The main scope was real internet access, not (possibly) limited internal network access. But that is a good point to think about.

I totally agree on this point (and thanks again to push such network reliable dev).

For my project (where I consider Internet as an optional connexion, the less trusted one btw.), this is number one priority to be have a «  non-blocking way to test if the image has a working internet connection ».

As I see it, but this is conceptual. To have Internet, I either use:
1)  a LAN in between with a gateway (which provides a WAN service)
2)  an internal modem (GSM for instance)

I focus on 1) meaning if I (re) connect to a (know) LAN, I know if it provides a gateway. Actually the access point/switch is another node in the system.

So I kind of get around the problem.

Still having a nice OS integration is important especially with all network interfaces we have (and I think that was a limitation of SSDP if I remember correctly what Henrik said).

So +100 to you initiative guys :)

>> Let's definitely talk. I hang out in the pharo discord group. :)
>>
>>
>> holger
>
> I am only an occasional IM users, I more of an email guy.


Not an expert as you, but definitely interested in such discussions. PharoDays ?

This could include pushing SSDP from Henrik and maybe integrating Noury’s (and others) dev on NetworkExtra ?


My 2 cents,

Cédrick






>
> Sven
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Resolving DNS in-image

Ben Coman
In reply to this post by Sven Van Caekenberghe-2


On Fri, 29 Mar 2019 at 18:08, Sven Van Caekenberghe <[hidden email]> wrote:
Holger,

I'll write most of my comments inline.

Yesterday I moved my code to https://github.com/svenvc/NeoDNS using Tonel format, to make it a bit easier to consume. I also did a couple of minor updates while going over the functionality. Note the 'my code' is in the package Net-Protocols-DNS-Experimental, which is using the object model in 'Net-Protocols-DNS-MessageFormat' (that already existed before, I just ported/cleaned it a little bit to my taste).

Are you coming to the Pharo Days ? We could talk there.

(More below)

> On 28 Mar 2019, at 09:29, Holger Freyther <[hidden email]> wrote:
>
>
>
>> On 28. Mar 2019, at 08:02, Sven Van Caekenberghe <[hidden email]> wrote:
>>
>> Hi Holger & Norbert,
>>
>
> great. Regardless of how many versions exist. We should get one into the image with proper platform integration. 
>
> I wasn't aware of your code but I assumed it is something you could write, hence the Paleo prefix. Now that the Paleo code is the Neo one is more funny...

Yeah, naming is always fun, isn't it ?

>> NeoSimplifiedDNSClient default addressForName: 'pharo.org'. "104.28.27.35"
>>
>> One of my goals was to use it as a more reliable, non-blocking 'do we have internet access' test:
>>
>> NeoNetworkState default hasInternetConnection. "true"
>
>
> What is internet access and how would this be used? Is this about captive portals? With local network policy the big anycast services might be blocked but the user can still reach services. Or with deployed microservices they might reach other but not the outside?

For years there is this issue in Pharo that if we build features that require internet access (say for example automatic loading of the Catalog when you start using Spotter, but there are many more places where this could add lots of value), that people say "don't do this, because it won't work when I have slow or no internet (like on a train)".

The core cause of the problems is that the current NameResolver is totally blocking, especially in failure cases, which gives a terrible experience.

Not that we want the everything in the kitchen sink in Pharo, but DNS is PERVASIVE!!! :)  
A lot of experimentation with new services goes on with RR records such that having a "live" DNS implementation would facilitate Pharo being a platform to play with leading edge services.
Can we discuss putting an in-Image DNS on the Pharo 8 roadmap?
    First consideration - what are the argument against this?
    Second consideration - who would volunteer for a working party to make this happen? 

btw, years ago I found the BIND book really entertaining reading...

>
> Agreed that getting the OS DNS settings is hard but not impossible (go seems to get away with its implementation on unix). It seems we manage to honor platform settings for http proxies and I am confident we can do it for DNS as well.

I hacked a #getSystemDNS class method that goes via /etc/resolv.conf and it works (but of course not on Windows). In the 'initialise' protocol, there are #useCloudflareDNS, #useGoogleDNS, #useSystemDNS as well as #useMulticastDNS methods.

I would *very* much prefer not to depend on any obscure, hard to maintain VM code (FFI would just be acceptable).

> I planned to solve concurrency by creating one stub resolver per request and having a shared cache. The internet can be a hostile place and DNS is an easy victim. Cache poisoning does exist and random source port, 0x20 randomization, random transaction ids, disrespecting PTMU ICMP messages are the few mitigations we have.

What I tried/implemented in NeoDNSClient (which inherits from the one-shot NeoSimplifiedDNSClient) is a requestQueue and a cache (respecting ttl), where clients put a request on the requestQueue and wait on a semaphore inside the request (respecting their call's timeout). A single process (that starts/stops as needed) handles the sending & receiving of the actual protocol, signalling the matching request's semaphore. (The #beThreadSafe option needs a bit more work though).

But yes, ultimately, things might be more complex. There is of course a problem in doing in-image DNS: it creates a (possibly) huge maintenance burden with a lot of responsibility. Now, since we already do socket streams, TLS/SSL and HTTP, the risk/challenge is probably acceptable.

The maintenance burden has been the argument against in the past.
But the opportunity of "liveness" may be worth bit.  My presumption is the main part of DNS changes slowly, 
so the big question for me is how feasible is it to fall back to the C-library for things the in-Image stuff doesn't handle.
 
cheers -ben
Reply | Threaded
Open this post in threaded view
|

Re: Resolving DNS in-image

Holger Freyther
In reply to this post by Sven Van Caekenberghe-2


> On 29. Mar 2019, at 10:07, Sven Van Caekenberghe <[hidden email]> wrote:
>
> Holger,

Sven, All!

Thanks for moving it to GitHub!

Pharo Days:

I am in APAC right now and I am not sure if I make it. I am hesitating. Maybe we can have a Google Hangout to discuss this (if not too inconvenient for the ones present?).


Unix system resolver config discovery:

The FreeBSD manpages are quite good. I think we need to parse resolv.conf, hosts and nsswitch (Linux, FreeBSD). It's probably okay to not support everything initially (e.g. I have never seen sortlist being used in my unix career). Also the timeouts for re-reading these file are interesting (inotify/stat/lazily reread might be preferable).


https://www.freebsd.org/cgi/man.cgi?resolv.conf
https://www.freebsd.org/cgi/man.cgi?hosts
https://www.freebsd.org/cgi/man.cgi?query=nsswitch.conf



Windows resolver config discovery:

It seems https://docs.microsoft.com/en-us/windows/desktop/api/iphlpapi/nf-iphlpapi-getnetworkparams populates a FIXED_INFO that includes a list of resolver addresses.


MacOS config discovery:

Starting with the Unix implementation might not be terrible.


My interest:

I would like Pharo to improve on the networking side and I have worked with recursive resolvers and authoritative servers in my last job. It seemed obvious to combine these two when Norbert tried NetNameResolver and only got one IPv4 address and I looked at the C implementation.

The other interest is that I am following the IETF DNS development (on dnsop/dprive/doh with interesting topics). I think having a manageable DNS toolkit will help me to play with specs/standards in the future.


More responses inline.



>> What is internet access and how would this be used? Is this about captive portals? With local network policy the big anycast services might be blocked but the user can still reach services. Or with deployed microservices they might reach other but not the outside?
>
> For years there is this issue in Pharo that if we build features that require internet access (say for example automatic loading of the Catalog when you start using Spotter, but there are many more places where this could add lots of value), that people say "don't do this, because it won't work when I have slow or no internet (like on a train)".

This sounds like "bearer management"? It seems like consulting the OS for the network status might be better/more consistent?



> The core cause of the problems is that the current NameResolver is totally blocking, especially in failure cases, which gives a terrible experience.

Yes. That's horrible. The MacOS implementation is actually asynchronous but has a level of concurrency of one. :(



> One way to fix this would be with the concept of NetworkState, a cheap, reliable, totally non-blocking way to test if the image has a working internet connection. Related is the option of 'Airplane Mode', so that you can manually say: "consider the internet unreachable".

Makes sense but is difficult as well. Just because we can't resolve one name doesn't mean that NetNameResolver won't lock-up soon after. :(

I think we have to come up with ways to deal with just because all I/O is blocking in a Pharo Process doesn't mean that there is no concurrency. Is this only true for files+dns?

In the bigger context I would like to have something like CSP in Pharo.


> I would *very* much prefer not to depend on any obscure, hard to maintain VM code (FFI would just be acceptable).

ack.



> What I tried/implemented in NeoDNSClient (which inherits from the one-shot NeoSimplifiedDNSClient) is a requestQueue and a cache (respecting ttl), where clients put a request on the requestQueue and wait on a semaphore inside the request (respecting their call's timeout). A single process (that starts/stops as needed) handles the sending & receiving of the actual protocol, signalling the matching request's semaphore. (The #beThreadSafe option needs a bit more work though).

In my implementation I have separated the transports in their own classes. For UDP we always want to have a fresh socket to get a new source port assigned, for TCP, TLS and DoH it might make sense to keep the connection open a bit.

In some ways if I open 15 db connections with Voyage, I'm not concerned about 15 DNS queries. The implementation will be a lot more simple (no synchronization, no need to reason about concurrency) but on the other hand coordination is what we have today.

I think we can achieve coordination with an easier way. E.g. register pending requests and allow other clients to subscribe on the result.



> I am curious though, what was your initial motivation for starting PaleoDNS ? Which concrete issues did you encounter that you wanted to fix ?


What I like and found with my implementation:

* It would be nice if ZdcAbstractSocketStream understood uintX/uintX:
* My record classes can be parsed and serialized. In generalI want to be "sender" and "receiver".
* Separating the transport by class might make sense like my implementation. I have TCP, UDP and TLS.
* My decompression is subject to infinite loops but I think the other code too?
* We should aim to enable EDNS(0) support by default.
* We should have 0x20 randomization, random transaction ids, and comparing query and result in a final stub resolver.
* timeout and RTT handling should be adaptive. Chromium's stub resolver might have a good example.
* We should have a minimal implementation in the image and be extendable.


have a great weekend!

        holger