Network Automation Foundations

In my previous post, I covered the infrastructure required for getting you off the ground with network automation. Even though this is the last post (and a very long one) on the network automation journey, it does not mean your journey finishes here. In fact, from this point onwards, you will be looking in evolving your network automation to a scalable and reliable network automation. If you want to push even further, you will be looking into bringing the elements required to turn your network into an autonomous network. Watch out this space for future blogs on autonomous networks.

In this post, I will cover the foundations elements for the network automation. I decided to call it foundation elements because these are fundamental building blocks for any type of network automation you are going to design and implement. The building blocks are organised by function and each function has few examples of tools and frameworks.

A common problem with these tools and frameworks is that there are far too many options out there for you to choose. Quite often, the discussion around which tool should be selected is done by familiarity instead of technical merits. There is nothing wrong in selecting it by familiarity as long as the tool delivers the function it supposed to deliver. My advice in this situation is always to select the right tool for the job.

Every network automation architecture will have some common elements. The common elements are essential components required to build a production-ready network automation architecture. These common elements and their examples are presented in the table below.

OrchestrationUser InterfacesCommunications
Data StructuresProgramming LanguagesSoftware Packaging
Fundamental building blocks for network automation

Let’s analyse the function and examples of each of these common network automation elements.

Note: this is not an exhaustive list of examples for each of the elements. Specific network automation use cases will require different elements and examples than the ones listed here.

Orchestration

An orchestration system is responsible to execute your distributed, isolated and idempotent automated workflows in a particular sequence and manner to achieve a specific outcome. Sounds complicated, huh ? In very few words, the orchestration system is the glue that will put together your disparate network automation elements.

Quite often, the orchestration system doesn’t deserve the required attention until the things either get out of control or too big. And sometimes, this happens too late in the process and generates a considerable amount of rework. Fortunately, the appearance of Ansible and SaltStack in the networking space have provided a simple orchestration system to get the things off the ground quickly. And for many, this is more than enough. So, it wouldn’t come as surprise that Ansible and SaltStack are listed as examples of orchestration systems.

Many network engineers won’t see Jenkins as an orchestration tool. In fact, Jenkins won’t do what Ansible and SaltStack do. However, Jenkins provide you with a different type of orchestration: the pipeline orchestration. Often, Jenkins is used to trigger Ansible playbooks. For instance, if you have a workflow to build a service, each stage of the workflow could become a stage in Jenkins: collect service information, configuration rendering, configuration validation, configuration push, service validation, and close service order. Jenkins will keep track of the success on each stage of the pipeline. If a fail occurs in a particular stage, actions can be programmed in Jenkins to either attempt to fix or rollback the rollout of the service.

It is worth noting that there are other commercial available tools not mentioned in this post. Some are quite good and others convoluted. Depending on your business case, they may be a perfect fit. However, these solutions come with a price tag. It’s a trade off that ultimately comes down to a risk management and time to market conversation.

User Interfaces

Developing network automation usually relies heavily on making API calls. Hence, many of us forget that, sometimes, a UI is required throughout the network automation. A UI could be a web page to present reports, to push buttons, to visualise workflows, etc. In this section, I have listed few frameworks that will help to build nice UIs with little effort. There are many other frameworks out there, but the ones listed here are a good way to start.

Django and Flask will provide you with the infrastructure for the UI as they are rapid application development frameworks. React.js can be used in both Django and Flask to provide a nice touch in the look and feel of your UI.

If your goal is to present data in graphs, Grafana is the way to go. Grafana is a powerful tool to quickly generate graphs and dashboards with various types of data.

Communications

Communications play an important role on your automation architecture. Without it, no orchestration can be implemented. Even the most simplistic automation architecture (scripts) will, at least, require communications south bound, i.e., communication with the network devices.

The communications elements of your network automation architecture can, essentially, be broken down in two types: protocols and frameworks. On the protocols, you will find NETCONF, REST, RESTCONF, and gRPC. On frameworks, you will find Kafka, ZeroMQ, RabbitMQ, and others. Like the other elements, this is not an extensive list nor a definite answer for all communication problems you will have to deal with. These elements generally address all your communications requirements. When you analyse in detail your automation use case, you may find the need to incorporate other communication tools into your architecture.

NETCONF is the standard protocol for network device configuration. With very few exceptions, most of the networking vendors support NETCONF today.

REST is the most used protocol for system integrations. System-to-system communications will often rely on REST.

Similar to NETCONF and using REST principles, RESTCONF is a protocol to deal with network configuration using HTTP/S based methods. Similar to NETCONF, RESTCONF also provide CRUD (Create, Read, Update, Delete) operations.

gRPC is a universal and feature rich RPC framework. It can be used for both device communication and system-to-system communication.

Ultimately, what will dictate the protocols and frameworks you choose is what is available in the network devices and systems you have to deal with. In a green field environment, I would try to use as much as possible gRPC.

Data Structures

Data structures is an important item in any network automation. Whether you are automating service provisioning, operational tasks or build tasks, you are always dealing with data. Therefore, it is important to choose a robust data structure. More importantly, choose the right data structure for the job.

There are many options of data structure out there. For instance, XML, JSON, YAML, GPB, OpenConfig are just few examples. From observations of how our industry is using these, they generally will be used like this: JSON will be used in northbound interfaces as most of the northbound interfaces rely on REST protocol, XML is primarily used for southbound communication, especially when pushing configurations, GPB is primarily used for streaming telemetry (OpenConfig and general gRPC), and last but not least, YAML is primarily used for infrastructure configuration.

There are many other types of data not mentioned here. Though, they usually fit on very specific use cases. For example, my preference for representing a topology is using Graph Theory. I generally try to use libraries from the programming language of choice for this that enables me to export the graph into something other libraries could read (e.g., text). A good example of graph library is Python’s networkx library.

The most important thing while choosing the data structures of your automation solution is to keep in mind that you are developing automation for machine-to-machine interaction. Hence, the data structure must be easily readable by machines and not by humans. It happens more often than you think that we choose a data structure that is very easy for us to read but it make it difficult for machines to read. Being difficult for machines to read means that it is difficult (or not computational cost effective) to represent the relation among data entries. Short story is: choose it wisely!

Programming Language

Choosing a programming language for your automation can be tricky. It is tricky because it involves multiple professional and personal aspects. For instance, if you are the solely responsible for the coding (very unlikely), you are free to choose the language you are more familiar with. On the other hand, if there is a team, you need to consider the team’s preference. The important thing to remember here is that most of the network engineers are not like software developers who can transition from language to language in a blink of an eye.

Whatever the language you choose, it does need to meet the technical requirements. My recommendation is to choose a language that you are comfortable with, that meets the technical requirements and has a rich library of functions. Additionally, if you are developing your automation using containers, you can always hide the intricacies of the language inside of a container. Sometimes, this is key for you to shorten the development time of your project.

With that, I am not going to suggest you to pick language A or B. However, what I have seen out in the field is a lot of network automation being developed in Python. Additionally, Go is getting a lot of attention and traction these days.

Software Packaging

While you are doing ad-hoc automation, software packaging isn’t really a problem as most of the time you will be pulling things from git. However, when you start to do really serious automation, you need to consider how you are going to deploy your automation. More importantly, how it is going to scale (preferably, horizontally) and how resiliency will be achieved. Most of the scale and resiliency comes out of the software/automation architecture. However, its packaging plays a crucial role in enabling that scale and resiliency.

There are many options for packaging. However, the go-to option for packaging are containers with deployment on Kubernetes. If you are using controllers and orchestrators, each of them may have their own packaging solutions.

When choosing your packaging option, consider where you are with your automation journey and what the next steps for you are. If scale and resilience for your automation are in the horizon, then you should consider containers and Kubernetes. But if scale and resilience are still far away for you, you can use a much more simple packaging solutions for your automation.

Putting all elements together

Once you have all foundation elements of your network automation identified, it is time to put them together. As mentioned previously, this depends on how far you are on your journey. If you are doing simple ad-hoc automation (early stages of the journey), then a Linux cron on an automation server will do the job.

If you have gone past the ad-hoc automation point, you will be looking to a simple orchestration system that helps you with workflows, execution scheduler and event driven automation. Ansible Tower, SaltStack and StackStorm are few examples in this area. There are many others and all of these have their open source version as well as its commercial version that often has extra features to add value on your network automation.

The next step on the journey is evolving your network automation to leverage network controllers and orchestrators. When you get to that stage, it means you are probably looking for having higher levels of automation on your network. Now, just because you use an orchestrator or network controller, it doesn’t mean you have the magic button. Usually, there is a lot of work involved to get these things well oiled before you start to see the benefits.

Whatever stage of network automation explained above you are, consider what you have developed so far, what is your next stage, what is the approach you want to take. In this analysis, you need to consider the lifecycle of the things you have developed: who created, who maintain them today and tomorrow, and how you are going to evolve them. Is it worth throwing everything out of the window and replace with something else ? Sometimes, the answer is yes (unfortunately). That is why it is important to incorporate elements of DevOps (agile) on your network automation development (e.g. CD/CI, faster release cycles, faster iterations, etc). This will not only enable you to deliver robust and resilient network automation but also enable you to experiment things much faster and in a safe way.

This post ends the network automation journey sequence of blog posts. I hope you have enjoyed a bit of not-so-technical conversation. Stay tuned for future posts and feel free to suggest topics through the contact form of the blog.

Network Automation Infrastructure

The further you progress on your automation journey, the more you will understand that the environmental factors play an important role in the journey. For a recap on the environmental factors, have a look on my previous post. And as promised, in this post we will be covering the infrastructure elements required to help you throughout your automation journey.

In order to achieve success with network automation, you need to have the right infrastructure in place. However, what is the right infrastructure for you may not be the right infrastructure for me or somebody else. Why ? Because our goals with the automation journey may be very different. Consider the following example:

  • Organisation A:
    • Small organisation with a very simple network.
    • Only need to perform configuration backups and few other network reports.
  • Organisation B:
    • Medium size organisation with a network that is growing rapidly in size and in complexity.
    • Need to leverage automation because of the growing number of elements, redundant paths and systems.

While the same automation infrastructure that solves the problem for Organisation B can also solve the problem for Organisation A, the opposite is not true. Additionally, the cost of the infrastructure implemented to solve the problem of Organisation B may be prohibitive for Organisation A. In this post, I will cover a common infrastructure that all organisations will need to start with.

In this common infrastructure, there are five fundamentals: Linux servers, programming language toolkits, disk space, versioning control, and connectivity to the elements. Without these, you can’t even start.

  • Linux Servers

Regardless of which automation you are developing and regardless of its size, you are going to need a Linux server to run the tools required by your automation. Linux comes in all forms, shapes and flavours. Pick your favourite one. My only suggestion on this front is: pick one that has a good package management system.

Any Linux distribution either based on Debian or RedHat will do the job very well. When choosing your Linux distribution, make sure you also choose a Linux distribution for containers. I will cover containers in future posts.

Another important point is whether these servers will be bare metal servers (BMS) or virtual machines. From the Linux and the automation that you are developing, this point usually makes no difference. However, it makes a huge difference in the fact that if you are using virtual machines, you now need to deal with the virtualisation infrastructure that is providing that virtual machine(s). Just to mention few options you have for this: KVM, OpenStack, Public Clouds, and VMWare. The virtualisation infrastructure itself is a Pandora’s box.

  • Programming Language Toolkits

I will always argue that the best programming language is the one you know best. If you can develop network automation using your favourite language, then there is no reason to learn a new language. When you start to develop your automation to run in containers, it becomes much easier to choose the language you want because the container is a way to hide the internals of a system component.

These days, there is a lot of network automation being developed in Python. In the past, many (including me) used Perl. Go is a language that is gaining a lot of traction these days too. And I can’t forget to mention Bash. Bash seems to be always there regardless the programming language you choose.

Soon or later, you will find that managing dependencies is a big problem. That’s why containers gained a lot of traction in the last few years. It is a nice way to package your software with its respective dependencies. We will talk more about containers in future posts.

  • Disk Space

Whether you are generating reports, versioning device configurations or collecting telemetry, you will need space to store all this information. The important thing here is to understand how much you need. It’s not about to find the precise amount in GB or TB. It is about understanding whether a local array in the server, or a small cluster of servers, a big data environment or offloading to a SaaS (storage-as-a-service) is required.

  • Versioning Control

Do you still write code and use that button “Save As” to do your versioning control ? If you answered yes, consider learning and start using a versioning control system today. Please, don’t wait for tomorrow.

The most common versioning system these days is git. git itself deserves an entire post in order to cover just the basics. So, do yourself a favour (in case never heard about git or don’t use a versioning control system yet): learn git!

Versioning control is important for tracking changes and defects. It is very hard to remember why you wrote the code in one way or another a month later. Even worse if you have to identify who made a change in the code and why. It gets worse when you have geographically distributed development teams. git helps you with these and much more.

  • Connectivity

Connectivity is a very important element of the automation infrastructure. Connectivity follows three principles: has to be reliable, has to be scalable and has to be secure. Connectivity is all about how you connect to your automation servers, how the servers connect to your network devices and how the servers are updated and maintained.

The main problem with communication in most of the infrastructures that I have seen is that the reliability and scalability are, usually, designed and managed by one group while security is managed by another group. These two groups often have different objectives. And when their objectives get in the way of delivering a reliable, secure and scalable communication infrastructure for automation, that’s where the problems start.

There is no doubt that security is very important. However, if you have to jump through two or three different servers in order to get to your automation server, something is wrong. On the same token, if you need to go through a tedious path in order to get your server’s packages updates, it is another sign that things won’t work well. When your developers and users start to do ssh tunnelling in order to have the connectivity they require, that means it has gone too far already.

Authentication, encryption, RBAC, Single-Sign-On (SSO), firewalls, proxies, and many others, are extremely important elements to deliver a world-class and secure environment. However, all these things must be almost transparent to the users and developers. If they are not, security is actually being a big road block. And soon or later, developers and users will start to look for ways around it so then they can get their job done.

Final Comments

In this post, I covered the most fundamental elements of any automation infrastructure. Without these elements in place, it will be hard to get the things right. As mentioned in the beginning, the important thing is to identify your target goals. Having that identification exercised before you start will certainly drive you towards the right direction. So, before you start to write your automation plan, make sure you write down first your automation goals. In the next post, I will cover the automation building blocks. Till there, happy reading!

Network Automation Journey – Where Next ?

In my previous post, I covered few tips for you to start planning your network automation journey. While some of you may see those points as philosophical, they actually set the ground and the right direction for you. For others, you may think they are not important because they are not technical points. Either way, when embarking on this journey, make sure you clearly identify the business benefits and business implications behind it. Try to do this even though you are on a self-learning journey only. Business mindset is as important as your technical knowledge.

  • So, what’s next ?

Now that we have set the scene, let’s have a look on the next steps of the journey.

From this point onward, there are many things to be looked at. I am going to select few things to cover throughout few blog posts. By no means this is an extensive list nor they are in order of importance. The items that I will cover here are: environmental factors, infrastructure and foundations. These are the items you will need in the beginning of the journey. As you and the business get mature in network automation, other things will come into play. We will cover those later. So, let’s start with environmental factors.

Environmental factors

  • DevOps/DevNetOps Culture

As painful as this sounds to many of us, the working culture of a company really affects its capability of delivering. Throughout my career as network engineer, I have seen many types of environments. In all of them, the ability to deliver products and services with quality and speed was really associated with the environmental quality. If you want to read further on this topic, I highly recommend you to read The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford. Even though the book is a novel, it does present very well various environmental factors and their impact on the ability to deliver. It is also a very good introductory book on the benefits of adopting DevOps methodologies.

While I am a big supporter of DevOps approach, and more recently, its sibling in the networking area, DevNetOps, I am not suggesting that every organisation must implement DevOps or DevNetOps. Sometimes, implementing an agile culture in an organisation requires a complete shift on what the company has been doing for many years (in some cases, many decades). However, if the business is at risk of being disrupted or being extinct, then I believe DevOps and DevNetOps must be strongly considered as a matter of survival. These agile cultures are essential in implementing environments that can react quickly to market changes.

  • Blameless Culture

Another important environmental factor is a blameless culture. When moving to an agile environment, this is a must. Many companies that succeeded transitioning or implementing agile environments have also implemented a blameless culture, which also goes hand-in-hand with a continuous improvement culture. A blameless culture doesn’t mean engineers are free to take the network upside down at any time. It means that, every engineer has taken all necessary steps to make sure the things will work in the expected way, i.e., without causing problems. However, there are a series of corner case scenarios (black swans) that are almost impossible to predict. It is better to acknowledge that they exist and that they will occur eventually. When they do occur, the blameless culture focuses in understanding why it happened, what we could have done to prevent it, and what we are going to do to prevent or mitigate their impact in the future. In that way, the work becomes more productive, our systems and networks become more reliable and everyone have the space to explore their creativity.

Even though DevOps/DevNetOps and a blameless culture are big steps towards the right direction, many companies are still relying on the traditional silo’ed organisation structure. Many that failed to implement agile cultures will say that this type of culture doesn’t work for them. In fact, it does. The problem is that the hurdle involved in breaking down the silos and keep everyone happy is usually bigger than the hurdle of dealing with their existing issues and challenges. Raise your hand if you have never been into battles between your Development and Operations departments.

  • Finger Pointing Culture

Last but not least, the finger pointing culture. Often this culture is embedded in all layers of the organisation. This type of culture creates an environment for people to look for someone to blame rather than to look for how they could improve the situation. Whether you are pointing finger at your colleague, somebody in another team or another department of your company, to a partner or a supplier, it doesn’t matter. All you are trying to do is an easy way to say: it wasn’t my fault. There are few problems with this culture. First, it only gets worse. The more you do, the more you want to continue doing it. Second, you are inhibiting people’s creativity. Third, you are creating a toxic environment. And last but not least, you are putting your efforts in finding an excuse that exempts your accountability of the problem. The finger pointing culture is, in my opinion, one of the hardest problem to solve in one organisation because it generally means moving people out of their comfort zone in a very shocking way.

Final Thoughts on Environmental Factors

You may be wondering why environmental factors matters. I can assure to you that it really does. For a sole network engineer trying to improve his work by using automation, it will dictate how aggressive you can go after in implementing these things in production. Also, it will shape the way on how the business supports you implementing network automation. For an organisation, it will shape the length and the speed that the benefits of embarking on an automation journey will be seen. Even though you and the business are not willing to shift gears with regards to the environment factors, it is important for both of you understand where you are on this topic. It will help both of you to understand how much risks the business is willing to take.

I’m sure you are craving to read something more technical ??? No worries, it will come soon 🙂 On my next blog, I will cover some important elements of the infrastructure that is required to implement a successful network automation environment. Till there, happy reading!

Network Automation Journey

Over the past few years, network automation has gained a lot of attention. The networks have grown in size and complexity. The simplest answer for this growth is network automation. Network automation is required to make these large and complex networks simple to operate.

Many organisations recognise the importance of automation these days. But in many, network automation is still treated as a second-class citizen. The good news for these organisations is that it may not be too late yet to elevate the network automation to its deserved place.

The start of an automation journey can have far too many ramifications to be described in a single blog post. One may target addressing only the low hanging fruits while others will prefer to go full steam ahead in pursue of the fully automated network. Whatever journey you decide to start, here are few tips for you and for your organisation.

  • Network automation is more than just automating the device’s CLI

If you think automation is just issuing commands against a device, then your network automation will be as reliable as the device’s CLI. Don’t get me wrong, some vendors do have powerful CLI out there. But the point here is: you are not thinking about the big picture of automation. Your automation universe in this case is just about executing a task against a router. You are not thinking in creating workflows, extracting information from the router and sending to other systems for manipulation and validation, or implementing event-driven automation. If you are still doing “CLI automation”, please visit this blog regularly to learn more about network automation. Perhaps you will find that the land of network automation unicorns does exist.

  • CLI screen-scrapping must die (as someone said for the SNMP once)

There are various reasons why you should not do CLI screen-scrapping.

First, it doesn’t scale. While you may see a lot of benefits in the beginning (remember, you were doing those tasks manually previously), over time, you will find that it requires a lot of time to maintain the code of this automation.

Second, it breaks often. Vendors change their CLI and outputs time to time. If you are doing CLI screen-scrapping, the chances that your automation will break after an upgrade are quite high.

Last but not least, CLI implies in dealing with input and output CLI buffers. Pacing the input and output of commands can be a nightmare. Remember this: CLI screen-scrapping is just the automation of the keyboard, i.e., no intelligence added here.

  • Network automation is a never-ending journey

A common mistake for beginners in network automation is that network automation is something you do once and that is it (after all, that is what automation is all about: do once, repeat many, right ?). And that is from where most of the noise about eliminating job positions come from. The fact is that network automation is a journey. You start automating the basic tasks. Then, you move on to automate more complex tasks. Then, you get into creating automation workflows. Then, you add closed-loop feedback into your automation. To get to this point, there is a lot of work involved. To get to this point and having no workflows left to automate is really a very very long journey. Moreover, by the time a company gets there, the technology may have evolved and changed. Hence, new things need to be developed and implemented. In short, there will be always things to be developed.

  • Continuous learning is a must

Quite often I am asked how I find time to learn these new things. And usually I answer like this: I wake up 4.30am every day, prepare my mate (chimarrao), and have at least 1h to 1.5h of my day dedicated to learn new things. While I do not recommend everyone to wake up at 4.30am (I have this habit since I was a kid, so stop thinking I am a crazy person), I do recommend everyone to find a time-slot in their day to dedicate to learn new things. That’s what continuous learning is all about.

More than just learn something new every day, the continuous learning is the establishment of a new habit: you go after the information (research, read, listen, watch, etc) instead of waiting someone to transfer the information to you (traditional training class). More importantly, this new habit will help you to forge a new skill: the ability to adapt to changes.

And last but not least, these next two items are extremely important while taking off on an automation journey.

  • Start with the low hanging fruits

This is a very common and recurring mistake. Quite often engineers don’t start to automate because they over-engineer things and, consequently, automating these things become more a problem than a solution. My advice: don’t try to boil the ocean, i.e., start to automate simple and repetitive things. Once you have mastered that, then move on to the next step on your automation journey.

  • Do not automate a broken process

Companies that have well-defined processes are a good environment to automate things because usually the interfaces of each process are well-defined. However, what usually goes wrong in these places is when the processes are broken. Once you automate those broken processes without fixing them first, what do you think you are going to get as an output ? Answer: a broken automated process. Even worse, you may end up with something executing at speed and at scale in a broken way (the worst-case scenario). So, fix the broken process first and automate it afterwards.

 

Final words

Whatever automation journey you are pursuing, make sure you have a plan. The items covered in this blog post are important things that any organisation should consider when embarking on an automation journey. The automation journey is more than just selecting tools. It is about building the automation ecosystem that fits in the company business ecosystem.