Computer Network

- Operating Systems
- Computer Fundamentals
- Interview Q

## Physical Layer

Data link layer, network layer, routing algorithm, transport layer, application layer, application protocols, network security.

Interview Questions

A network is a collection of different devices connected and capable of communicating. For example, a company's local network connects employees' computers and devices like printers and scanners. Employees will be able to share information using the network and also use the common printer/ scanner via the network. Data to be transferred or communicated from one device to another comes in various formats like audio, video, etc. This tutorial explains how different data types are represented in a computer and transferred in a network.
Data in text format is represented using bit patterns (combinations of two binary bits - 0 and 1). Textual data is nothing but a string, and a string is a collection of characters. Each character is given a specific number according to an international standard called Unicode. The process of allocating numbers to characters is called "Coding," and these numbers are called "codes". Now, these codes are converted into binary bits to represent the textual data in a pattern of bits, and these bits are transferred as a stream via the network to other devices.
It is the universal standard of character encoding. It gives a unique code to almost all the characters in every language spoken in the world. It defines more than 1 40 000 characters. It even defined codes for emojis. The first 128 characters of Unicode point to ASCII characters. ASCII is yet another character encoding format, but it has only 128 codes to 128 characters. Hence, ASCII is a subset of Unicode.
.doc, .docx, .pdf, .txt, etc.
Word: H Unicode representation: U+0048 Numbers are directly converted into binary patterns by dividing by 2 without any encoding. The numbers we want to transfer generally will be of the decimal number system- ( ) . We need to convert the numbers from ( ) to a binary number system - ( )
Integers Date Boolean Decimal Fixed point Floating point
Number: 780 Binary representation: 1100001100 Image data is also transferred as a stream of bits like textual data. An image, also called a picture, is a collection of little elements called " ". A single pixel is the smallest addressable element of a picture, and it is like a dot with a size of 1/96 inch/ 0.26 mm. The dimensions of an image are given by the
A black-and-white/ Grayscale image consists of white, black, and all the shades in between. It can be considered as just . The intensity of the white color in a pixel is given by numbers called " ". The pixel value in a Grayscale image can be in the range , where 0 represents Black and 255 represents White, and all the numbers in the interval represent different shades. A matrix is created for the image with pixel values of all the pixels in the image. This matrix is called a " ".
representing three standard colors: . Any color known can be generated by using these three colors. Based on the intensity of a color in the pixel, three matrices/ channels for each color are generated. Suppose there is a colored image, and three matrices are created for Red, Green, and Blue colors in each pixel in the image: , and this bit stream is transferred to any other device in the network to communicate the image. N-bit streams are used to represent 2N possible colors. From 0 to 255, we can represent 256 shades of color with different 8-bit patterns. , an image consists of only either black or white colors, only one bit will be enough to represent the pixels: White - 1 Black - 0
.jpg, jpeg, .png, etc. Transferring an audio signal is different from other formats. Audio is broadcasting recorded sound or music. An audio signal is to be stored in a computer by representing the wave amplitude at moments in bits. Another parameter is the sample rate. It represents the number of samples or, in other words, samples saved. The audio quality depends and the . If more bits are used to represent the amplitudes in moments and more moments are captured accurately, we can save the audio with every detail accurately.
.mp3, .m4a, .WAV, .AAC, etc. A video is a with the same or different dimensions. These frames/ images are represented as matrices, as we discussed above. All the frames/ images are displayed continuously, one after the other, to show a video in movement. To represent a video, The computer will analyze data about the video like: (Frames per second)A video is mostly combined with an audio component, like a film or a video game.
.mp4, .MOV, .AVI, etc. |

- Send your Feedback to [email protected]

## Help Others, Please Share

## Learn Latest Tutorials

Transact-SQL

Reinforcement Learning

R Programming

React Native

Python Design Patterns

Python Pillow

Python Turtle

## Preparation

Verbal Ability

Company Questions

## Trending Technologies

Artificial Intelligence

Cloud Computing

Data Science

Machine Learning

## B.Tech / MCA

Data Structures

Operating System

Compiler Design

Computer Organization

Discrete Mathematics

Ethical Hacking

Computer Graphics

Software Engineering

Web Technology

Cyber Security

C Programming

Control System

Data Mining

Data Warehouse

## Hands-on Network Machine Learning with Scikit-Learn and Graspologic

- suggest edit

## Representations of Networks

3.3. representations of networks #.

Now that you know how to represent networks with matrices and have some ideas of properties of networks, let’s take a step back and take a look at what network representation is in general, and the different ways you might think about representing networks to understand different aspects of the network.

We already know that the topological structure of networks is just a collection of nodes, with pairs of nodes potentially linked together by edges. Mathematically, this means that a network is defined by two objects: the set of nodes, and the set of edges, with each edge just being defined as a pair of nodes for undirected networks. Networks can have additional structure: you might have extra information about each node (“features” or “covariates”), which we’ll talk about when we cover Joint Representation Learning in Section 5.5 . Edges might also have weights, which are usually measure the connection strength in some way. We learned in the previous section that network topology can be represented with matrices in a number of ways – with adjacency matrices, Laplacians, or (less commonly) with incidence matrices.

One major challenge in working with networks is that a lot of standard mathematical operations and metrics remain undefined. What does it mean to add a network to another network, for instance? How would network multiplication work? How do you divide a network by the number 6? Without these kinds of basic operations and metrics, you are left in the dark when you try to find analogies to non-network data analysis.

Another major challenge is that the number of possible networks can get obscene fairly quickly. See the figure below, for instance. When you allow for only 50 nodes, there are already more than \(10^{350}\) possible networks. Just for reference, if you took all hundred-thousand quadrillion vigintillion atoms in the universe, and then made a new entire universe for each of those atoms… you’d still be nowhere near \(10^{350}\) atoms.

To address these challenges, you can generally group analysis into four approaches, each of which addresses these challenges in some way: the bag of features, the bag of edges, the bag of nodes, and the bag of networks, each so-called because you’re essentially throwing data into a bag and treating each thing in it as its own object. Let’s get into some details!

## 3.3.1. Bag of Features #

The first approach is called the bag of features. The idea is that you take networks and you compute statistics from them, either for each node or for the entire network. These statistics could be simple things like the edge count or average path length between two nodes like you learned about in the last section, or more complicated metrics like the modularity [ 1 ] , which measures how well a network can be separated into communities. Unfortunately, network statistics like these tend to be correlated; the value of one network statistic will almost always influence the other. This means that it can be difficult to interpret analysis that works by comparing network statistics. It’s also hard to figure out which statistics to compute, since there are an infinite number of them.

## 3.3.1.1. You Lose A Lot of Information with the Bag of Features Approach #

Let’s take a look at a collection of networks in Fig. 3.2 :

Fig. 3.2 This figure contains four networks, all of whom have the exact same network statistics for the four statistics we show here. They each have ten nodes and 15 edges. They also all contain the same number of closed triangles, and the same [global clustering coefficient from Section 3.2.3.2 . #

Each of these networks, however, are completely different from each other. The first network, for instance, has two connected components, while the others are all connected. The second network has a community of nodes that are only connected along a path, and a different community which are tightly connected – and so on. Modeling these networks through computing these features from them would lose a great deal of useful information about the topology of the network.

## 3.3.1.1.1. Network Features Tend to be Correlated #

As we mentioned in the last paragraph, if you consider all possible networks, knowing the value of any of the network features gives you information about what the value of other network features might be.

Let’s play around with this. We’ll make 100 random networks, each with 50 nodes, and then we’ll compute some of the most common network statistics that people will use on them (you’ll explain what each network feature is along the way). Then, you’ll look at how correlated these features are. For now, just think of a random network as being a network with each node being connected to each other node with some set probability. Each network will have a different connection probability. These networks will also have communities – groups of nodes which are more connected with each other than other nodes – the strength of which will also be determined randomly. When you generate data later on in this book, you’ll get into different types of random network models you can use.

Now, for each of these networks, you’ll calculate a set of network features, using some of the various properties you learned about in the previous Section 3.2 .

The Modularity measures the fraction of edges in your network that belong to the same estimated community, subtracting out the probability of an edge existing at random. It effectively measures how much better a particular assignment of community labels is at defining communities than a completely random assignment.

The Network Density is the fraction of all possible edges that a network can have which actually exist. If every node were connected to every other node, the network density would be 1; and if no node is connected to anything, the network density would be 0.

The Clustering Coefficient indicates how much nodes tend to cluster together. If you pick out three nodes, and two of them are connected, a high clustering coefficient would mean that the third is probably connected as well.

The Path Length indicates how far apart two nodes in your network are on average. If two nodes are directly connected, their path length is one. If two nodes are connected through an intermediate node, their path length is two.

The code below defines functions to calculate each of these network features, and then calculates them for each of the networks you created above. Since most of these metrics already exist in networkx , we’ll just pull from there. You can check the networkx documentation for details.

We’ll also define a preprocessing decorator, which just converts the network from a numpy array into the format networkx uses.

Now, we’ll calculate all of these features for each network, and finally we’ll create a heatmap of their correlation.

Below is the heatmap. Numbers close to 1 mean that when the first feature is large, the second tends to be large, numbers close to 0 mean that the features are not very correlated, and numbers close to -1 mean that when the first feature is large, the second feature tends to be small.

If you’re familiar with correlation, that these numbers aren’t particularly close to zero means that many of the features contain varying degrees of information about other features. Some of these features being big might say that another feature tends to also be high (positive correlation). Some of these features being big might say that another feature might tend to be the small (negative correlation). Some of these features don’t tell you much about another particular feature (for instance, clustering and modularity only have a correlation with a magnitude around \(0.3\) ), and some features are nearly perfectly informative about other features (network density and average path length are nearly perfectly negative-correlated with a value of \(-.91\) ; that is, a high average path length implies a low network density, and vice versa).

## 3.3.1.1.2. Why Network Feature Correlatedness Can Lead To Problems #

Let’s take a step back to the implications of using the bag of features approach to analyze networks, now that you can see how correlated they usually are. Say you have a bunch of brain networks of mice, where the nodes are neurons and the edges are connections between neurons. You have a group of mice who were raised total darkness, and another group who were raised normally: let’s call the ones who were raised in the darkness the batman mice. You’re interested in how the visual parts of the brain are affected in the batman mice. You find the networks for only the visual parts of their brain, and then you calculate some network feature; maybe the density. It turns out that the network density is much lower for batman mice than it is for normal mice, so you conclude that raising mice in the darkness causes lower network density. Seems reasonable.

The problem is that network density is correlated with pretty much every other network feature you could have used. When you perform science, a primary focus of science tends to be establishing causality . While we won’t go much into causality here (there is an entire field dedicated to just this topic, called causal inference , for which there are many stellar reference texts), the jist is this. When you try to establish causality, you want to show a cause and effect type of relationship: the presence, or lack thereof, of some item \(X\) causes an outcome \(Y\) to happen. Whenever you do science, you want to find the root causes of why something is what it is: a mental illness is caused by a misfiring neuron, a misfolded protein is caused by the absence of a particular amino acid, a pair of friends having a major fight led to the students in a school being separated friendship wise into two distinct groups of people who support one student or the other. What you don’t want to do is find something that just happens to be correlated with the thing that is impacted. When you look at network features as things that potentially “cause” a particular effect, you can’t really “untangle” the correlatedness of these network features, and it becomes extremely difficult, if not impossible , to actually establish which aspects of the network topology are actually impacted by the factor you are studying. For this reason, we think that network features are useful to ascertain whether something has an effect on the network, and provide information as to which directions you might want to look to establish causality. However, network features alone are insufficient, in our opinion, for defining the causal impact on the network topology itself.

## 3.3.2. Bag of Edges #

The second approach to studying networks is called the bag of edges. Here, you just take all of the edges in your network and treat them all as independent entities. You study each edge individually, ignoring any interactions between edges. This can work in some situations, but you still run into dependence: if two people within a friend group are friends, that can change the dynamic of the friend group and so change the chance that a different set of two people within the group are friends.

More specifically, in the bag of edges approach, you generally assume that every edge in your network will exist with some particular probability , which can be different depending on the edge that you’re looking at. For example, there might be a 60% chance that the first and second nodes in your network are connected, but only a 20% chance that the third and fourth nodes are. What often will happen here is that you have multiple networks describing the same (or similar) systems. For example, let’s use the mouse example again from above. You have your batman mice (who were raised in the dark) and your normal mice. You’ll have a network for each batman mouse and a network for each normal mouse, and you assume that, even though there’s a bit of variation in what you actually see, the probability of an edge existing between the same two nodes is the same for all batman mice. Your goal would be to figure out which edges have a different probability of existing with the batman mice compared to the normal mice.

Let’s make some example networks to explore this. We’ll have two groups of networks, and all of the networks will have only three nodes for simplicity’s sake. Each group will contain 20 networks, for a total of 40 networks. In the first group, every edge between every pair of nodes simply has a 50% chance of existing. In the second group, the edge between nodes 0 and 1 will instead have a 90% chance of existing, but every other edge will still just be 50%. We’ll generate ten networks from the first group, and ten networks from the second group.

## 3.3.2.1. Figuring out which edge is the signal edge #

By design, you know that the edge between nodes 0 and 1 has signal - the probability that it’s there changes depending on whether your network is in the first or the second group. One common goal when using the bag of edges approach is finding signal edges: edges whose probability of existing changes depending on which type of network you’re looking at. In your case, we’re trying to figure out (without using your prior knowledge) that the edge between nodes 0 and 1 is a signal edge.

To find the outlier edge, you’ll first get the set of all edges, along with their indices. Since all of your networks are undirected, you’ll get the edges and their indices by finding all of the values in the the upper-triangular portion of the adjacency matrices.

Now, you’ll use a hypothesis test called the Fisher’s Exact Test to find the outlier edge. You don’t need to worry too much about what fisher’s exact test is; it’s essentially a hypothesis that is useful with networks, which makes limited assumptions about your data. We’ll talk more about it in Section 8.2 .

In the code below, we:

Loop through the edge indices

Get a list of all instances of that edge in the first group, and all instances of that edge in the second group

Feed that list into the Fisher’s exact test and obtain \(p\) -values (small \(p\) -value = more signal)

You can see below that the p-value for the first edge, the one that connects nodes 0 and 1, is extremely small, whereas the p-values for the other two edges are relatively large.

## 3.3.2.1.1. Correcting for Multiple Comparisons #

Because you are doing multiple tests, we’re running into a multiple comparisons problem here. If you’re not familiar with the idea of multiple comparisons in statistics, it is as follows. Suppose you have a test that estimates the probability of making a discovery (or, to be more rigorous, tells you whether you should reject the idea that you didn’t make a discovery). You run that test multiple times. If you run this test enough times, even if there’s no discovery to be made, eventually random chance will make it seem like you’ve made a discovery. So, the chance that you make a false discovery increases with the number of tests that you run. For example, say your test has a 5% false-positive rate, and you run this test 100 times. On average, there will be 5 false positives. If there was only one true positive in all of your data, and your test finds it, then you’d on average end up with 6 positives total, 5 of which were false.

We need to correct for this here because we’re doing a new test for each edge. There are a few standard ways to do this, but we’ll use something called the Holm-Bonferroni correction [ 2 ] . Don’t worry about the details of this; all you need to know for now is that it corrects for the multiple comparisons problem by being a bit more conservative with what you classify as a positive result. This correction is implemented in the statsmodels library, a popular library for statistical tests and data exploration.

You can see below that the corrected p-value for the edge connecting nodes 0 and 1 is still extremely small. In statistics, we somewhat arbitrarily choose a cutoff \(p\) -value (called the \(\alpha\) of the test) for determining when we have enough evidence to determine that there we have evidence against the idea that there is no difference between the two groups, so we’ll call edges with a \(p\) -value below \(\alpha = 0.05\) our signal edges . We’ve used the bag-of-edges approach to find an edge whose probability of existing changed depending on which group a network belongs to!

## 3.3.3. Bag of Nodes #

Similarly to the bag of edges, you can treat all of the nodes as their own entity and do analysis on a bag of nodes. Much of this book will focus on the bag of nodes approach, because we’ll often use edge count, covariate information, and other things when we work with bags of nodes – and, although there’s still dependence between nodes, it generally isn’t as big of an issue. Most of the single-network methods you’ll use in this book will take the bag of nodes approach. What we’ll see repeatedly is that we take the nodes of a network and embed them so each node is associated with a point on a plot (this is called the node latent space ). Then, we can use other methods from mainstream machine learning to learn about our network. We’ll get into this heavily in future chapters.

We’ll also often associate node representation with community investigation. The idea is that sometimes you have groups of nodes which behave similarly – maybe they have a higher chance of being connected to each other, or maybe they’re all connected to certain other groups of nodes. Regardless of how you define communities, a community investigation motif will pop up: you get your node representation, then you associate nearby nodes to the same community. We can then look at the properties of the node belonging to a particular community, or look at relationships between communities of nodes.

Since you’ll use the bag of nodes approach heavily throughout this book, you’ll be getting a much better sense for what you can do with it later. As a sneak preview right now, let’s generate a few networks and embed their nodes to get a feel for what bag-of-nodes type analysis might look like. We’ll discuss these procedures more in the coming chapters in Section 5.3 and Section 6.1 .

Don’t worry about the specifics, but below you generate a simple network with two communities. Nodes in the same community have an 80% chance of being connected, whereas nodes in separate communities have a 20% chance of being connected. There are 20 nodes per community.

Now, you’ll use graspologic to find the points in 2D space that each node is associated with. Again, don’t worry about the specifics: this will be heavily explained later in the book. All you have to know right now is that we’re mapping (transforming) the nodes of your network from network space, where each node is associated with a set of edges with other nodes, to the 2D node latent space space, where each node is associated with an x-coordinate and a y-coordinate.

Below you can see the result, colored by community. Each of the dots in this plot is one of the nodes of your network. You can see that the nodes cluster into two groups: one group for the first community, and another group for the second community. Using this representation for the nodes of your network, you can open the door to later downstream machine learning tasks.

## 3.3.4. Bag of Networks #

The last approach is the bag of networks, which you’d use when you have more than one network that you’re working with. Here, you’d study the networks as a whole and you’d want to test for differences across different networks or classify entire networks into one category or another. You might want to figure out if two networks were drawn from the same probability distribution, or whether you can find a smaller group of nodes that can represent each network, preserving its important properties. This is can be useful if you have extremely large networks, with millions of nodes.

To showcase the bag of networks approach, let’s create a few networks. We’ll have one group of networks distributed the same way, and another group distributed differently. What you want to do is plot each network as a point in space, so that you can see the communities of networks directly.

Both sets of networks will have two communities of nodes, but the first set will have slightly stronger within-community connections than the second. Each network will have 200 nodes in it, 100 for each community.

Again, don’t worry too much yet about the process with which these networks were generated - that will all be explained in the next few chapters. All you need to get out of this code is that you have six networks from the first group, and another twelve networks from the second. You can see these networks in heatmap form below.

Now, you have to figure out some way of plotting each network as a point in space. Here’s a rough overview for how you’ll do it.

First, you’ll take all of your networks and find a common space that you can orient all of the nodes into. The nodes for all of the networks will exist in the same space, meaning their locations can be compared with each other. We’ll have a bunch of matrices, one for each network. Since each network has 200 nodes, and we’re embedding into 2-dimensional space, we’ll have six 200 by 200 matrices for the first group of networks, and another twelve 200 by 200 matrices for the second group. This process is called finding a network latent space with a homogeneous node latent space , and will be extremely valuable for embedding collections of networks later on.

Now, we’ll look at pairwise dissimilarity for each matrix. The dissimilarity between the \(i_{th}\) and \(j_{th}\) network is defined as the norm of the difference between the \(i_{th}\) matrix and the \(j_{th}\) matrix that we’ve created this way. You can think of this pairwise dissimilarity as just a number that tells you how different the representations for the nodes are between the two networks: a low number means the networks have very similar nodes, and a high number means the networks have very different nodes.

Then, we’ll organize all of these pairwise dissimilarities into a dissimilarity matrix , where the \(i, j_{th}\) entry is the dissimilarity between the \(i_{th}\) and \(j_{th}\) network.

Again, don’t worry if you don’t understand the details: embedding and how it works will be explained in future chapters.

You can see the dissimilarity matrix below, with values between 0 and 1 since you normalized it. As you can see, there are two groups, with the values for two networks in different groups having high dissimilarity, and the values for two networks in the same group having low dissimilarity. The diagonals of the matrix are all 0, because the dissimilarity between a network and itself is 0.

Embedding this new matrix will give us a point in space for each network. Once you find the embedding for this dissimilarity matrix, you can plot each of the networks in space. In the plot below, each network represents a single point in space. We can easily see the two clusters, which represent the two types of networks you created. Since there are six networks of the first type and twelve of the second type, one of the clusters has six dots, and the other has twelve dots.

## 3.3.5. References #

M. E. J. Newman. Modularity and community structure in networks. Proc. Natl. Acad. Sci. U.S.A. , 103(23):8577–8582, June 2006. doi:10.1073/pnas.0601602103 .

Sture Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics , 6(2):65–70, 1979. URL: http://www.jstor.org/stable/4615733 (visited on 2023-02-03).

## What is Network Visualization?

Last Updated: August 19, 2024

Network visualization is the practice of creating and displaying graphical representations of network devices, network metrics, and data flows. In plain speak, it’s the visual side of network monitoring and analysis . There are a variety of different subcategories of network visualization, including network maps, graphs, charts, and matrices. In the world of IT networks, network management software will usually have some type of network visualization features built in.

## Benefits of network visualization

If you’ve ever been on a Zoom or Teams call and realized it’d just be easier to get your point across if you start sharing your screen, the benefits of network visualization should be easy to understand. Network visualization is our industry’s embodiment of the phrase “A picture is worth a thousand words.”

Network visualization allows administrators to quickly and intuitively understand a lot of information about the state of the network at a glance. For example, a network map can provide a network-wide topology overview that details physical and logical data flows. Charts and graphs of network data over time can help identify trends and aid capacity planning . Matrices and network topology maps can also help identify dependencies that can be vital to troubleshooting and resolving outages or bottlenecks.

Here’s a detailed look at some key benefits of network visualization:

- Improved network visibility . Network topology maps can give you total visibility into your entire network. As a result, you can understand the network architecture and device dependencies in detail. Dynamic, as opposed to static, network maps go a step further and automatically update when changes occur and often enable you to drill down and view granular data on traffic flows and resource utilization.
- Faster troubleshooting . You can’t fix what you can’t see. The visibility provided by network visualization like maps and performance graphs allows you to quickly understand where to focus your troubleshooting and root cause analysis efforts. Identifying why a user’s VoIP phone isn’t working can be as simple as finding it in your network map, seeing which switch port it is connected to, and checking that the port is configured for the correct VLAN.
- More informed network planning. Planning for network upgrades and refreshes gets easier when you understand your current network dependencies, inventory, and resource utilization trends. A tool that does real-time network asset management, and topology mapping and visualizes historical network traffic data can give you the information you need to make informed choices.
- Streamlined onboarding. There are two important aspects to onboarding in network roles. The first is speed. If you’re onboarding a network engineer to your team, the fastest way to make them productive is to make sure they understand the lay of the land. A quality network topology map does just that. They can see, down to the device and cable level, both logical and physical connections and quickly get up to speed on the environment. The other is new scenarios. Let’s say it’s you, and you’re learning a new network. Maybe you’re a network admin starting a new job, a seasoned network admin whose company has just completed an acquisition, or an MSP onboarding a new customer. Even if the previous engineers didn’t map the infrastructure, a network visualization tool with topology mapping and network discovery functionality can jumpstart your efforts to learn about and document the network.
- Better communication. The ability to quickly, clearly, and unambiguously communicate a lot of information in a way customers, managers, and engineers can understand and act on is no simple task. The right visuals can really help explain a situation clearly, so you can address specific problems.

## Network visualization examples

Now that you know what network visualization is and understand its benefits, let’s take a look at some real-world examples.

## Network maps

Network maps are one of the most popular types of network visualization and for good reason. We can’t overstate the importance of network topology . Maps can help visualize your entire network topology as well as the specific devices, physical connections, logical connections, rack locations, and even floor plans that comprise your network infrastructure. At a high-level, network maps fall into two categories: static network maps which provide an unchanging view of your network topology, and dynamic network maps which reflect changes in near real-time.

Static network maps Although they’re more frequently called network diagrams, static maps can be created manually using tools like Visio or — as in the example below — LucidChart.

Static network maps can also be created by tools that perform network discovery like the appropriately named Nmap (Network Mapper). The benefit of coupling network discovery with network mapping is that you can use the discovery functions to create snapshots of the actual state of a network.

We’ve put together our round-up of the 7 best network mapping tools that to help find a solution.

## Dynamic network maps

Dynamic network maps add real-time data on network changes to the mix. While a static map is useful for understanding the intended network design, or even the state of the network at a specific point in the past, a real-time network map provides visibility into the current state of the network.

Generally, real-time dynamic network maps are interactive, allowing you to drill down and view specific metrics on connections, devices, and traffic flows.

Dynamic network topology maps can also have the benefit of being more detailed. Since the discovery and visualization processes are automated, there is no time spent drawing and detailing the network diagram. Where manual maps might stop at the switch level (since it takes more time to draw specific endpoints, and anything below it is changing regularly anyway), dynamic network maps will regularly depict right down to the endpoint level.

Network visibility solutions that provide network discovery and dynamic network mapping are significantly better network troubleshooting, change management, and analysis resources than static maps that must be manually updated.

## Charts & graphs

Charts and graphs are an excellent way to visualize trends and compare sets of data. In the context of network visualization, there are a wide variety of charts and graphs that can help you better understand your network’s health. For example, metrics like interface utilization, bandwidth, and packet loss can all be graphed over a period of time. Similarly, charts can compare metrics between devices, interfaces, or time periods.

For useful information to be actionable, it needs to be easily accessible and digestible. Dashboards aren’t a specific type of network visualization per se , but rather an aggregation of important data on a single page that helps make important information easy to consume.

A well-designed dashboard should be information-rich but also intuitive enough that you can quickly understand the most important information. For example, Auvik’s main default dashboard provides you with information on open alerts, detected misconfigurations, and a graph of top device utilization among other metrics.

## What is Network Topology Mapping?

Network topology mapping is the process of creating maps that visualize your network layout by creating a network map. Network topology mapping tools enable the automatic creation of network maps. As we know there’s a lot of value in dynamic network visualization, it’s worth taking a closer look at how network topology mapping works, so let’s jump in.

## How are topology maps created?

Of course, you could create a topology map by hand (and we’ve got a helpful guide for that!), but other than for very small networks, doing so would be inefficient and error-prone. Your map would also become stale after the 1st change. Because of that, let’s focus on how dynamic network topology maps are automatically created by network topology software tools.

While the specifics will vary from tool to tool, the general process for creating a topology map is:

- The user defines network range(s) and credentials (e.g. for SNMP, SSH, and/or WMI) and begins a scan.
- The network topology software scans the networks to discover devices and the connections between them.
- A visual representation of the logical and physical connections is displayed to the user.
- When a change occurs (e.g. a link goes down) the map automatically updates.

More sophisticated network topology software will allow the user to drill down and capture specific data on devices within the network. For example, suppose you see traffic isn’t being routed to a specific VLAN. Using network topology software, you can often identify the root cause (like a misconfigured trunk port) by tracing the logical connections.

## What are the benefits of network topology mapping?

Network topology software like Auvik can enable you to drill down from an overall view of the network to devices, to ports, in seconds. Along the way, you can see active alerts and detailed network metrics that enable faster troubleshooting and better decision-making. Additionally, you’re able to greatly reduce tech debt by automating your network documentation processes .

## Getting started with network visualization

In most cases, network discovery and the creation of a dynamic network map is a good starting point for anyone new to learning to work with network mapping. With a dynamically updating map that allows you to drill down to the port level, you’ll quickly become a more effective troubleshooter. Additionally, if you’ve never had a full network map before, you’ll likely learn a thing or two about your actual network topology. From there, you can begin to focus on specific business and performance goals and the KPIs (key performance indicators) that matter most to you.

Ready to try bringing network visualization of your network to the next level? Try Auvik risk-free for 14 days, and see the difference it makes.

Your Guide to Selling Managed Network Services

Get templates for network assessment reports, presentations, pricing & more—designed just for MSPs.

What do you think?

Share your thoughts and read what others have to say

Hai, Thanks for the elaborate explanation . I think you have been a revelation ! All the best !

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Share your thoughts here *

Enter your name *

Enter your email address *

Save my name, email, and website in this browser for the next time I comment.

## How to Create a Network Assessment Report with This Template

Msp networking: definitions, insights and strategies, subnetting: what it is and how it works, what’s your shadow it risk factor.

- The impact shadow IT has on an organization
- How to evaluate tools
- Tips on security
- A quiz to help you determine the severity of shadow IT in your org
- Solutions to solve these problems

- Who should attend

## Biographies

Tutorial information.

Researchers in network science have traditionally relied on user-defined heuristics to extract features from complex networks (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode network structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. These network representation learning (NRL) approaches remove the need for painstaking feature engineering and have led to state-of-the-art results in network-based tasks, such as node classification, node clustering, and link prediction.

In this tutorial, we will cover key advancements in NRL over the last decade, with an emphasis on fundamental advancements made in the last two years. We will discuss classic matrix factorization-based methods, random-walk based algorithms (e.g., DeepWalk and node2vec), as well as very recent advancements in graph neural networks. We will cover methods to embed individual nodes as well as approaches to embed entire (sub)graphs, and in doing so, we will present a unified framework for NRL.

The tutorial will be held at The Web Conference, 2018 (WWW) in Lyon, France, April 24th, 2018.

## Tutorial outline

- Part 1: Introduction (Slides)
- Part 2: Snap.py: SNAP for Python (Slides)
- Part 3: Network Analytics with SNAP (Slides)
- Part 4: SNAP Network Datasets (Slides)
- Part 5: SNAP C++ (Slides)
- Part 6: Hands-on Exercise (Slides)

## Tutorial materials and outline

- What is network representation learning and why is it important?
- Learning low-dimensional embeddings of nodes in complex networks (e.g., DeepWalk and node2vec).
- Techniques for deep learning on network/graph structed data (e.g., graph convolutional networks and GraphSAGE).
- Applications of network representation learning for recommender systems and computational biology.

All the organizers are members of the SNAP group under Prof. Jure Leskovec at Stanford University. The group is one of the leading centers of research on new network analytics methods. In recent years, the SNAP group has performed extensive research in the area of network representation learning (NRL) by publishing new methods, releasing open source code and datasets, and writing a review paper on the topic .

William L. Hamilton is a PhD Candidate in Computer Science at Stanford University. His research focuses on NRL as well as large-scale computational social science applications. He is the recipient of the SAP Stanford Graduate Fellowship, an Alexander Graham Bell Graduate Scholarship, and his work has been covered in the New York Times, Wired, and the BBC. He is the co-lead developer of GraphSAGE , a state-of-the-art open-source framework for NRL.

Rex Ying is a PhD Candidate in Computer Science at Stanford University. His research focuses on deep learning algorithms for network-structured data, and applying these methods in domains including recommender systems, knowledge graph reasoning, social networks, and biology. He is the co-lead developer of the GraphSAGE framework, and he has undertaken industry collaborations to apply this framework to real-world web-scale recommender systems.

Jure Leskovec is an associate professor of Computer Science at Stanford University. His research focuses on the analysis and modeling of large real-world social and information networks as the study of phenomena across the social, technological, and natural worlds. Problems he investigates are motivated by large scale data, the Web and Social Media. Jure received his PhD in Machine Learning from Carnegie Mellon University in 2008 and spent a year at Cornell University. His work received five best paper awards, won the ACM KDD cup and topped the Battle of the Sensor Networks competition.

Rok Sosic is a senior researcher in Prof. Leskovec's group at Stanford University, working on SNAP tools for large scale network analytics. He has published over 50 papers, including the best paper at Supercomputing'95 and a top 10 paper in the field of high-performance distributed computing. Rok received his PhD in Computer Science from University of Utah. He joined Stanford University in 2012.

Advertisement

## Network representation learning: models, methods and applications

- Review Paper
- Published: 09 August 2019
- Volume 1 , article number 1014 , ( 2019 )

## Cite this article

- Anuraj Mohan ORCID: orcid.org/0000-0002-1044-9368 1 &
- K. V. Pramod 2

5326 Accesses

10 Citations

Explore all metrics

With the rise of large-scale social networks, network mining has become an important sub-domain of data mining. Generating an efficient network representation is one important challenge in applying machine learning to network data. Recently, representation learning methods are widely used in various domains to generate low dimensional latent features from complex high dimensional data. A significant amount of research effort is made in the past few years to generate node representations from graph-structured data using representation learning methods. Here, we provide a detailed study of the latest advancements in the field of network representation learning (also called network embedding). We first discuss the basic concepts and models of network embedding. Further, we build a taxonomy of network embedding methods based on the type of networks and review the major research works that come under each category. We then cover the major datasets used in network embedding research and describe the major applications of network embedding with respect to various network mining tasks. Finally, we provide various directions for future work which enhance further research.

## Similar content being viewed by others

## Efficient Network Representations Learning: An Edge-Centric Perspective

## Mineral: Multi-modal Network Representation Learning

## PPNE: Property Preserving Network Embedding

Explore related subjects.

- Artificial Intelligence

Avoid common mistakes on your manuscript.

## 1 Introduction

Networks provide a fundamental model for defining a relationship between various entities. Networks have applications in diverse domains like social computing [ 82 ], systems biology [ 76 , 141 ], cyber-physical systems [ 148 ], recommender system [ 74 ], language modeling [ 125 ], and network medicine [ 4 ]. Social network reflects the relationship between people, citation network relates research papers, a biological network can define protein–protein interactions, a word co-occurrence network defines linguistic relationships and many more. Analysis and mining of these complex networks can generate various insights, which can be very useful for both scientific and business community. Friend recommendation in social networks, protein function prediction from protein interaction networks, terrorist group identification from communication networks, influential paper detection from citation networks etc. are some typical examples. We usually define these tasks formally as link prediction [ 71 ], node classification [ 8 ], graph clustering [ 105 ], and influential node detection [ 79 ]. Performing these tasks on large real-world networks pose various challenges.

Traditional methods for network embedding use graph algorithm based approaches, which uses adjacency matrix as network representation. Also, these methods adopt iterative processing, which results in high computational cost when applied to large networks. For example, for node classification, most of the approaches like iterative classification algorithm (ICA) [ 91 ] and label propagation [ 149 ] are iterative approaches. Machine learning methods cannot be directly applied to networks because such methods assume that the data have independent and identical distribution (i.i.d), which is not true in the case of graph-structured data. Using sparse adjacency matrix representation is also not practical to perform machine learning. An alternate method to perform machine learning on network data is to use hand engineered features generated using network statistics and other measures [ 34 ], which is a time-consuming process. Traditional distributed computing platforms [ 26 ] are not well suited for parallel processing of graph-structured data. Many specialized distributed graph analytic platforms like Pregel [ 78 ], Giraph [ 81 ] and Graphx [ 137 ] are developed, but their efficiency is limited by the complex phenomenon of real-world networks like scale-free property and power law distributions.

An interesting direction towards applying machine learning on network data is to map the data to a low dimensional latent space, and then to apply traditional machine learning algorithms. This process of mapping the network data to vector space is known as network embedding. Many linear and non-linear dimensionality reduction methods [ 126 ] were initially used to generate network embedding. Most of these methods were based on matrix factorization, and hence suffered from scalability issue. More recently, machine learning community has come up with new theories and architectures to learn complex features from high dimensional data. These approaches are referred to as representation learning, Footnote 1 which aims at finding a set of transformations that can map the high dimensional data to a low dimensional manifold. With the success of representation learning on image [ 44 , 60 , 128 , 134 ], speech [ 23 , 40 , 48 ], and natural language processing [ 19 , 21 , 108 ], researchers attempted to use these methods on network data and created fruitful results.

Subset of github user interaction network

Given an input network, we can generate embedding in different output formats, which includes node, edge, subgraph and whole-graph embedding. Edge embedding aims to map the edges of a network to a latent space, and subgraph embedding attempts to map the graph components (subgraph structures) to a vector space. Whole-graph embedding aims to generate the representation of a complete graph in vector space, and many works used graph kernel methods [ 3 , 127 ] for generating whole-graph representations. Node embedding, which represents vertices of a graph in vector space, is the more focused and well-studied problem, which is covered throughout this survey. Figure 1 a shows Gephi [ 5 ] visualization of a small subset of github user interaction network, and Fig. 1 b shows its 2-D representation in vector space, generated by DeepWalk [ 96 ], and plotted using t-SNE [ 75 ]. Generating low dimensional vectors as node embedding from a large real-world network is not straightforward. The vector representation should preserve the structural properties of the network which includes the first order, second order and higher order proximities between nodes. The network data is highly sparse and usually non-linear, and the embedding algorithm should generate the embedding from sparse and non-linear data. Many real-word networks contain millions of nodes and edges, and the embedding algorithm should be scalable. In reality, many networks may be heterogeneous, attributed, scale-free and dynamic, and the embedding method should adapt to all such situations.

A few efforts are already made to survey [ 22 , 38 , 46 , 89 ] the various approaches for network embedding. In this survey, we focus on the recent methods for node embedding which are inspired by the recent advancements in representation learning. We provide a taxonomy of node embedding methods based on the type of the networks. Networks are classified into broader categories such as homogeneous networks, heterogeneous networks, attributed networks, signed networks, and dynamic networks. We discuss the common models of network representation learning and reviews the major works which come under each model with respect to each type of network. Further, we discuss the applications of network embedding along with the data sets used in the network embedding research.

## 2 Terminologies and problem definition

Definition 1.

A Network is a graph \(G =(V,E)\) , where \(V=\{v_1,v_2\ldots v_n\}\) , is the set of vertices and \(e \in E\) is an edge between any two vertices. An adjacency matrix A defines the connectivity of G , \(A_{ij}=1\) if \(v_i\) and \(v_j\) are connected, else \(A_{ij}=0\) .

## Definition 2

A homogeneous network is a network \(G =(V,E)\) , where each node \(v_i \in V\) belongs to the same type and each edge \(e_i \in E\) also belong to the same type.

## Definition 3

A attribute network can be defined as \(G_A=(V,E,A,F)\) where V is the set of vertices, E is the set of edges, A is the adjacency matrix and \(F \in R^{n \times k}\) , i th row of F denotes the k dimensional attribute vector of node i .

## Definition 4

A heterogeneous network is a network \(G =(V,E)\) , where each node \(v_i \in V\) and each edge \(e_i \in E\) , are associated with mapping functions \(F(v):V \rightarrow T_v\) and \(f(e):E \rightarrow T_e\) , where \(T_v\) and \(T_e\) denotes the entity and relationship types respectively.

## Definition 5

A signed network is a network \(G =(V,E)\) , \(v \in V\) , \(e \in E\) and for each edge, \(e_{ij}= +1\) or \(e_{ij}=-1\) , denoting a positive link or a negative link between \(v_i\) and \(v_j\) .

## Definition 6

A dynamic network can be defined as a series of snapshots \(G=\{G_1,G_2\ldots G_n\}\) where \(G_i=(V_i,E_i)\) and n is the number of snapshots.

## Definition 7

First order proximity describes the pair wise proximity between the vertices which is defined using the edge weight \(e_{ij}\) between node \(v_i\) and node \(v_j\) .

## Definition 8

Second order proximity for a pair of nodes \(v_i\) and \(v_j\) is the proximity of neighborhood structures of the nodes \(v_i\) and \(v_j\) .

Network embedding—Given a network \(G =(V,E)\) , the task is to learn a transformation function \(f:V_i \rightarrow K_i \in R^d\) , where \(d<< \vert V \vert\) , such that f preserves the first order, second order and higher order proximities of the network. d defines the number of dimensions of the real valued vector.

## 3 Models of network embedding

Researches used various models for network embedding which includes both linear and nonlinear dimensionality reduction techniques. Models based on matrix factorization, models that combine random walk sampling with shallow neural networks, and deep neural architectures are the most commonly used models. Other approaches focus on modeling an optimization function based on the structure and the properties to be preserved and solving it using gradient based methods.

## 3.1 Matrix factorization

Network embedding can be considered as a structure-preserving dimensionality reduction process, which assumes that the input data lie in a low dimensional latent space. Network data can be represented in matrix form, which includes adjacency matrix, laplacian matrix, node transition probability matrix and many more. A matrix factorization can be applied on any of these matrices to generate node embeddings. Locally linear embedding [ 103 ], Laplacian eigenmaps [ 6 ], Structure preserving embedding [ 107 ], Graph factorization [ 2 ], GraRep [ 14 ], HOPE [ 94 ] etc. are some among the matrix factorization based methods for network embedding. A detailed survey on these methods can be found in [ 13 , 22 , 33 ]. In this paper, we focus on the methods based on recent advancements in representation learning.

A flow diagram of random walk based approaches for network embedding

## 3.2 Random walk

Capturing the network structure is a primary concern while generating node embeddings. A random walk is a well-known method which can capture the local structure of the graph. Even if each row in an adjacency matrix corresponds to a node vector which defines the connectivity structure and is analogous to a one hot vector representation, it is very sparse and high dimensional. The word2vec model [ 85 ] succeeded in developing a word representation by generating dense low dimensional vectors from sparse high dimensional one hot vectors, using a shallow neural network architecture. Word2vec defines two neural architectures, namely continuous bag-of-word model and skip-gram model [ 86 ]. The training is done using stochastic gradient decent(SGD) [ 9 ]. Word2vec uses two optimization strategies called hierarchical softmax and negative sampling to speed up the training process. Many network embedding methods [ 29 , 42 , 92 , 96 , 101 ] are inspired from word2vec, which first applies a random walk on the network to generate node sequences that are analogous to sentences in word2vec, followed by using the skip-gram architecture to generate node embeddings. Random walk based approaches try to maximize the probability that the nodes that tend to co-occur on truncated walk lay closer in the embedding space. A general architecture of random walk based procedure for network embedding is shown in Fig. 2 .

## 3.3 Deep architecture

A flow diagram of deep architecture based approaches for network embedding

The aim of network embedding is to map the nodes from a high dimensional network space to a low dimensional feature space. Some works used specialized neural network models [ 84 , 104 ], while many others used generalized models over graph-structured data to represent graph in a euclidean space. Network data is inherently non-linear, and using shallow neural network architectures for generating node embedding may result in sub-optimal solutions. Deep neural networks [ 7 , 63 , 106 ] have been successfully used in various domains to learn multiple levels of feature representations from complex and non-linear data. To train large neural networks with more than one hidden layers, may theories and architectures were proposed recently, which includes deep belief networks (DBN) with greedy layer-wise pre-training [ 49 ], deep convolutional neural networks (CNN) [ 60 ], long short-term memory networks (LSTM) [ 41 ], and generative adversarial networks (GAN) [ 37 ]. An autoencoder is a neural architecture which acts as a building block in training deep belief networks. An autoencoder is a three-layer neural network which reconstructs the input vectors at their output layer through a number of non-linear transformations on the input. As an unsupervised feature learning technique, an autoencoder can generate a deep latent representation for the input data. Multiple layers of autoencoders are stacked together to form a stacked autoencoder and it is used as the deep neural architecture for generating node embeddings in many works [ 15 , 39 , 129 ]. Convolutional neural networks, which are very popular in image processing tasks, are not directly applied to graphs, but some works use convolutional architectures [ 18 , 27 , 56 , 57 ], which use spectral graph theory [ 47 ] to generate node embeddings. Generative adversarial networks (GANs) are deep neural network architectures comprised of two components, a generator and a discriminator, competing one against the other. A few works [ 24 , 130 ] on network embedding are inspired from GANs. A general architecture of using deep architectures for network embedding is shown in Fig. 3 .

## 4 Network representation learning methods

In this section, we review the major works which come under each model with respect to each type of network. The classification of network embedding methods based on different types of networks is depicted in Fig. 4 .

Network embedding methods based on the types of networks

## 4.1 Homogeneous network

Most of the works on network embedding focus on non-attributed, static, unsigned homogeneous networks. Preserving the structural property of the network is the primary objective of homogeneous network embedding methods. Figure 5 shows the major works on homogeneous network embedding which are grouped under major models of network embedding. Table 1 shows a summary of the input, objective function, model used, and properties preserved by some of these methods.

Homogeneous network embedding methods

Proximity preservation during network embedding is the main aim followed in most of the random walk based methods. Among those, DeepWalk [ 96 ] gained a lot of attraction, as it is inspired from the well-studied word2vec algorithm. DeepWalk algorithm involves a two-step process. (1) A truncated random walk on the network to generate a sequence of vertices, which creates an analogy of sentence in word2vec. (2) Using a skip-gram model, which uses a shallow neural network architecture to generate node embeddings. The skip-gram is a generative model whose objective is to maximize the probability of neighbors in the walk, given the representation of a vertex. For each node \(v_i\) , skip-gram assigns a current d dimensional representation, \(\phi (v_i )\in R^d\) and maximizes the co-occurrence probability of its neighbors in the walk to update this representation. The optimization becomes,

where \(v_{i-w},\ldots ,v_{i+w}\) denotes the neighbors of \(v_i\) in the node sequence, and w is the context size. Computing the softmax at the output layer of skip-gram is computationally expensive and DeepWalk approximates softmax using two strategies, hierarchical softmax and negative sampling. These strategies reduces the time complexity of skip-gram model and speed up the training process. As random walk being a sampling strategy, the time complexity to perform random walk is linear w.r.t the number of edges. The complexity of the skip-gram architecture is proportional to \(O(C (D + D log_{2}(V )))\) , where C is the context window size, D is the number of dimensions and \(log_{2}(V)\) is the time to build the hierarchical softmax over V vertices. DeepWalk is parallelizable and can be implemented without the knowledge of entire graph, which makes it suitable for large-scale machine learning. DeepWalk motivated many subsequent works [ 17 , 24 , 29 , 42 , 92 , 95 , 101 ] and also acted as a baseline for various works in the area of network representation learning. Walklet [ 97 ] modified the random walk used in DeepWalk by explicitly preserving the proximities between vertices during the random walk, and showed that the multi-scale representations thus generated can improve the performance of multi-label classification task. Max-Margin DeepWalk [ 121 ] and Discriminative Deep Random Walk [ 69 ] extended DeepWalk by associating classification objective with embedding objective and thereby demonstrated the performance improvement of the multi-label classification task.

The quality of the network embedding can be further improved by preserving the structural equivalence of the nodes along with the proximity information. Node2vec [ 42 ] works towards this goal by performing a biased random walk which can provide more flexibility in exploring neighborhood compared to DeepWalk. During random walk, node2vec uses two sampling strategies, breadth-first search (BFS) and depth-first search (DFS) which traverse the search space by exploring both community structures and structurally equivalent nodes in the network. Based on the random walk sequence, node2Vec extends skip-gram architecture to optimize the objective function,

where \(v_t\) is the node taken from the random walk sequence \(W_v\) , \(v_{t'}\) is the neighbor node of node \(v_t\) within the window w , and \(\phi (v_t)\in \mathbb {R}^d\) is the feature representation of the node \(v_t\) . Node2vec incurs additional space and time complexity over deepwalk as it involves BFS and DFS search during random walk. Node2vec can preserve the structural equivalence of nodes in the network but is limited by the size of the context window. Struc2vec [ 101 ] aims at preserving structural equivalence to a better extent by computing the structural similarity between each pair of vertices in the network. Struc2vec constructs a multilayer network, where each layer denotes a hierarchy in measuring the structural similarity, and then applies random walk sampling followed by skip-gram learning on the multilayer graph to generate the embedding of each vertex.

Preserving higher order structural patterns in large-scale networks is a challenging issue. HARP [ 17 ] is a meta-strategy that can achieve this goal. HARP can be used to improve the state-of-the-art NRL algorithms [ 42 , 96 , 115 ] so as to avoid these algorithms to get stuck in local optima, as these models rely on non-convex optimization, solved by SGD. HARP progresses through three steps—graph coarsening, embedding and representation refinement. In coarsening, a large network is divided into smaller networks by preserving the original structure using two strategies, edge collapsing and star collapsing. The embedding algorithm is applied to the coarsest graph and the embedding is generated. The last step is to prolong and refine the network from coarsest to finest. To perform refinement, HARP uses two strategies, multilevel hierarchical softmax and multilevel negative sampling.The overall time complexity of HARP (with deepWalk) is \(O(\gamma |V |)\) where \(\gamma\) is the number of walks and V is the number of vertices. Experiments show that the HARP extension can improve the performance of DeepWalk, LINE, and Node2vec over multi-label classification.

Network structure is inherently non-linear, and using a shallow neural network for network embedding may lead to suboptimal solutions. SDNE [ 129 ] addresses this challenge by using a deep architecture, build with stacked autoencoders, to generate network embeddings. SDNE deploys a deep belief network, implements multiple layers of non-linear functions, and map the data into a non-linear feature space. In order to maintain the structure-preserving property and to address sparsity, SDNE trains a joint optimization function (shown as equation 3) which preserves the first order and second order proximities. This function preserves the second order proximity using stacked autoencoders and the first order proximities using Laplacian Eigenmaps.

Here X and Y are the input and reconstructed data. The model minimizes the reconstruction error to capture the global information. B has been used to handle the sparsity of the adjacency matrix \(a_{ij}\) . \(\phi _i\) and \(\phi _j\) represent the feature representation of node i and j and \(W^{(k)}\) is the hidden layer weight matrix for the autoencoder. The time complexity of SDNE is O ( ncdi ), where n is the number of vertices, d is the embedding dimension, c is the average degree of the network and i is the number of iterations.

In language modeling, alternate method to generate word vectors is to find the low dimensional linear projections from positive point-wise mutual information matrix (PPMI) [ 12 , 68 ] of words and their contexts. DNGR [ 15 ] is inspired from [ 68 ], which first uses a random surfing model to generate a probabilistic co-occurrence matrix that captures the graph structure information. The PMMI matrix is then calculated from the co-occurrence matrix. Instead of applying singular value decomposition (SVD) as in [ 14 ], DNGR applies a stacked denoising autoencoder on PMMI matrix which learns a non-linear function to map high dimensional vertex vectors into low dimensional node embeddings. The authors of DNGR claim that, using the probabilistic co-occurrence matrix is well suited for weighted networks and is less computationally expensive compared to sampling-based methods [ 42 , 96 ]. The time complexity of DNGR is defined to be linear w.r.t. the number of vertices in the graph. The objective function of DNGR is defined as

where \(x_{i}\) is the i th instance, \(y_{i}\) is the corrupted input data of \(x_{i}\) , and \(f_{\theta _{1}}\) and \(g_{\theta _{2}}\) are the encoding and decoding functions of the autoencoder respectively.

A few efforts are made to apply some variants of CNN [ 27 , 47 ], to perform representation learning on networks. GCN [ 56 ] is one such approach whose goal is to learn a function from the network which takes as input (1) \(N \times K\) feature matrix, where N is the number of nodes and K is the number of input features. (2) An adjacency matrix A . The GCN produces an output Z which is an \(N \times D\) matrix, where D is the number of dimensions per node. GCN uses the layer-wise propagation rule

where \(W_l\) denote the weight matrix of l th network, \(\hat{A}=A+I\) , D is the diagonal node degree matrix and \(H_{(0)}=X\) , the matrix of node attributes. The authors interpreted GCN as a generalized version of the weisfeiler-Lehman algorithm on graphs.The complexity of the convolution operation is O ( efc ) where e is the number of edges, f is the number of filters and g is the node dimension. FastGCN [ 18 ] is an enhancement over GCN where the authors interpret graph convolutions as integral transforms of embedding functions under probability measures, and uses Monte Carlo approaches to consistently estimate the integrals. Parametric graph convolution [ 119 ] is another enhancement over GCN which generalizes a convolutional filter by adding a hyper-parameter that influences the filter size, and thereby improves the performance of GCN. Variational graph autoencoder(VGAE) [ 57 ] is another work which uses variational autoencoder to learn latent representations from undirected graphs.The authors demonstrated this model using a graph convolutional network (GCN) encoder and a simple inner product decoder.

GraphGAN [ 130 ] directly follows the GAN architecture which tries to learn two models: (1) a generator which approximates the underlying connectivity distribution and generates fake vertex pairs to fool the discriminator. (2) a discriminator that tries to distinguish the vertex pairs that is generated by the generator from the real ones. The objective of the discriminator is to maximize the logarithmic probability of assigning correct labels to real and generated samples. The generator objective is to minimize the logarithmic probability that the discriminator correctly identifies the samples generated by the generator.A sigmoid and softmax function are used as the discriminator and generator function respectively.The work also proposes an alternate method called graphsoftmax, which can improve the performance of softmax. The objective function of GraphGAN is modeled as a two-player minmax game with cost function

Here the generator G tries to generate vertices which resembles vertex \(v_{c}\) ’s neighbors by approximate the underlying true connectivity distribution \(p_{true}(v|v_{c})\) , and the discriminator D tries to discriminate the true neighbor of \(v_{c}\) from those generated by G by finding the probability of an edge to exist between v and \(v_{c}\) which is represented as \(D(v,v_c;\theta _D)\) . By minimizng and maximizing the cost function, the optimal parameters for D and G can be learned, and GraphGAN uses policy gradient ascent procedure to learn the parameters. The time complexity of each iteration of GraphGAN is O ( VlogV ), where V is the number of vertices.

ANE [ 24 ] proposes a different approach which uses adversarial learning as a regularizer to learn more robust network representations. ANE employs a structure-preserving component and an adversarial learning component. For structure preservation, ANE uses a method called inductive DeepWalk (IDW). IDW perform random walk using PMMI matrix to explore the neighborhood, and optimizes a parameterized generator function to generate embeddings. The adversarial learning component consists of a generator and a discriminator. It shares the generator function with the structure-preserving component. Initially, the discriminator is trained to separate the prior samples from the embedding vectors. The parameters of the generator are then updated so as to fool the discriminator and thereby performing regularization on the embedding generated by the structure-preserving component.

## 4.1.1 Other works

LINE —The objective of LINE [ 115 ] is to preserve first order and second order proximity during embedding. LINE first calculates the joint probability between two vertices in two ways, one using edge weights and other using node vectors. To preserve first order proximity, LINE defines an objective function to minimize the distance between two probability distributions. Objective function to preserve second order proximity is also defined in a similar way. LINE uses edge sampling strategy to speed up the computations.

NETMF —The works [ 42 , 96 , 115 ] lacked through theoretical analysis and the same is provided by [ 100 ]. The work reveals that, all these methods are essentially performing implicit matrix factorization. Analysing closed form matrices of all the methods, [ 100 ] eventually discusses the relationship between these methods and their connection with the Graph Laplacian. The authors also propose a method called NETMF which explicitly factorize the closed form implicit matrix of DeepWalk using singular value decomposition (SVD) and generates the node embeddings.

GraphSAGE —It [ 45 ] is an inductive representation learning method which is suitable for large graphs. Instead of training individual embeddings for each node, GraphSAGE learns a function that generates node embeddings by sampling and aggregating features from the nodes local neighborhood.

Ep —Embedding propagation(Ep) [ 35 ] is network representation method inspired from label propagation. EP sends forward and backward messages between neighboring nodes. Forward messages contain label representations and backward messages contain gradients that result from aggregating the label representations and applying a reconstruction loss. Node representations are computed from label representations.

## 4.2 Attributed network embedding

In most of the real-world networks, nodes or edges are associated with single or multiple attributes which provide some semantic information. In this section, we will cover some methods which perform network embedding on such attributed networks [ 98 , 99 ]. Figure 6 shows the major works under attributed network embedding. Table 2 shows a summary of the input, objective function, model used, and properties preserved by some of these methods.

Attributed network embedding methods

Nodes of the network may have text associated with it. TADW [ 139 ] aims to embed networks by using the structural information and the text information associated with the nodes. The work proves the equivalence of DeepWalk and closed form matrix factorization, and creates a PMMI matrix using vertex–context pairs for further processing.TADW performs inductive matrix completion [ 90 ] to associate text features into PMMI matrix, and low-rank matrix factorization on the resultant matrix to generate the network embedding. The objective function of TADW is stated as

where M and T are the word-context matrix and text feature matrix respectively,and \(\min _{W,H} {\vert \vert M - W^THT \vert \vert }_F^2\) represents the low rank matrix decomposition of matrix M The complexity of each iteration of minimizing W and H is \(O(n_{0}(M)k+|V |f_{t}k+|V |k^2)\) where \(n_{0}(M)\) indicates the number of non-zero entries of M, and k denotes the low rank of M.

Accelerated attributed network embedding(AANE) [ 51 ] is another approach which uses connectivity information and attribute information to perform network embedding. ANNE modes a joint optimization function with two components, a strategy based on spectral clustering to preserve node proximities and a matrix factorization framework to approximate the attribute affinity matrix. Further, the authors have provided a distributed algorithm to solve the optimization objective in an efficient manner.The loss function of AANE is modeled as

Here S represent the attribute affinity matrix, H the embedding matrix and \(h_{i}\) and \(h_{j}\) are the vector representations of node i and node j . \(\min _{H} {\vert \vert S - HH^T \vert \vert }_F^2\) component preserve the node attribute proximity and \(\lambda \sum _{(i,j) \in \epsilon }w_{ij}({\vert \vert h_{i}-h_{j} \vert \vert }_F^2)\) component preserve the network stucture proximity. The time complexity of AANE is \(O(nN_{A}+n^2 )\) , where \(N_{A}\) is the number of nonzero in attribute affinity matrix A and n is the number of nodes in the network.

In real-world networks like citation networks, papers may have text associated with it, the category information that the paper belongs to, and a reference link to other papers. Such networks can be modeled as graphs with node structure, content, and labels. TriDNR [ 95 ] aims at generating embedding by exploiting all these three levels. It uses the idea from DeepWalk and paragraph vector algorithm [ 62 ] to embed node, text and label information. TriDNR models a joint optimization function which learns inter-node, node-content and label-content correlations, and the training is done using SGD. It also uses hierarchical softmax to speed up the computations. The objective function of TriDNR is to maximize the log-likelihood

The first component of the equation to is maximize the likelihood of the neighboring nodes given current node \(v_{i}\) , the second component maximizes probability of observing contextual words given the current node \(v_{i}\) , and the third component maximizes the likelihood of observing the words given a class label \(c_i\) . \(\alpha\) is balance parameter to control the proportion of network structure, text, and label information.

DeepGL [ 102 ] is a deep architecture which performs hierarchical representation learning on attributed networks. DeepGL first generates a set of base features by performing graphlet decomposition on higher order network motifs(graphlets). DeepGL learns a set of relational feature operations, which when applied on the base features generates a set of higher level features. At each layer of the deep architecture of DeepGL, features from lower order subgraphs are combined using composition of relational feature operations to generate higher order subgraph patterns. DeepGL is designed to be effective for network-based transfer learning tasks. The optimization function of DeepGL is stated as

which aims to find a set of features \(x_{i}\) that maximizes it similarity to to the label y and minimizes the similarity between the features \(x_{i}\) and \(x_{j}\) in the collection. The complexity of generating node embeddings with the DeepGL is \(O(F(M + NF))\) , where N , M , and F are the number of nodes, edges, and node features respectively.

The GCN methods presented in the previous section can also deal with attributed networks.

## 4.2.1 Other works

DANE —It [ 70 ] aims at generating representations from a network with structure and attribute information. DANE provides an online learning model, which extends the basic design to a distributed environment. DANE takes as input an adjacency matrix and a user feature matrix, and generates two embedding Ea and Ex , using spectral clustering based methods. Further, DANE generates a single embedding representation by maximizing the correlation between Ea and Ex . DANE uses matrix perturbation theory to update Ea and Ex , and to generate the updated embedding E .

LANE —Label informed attributed network embedding(LANE) [ 52 ] affiliates labels with the attributed network and maps the network into a low-dimensional representation by modeling their structural proximities and correlations.

CANE —Context-aware network embedding (CANE) [ 122 ] is another attributed network embedding which learns various context-aware embeddings for a vertex according to the neighbors it interacts with.

NEEC —NEEC [ 53 ] aims at improving the attributed network embedding by learning and incorporating the expert cognition into the embedding objective.

IIRL —IIRL [ 138 ] uses two terminologies, structure close links and content close links to define the topological and attribute similarity between nodes. A joint optimization function is defined to preserve the proximity of structure-close and content-close links in the embedding space, and the training is done using a gradient based algorithm.

## 4.3 Heterogeneous network embedding

Typically, some network mining tasks demand the data to be modeled as heterogeneous networks [ 111 ] that involve nodes and edges of different types. For example, a citation network can be modeled as a heterogeneous network with authors, papers, and venue as nodes and relationship between these types as edges. In this section, we will cover some methods which perform network embedding on heterogeneous networks. Figure 7 shows the major works on heterogeneous network embedding and Table 3 shows a summary of these methods.

Heterogeneous network embedding methods

Metapath2vec [ 29 ] is an extension of random walk and skip-gram based methods which can be applied to heterogeneous networks. A meta path [ 112 ] is a path that can be represented in the form \(V_1 \xrightarrow {R_1} V_2 \xrightarrow {R_2}\cdots V_t \xrightarrow {R_t} V_{t+1}\cdots \xrightarrow {R_{k-1}}V_k\) where \(R= R_1 \circ R_2\circ R_3\circ \cdots \circ R_{k-1}\) defines the composition relations between node types \(V_1\) and \(V_k\) . Metapath2vec performs a metapath based random walk through the heterogeneous network, and generate paths which can capture both the structural and semantic relationship between different types of nodes. The resulting paths are fed to a heterogeneous skip-gram model which can learn the representation of nodes by maximizing the probability of heterogeneous context nodes, given the input node. The objective function of heterogeneous skip-gram is stated as

where t denotes the node type, \(N_{t(v)}\) denotes neighborhood of node v , and \(P(c_{t} \vert v; \theta )\) is a softmax function which calculates the probability of co-occurrence of context-input pairs. The time complexity of metapath2vec is same as that of deepwalk as both uses the skip-gram architecture for learning node representations. The work also discusses an algorithm called metapath2vec++, which provides heterogeneity in negative sampling by maintaining separate multinomial distributions for each node type in the output layer of the skip-gram model, and thereby provides more efficiency and accuracy in representation.

The main aim of HNE [ 16 ] is to map the multimodal objects in the heterogeneous network to a common space such that the similarity between the objects can be computed. HNE considers a heterogeneous network with text–text, text–image, and image-i-mage interactions as input. Text and image data are transformed into d-dimensional vectors and are mapped to a latent space using linear transformations. An objective function is modeled to minimize the distance between the objects if they are topologically connected. The loss function of HNE is stated as

where the first component represents the loss w.r.t. text to text similarity, second component represents the loss w.r.t. image to image similarity and the third component represents the loss w.r.t. text to image similarity. \(N_{II}\) , \(N_{TT}\) and \(N_{IT}\) are the numbers of the three types of links, and \(\lambda _1\) , \(\lambda _2\) and \(\lambda _3\) are the three balancing parameters. HNE further proposes a deep architecture which can map different modalities into a common subspace, and can construct the feature representation. A CNN is used to learn the image features and a fully connected layer is used to learn the text features. A linear embedding layer is used to map the input to a common subspace. A prediction layer is used to calculate the loss function and the training is done using SGD.

The authors of LINE [ 115 ] extended their network embedding approach on the homogeneous network to a heterogeneous network using PTE [ 114 ]. PTE constructs a heterogeneous text network by combining a word-word network, a word-document network, and a word-label network. PTE then apply LINE to embed the three bipartite networks. PTE further models a joint optimization function which can collectively embed the three bipartite networks to generate a single heterogeneous text network embedding. The loss function of PTE is stated as

The first, second and third term of the equation is to minimize the negative log-likelihood of co-occurrence of word-word pair,word-document pair and word-label pair respectively. The authors provided two approaches (1) a joint training and (2) a pre-training and fine tuning to perform the learning process.

HIN2Vec [ 32 ] is another work which uses meta path based approach for representation learning in heterogeneous information networks. Initially, hin2vec proposes a conceptual neural network architecture which is trained to learn the relationship between the nodes, by placing the possible meta paths at the output layer. The objective function of HIN2Vec is

The function takes as input, a pair of nodes x and y , and a relationship r , and tries to maximize the prediction probability whether the relationship exists between x and y . Further, the authors provide an enhanced neural network architecture which can learn the node embedding and meta path embedding during the training process.

## 4.4 Signed networks

Signed networks [ 67 , 116 ] are part of real social systems where the relationship between entities can be either positive or negative. In this section, we will cover some methods which perform network embedding on signed networks. Various works on signed network embedding are listed in Fig. 8 and a summary of these methods is shown in Table 4 .

Signed network embedding methods

SIDE [ 55 ] is a network embedding method for signed directed networks. SIDE follows the random sampling strategy and hierarchical optimization, which is well exploited by language models. SIDE performs a truncated random walk on a signed directed network, and generates positive and negative node pairs based on structural balance theory. Then, SIDE models an optimization function which can be stated as

The function tries to find the parameters that maximize the likelihood p ( u , v ) between two nodes u and v such that the likelihood value is high for positively connected nodes and low for negatively connected nodes. The latter part of the objective function regularizes the bias terms in the function. The time complexity SIDE is linear w.r.t. the number of nodes in the network.

SiNE [ 133 ] is a deep learning architecture for signed network embedding. It uses the structural balance theory which assumes that, a node is more similar to a node with a positive link than to a node with a negative link. SINE first defines a similarity function between the d-dimensional representations of nodes, and models an optimization function to learn the parameters of the similarity function which is stated as

where P defines the set of triplets \((v_i, v_j, v_k)\) where \(v_i\) and \(v_j\) have a positive link while \(v_i\) and \(v_k\) have a negative link, and \(P_0\) defines set of triplets \((v_i, v_j, v_0)\) where \(v_i\) and \(v_j\) have a positive link while \(v_i\) and \(v_0\) have a negative link. C is the size of the training data and \(\theta\) is the set of parameters to learn. SIDE uses a deep neural architecture with two hidden layers to optimize the objective function.

SNE [ 140 ] is a log-bilinear model [ 88 ] for generating embedding from the signed network. Given a path, SNE tries to predict the embedding of node v by linearly combining the feature vectors nodes in the path with corresponding signed weight vectors. A scoring function is used to measure the similarity between actual and predicted representation. The optimization objective is

The objective function is to maximize the logarithmic likelihood of a target node v , generated by a path of nodes h and their edge types, using a softmax function. Attribute signed network embedding [ 132 ] is another work which addresses network embedding on the signed network with an attribute associated with nodes. Initially, SNEA defines two optimization functions to generate embedding, one for modeling user attributes and other for modeling signed networks using structural balance theory. Then it models a joint optimization function by combining these components, and the training is done using gradient descent.

## 4.5 Dynamic network embedding

Many real-world networks are dynamic and will evolve over time [ 66 , 109 ]. Between adjacent snapshots, new nodes and edges may be added and existing ones may be lost. In this section, we will cover some methods which perform network embedding on dynamic networks. Figure 9 shows the major works on dynamic network embedding and Table 5 describes a summary of these methods.

Dynamic network embedding methods

The research done by [ 147 ] aims at developing a temporal latent space model that can predict links over time based on a sequence of previous graph snapshots. The authors first model a quadratic loss function to learn the temporal latent space from a dynamic social network which is stated as

where the first term denotes the matrix factorization of adjacency matrix representations of the network snapshots \(G_\varGamma\) and the second term \(1-Z_\varGamma (u)Z_{\varGamma -1}(u)^T\) penalizes node u for a sudden change in its latent position. The objective of the loss function is to maintain the temporal smoothness while generating embedding of consecutive snapshots by incorporating a temporal regularizer into a matrix factorization framework. A standard block-coordinate gradient descent (BCGD) algorithm is provided as a solution to the optimization problem. They also present two lemmas which prove the correctness of the method, followed by a thorough theoretical analysis of the solution. The time complexity of BCGD algorithm is \(O(rk\sum _T(n+m_{T}))\) , where n is the number of nodes, \(m_{T}\) is the number of edges in the graph \(G_{T}\) , k is the number of dimensions, and T is the number of timestamps. In the later section, the authors describe two variants of the proposed algorithm, namely local BCGD algorithm and incremental BCGD algorithm, with local and incremental updates respectively. Then they compare the proposed methods with other latent space inferring approaches in terms of inference time and memory consumption, and prove the quality of learned latent spaces in terms of their link predictive power.

Another perspective of a dynamic network is a temporal network [ 50 , 93 ], whose edges are active only when an interaction happens between the nodes. These interactions may lead to a flow of information between the nodes. Continuous-time dynamic network embedding (CTDNE) [ 92 ] aims at developing embedding for temporal networks by incorporating temporal dependencies into the state-of-the-art methods. In a temporal network, each edge is labeled with a timestamp which denotes the time of activation of an edge. CTDNE first perform a temporal random walk where an edge is traversed in the increasing order of timestamps and generates time-aware node sequences. Further, CTDNE uses the skip-gram architecture to learn time preserving node embeddings from the node sequences. The optimization objective of CTDNE is defined as

where \(v_{i-w},\ldots ,v_{i+w}\) is the neighboring vertices of vertex \(v_i\) and w is the context window size. The objective is to learn a function f which generates the node embeddings by maximizing the probability that the vertices co-occur in the temporal random walk occupy closer in the latent space. The time complexity of CTDNE is same as that of deepwalk as both uses the skip-gram architecture for learning node embeddings.

DynGEM [ 39 ] is a deep autoencoder based architecture to embed a dynamic network, which is inspired from SDNE. Given n snapshots of a dynamic network, DynGEM incrementally builds the embedding of the snapshot at time \(t_n\) from the embedding of the snapshot at time \(t_{n-1}\) . At each time step, DynGEM initializes the embedding from the previous time step, performs incremental learning, and thereby reduces the time for convergence from the second iteration. Incremental learning can be viewed as a transfer learning task where the model only needs to learn the changes between two graph snapshots. The dynamic network may grow in size. DynGEM uses a heuristic, prop size, to dynamically determine the number of hidden units required for each snapshot. The authors also provide various stability metrics for generating stable dynamic network embeddings. Unlike SDNE which uses a sigmoid function for activations and SGD for training, DynGEM uses ReLU in all autoencoder layers to support weighted graphs, and Nesterov momentum with properly tuned hyperparameters for training. The loss function of DynGEM is stated as

where the first and second term represents the second order and first order proximities respectively. DynGEM uses similar optimization objective as that of SDNE [ 129 ], but unlike SDNE which operates on static network, DynGEM optimizes the parameters of the objective function at each time step, thereby by learn the parameters across a series of snapshots. The time complexity of DynGEM is O ( ncdit ), where n is the number of vertices, d is the embedding dimension, c is the average degree of the network and i is the number of iterations, and t is the number of snapshots.

DynamicTriad [ 146 ] is another dynamic network embedding method which tries to preserve both structural and evolution pattern of the network. The aim of the work is to capture the network dynamics and to learn the low dimensional vectors for each node at different time steps. The work considers triadic closure as an important phenomenon which leads to network evolution, and is used to preserve the temporal dynamics while generating embedding. DynamicTriad models an optimization function with three components which is stated as

where \(L^t_{sh,1}\) is a loss function to preserve the structural connectivity, \(L^t_{tr,2}\) is a loss function to preserve the triadic closure process and \(\sum _{t=1}^{T-1}\sum _{i=1}^N\vert \vert u^{t+1}_{i} -u^{t}_{i}\vert \vert ^2_2\) is a loss function to impose temporal smoothness by minimizing the euclidean distance between embedding vectors in adjacent time steps. DyRep [ 120 ] is another work which considers both topological evolution and temporal interactions, and aims to develop embeddings which encode both structural and temporal information.

## 5 Datasets for network representation learning

In this section, we discuss the major network datasets used in network representation learning research.

## 5.1 Social networks

BlogCatalog [ 117 ] This is a dataset used in most of the network representation learning research. BlogCatalog is a social network denoting the relationship between blogger authors listed on BlogCatalog website. The topic category of each author can act as the label of each node. To model BlogCatalog as an attributed network, we may use tags and short description of blogs as user attributes. Blogger users and groups can be considered as heterogeneous node types, and can form a heterogeneous network with user-user and user-group edges.

Yelp Footnote 2 This network represents the user friendship relationship in the Yelp social networking service. User reviews can be considered as the attribute information. A heterogeneous network can be modeled using users (U), businesses (B), cities(C) and categories (T) as nodes, and friendships (U–U), user reviews (B–U), business cities (B–C), and business categories (B–T) as edges.

Flickr [ 117 ] The Flickr network denotes contacts between users of photo sharing website Flickr. The interest group of each user can be used as the label of each node. To model Flickr as an attributed network, we may use aggregate tags on the user photos as user attributes.

Youtube [ 118 ] It is a social network where users are linked if they share a common video. The users can be grouped based on their tastes and can form the label of each user.

Facebook Twitter [ 65 ] Social networks showing friend and following relationships between users. They are usually used in works which use networks with the scale-free property.

## 5.2 Citation networks

DBLP [ 113 ] Three datasets using DBLP data (author citation network, paper citation network and co-authorship network) are used in NRL research. Author citation network links authors when one author cites the other, paper citation network links papers when one paper cites the other and co-authorship network links authors if they co-author at least a single paper. Paper title or paper abstract can be used as the attribute associated with each node.

ArXiV [ 66 ] Two datasets using ArXiV, ArXiV-GR-QC and ArXiv Astro-PH are used in network representation learning research. Both are co-author collaboration networks where authors are linked if they co-author at least a single paper.

Citeseer Footnote 3 and Cora [ 83 ] Data from both Citeseer and Cora is used as paper citation network with paper text denoting the attributes of nodes.

Aminer computer science data [ 113 ] and database information system data [ 112 ] These datasets are commonly used to model heterogeneous networks with the author, paper, and venue as node types and with author–author, author–paper, and paper–venue edges.

## 5.3 Other networks

Wikipedia [ 77 ] A language network using word co-occurrences can be constructed from Wikipedia data with POS tags as node labels. Wikieditor [ 140 ] is a signed network extracted from Wikipedia dataset [ 61 ]. Positive or negative edges are given based on the co-edit relationship between the users.

PPI [ 11 ] Protein–Protein Interaction Network(PPI) is a biological network showing the interaction between proteins. Protein functions or post-translational modifications can be considered as node labels.

Epinions Footnote 4 and Slashdot Footnote 5 Epinions is a user-user signed network constructed from product review site Epinions. Positive and negative links between users indicate the trust and distrust between them. The product review written by the users can be considered as the attributes. Slashdot is a technology news site which allows users to annotate other users as friends.

Dynamic network datasets Collaboration network snapshots from HEP-TH dataset [ 36 ], autonomous system communication network snapshots [ 64 ] from BGP (Border Gateway Protocol) logs, email communication network snapshots from ENRON dataset [ 58 ], user collaboration network snapshots from Github data, Footnote 6 timestamped communication networks from Chinese Telecom and PPDai [ 146 ], academic network snapshots from Aminer dataset Footnote 7 etc. are some dynamic network snapshots that are used to conduct the experiments with representation learning on dynamic networks.

## 6 Applications of network representation learning

Researchers applied network representation learning on various network mining applications and demonstrated the performance improvement of such tasks over the state-of-the-art methods. A pipeline of network embedding based network mining is shown in Fig. 10 . In this section, we discuss the major applications of network representation learning.

A pipeline of network embedding based network mining

## 6.1 Node classification

Node classification [ 8 ] is the process of assigning labels to the unlabeled nodes in a network by considering the labels assigned to the labeled nodes and the topological structure of the network. The task is classified into single-label and multi-label node classification [ 59 ] depending upon the number of labels to be assigned to each node. A network embedding approach for node classification can be explained in three steps. (1) Embed the network to a low dimensional space. (2) Associate the known labels with the nodes, which form the training set (3) A lib-linear [ 30 ] classifier is trained to build the model, and can be used to predict the label of unlabeled nodes. The efficiency of the task can be measured using several evaluation measures like micro-F1, macro-F1 and accuracy. Node classification has been widely used as a benchmark for testing the efficiency of network representation methods. The effect of network embedding on node classification was tested on different datasets by various methods discussed in section 4.1 and the results presented by the authors are summarized below.

DeepWalk used social networks(BlogCatalog, Flickr, Youtube), Node2vec used social, biological, and language networks(BlogCatalog, P2P, Wikipedia), LINE used social, citation, and language networks (Flickr, Youtube, DBLP, Wikipedia), SDNE used social networks(BlogCatalog, Flickr, Youtube), GCN used citation networks(Citeseer, Cora, PubMed), HARP used social and citation networks(BlogCatalog, DBLP, Citeseer), ANE used citation and language networks(Cora, Citeseer, Wikipedia), GraphGAN used social and language networks(BlogCatalog, Wikipedia), and NETMF used social, biological, and language networks(BlogCatalog, Flickr, PPI, Wikipedia), for conducting the experiments with node classification problem on homogeneous networks. The effect of network embedding on node classification has been tested on attributed networks by TADW using citation and language networks(Cora, Citeseer, Wikipedia), AANE using social networks(BlogCatalog, Flickr, Youtube), DANE using social and citation networks (BlogCatalog, Flickr, DBLP, Epinions), IIRL using social and citation networks (BlogCatalog, Flickr, DBLP), and TriDNR using citation networks (Citeseer, DBLP). Experiment on node classification was tested on heterogeneous networks by Metapath2vec using citation networks (Aminer), and HIN2Vec using social and citation networks (BlogCatalog, Yelp, DBLP). The effect of network embedding on node classification has been tested on signed networks by SNE using language network(Wikieditor), SiNE using social networks( Epinions, Slashdot), and SIDE using social and language networks (Epinions, Slashdot, Wikipedia) and on dynamic networks by DynamicTriad using communication and citation networks (Mobile, Loan, Aminer).

## 6.2 Link prediction

Link prediction [ 71 , 73 ] is the one among the most well-studied network mining tasks that has got greater attention in recent years due to its wide range of applications. The link prediction problem can be defined as, given a social network at time \(t_1\) , the model needs to predict the edges that will be added to the network during the interval from current time \(t_1\) to a given future time \(t_2\) . In general, it can be related to the problem of inferring missing links from an observed network. Link prediction is useful in a variety of domains such as in social networks, where it recommends real-world friends, and in genomics, where it discovers the novel interaction between genes. The traditional method for link prediction is to define a similarity score between nodes based on similarity measures [ 71 ] like common neighbors, Adamic-Adar and preferential attachment. In a network embedding approach for link prediction, the nodes are first mapped into a low dimensional space. Then the vector similarity measures like cosine similarity and nearest neighbor approximation can be used to score the predicted links. The efficiency of the link prediction task can be measured using several evaluation measures such as precision and area under receiver operating curve (AOC) [ 73 ].

Node2vec used social, biological, and citation networks (Facebook, PPI, ArXiV), and SDNE and GraphGAN used citation networks (ArXiV) for conducting link prediction experiments on homogeneous networks. Link prediction experiments were conducted on attributed networks by DeepGL using various network datasets available at network repository, Footnote 8 and on heterogeneous networks by HIN2Vec using social and citation networks (BlogCatalog, Yelp, DBLP). The effect of network embedding on link prediction has been studied in signed networks by SNE using social and language networks (Slashdot, Wikieditor), SiNE using social networks (Epinions, Slashdot), SIDE using social and language networks (Epinions, Slashdot, Wiki), and SNEA using social networks (Epinions, Slashdot). Link prediction is an important challenge in dynamic networks and the significance of using node representations for link prediction in dynamic networks was tested by TNE using various network datasets from Koblenz Large Network Collection, Footnote 9 DynamicTraid using communication and citation networks (Mobile, Loan, Academic), DynGem using communication and citation networks (HEP-TH, ENRON, AS), CTDNE using various temporal network datasets, and DyRep using Github social network snapshots.

## 6.3 Network visualization

A network can be meaningfully visualized by creating a layout in 2-D space. In a network embedding approach for visualization, the learned node embeddings generated by the embedding algorithm is passed to a visualization tool (t-SNE [ 75 ], tensor flow embedding projector [ 1 ], PCA plot), and is visualized in a two-dimensional vector space. The visualization of the same dataset may differ across different embedding algorithms due to the differences in the properties preserved by each method.

Homogeneous networks are visualized by DNGR using t-SNE visualization of Wine Dataset Footnote 10 and ANE using t-SNE visualization of paper citation network (DBLP). TriDNR gives t-SNE visualization of attributed citation network (Citeseer). Metapath2vec provides tensorflow embedding projector visualization of a heterogeneous network (Aminer). SNE provides t-SNE visualization of a signed network (Wikieditor). Dynamic network snapshots are visualized by DynGem using a synthetic network (SYS), and DynRep using user collaboration network (Github).

## 6.4 Node clustering

Node clustering [ 105 ] is the process of grouping the nodes in a network into different clusters such that the sparsely connected dense subgraphs will be separated from each other. Functional module identification [ 28 ] in PPI networks is a typical application of node clustering. Traditional approaches for graph clustering [ 105 ] include methods based on k-spanning tree, betweenness centrality, shared nearest neighbor and clique enumeration. In a network embedding based approach for node clustering, the nodes are first mapped to a low dimensional space and vector space based clustering methods (eg. K-means clustering) are applied to generate the node clusters. Accuracy (AC) and normalized mutual information (NMI) [ 110 ] are the commonly used measures for evaluating the performance of node clustering task.

Some works used node clustering as the benchmark for evaluating the quality of node embeddings. DNGR performed node clustering on homogeneous language network (20-newsgroup network), DANE performed node clustering on attributed social and citation networks (BlogCatalog, Flickr, Epinions, DBLP), Methapath2vec and HNE performed node clustering on heterogeneous citation and social networks(Aminer, BlogCatalog), and SNEA performed node clustering on signed social networks (Epinions, Slashdot).

## 6.5 Other applications

Network representation learning is also applied in other areas of data mining and information retrieval. SDNE, HNE, and DynGem used network embedding for network reconstruction. GraphGAN used network embedding to build a recommender system using Movielens dataset. A user-movie bipartite graph is constructed and used the learned representations of users and movies to recommend unwatched movies to the user. CUNE [ 143 ] aimed at enhancing the recommender system by incorporating the social information from the user-item bipartite network with rating information. CUNE constructs a user-interaction network from the user-item bipartite network, extracts implicit social information by embedding nodes of the user interaction network, and finally learns an objective function that incorporates top-k social links with the matrix factorization framework. Researchers [ 20 ] used network embedding (DeepWalk) to analyze Wikipedia pages for identifying historical analogies. The work [ 54 ] aimed at predicting users multi-interests from user interactions on health-related datasets. Other applications of network embedding include anomaly detection [ 39 ], multimodal search [ 16 ], information diffusion [ 10 , 145 ], community detection [ 145 ], anchor-link prediction [ 80 ], emerging relation detection [ 144 ], sentiment link prediction [ 131 ], author identification [ 19 ], social relation extraction [ 123 ], and name disambiguation [ 142 ].

## 7 Conclusion and future works

As revolutionary advances in representation learning have got tremendous success in several application domains, the area of network mining also got influenced by representation learning techniques, due to its high-quality result and state-of-the-art performance. Various approaches based on representation learning were developed to learn node representations from large and complex networks. In this paper, we build a taxonomy of network representation learning methods based on the type of networks and review the major research works that come under each category. We further discuss the various network datasets used in network representation learning research. Finally, we review the major applications of network embedding.

Network representation learning is a young and promising field with a lot of unsolved challenges which provides various directions for future works.

Preserving complex structure and properties of real-world networks : Most of the real-world networks are very complex, and may contain higher order structures like network motifs [ 87 ]. They also exhibit complex properties, which include scale-free property, hyper edges, and nodes with high betweenness centrality . Even if some efforts have been made to work with scale-free property [ 31 ] and hyper networks [ 43 , 124 ], a significant improvement has to be made in these directions.

Complex network types : The taxonomy of the types of networks that we provided in this review is not mutually exclusive. More complex network types can be modeled by combining these basic types. For example, a citation network can be modeled as a dynamic heterogeneous attributed network which demands novel efforts in generating node embeddings.

Addressing the big graph challenge : Many real-world networks are very large with millions of nodes and vertices. Even if most of the embedding methods are designed to be highly scalable, a significant amount of work is to be done to adapt them towards such huge networks. As network embedding is basically an optimization problem, large-scale optimization methods can be used to improve its scalability. Another interesting direction towards enhancing the scalability is to develop new embedding strategies which can make use of the large-scale graph processing platforms like Giraph and Graphx, or to parallelize the existing methods so as to work with these distributed computing platforms.

More Applications : Most of the research on network embedding focused on node classification, node clustering, and link prediction. Network mining is a fast growing field with a lot of applications in various domains. So there exists an exciting direction of further work towards extending the existing methods or developing novel embedding methods towards solving more network mining tasks such as network evolution detection [ 135 , 136 ], influential node detection [ 79 ], and network summarization [ 72 ].

A few efforts are already made to learn hyperbolic embeddings [ 25 ] and to use deep reinforcement learning [ 100 ] for network embedding, and more work is to be done in these significant directions.

The terms network representation learning and network embedding are used interchangeably in this paper.

https://www.yelp.com/dataset_challenge .

http://konect.uni-koblenz.de/networks/citeseer .

http://www.epinions.com/ .

https://slashdot.org/ .

https://www.gharchive.org/ .

http://www.aminer.org .

http://networkrepository.com/ .

http://konect.uni-koblenz.de/ .

https://archive.ics.uci.edu/ml/datasets/wine .

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. OSDI 16:265–283

Google Scholar

Ahmed A, Shervashidze N, Narayanamurthy S, Josifovski V, Smola AJ (2013) Distributed large-scale natural graph factorization. In: Proceedings of the 22nd international conference on World Wide Web. ACM, pp 37–48

Aiolli F, Donini M, Navarin N, Sperduti A (2015) Multiple graph-kernel learning. In: 2015 IEEE symposium series on computational intelligence. IEEE, pp 1607–1614

Barabási AL, Gulbahce N, Loscalzo J (2011) Network medicine: a network-based approach to human disease. Nat Rev Genet 12(1):56

Bastian M, Heymann S, Jacomy M et al (2009) Gephi: an open source software for exploring and manipulating networks. ICWSM 8:361–362

Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in neural information processing systems. pp 585–591

Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

Bhagat S, Cormode G, Muthukrishnan S (2011) Node classification in social networks. In: Aggarwal C (ed) Social network data analytics. Springer, pp 115–148

Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010. Springer, pp 177–186

Bourigault S, Lagnier C, Lamprier S, Denoyer L, Gallinari P (2014) Learning social network embeddings for predicting information diffusion. In: Proceedings of the 7th ACM international conference on Web search and data mining. ACM, pp 393–402

Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bähler J, Wood V et al (2007) The biogrid interaction database: 2008 update. Nucleic Acids Res suppl 1(36):D637–D640

Bullinaria JA, Levy JP (2007) Extracting semantic representations from word co-occurrence statistics: a computational study. Behav Res Methods 39(3):510–526

Cai H, Zheng VW, Chang K (2018) A comprehensive survey of graph embedding: problems, techniques and applications. IEEE Trans Knowl Data Eng 30(9):1616–1637

Cao S, Lu W, Xu Q (2015) Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 891–900

Cao S, Lu W, Xu Q (2016) Deep neural networks for learning graph representations. In: AAAI. pp 1145–1152

Chang S, Han W, Tang J, Qi GJ, Aggarwal CC, Huang TS (2015) Heterogeneous network embedding via deep architectures. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 119–128

Chen H, Perozzi B, Hu Y, Skiena S (2017a) Harp: Hierarchical representation learning for networks. arXiv preprint arXiv:170607845

Chen J, Ma T, Xiao C (2018) Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:180110247

Chen T, Sun Y (2017) Task-guided and path-augmented heterogeneous network embedding for author identification. In: Proceedings of the Tenth ACM international conference on web search and data mining. ACM, pp 295–304

Chen Y, Perozzi B, Skiena S (2017b) Vector-based similarity measurements for historical figures. Inf Syst 64:163–174

Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning. ACM, pp 160–167

Cui P, Wang X, Pei J, Zhu W (2017) A survey on network embedding. arXiv preprint arXiv:171108752

Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Apeech Lang Process 20(1):30–42

Dai Q, Li Q, Tang J, Wang D (2017) Adversarial network embedding. arXiv preprint arXiv:171107838

De Sa C, Gu A, Ré C, Sala F (2018) Representation tradeoffs for hyperbolic embeddings. arXiv preprint arXiv:180403329

Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in neural information processing systems. pp 3844–3852

Dittrich MT, Klau GW, Rosenwald A, Dandekar T, Müller T (2008) Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics 24(13):i223–i231

Dong Y, Chawla NV, Swami A (2017) metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 135–144

Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(Aug):1871–1874

MATH Google Scholar

Feng R, Yang Y, Hu W, Wu F, Zhuang Y (2017) Representation learning for scale-free networks. arXiv preprint arXiv:171110755

Fu Ty, Lee WC, Lei Z (2017) Hin2vec: explore meta-paths in heterogeneous information networks for representation learning. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 1797–1806

Fu Y, Ma Y (2012) Graph embedding for pattern analysis. Springer, Berlin

Gallagher B, Eliassi-Rad T (2010) Leveraging label-independent features for classification in sparsely labeled networks: an empirical study. In: Advances in social network mining and analysis. Springer, pp 1–19

García-Durán A, Niepert M (2017) Learning graph representations with embedding propagation. arXiv preprint arXiv:171003059

Gehrke J, Ginsparg P, Kleinberg J (2003) Overview of the 2003 kdd cup. ACM SIGKDD Explor Newsl 5(2):149–151

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems. pp 2672–2680

Goyal P, Ferrara E (2017) Graph embedding techniques, applications, and performance: a survey. arXiv preprint arXiv:170502801

Goyal P, Kamra N, He X, Liu Y (2018) Dyngem: deep embedding method for dynamic graphs. arXiv preprint arXiv:180511273

Graves A, Mohamed Ar, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6645–6649

Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) Lstm: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232

MathSciNet Google Scholar

Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 855–864

Gui H, Liu J, Tao F, Jiang M, Norick B, Han J (2016) Large-scale embedding learning in heterogeneous event data. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 907–912

Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep learning for visual understanding: a review. Neurocomputing 187:27–48

Hamilton W, Ying Z, Leskovec J (2017a) Inductive representation learning on large graphs. In: Advances in neural information processing systems. pp 1025–1035

Hamilton WL, Ying R, Leskovec J (2017b) Representation learning on graphs: methods and applications. arXiv preprint arXiv:170905584

Hammond DK, Vandergheynst P, Gribonval R (2011) Wavelets on graphs via spectral graph theory. Appl Comput Harmonic Anal 30(2):129–150

MathSciNet MATH Google Scholar

Hinton G, Deng L, Yu D, Dahl GE, Ar Mohamed, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97

Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural comput 18(7):1527–1554

Holme P, Saramäki J (2012) Temporal networks. Phys Rep 519(3):97–125

Huang X, Li J, Hu X (2017a) Accelerated attributed network embedding. In: Proceedings of the 2017 SIAM international conference on data mining. SIAM, pp 633–641

Huang X, Li J, Hu X (2017b) Label informed attributed network embedding. In: Proceedings of the tenth ACM international conference on web search and data mining. ACM, pp 731–739

Huang X, Song Q, Li J, Hu X (2018) Exploring expert cognition for attributed network embedding. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, pp 270–278

Jin Z, Liu R, Li Q, Zeng DD, Zhan Y, Wang L (2016) Predicting user’s multi-interests with network embedding in health-related topics. In: 2016 International joint conference on neural networks (IJCNN). IEEE, pp 2568–2575

Kim J, Park H, Lee JE, Kang U (2018) Side: Representation learning in signed directed networks. In: Proceedings of the 2018 World Wide Web conference on World Wide Web, international World Wide Web conferences steering committee. pp 509–518

Kipf TN, Welling M (2016a) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:160902907

Kipf TN, Welling M (2016b) Variational graph auto-encoders. arXiv preprint arXiv:161107308

Klimt B, Yang Y (2004) The enron corpus: a new dataset for email classification research. In: European conference on machine learning. Springer, pp 217–226

Kong X, Shi X, Yu PS (2011) Multi-label collective classification. In: Proceedings of the 2011 SIAM international conference on data mining. SIAM, pp 618–629

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. pp 1097–1105

Kumar S, Spezzano F, Subrahmanian V (2015) Vews: A wikipedia vandal early warning system. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 607–616

Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning. pp 1188–1196

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data

Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. In: Advances in neural information processing systems. pp 539–547

Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data (TKDD) 1(1):2

Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1361–1370

Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Linguist 3:211–225

Li J, Zhu J, Zhang B (2016) Discriminative deep random walk for network classification. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers). vol 1, pp 1004–1013

Li J, Dani H, Hu X, Tang J, Chang Y, Liu H (2017) Attributed network embedding for learning in a dynamic environment. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 387–396

Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Assoc Inf Sci Technol 58(7):1019–1031

Liu Y, Safavi T, Dighe A, Koutra D (2016) Graph summarization methods and applications: a survey. arXiv preprint arXiv:161204883

Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Physica A Stat Mech Appl 390(6):1150–1170

Lü L, Medo M, Yeung CH, Zhang YC, Zhang ZK, Zhou T (2012) Recommender systems. Phys Rep 519(1):1–49

Lvd Maaten, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605

Maayan A (2011) Introduction to network analysis in systems biology. Sci Signal 4(190):tr5

Mahoney M (2011) Large text compression benchmark. http://www.mattmahoney.net/text/text.html

Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 135–146

Malliaros FD, Rossi MEG, Vazirgiannis M (2016) Locating influential nodes in complex networks. Sci Rep 6:19307

Man T, Shen H, Liu S, Jin X, Cheng X (2016) Predict anchor links across social networks via an embedding approach. IJCAI 16:1823–1829

Martella C, Shaposhnik R, Logothetis D, Harenberg S (2015) Practical graph analytics with apache giraph. Springer, Berlin

Mason W, Vaughan JW, Wallach H (2014) Computational social science and social computing. Mach learn 95:257–260. https://doi.org/10.1007/s10994-013-5426-8

Article MathSciNet Google Scholar

McCallum AK, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inf Retr 3(2):127–163

Micheli A (2009) Neural network for graphs: a contextual constructive approach. IEEE Trans Neural Netw 20(3):498–511

Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781

Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp 3111–3119

Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298(5594):824–827

Mnih A, Kavukcuoglu K (2013) Learning word embeddings efficiently with noise-contrastive estimation. In: Advances in neural information processing systems. pp 2265–2273

Moyano LG (2017) Learning network representations. Eur Phys J Spec Top 226(3):499–518

Natarajan N, Dhillon IS (2014) Inductive matrix completion for predicting gene-disease associations. Bioinformatics 30(12):i60–i68

Neville J, Jensen D (2000) Iterative classification in relational data. In: Proc. AAAI-2000 workshop on learning statistical models from relational data. pp 13–20

Nguyen GH, Lee JB, Rossi RA, Ahmed NK, Koh E, Kim S (2018) Continuous-time dynamic network embeddings. In: 3rd International workshop on learning representations for big networks (WWW BigNet)

Nicosia V, Tang J, Mascolo C, Musolesi M, Russo G, Latora V (2013) Graph metrics for temporal networks. In: Holme P, Saramäki J (eds) Temporal networks. Springer, Berlin, pp 15–40. https://doi.org/10.1007/978-3-642-36461-7_2

Chapter Google Scholar

Ou M, Cui P, Pei J, Zhang Z, Zhu W (2016) Asymmetric transitivity preserving graph embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1105–1114

Pan S, Wu J, Zhu X, Zhang C, Wang Y (2016) Tri-party deep network representation. Network 11(9):12

Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 701–710

Perozzi B, Kulkarni V, Skiena S (2016) Walklets: Multiscale graph embeddings for interpretable network classification. arXiv preprint arXiv:160502115

Pfeiffer III JJ, Moreno S, La Fond T, Neville J, Gallagher B (2014) Attributed graph models: modeling network structure with correlated attributes. In: Proceedings of the 23rd international conference on World Wide Web. ACM, pp 831–842

Qi GJ, Aggarwal C, Tian Q, Ji H, Huang T (2012) Exploring context and content links in social media: a latent space method. IEEE Trans Pattern Anal Mach Intell 34(5):850–862

Qu M, Tang J, Han J (2018) Curriculum learning for heterogeneous star network embedding via deep reinforcement learning. In: Proceedings of the eleventh ACM international conference on web search and data mining. ACM, pp 468–476

Ribeiro LF, Saverese PH, Figueiredo DR (2017) struc2vec: Learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 385–394

Rossi RA, Zhou R, Ahmed NK (2017) Deep feature learning for graphs. arXiv preprint arXiv:170408829

Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80

Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64

Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

Shaw B, Jebara T (2009) Structure preserving embedding. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 937–944

Socher R, Bengio Y, Manning CD (2012) Deep learning for nlp (without magic). In: Tutorial abstracts of ACL 2012, association for computational linguistics. pp 5–5

Steglich C, Snijders TA, Pearson M (2010) Dynamic networks and behavior: separating selection from influence. Sociol Methodol 40(1):329–393

Strehl A, Ghosh J, Mooney R (2000) Impact of similarity measures on web-page clustering. In: Workshop on artificial intelligence for web search (AAAI 2000). vol 58, p 64

Sun Y, Han J (2012) Mining heterogeneous information networks: principles and methodologies. Synth Lect Data Min Knowl Discov 3(2):1–159

Sun Y, Han J, Yan X, Yu PS, Wu T (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proc VLDB Endow 4(11):992–1003

Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 990–998

Tang J, Qu M, Mei Q (2015a) Pte: Predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1165–1174

Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015b) Line: large-scale information network embedding. In: Proceedings of the 24th international conference on World Wide Web, international World Wide Web conferences steering committee. pp 1067–1077

Tang J, Chang Y, Aggarwal C, Liu H (2016) A survey of signed network mining in social media. ACM Comput Surv (CSUR) 49(3):42

Tang L, Liu H (2009a) Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 817–826

Tang L, Liu H (2009b) Scalable learning of collective behavior based on sparse social dimensions. In: Proceedings of the 18th ACM conference on Information and knowledge management. ACM, pp 1107–1116

Tran DV, Navarin N, Sperduti A (2018) On filter size in graph convolutional networks. In: 2018 IEEE symposium series on computational intelligence (SSCI). IEEE, pp 1534–1541

Trivedi R, Farajtbar M, Biswal P, Zha H (2018) Representation learning over dynamic graphs. arXiv preprint arXiv:180304051

Tu C, Zhang W, Liu Z, Sun M (2016) Max-margin deepwalk: discriminative learning of network representation. In: IJCAI. pp 3889–3895

Tu C, Liu H, Liu Z, Sun M (2017a) Cane: context-aware network embedding for relation modeling. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers). vol 1, pp 1722–1731

Tu C, Zhang Z, Liu Z, Sun M (2017b) Transnet: translation-based network representation learning for social relation extraction. In: Proceedings of international joint conference on artificial intelligence (IJCAI). Melbourne

Tu K, Cui P, Wang X, Wang F, Zhu W (2017c) Structural deep embedding for hyper-networks. arXiv preprint arXiv:171110146

Utsumi A (2015) A complex network approach to distributional semantic models. PLoS ONE 10(8):e0136277

Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10:66–71

Vishwanathan SVN, Schraudolph NN, Kondor R, Borgwardt KM (2010) Graph kernels. J Mach Learn Res 11(Apr):1201–1242

Wan J, Wang D, Hoi SCH, Wu P, Zhu J, Zhang Y, Li J (2014) Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 157–166

Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1225–1234

Wang H, Wang J, Wang J, Zhao M, Zhang W, Zhang F, Xie X, Guo M (2017a) Graphgan: Graph representation learning with generative adversarial nets. arXiv preprint arXiv:171108267

Wang H, Zhang F, Hou M, Xie X, Guo M, Liu Q (2018) Shine: signed heterogeneous information network embedding for sentiment link prediction. In: Proceedings of the eleventh ACM international conference on web search and data mining. ACM, pp 592–600

Wang S, Aggarwal C, Tang J, Liu H (2017b) Attributed signed network embedding. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 137–146

Wang S, Tang J, Aggarwal C, Chang Y, Liu H (2017c) Signed network embedding in social media. In: Proceedings of the 2017 SIAM international conference on data mining. SIAM, pp 327–335

Wright J, Ma Y, Mairal J, Sapiro G, Huang TS, Yan S (2010) Sparse representation for computer vision and pattern recognition. Proc IEEE 98(6):1031–1044

Wu T, Chen L, Zhong L, Xian X (2017) Predicting the evolution of complex networks via similarity dynamics. Physica A Stat Mech Appl 465:662–672

Wu T, Chang CS, Liao W (2018) Tracking network evolution and their applications in structural network analysis. IEEE Transactions on Network Science and Engineering

Xin RS, Gonzalez JE, Franklin MJ, Stoica I (2013) Graphx: a resilient distributed graph system on spark. In: First international workshop on graph data management experiences and systems. ACM, p 2

Xu L, Wei X, Cao J, Philip SY (2018) On exploring semantic meanings of links for embedding social networks. DEF 2:6

Yang C, Liu Z, Zhao D, Sun M, Chang EY (2015) Network representation learning with rich text information. In: IJCAI. pp 2111–2117

Yuan S, Wu X, Xiang Y (2017) SNE: signed network embedding. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 183–195

Zampieri G, Van Tran D, Donini M, Navarin N, Aiolli F, Sperduti A, Valle G (2018) Scuba: scalable kernel-based gene prioritization. BMC Bioinform 19(1):23

Zhang B, Al Hasan M (2017) Name disambiguation in anonymized graphs using network embedding. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 1239–1248

Zhang C, Yu L, Wang Y, Shah C, Zhang X (2017a) Collaborative user network embedding for social recommender systems. In: Proceedings of the 2017 SIAM international conference on data mining. SIAM, pp 381–389

Zhang J, Lu CT, Zhou M, Xie S, Chang Y, Philip SY (2016) Heer: Heterogeneous graph embedding for emerging relation detection from news. In: 2016 IEEE international conference on big data (big data). IEEE, pp 803–812

Zhang J, Cui L, Fu Y (2017b) Latte: application oriented network embedding. arXiv preprint arXiv:171111466

Zhou L, Yang Y, Ren X, Wu F, Zhuang Y (2018) Dynamic network embedding by modeling triadic closure process. In: Thirty-Second AAAI Conference on Artificial Intelligence

Zhu L, Guo D, Yin J, Ver Steeg G, Galstyan A (2016) Scalable temporal latent space inference for link prediction in dynamic social networks. IEEE Trans Knowl Data Eng 28(10):2765–2777

Zhu W, Milanović JV (2017) Interdepedency modeling of cyber-physical systems using a weighted complex network approach. In: PowerTech, 2017 IEEE Manchester. IEEE, pp 1–6

Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning (ICML-03). pp 912–919

Download references

## Acknowledgements

The authors would like to thank the management and staff of Department of Computer Applications, CUSAT, India and NSS College of Engineering, Palakkad, India for providing enough materials to conduct this study.

## Author information

Authors and affiliations.

Artificial Intelligence Lab, Department of Computer Applications, Cochin University of Science and Technology, Kerala, 682022, India

Anuraj Mohan

Department of Computer Applications, Cochin University of Science and Technology, Kerala, 682022, India

K. V. Pramod

You can also search for this author in PubMed Google Scholar

## Corresponding author

Correspondence to Anuraj Mohan .

## Ethics declarations

Conflict of interest.

The authors declare that they have no conflict of interest.

## Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and permissions

## About this article

Mohan, A., Pramod, K.V. Network representation learning: models, methods and applications. SN Appl. Sci. 1 , 1014 (2019). https://doi.org/10.1007/s42452-019-1044-9

Download citation

Received : 18 May 2019

Accepted : 02 August 2019

Published : 09 August 2019

DOI : https://doi.org/10.1007/s42452-019-1044-9

## Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

- Network embedding
- Representation learning
- Deep learning
- Social networks
- Neural networks

## JEL Classification

- 06 Computer Science
- Find a journal
- Publish with us
- Track your research

## Representation Learning on Networks: Theories, Algorithms, and Applications

New citation alert added.

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

## New Citation Alert!

Please log in to your account

## Information & Contributors

Bibliometrics & citations.

- He X Ren Z Yilmaz E Najork M Chua T (2021) Introduction to the Special Section on Graph Technologies for User Modeling and Recommendation, Part 2 ACM Transactions on Information Systems 10.1145/3490180 40 :3 (1-5) Online publication date: 14-Dec-2021 https://dl.acm.org/doi/10.1145/3490180
- He X Ren Z Yilmaz E Najork M Chua T (2021) Graph Technologies for User Modeling and Recommendation: Introduction to the Special Issue - Part 1 ACM Transactions on Information Systems 10.1145/3477596 40 :2 (1-5) Online publication date: 27-Sep-2021 https://dl.acm.org/doi/10.1145/3477596
- Shen I Zhang L Lian J Wu C Fierro M Argyriou A Wu T Gupta R Liu Y Shah M Rajan S Tang J Prakash B (2020) In Search for a Cure Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 10.1145/3394486.3406711 (3519-3520) Online publication date: 23-Aug-2020 https://dl.acm.org/doi/10.1145/3394486.3406711

## Index Terms

Computing methodologies

Machine learning

Machine learning approaches

Neural networks

Information systems

Information systems applications

Data mining

## Recommendations

Representation learning for diagnostic data.

Representation learning algorithms have recently led to a significant progress in knowledge extraction from network structures. In this paper, a representation learning framework for the medical diagnosis domain is proposed. It is based on a ...

## Representation Learning: A Review and New Perspectives

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. ...

## Network representation learning: a systematic literature review

Omnipresent network/graph data generally have the characteristics of nonlinearity, sparseness, dynamicity and heterogeneity, which bring numerous challenges to network related analysis problem. Recently, influenced by the excellent ability of deep ...

## Information

Published in.

Georgia Tech, USA

Microsoft Research, USA

## In-Cooperation

- IW3C2: International World Wide Web Conference Committee

Association for Computing Machinery

New York, NY, United States

## Publication History

Permissions, check for updates, author tags.

- Feature Learning
- Graph Mining
- Graph Neural Networks
- Network Embedding
- Network Science
- Representation Learning
- Research-article
- Refereed limited

## Acceptance Rates

Contributors, other metrics, bibliometrics, article metrics.

- 3 Total Citations View Citations
- 512 Total Downloads
- Downloads (Last 12 months) 8
- Downloads (Last 6 weeks) 0

## View Options

Login options.

Check if you have access through your login credentials or your institution to get full access on this article.

## Full Access

View options.

View or Download as a PDF file.

View online with eReader .

## HTML Format

View this article in HTML Format.

## Share this Publication link

Copying failed.

## Share on social media

Affiliations, export citations.

- Please download or close your previous search result export first before starting a new bulk export. Preview is not available. By clicking download, a status dialog will open to start the export process. The process may take a few minutes but once it finishes a file will be downloadable from your browser. You may continue to browse the DL while the export process is in progress. Download
- Download citation
- Copy citation

We are preparing your search results for download ...

We will inform you here when the file is ready.

Your file of search results citations is now ready.

Your search export query has expired. Please try again.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Open access
- Published: 08 September 2015

## A unified data representation theory for network visualization, ordering and coarse-graining

- István A. Kovács 1 , 2 , 3 ,
- Réka Mizsei 4 &
- Péter Csermely 5

Scientific Reports volume 5 , Article number: 13786 ( 2015 ) Cite this article

11k Accesses

9 Citations

3 Altmetric

Metrics details

- Complex networks
- Information theory and computation
- Network topology

Representation of large data sets became a key question of many scientific disciplines in the last decade. Several approaches for network visualization, data ordering and coarse-graining accomplished this goal. However, there was no underlying theoretical framework linking these problems. Here we show an elegant, information theoretic data representation approach as a unified solution of network visualization, data ordering and coarse-graining. The optimal representation is the hardest to distinguish from the original data matrix, measured by the relative entropy. The representation of network nodes as probability distributions provides an efficient visualization method and, in one dimension, an ordering of network nodes and edges. Coarse-grained representations of the input network enable both efficient data compression and hierarchical visualization to achieve high quality representations of larger data sets. Our unified data representation theory will help the analysis of extensive data sets, by revealing the large-scale structure of complex networks in a comprehensible form.

## Similar content being viewed by others

## Exploiting graphlet decomposition to explain the structure of complex networks: the GHuST framework

## Network cartographs for interpretable visualizations

## Intrinsic dimension as a multi-scale summary statistics in network modeling

Introduction.

Complex network 1 , 2 representations are widely used in physical, biological and social systems and are usually given by huge data matrices. Network data size grew to the extent, which is too large for direct comprehension and requires carefully chosen representations. One option to gain insight into the structure of complex systems is to order the matrix elements to reveal the concealed patterns, such as degree-correlations 3 , 4 or community structure 5 , 6 , 7 , 8 , 9 , 10 , 11 . Currently, there is a diversity of matrix ordering schemes of different backgrounds, such as graph theoretic methods 12 , sparse matrix techniques 13 and spectral decomposition algorithms 14 . Coarse-graining or renormalization of networks 15 , 16 , 17 , 18 , 19 , 20 also gained significant attention recently as an efficient tool to zoom out from the network, by averaging out short-scale details to reduce the size of the network to a tolerable extent and reveal the large-scale patterns. A variety of heuristic coarse-graining techniques – also known as multi-scale approaches – emerged, leading to significant advances of network-related optimization problems 21 , 22 and the understanding of network structure 19 , 20 , 23 . As we discuss in the Supplementary Information in more details, coarse-graining is also closely related to some block-models useful for clustering and benchmark graph generation 24 , 25 , 26 .

The most essential tool of network comprehension is a faithful visualization of the network 27 . Preceding more elaborate quantitative studies, it is capable of yielding an intuitive, direct qualitative understanding of complex systems. Although being of a primary importance, there is no general theory for network layout, leading to a multitude of graph drawing techniques. Among these, force-directed 28 methods are probably the most popular visualization tools, which rely on physical metaphors. Graph layout aims to produce aesthetically appealing outputs, with many subjective aims to quantify, such as minimal overlaps between not related parts (e.g. minimal edge crossings in d = 2), while preserving the symmetries of the network. Altogether, the field of graph drawing became a meeting point of art, physics and computer science 29 .

Since the known approaches for the above problems generally lead to computationally expensive NP-hard problems 30 , the practical implementations were necessarily restricted to advanced approximative heuristic algorithms. Moreover, there was no successful attempt to incorporate network visualization, data ordering and coarse-graining into a common theoretical framework. Since information theory provides ideal tools to quantify the hidden structure in probabilistic data 31 , 32 , its application to complex networks 25 , 26 , 33 , 34 , 35 , 36 , 37 is a highly promising field. In this paper, our primary goal is to show an elegant, information theoretic representation theory for the unified solution of network visualization, data ordering and coarse-graining, establishing a common ground for the first time for these separated fields.

Usually, in graph theory, the complex system is at the level of abstraction, where each node is a dimensionless object, connected by lines representing their relations, given by the input data. Instead, we study the case in which both the input matrix and the approximative representation is given in the form of a probability distribution. This is the routinely considered case of edge weights reflecting the existence, frequency or strength of the interaction, such as in social and technological networks of communication, collaboration and traveling or in biological networks of interacting molecules or species. As discussed in details in the Supplementary Information , the probabilistic framework has long traditions in the theory of complex networks, including general random graph models, all Bayesian methods, community detection benchmarks 24 , block-models 25 , 26 and graphons 38 .

The major tenet of our unified framework is that the best representation is selected by the criteria, that it is the hardest to be distinguished from the input data. In information theory this is readily obtained by minimizing the relative entropy – also known as the Kullback-Leibler divergence 39 – as a quality function. In the following we show that the visualization, ordering and coarse-graining of networks are intimately related to each other, being organic parts of a straightforward, unified representation theory. We also show that in some special cases our unified framework becomes identical with some of the known state-of-the-art solutions for both visualization 40 , 41 , 42 and coarse-graining 25 , 26 , obtained independently in the literature.

## General network representation theory

Illustration of our data representation framework.

Here η is the ratio of the needed extra description length to the optimal description length of the system. In the following applications we use η to compare the optimality of the found representations. As an important property, the optimization of relative entropy is local in the sense, that the global optimum of a network comprising independent subnetworks is also locally optimal for each subnetwork. The finiteness of D 0 also ensures, that if i and j are connected in the original network ( a ij > 0), then they are guaranteed to be connected in a meaningful representation as well, enforcing b ij > 0, otherwise D would diverge. In the opposite case, when we have a connection in the representation, without a corresponding edge in the original graph ( b ij > 0 while a ij = 0), b ij does not appear directly in D , only globally, as a part of the b ** normalization. This density-preserving property leads to a significant computational improvement for sparse networks, since there is no need to build a denser representation matrix, than the input matrix if we keep track of the b ** normalization. Nevertheless, the B matrix of the optimal representation (where D is small) is close to A , since due to Pinsker's inequality the total variation of the normalized distributions is bounded by D 44

Thus, in the optimal representation of a network all the connected network elements are connected, while having only a strongly suppressed amount of false positive connections. Here we note, that our representation theory can be straightforwardly extended for input networks given by an H incidence matrix instead of an adjacency matrix, for details of this case see the Methods section.

## Network visualization and data ordering

Since force-directed layout schemes 28 have an energy or quality function, optimized by efficient techniques borrowed from many-body physics 45 and computer science 46 , graph layout could be in principle serve as a quantitative tool. However some of, these popular approaches inherently struggle with an information shortage problem, since the edge weights only provide half the needed data to initialize these techniques. For instance, for the initialization of the widely applied Fruchterman-Reingold 47 (or for the Kamada-Kawai 48 ) method we need to set both the strength of an attractive force (optimal distance) and a repulsive force (spring constant) between the nodes in order to have a balanced system. Due to the lack of sufficient information, such graph layout techniques become somewhat ill-defined and additional subjective considerations are needed to double the information encoded in the input data, traditionally by a nonlinear transformation of the attractive force parameters onto the parameters of the repulsive force 47 . Global optimization techniques, such as information theoretic methods 40 , 41 , 42 , 49 can, in principle, solve this problem by deriving the needed forces from one single information theoretic quality function.

Independently from the chosen optimization protocol, the finiteness of D 0 ensures that the connected nodes overlap in the layout as well, even for distributions having a finite support. Moreover, independent parts of the network (nodes or sets of nodes without connections between them) tend to be apart from each other in the layout. The density-preserving property of the representation leads to the fact, that even if all the nodes overlap with all other nodes in the layout, the B matrix can be kept exactly as sparse as the A matrix, while keeping track only of the sum of the b ** normalization including the rest of the potential b ij matrix elements. Additionally, if two rows (and columns) of the input matrix are proportional to each other, then it is optimal to represent them with the same distribution function in the layout, as though the two rows were merged together.

Our method is illustrated in Fig. 2 . on the Zachary karate club network 52 , which became a cornerstone of graph algorithm testing. It is a weighted social network of friendships between N 0 = 34 members of a karate club at a US university, which fell apart after a debate into two communities. While usually the size of the nodes can be chosen arbitrarily, e.g. to illustrate their degree or other relevant characteristics, here the size of the nodes is part of the visualization optimization by reflecting the width of the distribution, indicating relevant information about the layout itself. In fact, the size of a node represents the uncertainty of its position, serving also as a readily available local quality indicator. For illustration of the applicability of our network visualization method to larger collaboration 53 and information sharing 54 networks, having more than 10,000 nodes, see the Supplementary Information .

Illustration of the power of our unified representation theory on the Zachary karate club network 52 .

The optimal layout (η = 2.1%, see Eq. (3) ) in terms of d = 2 dimensional Gaussians is shown by a density plot in ( a ) and by circles of radiuses σ i in ( b ). ( c ) the best layout is obtained in d = 3 (η = 1.7%), where the radiuses of the spheres are chosen to be proportional to σ i . ( d ) the original data matrix of the network with an arbitrary ordering. ( e ) the d = 1 layout (η = 4.5%) yields an ordering of the original data matrix of the network. ( f ) the optimal coarse-gaining of the data matrix yields a tool to zoom out from the network in accordance with the underlying community structure. The colors indicate our results at the level of two clusters, being equivalent to the ones given by popular community detection techniques, such as the modularity optimization 5 or the degree-corrected stochastic block model 25 . We note, that the coarse-graining itself does not yield a unique ordering of the nodes, therefore an arbitrarily chosen compatible ordering is shown in this panel.

Our network layout technique works in any dimensions, as illustrated in d = 1, 2 and 3 in Fig. 2 . In each case the communities are clearly recovered and, as expected, the quality of layout becomes better (indicated by a decreasing η value) as the dimensionality of the embedding space increases. Nevertheless, the one dimensional case deserves special attention, since it serves as an ordering of the elements as well (after resolving possible degenerations with small perturbations), as illustrated in Fig. 1e .

When applying a local scheme for the optimization of the representations, we generally run into local minima, in which the layout can not be improved by single node updates, since whole parts of the network should be updated (rescaled, rotated or moved over each other), instead. Being a general difficulty in many optimization problems, it was expected to be insurmountable also in our approach. In the following we show, that the relative entropy based coarse-graining scheme – given in the next section – can, in practice, efficiently help us trough these difficulties in polynomial time.

## Coarse-graining of networks

In the process of coarse-graining we identify groups or clusters of nodes and try to find the best representation, while averaging out for the details inside the groups. Inside a group, the nodes are replaced by their normalized average, while keeping their degrees fixed. As the simplest example, the coarse-graining of two rows means, that instead of the original k and l rows, we use two new rows, being proportional to each other, while the b k * = a k * and b l * = a l * probabilities are kept fixed

In other words, we first simply sum up the corresponding rows and obtain a smaller matrix, then inflate this fused matrix back to the original size while keeping the statistical weights of the nodes (degrees) fixed. For an illustration of the smaller, fused data matrices see the lower panels of Fig. 3a–d . For a symmetric adjacency matrix, the coarse-graining step can be also carried out simultaneously and identically for the rows and columns, known as a bi-clustering. The optimal bi-clustering is illustrated in Fig. 2f for the Zachary karate club network. The heights in the shown dendrogram indicate the D values of the representations when the fusion step happens.

Illustration of our hierarchical visualization technique on the Zachary karate club network 52 .

In our hierarchical visualization technique the coarse-graining procedure guides the optimization for the layout in a top-down way. As the N number of nodes increases, the relative entropy of both the coarse-grained description (red, ○ ) and the layout (blue, ● ) decreases. The panels ( a – d ) show snapshots of the optimal layout and the corresponding coarse-grained input matrix at the level of N = 5, 15, 25 and 34 nodes, respectively. For simplicity, here the h i normalization of each distribution is kept fixed to be ∝ a i * during the process, leading finally to η = 4.4%.

As illustrated in Fig. 2f the coarse-graining process creates a hierarchical dendrogram in a bottom-up way, representing the structure of the network at all scales. Here we note, that a direct optimization is also possible for our quality function at a fixed number of groups, creating a clustering. As described in the Supplementary Information in details, our coarse-graining scheme comprises also the case of the overlapping clustering, since it is straightforward to assign a given node to multiple groups as well. As noted there, when considering non-overlapping partitionings with a given number of clusters, our method gives back the degree-corrected stochastic block-model of Karrer and Newman 25 due to the degree-preservation. Consequently, our coarse-graining approach can be viewed as an overlapping and hierarchical reformulation and generalization of this successful state-of-the-art technique.

## Hierarchical layout

Although the introduced coarse-graining scheme may be of significant interest whenever probabilistic matrices appear, here we focus on its application for network layout, to obtain a hierarchical visualization 58 , 59 , 60 , 61 , 62 , 63 . Our bottom-up coarse-graining results can be readily incorporated into the network layout scheme in a top-down way by initially starting with one node (comprising the whole system) and successively undoing the fusion steps until the original system is recovered. Between each such extension step the layout can be optimized as usual.

We have found, that this hierarchical layout scheme produces significantly improved layouts – in terms of the final D value – compared to a local optimization, such as a simple simulated annealing or Newton-Raphson iteration. By incorporating the coarse-graining in a top-down approach, we first arrange the position of the large-scale parts of the network and refine the picture in later steps only. The refinement steps happen, when the position and extension of the large-scale parts have already been sufficiently optimized. After such a refinement step, the nodes – moved together so far – are treated separately. At a given scale (having N ≤ N 0 nodes), the D value of the coarse-graining provides a lower bound for the D value of the obtainable layout. Our hierarchical visualization approach is illustrated in Fig. 3 . with snapshots of the layout and the coarse-grained representation matrices of the Zachary karate club network 52 at N = 5, 15, 25 and 34. As an illustration on a larger and more challenging network, in Fig. 4 . we show the result of the hierarchical visualization on the giant component of the weighted human diseasome network 64 . In this network we have N 0 = 516 nodes, representing diseases, connected by mutually associated genes. The colors indicate the known disease groups, which are found to be well colocalized in the visualization.

Visualization of the human diseasome.

The best obtained layout (η = 3.1%) by our hierarchical visualization technique of the human diseasome is shown by circles of radiuses σ i in ( a ) and by a traditional graph in ( b ). The nodes represent diseases, colored according to known disease categories 64 , while the σ i width of the distributions in ( a ) indicates the uncertainty of the positions. In the numerical optimization for this network we primarily focused on the positioning of the nodes, thus the optimization for the widths and normalizations was only turned on as a fine-tuning after an initial layout was obtained.

In this paper, we have introduced a unified, information theoretic solution for the long-standing problems of matrix ordering, network visualization and data coarse-graining. While establishing a connection between these separated fields for the first time, our unified framework also incorporates some of the known state-of-the art efficient techniques as special cases. In our framework, the steps of the applied algorithms were derived in an ab inito way from the same first principles, in strong contrast to the large variety of existing algorithms, lacking such an underlying theory, providing also a clear interpretation of the obtained results.

After establishing the general representation theory, in our paper we first demonstrated that the minimization of relative information yields a novel visualization technique, while representing the A input matrix by the B co-occurrence matrix of extended distributions, embedded in a d -dimensional space. As another application of the same approach, we obtained a hierarchical coarse-graining scheme, when the input matrix is represented by its subsequently coarse-grained versions. Since these applications are two sides of the same representation theory, they turned out to be superbly compatible, leading to an even more powerful hierarchical visualization technique, illustrated on the real-world example of the human diseasome network. Although we have focused on the visualization in d -dimensional flat, continuous space, the representation theory can be applied more generally, incorporating also the case of curved or discrete embedding spaces. As a possible future application, we mention the optimal embedding of a (sub)graph into another graph.

We have also shown that our relative entropy-based visualization with e.g. Gaussian node distributions can be naturally interpreted as a force-directed method. Traditional force directed methods prompted huge efforts on the computational side to achieve scalable algorithms applicable for the large data sets in real life. Here we can not and do not wish to compete with such advanced techniques, but we believe that our approach can be a good starting point for further scalable implementations. As a first step towards this goal, we have outlined the possible future directions of computational improvement. Moreover, in the Supplementary Information we illustrated the applicability of our approach on larger scale networks as well. We have also demonstrated, that network visualization is already interesting in one dimension yielding an ordering for the elements of the network. Our efficient coarse-graining scheme can also serve as an unbiased, resolution-limit-free, starting point for the infamously challenging problem of community detection by selecting the best cut of the dendrogram based on appropriately chosen criteria.

Our data representation framework has a broad applicability, starting form either the node-node or edge-edge adjacency matrices or the edge-node incidence matrix of weighted networks, incorporating also the cases of bipartite graphs and hypergraphs. We believe, that our unified representation theory is a powerful tool to gain a deeper understanding of the huge data matrices in science, beyond the limits of existing heuristic algorithms. Since in this paper our primary intention was merely to demonstrate a proof of concept study of our theoretical framework, more detailed analyses of interesting complex networks will be the subject of forthcoming articles.

The parametrization of the Gaussian distributions used in the visualization is the following in d -dimensions

For details of the numerical optimization for visualization and coarse-graining see the Supplementary Information . The codes written in C++ using OpenGL are freely available - as command-line programs - upon request.

## Additional Information

How to cite this article : Kovács, I. A. et al. A unified data representation theory for network visualization, ordering and coarse-graining. Sci. Rep. 5 , 13786; doi: 10.1038/srep13786 (2015).

Newman, M. E. J. Networks: An Introduction. (Oxford Univ. Press, 2010).

Albert, R. & Barabási, A.-L. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 47–97 (2002).

Article ADS MathSciNet Google Scholar

Newman, M. E. J. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002).

Article CAS ADS Google Scholar

Reshef, D. N. et al. Detecting novel associations in large data sets. Science 334, 1518–1524 (2011).

Girvan, M. & Newman, M. E. J. Community structure in social and biological networks. Proc. Natl Acad. Sci. USA 99, 7821–7826 (2002).

Article CAS ADS MathSciNet Google Scholar

Newman, M. E. J. Communities, modules and large-scale structure in networks. Nature Physics 8, 25–31 (2012).

Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75–174 (2010).

Kovács, I. A., Palotai, R., Szalay, M. S. & Csermely, P. Community landscapes: an integrative approach to determine overlapping network module hierarchy, identify key nodes and predict network dynamics. PLoS ONE 5, e12528 (2010).

Article ADS Google Scholar

Olhede, S. C. & Wolfe, P. J. Network histograms and universality of blockmodel approximation. Proc. Natl Acad. Sci. USA 111, 14722–14727 (2014).

Bickel P. J., Chen A. A nonparametric view of network models and Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 (50), 21068–21073 (2009).

Bickel P. J. & Sarkar P. Hypothesis testing for automated community detection in networks. arXiv: 1311.2694. (2013) (Date of access: 15/02/2015).

King, I. P. An automatic reordering scheme for simultaneous equations derived from network analysis. Int. J. Numer. Methods 2, 523–533 (1970).

Article Google Scholar

George A. & Liu, J. W.-H. Computer solution of large sparse positive definite systems. (Prentice-Hall Inc, 1981).

West, D. B. Introduction to graph theory 2nd edn. (Prentice-Hall Inc, 2001).

Song, C., Havlin, S. & Makse, H. A. Self-similarity of complex networks. Nature 433, 392–395 (2005).

Gfeller, D. & De Los Rios, P. Spectral coarse graining of complex networks. Phys. Rev. Lett. 99, 038701 (2007).

Sales-Pardo, M., Guimera, R., Moreira, A. A. & Amaral L. A. N. Extracting the hierarchical organization of complex systems. Proc. Natl. Acad. Sci. USA 104, 15224–15229 (2007).

Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabási, A.-L. Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002).

Radicchi, F., Ramasco, J. J., Barrat, A. & Fortunato, S. Complex networks renormalization: flows and fixed points. Phys. Rev. Lett. 101, 148701 (2008).

Rozenfeld, H. D., Song, C. & Makse, H. A. Small-world to fractal transition in complex networks: a renormalization group approach. Phys. Rev. Lett. 104, 025701 (2010).

Walshaw, C. A multilevel approach to the travelling salesman problem. Oper. Res. 50, 862–877 (2002).

Article MathSciNet Google Scholar

Walshaw, C. Multilevel refinement for combinatorial optimisation problems. Annals of Operations Research 131, 325–372 (2004).

Ahn, Y.-Y., Bagrow J. P. & Lehmann S. Link communities reveal multiscale complexity in networks Nature 1038, 1–5 (2010).

Google Scholar

Lancichinetti, A., Fortunate, S. & Radicchi, F. Benchmark graphs for testing community detection algorithms, Phys. Rev. E 78, 046110 (2008).

Karrer, B. & Newman, M. E. J. Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011).

Larremore, D. B., Clauset, A. & Jacobs, A. Z. Efficiently inferring community structure in bipartite networks. Phys. Rev. E 90, 012805 (2014).

Di Battista, G., Eades, P., Tamassia, R. & Tollis, I. G. Graph Drawing: Algorithms for the Visualization of Graphs. (Prentice-Hall Inc, 1998).

Kobourov, S. G. Spring embedders and force-directed graph drawing algorithms. arXiv: 1201.3011 (2012) (Date of access: 15/02/2015).

Graph Drawing, Symposium on Graph Drawing GD'96 (ed North, S. ), (Springer-Verlag, Berlin, 1997).

Garey, M. R. & Johnson, D. S. Computers and Intractability: A Guide to the Theory of NP-Completeness. (W.H. Freeman and Co., 1979).

Kinney, J. B. & Atwal, G. S. Equitability, mutual information and the maximal information coefficient. Proc. Natl. Acad. Sci. USA 111, 3354–3359 (2014).

Lee, D. D. & Seung, H. S. Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999).

Slonim, N., Atwal, G. S., Tkačik, G. & Bialek, W. Information-based clustering. Proc. Natl. Acad. Sci. USA 102, 18297–18302 (2005).

Rosvall, M. & Bergstrom, C. T. An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. USA 104, 7327–7331 (2007).

Rosvall, M., Axelsson, D. & Bergstrom, C. T. The map equation. Eur. Phys. J. Special Topics 178, 13–23 (2009).

Zanin, M., Sousa, P. A. & Menasalvas, E. Information content: assessing meso-scale structures in complex networks. Europhys. Lett. 106, 30001 (2014).

Allen, B., Stacey, B. C. & Bar-Yam, Y. An information-theoretic formalism for multiscale structure in complex systems. arXiv: 1409.4708 (2014) (Date of access: 15/02/2015).

Lovász, L. Large networks and graph limits, volume 60 of American Mathematical Society Colloquium Publications. American Mathematical Society, Providence, RI, (2012).

Kullback, S. & Leibler, R. A. On information and sufficiency. Annals of Mathematical Statistics 22, 79–86 (1951).

Hinton, G. & Roweis, S. Stochastic Neighbor Embedding, in Advances in Neural Information Processing Systems, Vol. 15, 833-840 (The MIT Press, Cambridge, 2002).

van der Maaten, L. & Hinton, G. Visualizing Data using t-SNE, Journal of Machine Learning Research 9, 2579–2605 (2008).

MATH Google Scholar

Yamada, T., Saito, K. & Ueda, N. Cross-entropy directed embedding of network data, Proceedings of the 20th International Conference on Machine Learning (ICML2003) , 832–839 (2003).

Grünwald, P. D. The Minimum Description Length Principle, (MIT Press, 2007).

Cover, Th. M. & Thomas, J. A. Elements of Information Theory 1st edn, Lemma 12.6.1, 300–301 (John Wiley & Sons, 1991).

Barnes, J. & Hut, P. A hierarchical O(NlogN) force-calculation algorithm. Nature 324, 446–449 (1986).

Gansner, E. R., Koren, Y. & North, S. in Graph drawing by stress majorization, Vol. 3383 (ed Pach, J. ), 239–250 (Springer-Verlag, 2004).

Fruchterman, T. M. & Reingold, E. M. Graph Drawing by Force-Directed Placement, Software: Practice & Experience 21, 1129–1164 (1991).

Kamada, T. & Kawai, S. An algorithm for drawing general undirected graphs. Information Processing Letters (Elsevier) 31, 7–15 (1989).

Estévez, P. A., Figueroa, C. J. & Saito, K. Cross-entropy embedding of high-dimensional data using the neural gas model. Neural Networks 18, 727–737 (2005).

van der Maaten, L. J. P. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15, 3221–3245 (2014).

MathSciNet MATH Google Scholar

Hopcroft, J. & Tarjan, R. E. Efficient planarity testing. Journal of the Association for Computing Machinery 21, 549–568 (1974).

Zachary, W. W. An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 452–473 (1977).

Leskovec, J., Kleinberg, J. & Faloutsos, C. Graph Evolution: Densification and Shrinking Diameters. ACM Transactions on Knowledge Discovery from Data (ACM TKDD), 1(1), (2007). Data is available at: http://snap.stanford.edu/data/ca-HepPh.html .

Boguña, M., Pastor-Satorras, R., Diaz-Guilera, A. & Arenas, A. Models of social networks based on social distance attachment. Phys. Rev. E 70, 056122 (2004). Data is available at: http://deim.urv.cat/alexandre.arenas/data/welcome.htm .

Kullback, S. Information Theory and Statistics, (John Wiley: New York, NY, USA, 1959).

Kapur, J. N. & Kesavan, H. K. The inverse MaxEnt and MinxEnt principles and their applications, in Maximum Entropy and Bayesian Methods, Fundamental Theories in Physics, Springer: Netherlands,, 39, 433–450 (1990).

Rubinstein, R. Y. The cross-entropy method for combinatorial and continuous optimization. Method. Comput. Appl. Probab. 1, 127–190 (1999).

Gajer, P., Goodrich, M. T. & Kobourov, S. G. A multi-dimensional approach to force-directed layouts of large graphs, Computational Geometry: Theory and Applications 29, 3–18 (2004).

Harel, D. & Koren, Y. A fast multi-scale method for drawing large graphs. J. Graph Algorithms and Applications 6, 179–202 (2002).

Walshaw, C. A multilevel algorithm for force-directed graph drawing. J. Graph Algorithms Appl. 7, 253–285 (2003).

Hu, Y. F. Efficient and high quality force-directed graph drawing. The Mathematica Journal 10, 37–71 (2006).

Szalay-Bekö, M., Palotai, R., Szappanos, B., Kovács, I. A., Papp, B. & Csermely P., ModuLand plug-in for Cytoscape: determination of hierarchical layers of overlapping network modules and community centrality. Bioinformatics 28, 2202–2204 (2012).

Six, J. M. & Tollis, I. G. in Software Visualization, Vol. 734, (ed Zhang, K. ) Ch. 14, 413–437 (Springer: US,, 2003).

Goh, K.-I. et al. The human disease network. Proc. Natl. Acad. Sci. USA 104, 8685–8690 (2007).

Download references

## Acknowledgements

We are grateful to the members of the LINK-group ( www.linkgroup.hu ) and E. Güney for useful discussions. This work was supported by the Hungarian National Research Fund under grant Nos. OTKA K109577, K115378 and K83314. The research of IAK was supported by the European Union and the State of Hungary, co-financed by the European Social Fund in the framework of TÁMOP 4.2.4. A/2-11-1-2012-0001 'National Excellence Program'.

## Author information

Authors and affiliations.

Wigner Research Centre, Institute for Solid State Physics and Optics, P.O.Box 49, Budapest, H-1525, Hungary

István A. Kovács

Institute of Theoretical Physics, Szeged University, Szeged, H-6720, Hungary

Center for Complex Networks Research and Department of Physics, Northeastern University, 177 Huntington Avenue, Boston, 02115, MA, USA

Institute of Organic Chemistry, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Pusztaszeri út 59-67, Budapest, H-1025, Hungary

Réka Mizsei

Department of Medical Chemistry, Semmelweis University, P.O.Box 266, Budapest, H-1444, Hungary

Péter Csermely

You can also search for this author in PubMed Google Scholar

## Contributions

I.A.K. and R.M. conceived the research and ran the numerical simulations. I.A.K. devised and implemented the applied algorithms. I.A.K. and P.Cs. wrote the main manuscript text. All authors reviewed the manuscript.

## Ethics declarations

Competing interests.

The authors declare no competing financial interests.

## Electronic supplementary material

Supplementary information, rights and permissions.

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

## About this article

Cite this article.

Kovács, I., Mizsei, R. & Csermely, P. A unified data representation theory for network visualization, ordering and coarse-graining. Sci Rep 5 , 13786 (2015). https://doi.org/10.1038/srep13786

Download citation

Received : 27 February 2015

Accepted : 05 August 2015

Published : 08 September 2015

DOI : https://doi.org/10.1038/srep13786

## Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

## This article is cited by

Gaining confidence in inferred networks.

- Léo P. M. Diaz
- Michael P. H. Stumpf

Scientific Reports (2022)

## MicroRNA interactome analysis predicts post-transcriptional regulation of ADRB2 and PPP3R1 in the hypercholesterolemic myocardium

- Tamás Baranyai
- Péter Ferdinandy

Scientific Reports (2018)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

## Quick links

- Explore articles by subject
- Guide to authors
- Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

- Data Center
- Applications
- Open Source

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More .

Network data models are a way of representing data that can be used for organizing and managing complex relationships between various entities. In some cases, these relationships are as or more important than the data itself—traditional data models fall short when these more expressive use cases arise, but network data models can provide better handling of varying data relationship types. Understanding the principles and mechanics of the network data model as well as its design parameters, benefits, and limitations can give users a powerful tool in their data toolbox.

Table of Contents

## How Does a Network Data Model Work?

A network data model is a representation of data that emphasizes the connections and interactions among different entities, providing a dynamic framework for understanding the intricate web of relationships within a system. At its core, a network data model organizes data in a way that reflects the inherent relationships between entities. Unlike the more traditional hierarchical and relational data models , where data is structured in tables or trees, the network model allows for a more flexible and interconnected representation.

## Nodes and Edges

The fundamental components of a network data model are nodes and edges. Nodes represent entities, while edges define the relationships or connections between these entities. Each node can have attributes that provide additional information about the entity it represents, and each edge can have properties describing the nature of the relationship.

For example, a social network data model represents individuals as nodes, and connections between them (e.g., friendship, family, professional relationships) are depicted as edges. Each individual node might have attributes like “Name,” “Age,” and “Location,” providing additional details about the person.

## Graph Theory Principles

The network data model draws heavily from graph theory, a mathematical discipline that studies the relationships between nodes and edges. Graph theory principles, such as nodes, edges, vertices, and connectivity, form the foundation for constructing and querying data in a network model.

## Query Languages

Specialized query languages are often required when interacting with databases that use the network data modeling paradigm. These languages allow users to traverse the graph, retrieve specific information, and perform complex queries that capture the relationships between entities. Common graph query languages include Cypher, Gremlin, and SPARQL, to name a few.

Read Data Modeling vs. Data Architecture to learn more about the distinctions between these two disciplines and understand where they overlap and how each benefits enterprises that use them.

## Benefits of a Network Data Model

Network data modeling was created to address the increasing complexities of modern data relationships, providing a more intuitive representation of interconnected information. To this end, organizations that adopt network data modeling stand to realize several benefits.

## Flexibility in Data Relationships

A key benefit of the network data model is its flexibility in representing complex relationships. Unlike traditional models that may struggle with many-to-many relationships, the network model excels at capturing intricate connections between entities. This flexibility is particularly valuable in scenarios where relationships are dynamic and evolve over time.

## Intuitive Representation

The visual representation of a network data model is often more intuitive for users to understand. With nodes representing entities and edges denoting relationships, the model mirrors real-world connections. This makes it easier for stakeholders, including business analysts and developers, to grasp the structure of the data and how different elements relate to each other.

## Efficient Query Performance

The network data model provides more efficient query performance in scenarios where relationships play a crucial role. Traversing relationships in a graph is a natural operation and (in most cases) a trivial affair; the same cannot be said about other types of data model relationships. Graph databases optimized for network models can deliver fast query responses for complex relationship-based queries.

## Better Support for Evolving Data Structures

As data requirements evolve, the network data model provides better support for changing structures. Adding or modifying relationships is often more straightforward in a network model than in traditional models, allowing organizations to adapt to new business requirements more seamlessly.

## Challenges of Network Data Models

While the network data model offers substantial benefits, it also presents several key challenges that organizations should address prior to adoption:

## Complexity in Modeling

Constructing a network data model can be complex, especially when dealing with a large number of entities and intricate relationships. Effective network data model design requires careful consideration of the relationships between nodes, potential redundancy, and the overall structure of the graph.

## Performance Scaling

While network data models excel in handling relationships, their performance can degrade when dealing with large datasets or complex graphs. Scaling a network model to accommodate growing amounts of data requires thoughtful database design and optimization strategies to maintain query performance.

## Lack of Standardization

Unlike relational databases with their standardized, structured query language (SQL), network databases lack a universally accepted query language. This lack of standardization can pose challenges when working with different graph database systems, as users may need to adapt to the specific query language supported by the chosen database.

## Examples of Network Data Models

The following examples are provided to help illustrate how network data models are used in different use cases to showcase how a network data model can effectively capture and represent relationships in diverse domains, from social interactions to complex IT ecosystems.

- Social Network— In social networks, a network data model could represent users as nodes and friendships as edges. Each user node might have attributes like “Username,” “Date of Birth,” and “Location.” The edges would represent the connections between users, indicating the nature of the relationship (e.g., “Friend,” “Follower”).
- IT Infrastructure— In the context of IT infrastructure, a network data model could represent devices (e.g., servers, routers, switches) as nodes and connections between devices as edges. Each device node might have attributes like “IP Address,” “Manufacturer,” and “Operating System.” The edges would represent the physical or logical connections between devices, such as “Connected to” or “Communicates with.”

## Bottom Line: Organizing Data With the Network Data Model

The network data model emerged alongside the development of database management systems (DBMS) as an effective way to organize and represent complex data relationships . An improvement over the hierarchical model, it allowed for more flexible relationships between data entities, using a graph-like structure where records could have multiple parent and child nodes. First implemented in pioneering database systems like the Integrated Data Store (IDS) , this model facilitated more intricate and interconnected data relationships, offering increased versatility in representing real-world scenarios.

Network data models remain invaluable for organizations grappling with the intricacies of their data relationships and the shortcomings of traditional data models. By emphasizing nodes and edges and drawing inspiration from graph theory, the network data model provides a dynamic and flexible framework for representing interconnected information.

Read Top 7 Data Modeling Tools to learn about the software enterprise teams use to organize and structure their data.

## Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

## Similar articles

8 best data analytics tools: gain data-driven advantage in 2024, common data visualization examples: transform numbers into narratives, what is data management a guide to systems, processes, and tools, get the free newsletter.

Subscribe to Data Insider for top news, trends & analysis

## Latest Articles

Exploring multi-tenant architecture: a..., 8 best data analytics..., common data visualization examples:..., what is data management....

Figure 179. Presentation formatting involves encoding and decoding application data.

You might ask what makes this problem challenging. One reason is that computers represent data in different ways. For example, some computers represent floating-point numbers in IEEE standard 754 format, while some older machines still use their own nonstandard format. Even for something as simple as integers, different architectures use different sizes (e.g., 16-bit, 32-bit, 64-bit). To make matters worse, on some machines integers are represented in big-endian form (the most significant bit of a word—the “big end”—is in the byte with the lowest address), while on other machines integers are represented in little-endian form (the least significant bit—the “little end”—is in the byte with the lowest address). For example, PowerPC processors are big-endian machines, and the Intel x86 family is a little-endian architecture. Today, many architectures (e.g., ARM) support both representations (and so are called bi-endian ), but the point is that you can never be sure how the host you are communicating with stores integers. The big-endian and little-endian representations of the integer 34,677,374 are given in Figure 180 .

Figure 180. Big-endian and little-endian byte order for the integer 34,677,374

Another reason that marshalling is difficult is that application programs are written in different languages, and even when you are using a single language there may be more than one compiler. For example, compilers have a fair amount of latitude in how they lay out structures (records) in memory, such as how much padding they put between the fields that make up the structure. Thus, you could not simply transmit a structure from one machine to another, even if both machines were of the same architecture and the program was written in the same language, because the compiler on the destination machine might align the fields in the structure differently.

## 7.1.1 Taxonomy

Although argument marshalling is not rocket science—it is a small matter of bit twiddling—there are a surprising number of design choices that you must address. We begin by giving a simple taxonomy for argument marshalling systems. The following is by no means the only viable taxonomy, but it is sufficient to cover most of the interesting alternatives.

## Data Types

The first question is what data types the system is going to support. In general, we can classify the types supported by an argument marshalling mechanism at three levels. Each level complicates the task faced by the marshalling system.

At the lowest level, a marshalling system operates on some set of base types . Typically, the base types include integers, floating-point numbers, and characters. The system might also support ordinal types and Booleans. As described above, the implication of the set of base types is that the encoding process must be able to convert each base type from one representation to another—for example, convert an integer from big-endian to little-endian.

At the next level are flat types —structures and arrays. While flat types might at first not appear to complicate argument marshalling, the reality is that they do. The problem is that the compilers used to compile application programs sometimes insert padding between the fields that make up the structure so as to align these fields on word boundaries. The marshalling system typically packs structures so that they contain no padding.

At the highest level, the marshalling system might have to deal with complex types —those types that are built using pointers. That is, the data structure that one program wants to send to another might not be contained in a single structure, but might instead involve pointers from one structure to another. A tree is a good example of a complex type that involves pointers. Clearly, the data encoder must prepare the data structure for transmission over the network because pointers are implemented by memory addresses, and just because a structure lives at a certain memory address on one machine does not mean it will live at the same address on another machine. In other words, the marshalling system must serialize (flatten) complex data structures.

In summary, depending on how complicated the type system is, the task of argument marshalling usually involves converting the base types, packing the structures, and linearizing the complex data structures, all to form a contiguous message that can be transmitted over the network. Figure 181 illustrates this task.

Figure 181. Argument marshalling: converting, packing, and linearizing

## Conversion Strategy

Once the type system is established, the next issue is what conversion strategy the argument marshaller will use. There are two general options: canonical intermediate form and receiver-makes-right . We consider each, in turn.

The idea of canonical intermediate form is to settle on an external representation for each type; the sending host translates from its internal representation to this external representation before sending data, and the receiver translates from this external representation into its local representation when receiving data. To illustrate the idea, consider integer data; other types are treated in a similar manner. You might declare that the big-endian format will be used as the external representation for integers. The sending host must translate each integer it sends into big-endian form, and the receiving host must translate big-endian integers into whatever representation it uses. (This is what is done in the Internet for protocol headers.) Of course, a given host might already use big-endian form, in which case no conversion is necessary.

The alternative, receiver-makes-right, has the sender transmit data in its own internal format; the sender does not convert the base types, but usually has to pack and flatten more complex data structures. The receiver is then responsible for translating the data from the sender’s format into its own local format. The problem with this strategy is that every host must be prepared to convert data from all other machine architectures. In networking, this is known as an N-by-N solution : Each of N machine architectures must be able to handle all N architectures. In contrast, in a system that uses a canonical intermediate form, each host needs to know only how to convert between its own representation and a single other representation—the external one.

Using a common external format is clearly the correct thing to do, right? This has certainly been the conventional wisdom in the networking community for over 30 years. The answer is not cut and dried, however. It turns out that there are not that many different representations for the various base classes, or, said another way, N is not that large. In addition, the most common case is for two machines of the same type to be communicating with each other. In this situation, it seems silly to translate data from that architecture’s representation into some foreign external representation, only to have to translate the data back into the same architecture’s representation on the receiver.

A third option, although we know of no existing system that exploits it, is to use receiver-makes-right if the sender knows that the destination has the same architecture; the sender would use some canonical intermediate form if the two machines use different architectures. How would a sender learn the receiver’s architecture? It could learn this information either from a name server or by first using a simple test case to see if the appropriate result occurs.

The third issue in argument marshalling is how the receiver knows what kind of data is contained in the message it receives. There are two common approaches: tagged and untagged data. The tagged approach is more intuitive, so we describe it first.

A tag is any additional information included in a message—beyond the concrete representation of the base types—that helps the receiver decode the message. There are several possible tags that might be included in a message. For example, each data item might be augmented with a type tag. A type tag indicates that the value that follows is an integer, a floating-point number, or whatever. Another example is a length tag. Such a tag is used to indicate the number of elements in an array or the size of an integer. A third example is an architecture tag, which might be used in conjunction with the receiver-makes-right strategy to specify the architecture on which the data contained in the message was generated. Figure 182 depicts how a simple 32-bit integer might be encoded in a tagged message.

Figure 182. A 32-bit integer encoded in a tagged message.

The alternative, of course, is not to use tags. How does the receiver know how to decode the data in this case? It knows because it was programmed to know. In other words, if you call a remote procedure that takes two integers and a floating-point number as arguments, then there is no reason for the remote procedure to inspect tags to know what it has just received. It simply assumes that the message contains two integers and a float and decodes it accordingly. Note that, while this works for most cases, the one place it breaks down is when sending variable-length arrays. In such a case, a length tag is commonly used to indicate how long the array is.

It is also worth noting that the untagged approach means that the presentation formatting is truly end to end. It is not possible for some intermediate agent to interpret the message unless the data is tagged. Why would an intermediate agent need to interpret a message, you might ask? Stranger things have happened, mostly resulting from ad hoc solutions to unexpected problems that the system was not engineered to handle. Poor network design is beyond the scope of this book.

A stub is the piece of code that implements argument marshalling. Stubs are typically used to support RPC. On the client side, the stub marshals the procedure arguments into a message that can be transmitted by means of the RPC protocol. On the server side, the stub converts the message back into a set of variables that can be used as arguments to call the remote procedure. Stubs can either be interpreted or compiled.

In a compilation-based approach, each procedure has a customized client and server stub. While it is possible to write stubs by hand, they are typically generated by a stub compiler, based on a description of the procedure’s interface. This situation is illustrated in Figure 183 . Since the stub is compiled, it is usually very efficient. In an interpretation-based approach, the system provides generic client and server stubs that have their parameters set by a description of the procedure’s interface. Because it is easy to change this description, interpreted stubs have the advantage of being flexible. Compiled stubs are more common in practice.

Figure 183. Stub compiler takes interface description as input and outputs client and server stubs.

## 7.1.2 Examples (XDR, ASN.1, NDR, ProtoBufs)

We now briefly describe four popular network data representations in terms of this taxonomy. We use the integer base type to illustrate how each system works.

External Data Representation (XDR) is the network format used with SunRPC. In the taxonomy just introduced, XDR

Supports the entire C-type system with the exception of function pointers

Defines a canonical intermediate form

Does not use tags (except to indicate array lengths)

Uses compiled stubs

An XDR integer is a 32-bit data item that encodes a C integer. It is represented in twos’ complement notation, with the most significant byte of the C integer in the first byte of the XDR integer and the least significant byte of the C integer in the fourth byte of the XDR integer. That is, XDR uses big-endian format for integers. XDR supports both signed and unsigned integers, just as C does.

XDR represents variable-length arrays by first specifying an unsigned integer (4 bytes) that gives the number of elements in the array, followed by that many elements of the appropriate type. XDR encodes the components of a structure in the order of their declaration in the structure. For both arrays and structures, the size of each element/component is represented in a multiple of 4 bytes. Smaller data types are padded out to 4 bytes with 0s. The exception to this “pad to 4 bytes” rule is made for characters, which are encoded one per byte.

Figure 184. Example encoding of a structure in XDR.

The following code fragment gives an example C structure ( item ) and the XDR routine that encodes/decodes this structure ( xdr_item ). Figure 184 schematically depicts XDR’s on-the-wire representation of this structure when the field name is seven characters long and the array list has three values in it.

In this example, xdr_array , xdr_int , and xdr_string are three primitive functions provided by XDR to encode and decode arrays, integers, and character strings, respectively. Argument xdrs is a context variable that XDR uses to keep track of where it is in the message being processed; it includes a flag that indicates whether this routine is being used to encode or decode the message. In other words, routines like xdr_item are used on both the client and the server. Note that the application programmer can either write the routine xdr_item by hand or use a stub compiler called rpcgen (not shown) to generate this encoding/decoding routine. In the latter case, rpcgen takes the remote procedure that defines the data structure item as input and outputs the corresponding stub.

Exactly how XDR performs depends, of course, on the complexity of the data. In a simple case of an array of integers, where each integer has to be converted from one byte order to another, an average of three instructions are required for each byte, meaning that converting the whole array is likely to be limited by the memory bandwidth of the machine. More complex conversions that require significantly more instructions per byte will be CPU limited and thus perform at a data rate less than the memory bandwidth.

Abstract Syntax Notation One (ASN.1) is an ISO standard that defines, among other things, a representation for data sent over a network. The representation-specific part of ASN.1 is called the Basic Encoding Rules (BER). ASN.1 supports the C-type system without function pointers, defines a canonical intermediate form, and uses type tags. Its stubs can be either interpreted or compiled. One of the claims to fame of ASN.1 BER is that it is used by the Internet standard Simple Network Management Protocol (SNMP).

ASN.1 represents each data item with a triple of the form

The tag is typically an 8-bit field, although ASN.1 allows for the definition of multibyte tags. The length field specifies how many bytes make up the value ; we discuss length more below. Compound data types, such as structures, can be constructed by nesting primitive types, as illustrated in Figure 185 .

Figure 185. Compound types created by means of nesting in ASN.1 BER.

Figure 186. ASN.1 BER representation for a 4-byte integer.

If the value is 127 or fewer bytes long, then the length is specified in a single byte. Thus, for example, a 32-bit integer is encoded as a 1-byte type , a 1-byte length , and the 4 bytes that encode the integer, as illustrated in Figure 186 . The value itself, in the case of an integer, is represented in twos’ complement notation and big-endian form, just as in XDR. Keep in mind that, even though the value of the integer is represented in exactly the same way in both XDR and ASN.1, the XDR representation has neither the type nor the length tags associated with that integer. These two tags both take up space in the message and, more importantly, require processing during marshalling and unmarshalling. This is one reason why ASN.1 is not as efficient as XDR. Another is that the very fact that each data value is preceded by a length field means that the data value is unlikely to fall on a natural byte boundary (e.g., an integer beginning on a word boundary). This complicates the encoding/decoding process.

If the value is 128 or more bytes long, then multiple bytes are used to specify its length . At this point you may be asking why a byte can specify a length of up to 127 bytes rather than 256. The reason is that 1 bit of the length field is used to denote how long the length field is. A 0 in the eighth bit indicates a 1-byte length field. To specify a longer length , the eighth bit is set to 1, and the other 7 bits indicate how many additional bytes make up the length . Figure 187 illustrates a simple 1-byte length and a multibyte length .

Figure 187. ASN.1 BER representation for length: (a) 1 byte; (b) multibyte.

Network Data Representation (NDR) is the data-encoding standard used in the Distributed Computing Environment (DCE). Unlike XDR and ASN.1, NDR uses receiver-makes-right. It does this by inserting an architecture tag at the front of each message; individual data items are untagged. NDR uses a compiler to generate stubs. This compiler takes a description of a program written in the Interface Definition Language (IDL) and generates the necessary stubs. IDL looks pretty much like C, and so essentially supports the C-type system.

Figure 188. NDR’s architecture tag.

Figure 188 illustrates the 4-byte architecture definition tag that is included at the front of each NDR-encoded message. The first byte contains two 4-bit fields. The first field, IntegrRep , defines the format for all integers contained in the message. A 0 in this field indicates big-endian integers, and a 1 indicates little-endian integers. The CharRep field indicates what character format is used: 0 means ASCII (American Standard Code for Information Interchange) and 1 means EBCDIC (an older, IBM-defined alternative to ASCII). Next, the FloatRep byte defines which floating-point representation is being used: 0 means IEEE 754, 1 means VAX, 2 means Cray, and 3 means IBM. The final 2 bytes are reserved for future use. Note that, in simple cases such as arrays of integers, NDR does the same amount of work as XDR, and so it is able to achieve the same performance.

## ProtoBufs

Protocol Buffers (Protobufs, for short) provide a language-neutral and platform-neutral way of serializing structured data, commonly used with gRPC. They use a tagged strategy with a canonical intermediate form, where the stub on both sides is generated from a shared .proto file. This specification uses a simple C-like syntax, as the following example illustrates:

where message could roughly be interpreted as equivalent to typedef struct in C. The rest of the example is fairly intuitive, except that every field is given a numeric identifier to ensure uniqueness should the specification change over time, and each field can be annotated as being either required or optional .

The way Protobufs encode integers is novel. They use a technique called varints (variable length integers) in which each 8-bit byte uses the most significant bit to indicate whether there are more bytes in the integer, and the lower seven bits to encode the two’s complement representation of the next group of seven bits in the value. The least significant group is first in the serialization.

This means a small integer (less than 128) can be encoded in a single byte (e.g., the integer 2 is encoded as 0000 0010 ), while for an integer bigger than 128, more bytes are needed. For example, 365 would be encoded as

To see this, first drop the most significant bit from each byte, as it is there to tell us whether we’ve reached the end of the integer. In this example, the 1 in the most significant bit of the first byte indicates there is more than one byte in the varint:

Since varints store numbers with the least significant group first, you next reverse the two groups of seven bits. Then you concatenate them to get your final value:

For the larger message specification, you can think of the serialized byte stream as a collection of key/value pairs, where the key (i.e., tag) has two sub-parts: the unique identifier for the field (i.e., those extra numbers in the example .proto file) and the wire type of the value (e.g., Varint is the one example wire type we have seen so far). Other supported wire types include 32-bit and 64-bit (for fixed-length integers), and length-delimited (for strings and embedded messages). The latter tells you how many bytes long the embedded message (structure) is, but it’s another message specification in the .proto file that tells you how to interpret those bytes.

## 7.1.3 Markup Languages (XML)

Although we have been discussing the presentation formatting problem from the perspective of RPC—that is, how does one encode primitive data types and compound data structures so they can be sent from a client program to a server program—the same basic problem occurs in other settings. For example, how does a web server describe a Web page so that any number of different browsers know what to display on the screen? In this specific case, the answer is the HyperText Markup Language (HTML), which indicates that certain character strings should be displayed in bold or italics, what font type and size should be used, and where images should be positioned.

The availability of all sorts of Web applications and data have also created a situation in which different Web applications need to communicate with each other and understand each other’s data. For example, an e-commerce website might need to talk to a shipping company’s website to allow a customer to track a package without ever leaving the e-commerce website. This quickly starts to look a lot like RPC, and the approach taken in the Web today to enable such communication among web servers is based on the Extensible Markup Language (XML)—a way to describe the data being exchanged between Web apps.

Markup languages, of which HTML and XML are both examples, take the tagged data approach to the extreme. Data is represented as text, and text tags known as markup are intermingled with the data text to express information about the data. In the case of HTML, markup indicates how the text should be displayed; other markup languages like XML can express the type and structure of the data.

XML is actually a framework for defining different markup languages for different kinds of data. For example, XML has been used to define a markup language that is roughly equivalent to HTML called Extensible HyperText Markup Language (XHTML). XML defines a basic syntax for mixing markup with data text, but the designer of a specific markup language has to name and define its markup. It is common practice to refer to individual XML-based languages simply as XML, but we will emphasize the distinction in this introductory material.

XML syntax looks much like HTML. For example, an employee record in a hypothetical XML-based language might look like the following XML document , which might be stored in a file named employee.xml . The first line indicates the version of XML being used, and the remaining lines represent four fields that make up the employee record, the last of which ( hiredate ) contains three subfields. In other words, XML syntax provides for a nested structure of tag/value pairs, which is equivalent to a tree structure for the represented data (with employee as the root). This is similar to XDR, ASN.1, and NDR’s ability to represent compound types, but in a format that can be both processed by programs and read by humans. More importantly, programs such as parsers can be used across different XML-based languages, because the definitions of those languages are themselves expressed as machine-readable data that can be input to the programs.

Although the markup and the data in this document are highly suggestive to the human reader, it is the definition of the employee record language that actually determines what tags are legal, what they mean, and what data types they imply. Without some formal definition of the tags, a human reader (or a computer) can’t tell whether 1986 in the year field, for example, is a string, an integer, an unsigned integer, or a floating point number.

The definition of a specific XML-based language is given by a schema , which is a database term for a specification of how to interpret a collection of data. Several schema languages have been defined for XML; we will focus here on the leading standard, known by the none-too-surprising name XML Schema . An individual schema defined using XML Schema is known as an XML Schema Document (XSD). The following is an XSD specification for the example; in other words, it defines the language to which the example document conforms. It might be stored in a file named employee.xsd .

This XSD looks superficially similar to our example document employee.xml , and for good reason: XML Schema is itself an XML-based language. There is an obvious relationship between this XSD and the document defined above. For example,

indicates that the value bracketed by the markup title is to be interpreted as a string. The sequence and nesting of that line in the XSD indicate that a title field must be the second item in an employee record.

Unlike some schema languages, XML Schema provides datatypes such as string, integer, decimal, and Boolean. It allows the datatypes to be combined in sequences or nested, as in employee.xsd , to create compound data types. So an XSD defines more than a syntax; it defines its own abstract data model. A document that conforms to the XSD represents a collection of data that conforms to the data model.

The significance of an XSD defining an abstract data model and not just a syntax is that there can be other ways besides XML of representing data that conforms to the model. And XML does, after all, have some shortcomings as an on-the-wire representation: it is not as compact as other data representations, and it is relatively slow to parse. A number of alternative representations described as binary are in use. The International Standards Organization (ISO) has published one called Fast Infoset , while the World Wide Web Consortium (W3C) has produced the Efficient XML Interchange (EXI) proposal. Binary representations sacrifice human readability for greater compactness and faster parsing.

## XML Namespaces

XML has to solve a common problem, that of name clashes. The problem arises because schema languages such as XML Schema support modularity in the sense that a schema can be reused as part of another schema. Suppose two XSDs are defined independently, and both happen to define the markup name idNumber . Perhaps one XSD uses that name to identify employees of a company, and the other XSD uses it to identify laptop computers owned by the company. We might like to reuse those two XSDs in a third XSD for describing which assets are associated with which employees, but to do that we need some mechanism for distinguishing employees’ idNumbers from laptop idNumbers.

XML’s solution to this problem is XML namespaces . A namespace is a collection of names. Each XML namespace is identified by a Uniform Resource Identifier (URI). URIs will be described in some detail in a later chapter; for now, all you really need to know is that URIs are a form of globally unique identifier. (An HTTP URL is a particular type of UNI.) A simple markup name like idNumber can be added to a namespace as long as it is unique within that namespace. Since the namespace is globally unique and the simple name is unique within the namespace, the combination of the two is a globally unique qualified name that cannot clash.

An XSD usually specifies a target namespace with a line like the following:

is a Uniform Resource Identifier, identifying a made-up namespace. All the new markup defined in that XSD will belong to that namespace.

Now, if an XSD wants to reference names that have been defined in other XSDs, it can do so by qualifying those names with a namespace prefix. This prefix is a short abbreviation for the full URI that actually identifies the namespace. For example, the following line assigns emp as the namespace prefix for the employee namespace:

Any markup from that namespace would be qualified by prefixing it with emp: , as is title in the following line:

In other words, emp:title is a qualified name, which will not clash with the name title from some other namespace.

It is remarkable how widely XML is now used in applications that range from RPC-style communication among Web-based services to office productivity tools to instant messaging. It is certainly one of the core protocols on which the upper layers of the Internet now depend.

## IEEE Account

- Change Username/Password
- Update Address

## Purchase Details

- Payment Options
- Order History
- View Purchased Documents

## Profile Information

- Communications Preferences
- Profession and Education
- Technical Interests
- US & Canada: +1 800 678 4333
- Worldwide: +1 732 981 0060
- Contact & Support
- About IEEE Xplore
- Accessibility
- Terms of Use
- Nondiscrimination Policy
- Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

## Everything You Need to Know about Network Diagrams: from Network Diagram Symbols to Best Practices

We all prefer graphics, images or any other type of visual representation over plain text.

Plain text is no fun and cannot retain our attention for a long span of time. Sometimes, it is difficult to understand as well. So, it is obvious that it is beneficial to use diagrams to showcase complex relationships or structures.

And, one of them is a network diagram .

It not only helps everyone on the team understand the structures, networks and processes; it also comes handy in project management, maintenance of network structures, debugging etc.

Network diagrams demonstrate how a network works. This network diagram guide will teach you everything you need to know, from what is a network diagram to its symbols and how to make it.

Creately offers simple tools to draw network diagrams or one can simply select an existing template.

What are network diagrams?

What are the symbols used?

Universally accepted terms

What are the uses of a network diagram?

Types of network diagram

How to draw network diagrams?

Network diagram templates

Common errors to avoid

Best practices of drawing a network diagram

## What are Network Diagrams?

As the name suggests, it is a visual representation of a cluster or a small structure of networking devices. It not only shows the components of this network but also depicts how they are interconnected.

While network diagrams were initially used to depict devices, it is now widely used for project management as well.

Network diagrams can be of two types

Physical : This type of network diagram showcases the actual physical relationship between devices/components which make the network.

Logical : This type of diagram shows how the devices communicate with each other and information flows through the network. It is mostly used to depict subnets, network devices, and routing protocols.

## What are the Network Diagram Symbols Used?

These are the commonly used symbols used in a network diagram. However, there are many other symbols which can make your network diagram precise and clear.

Once you select a network diagram template , Creately automatically loads the relevant symbols for you along with the names below it to make it simple and quick.

Isn’t that easy?

Below is the screenshot of the Creately dashboard and the symbols are marked with a red circle for reference. All you have to do is to drag and drop the symbol and create your own network diagram.

## Universally Accepted Terms

There are a few definitions used in network diagrams which you should be aware of.

Activity : It is an operation which is commonly represented by an arrow (to show directions mostly) with an end as well as a starting point.

It can be of 4 types:

Predecessor activity is to be completed before the start of another activity.

Successor activity cannot be initiated until the activities before they are completed. This successor activity should be in immediate succession.

Concurrent activity is to be started at the same time.

Dummy activity does not use any resource but depicts dependence.

An Event is depicted by a circle (also known as a node ) and denotes completion of one or more activities and starting of new ones. Events can be classified into three types:

Merge event is where one or more activity connects with the event and merges.

Burst event is where one or more activity leaves an event.

Merge and Burst event is where one or more activity merges and bursts simultaneously.

Sequencing refers to the precedence of relationships between devices or activities. The following questions can help you figure out

- What job will follow or precede?
- What jobs can run (or will run) concurrently?
- What controls the start and finish?

## What are the Uses of Network Diagrams?

You can use network diagrams for multiple activities including

- Structuring home or office network
- Understanding and troubleshooting any bugs or errors
- Upgrade or update an existing network.
- Documentation for onboarding, communication, planning etc
- Tracking components, devices or jobs
- Depict process and step to be taken while implementing a project

## Types of Network Diagrams

Bus topology

These are easiest to configure and will require lesser cable length than any other topology. The computers or network are connected to a single line (with two endpoints) or a backbone. Hence, it is also popularly known as line topology.

While most of the bus topology would be linear, there is yet another form of bus network which is called “Distributed bus”. This network topology connects different nodes to a common transmission point and this point has two or more endpoint for adding further branches.

Bus topology is generally used when you have a small network and requires, connecting devices in a linear fashion. However, if the bus (or the line) breaks down or has a bug, it is difficult to identify the problem and troubleshoot.

As the name suggests, the network is in the form of a ring. Each device/node connects with exactly two others until it becomes a circle. Information is sent from node to node (in a circular fashion) until it reaches its destination.

It is easy to add or remove a node from ring topology unlike in bus topology. However, if any of the cables break or nodes fail then the entire network fails.

Each node is separately and individually connected to a hub, thereby forming a star. All the information passes through the hub before it is sent to the destination.

While star topology takes up a lot more cable length than other, failure of any node will not affect the network. Not only this, each node can be taken down easily in case of any breakage or failure. However, if the hub fails the network will be stalled.

In this type of network diagram, each node relays data for the network. It can be of two types: Full Mesh and Partially connected mesh.

While each node is connected to one another in full mesh; nodes are connected to each other based on their interaction patterns in a partially connected mesh.

It is a combination of bus and star topology.

## How to Draw a Network Diagram

Create network diagram easily by following the below steps:

- First, plan and draft the network diagram in a paper, or sheet
- Choose the network topology
- Log in to Creately and choose a suitable network diagram template
- Drag and drop relevant network symbols and shapes from the Shape Library
- Label the symbols or devices in the network diagram
- Draw connecting lines to connect each component of the network
- Once you finish, export in the format you want

It is best that you start mapping out the diagram with a paper and a pen. Once you have, you can go to any diagramming tool (like Creately) designed for this purpose.

As mentioned earlier, all you need to do is to drag and drop symbols, lines, shapes etc. to depict connections. You can also select one of the thousand templates we have on Creately to save time and effort.

Choose the network topology : Depending on your end goal, the topology would differ. Network diagrams for a personal home network are much simpler (and, mostly linear) as compared to a rack network or VLAN network for office.

Once you have all the details of the connections, devices etc. you want to, you can begin with the diagramming tool .

With Creately, you can use one of the multiple network diagram templates available.

Once you select a diagram template;

- Add relevant equipment (by inserting symbols): As shown above, Creately loads the relevant shapes, tools, arrows etc. You can begin by inserting computers, servers, routers, firewalls etc on the page.
- Label the symbols/devices: Add components names for clarity for anyone who wants to refer it. If you do not want to add the names (because it will look cluttered, perhaps), you can number them and have an attachment along which describes each element.
- Draw Connecting Lines: Use lines and directional arrows to depict how each component is connected. Please see the best practices section to understand how lines and arrows should be drawn.

## Network Diagram Templates

Office Network Diagram Template

VLAN Network Diagram Template

Basic Network Diagram Template

## Common Network Diagram Errors

As the name suggests, it is a situation wherein you end up making an endless loop in the network diagram

It is a situation where an event is disconnected from other activity. While an activity is merging into the event there is no activity which is starting or emerging from that event. Hence, that event is detached from the network.

It does not exist and is imaginary. It is used in the network diagram (usually represented by a dotted arrow) to show dependency or connectivity between two or more activities.

For example, A and B are concurrent. C is dependent on A; D is dependent on A and B. This relationship is shown with the help of the dotted arrow.

## Network Diagram Best Practices

As is the case with other diagrams,network diagrams have a few universally accepted symbols. There are some other things you would like to do to make it more appealing perhaps.

However, if you are planning to use the diagram for official purposes, presentation or display etc then it is always better to go with standard symbols.

But, don’t be upset. You can always use the symbols you want to but ensure that you give out information in a manner which is easy to understand and find.

A few other pointers:

- Avoid using arrows that cross each other
- Use straight arrows
- Do not represent time with the length of arrows
- Always use arrows left to right.
- Use minimal dummies (use it for your draft if need be)
- The network should have only one entry point known as start event and one point of emergence which is known as the end event.

## What’s Your Feedback on this Network Diagram Guide?

I hope this post (rather, guide!) will help you create awesome network diagrams. They are brilliant if you want to show complicated networks or processes in a simpler manner.

If you have any questions about drawing network diagrams or any suggestions to improve this guide, go ahead and leave a comment.

Join over thousands of organizations that use Creately to brainstorm, plan, analyze, and execute their projects successfully.

## FAQs About Network Diagrams

What are the benefits of using network diagrams.

Here are some of the benefits of using network diagrams:

Helps in planning: Network diagrams provide a clear understanding of the project, its timeline and the relationship between various activities, making it easy to plan and allocate resources.

Facilitates communication: Network diagrams facilitate effective communication by providing a visual representation of the project timeline, and the interdependency of activities between teams, stakeholders and clients.

Improves efficiency: Network diagrams can help identify critical paths, bottlenecks and areas of inefficiency, allowing project managers to prioritize tasks and optimize processes.

Enhances risk management: By identifying potential risks and their impact on the project, network diagrams help in formulating contingency plans to minimize the risks.

## How to choose the right network diagram?

What are the common mistakes to avoid when creating network diagrams.

Inaccurate information: Using inaccurate information such as incorrect durations or dependencies can lead to incorrect network diagrams. Ensure that all information used in the diagram is accurate and up-to-date.

Overcomplicating the diagram: Adding too much information or using too many symbols and lines can make the diagram hard to read and understand. Keep the diagram simple and easy to follow.

Not labeling activities and nodes: It is essential to label activities and nodes in the diagram clearly. Failing to label activities and nodes can cause confusion and make it challenging to understand the diagram.

Incorrect symbol usage: Each network diagram symbol has a specific meaning. Incorrectly using symbols can lead to misinterpretation of the diagram. Ensure that the symbols are used correctly and consistently throughout the diagram.

Ignoring the critical path: The critical path is the sequence of tasks that must be completed on time to ensure the project is completed on schedule. Ignoring the critical path can lead to delays in the project. Ensure that the critical path is identified and given priority in the diagram.

## What tools are available in Creately to successfully create a network diagram?

More related articles.

These are awesome guest posts contributed by our users and technology enthusiasts. Do you have something interesting to share? Want to get exposed to a massive tech audience? Check out our Guest Posting Guidelines to how to proceed.

- Engineering Mathematics
- Discrete Mathematics
- Operating System
- Computer Networks
- Digital Logic and Design
- C Programming
- Data Structures
- Theory of Computation
- Compiler Design
- Computer Org and Architecture

## Network Model in DBMS

The Network Model in a Database Management System (DBMS) is a data model that allows the representation of many-to-many relationships in a more flexible and complex structure compared to the Hierarchical Model. It uses a graph structure consisting of nodes (entities) and edges (relationships) to organize data, enabling more efficient and direct access paths.

## What is Network Model?

This model was formalized by the Database Task group in the 1960s. This model is the generalization of the hierarchical model. This model can consist of multiple parent segments and these segments are grouped as levels but there exists a logical association between the segments belonging to any level. Mostly, there exists a many-to-many logical association between any of the two segments. We called graphs the logical associations between the segments. Therefore, this model replaces the hierarchical tree with a graph-like structure, and with that, there can more general connections among different nodes. It can have M: N relations i.e, many-to-many which allows a record to have more than one parent segment. Here, a relationship is called a set, and each set is made up of at least 2 types of record which are given below:

- An owner record that is the same as of parent in the hierarchical model.
- A member record that is the same as of child in the hierarchical model .

## Structure of a Network Model

A Network data model

In the above figure, member TWO has only one owner ‘ONE’ whereas member FIVE has two owners i.e, TWO and THREE. Here, each link between the two record types represents 1 : M relationship between them. This model consists of both lateral and top-down connections between the nodes. Therefore, it allows 1: 1, 1 : M, M : N relationships among the given entities which helps in avoiding data redundancy problems as it supports multiple paths to the same record. There are various examples such as TOTAL by Cincom Systems Inc., EDMS by Xerox Corp., etc.

Example : Network model for a Finance Department.

Below we have designed the network model for a Finance Department:

Network model of Finance Department.

So, in a network model, a one-to-many (1: N) relationship has a link between two record types. Now, in the above figure, SALES-MAN, CUSTOMER, PRODUCT, INVOICE, PAYMENT, INVOICE-LINE are the types of records for the sales of a company. Now, as you can see in the given figure, INVOICE-LINE is owned by PRODUCT & INVOICE. INVOICE has also two owners SALES-MAN & CUSTOMER.

Let’s see another example , in which we have two segments, Faculty and Student. Say that student John takes courses both in CS and EE departments. Now, find how many instances will be there?

For the above example, a student’s instance can have at least 2 parent instances therefore, there exist relations between the instances of students and faculty segment. The model can be very complex as if we use other segments say Courses and logical associations like Student-Enroll and Faculty-course. So, in this model, a student can be logically associated with various instances of Faculties and Courses.

## Advantages of Network Model

- This model is very simple and easy to design like the hierarchical data model.
- This model is capable of handling multiple types of relationships which can help in modeling real-life applications, for example, 1: 1, 1: M, M: N relationships.
- In this model, we can access the data easily, and also there is a chance that the application can access the owner’s and the member’s records within a set.
- This network does not allow a member to exist without an owner which leads to the concept of Data integrity.
- Like a hierarchical model, this model also does not have any database standard,
- This model allows to represent multi parent relationships.

## Disadvantages of Network Model

- The schema or the structure of this database is very complex in nature as all the records are maintained by the use of pointers.
- There’s an existence of operational anomalies as there is a use of pointers for navigation which further leads to complex implementation.
- The design or the structure of this model is not user-friendly.
- This model does not have any scope of automated query optimization.
- This model fails in achieving structural independence even though the network database model is capable of achieving data independence.

## Features of Network Model in DBMS

- Data Relationship Representation: The network model uses a graph structure to represent data relationships. It allows many-to-many relationships, providing greater flexibility in how data is connected.
- Records and Sets: Data in a network model is organized into records and sets. Records are similar to rows in a relational table, and sets are used to define relationships between records, akin to links in a graph.
- Owner-Member Relationships: The network model defines data relationships using owner-member pairs. An owner record can be linked to multiple member records, and a member record can belong to multiple owner records, facilitating complex relationships.
- Navigational Access: The network model supports navigational data access, where records are accessed through predefined paths. This is different from relational models, which use declarative query languages like SQL .
- Hierarchical and Non-Hierarchical Structures: The network model can represent both hierarchical (tree-like) and non-hierarchical (graph-like) structures, providing flexibility in data modeling.

## Operations on Network Model in DBMS

- Insertion: Adding new records and establishing owner-member relationships.
- Deletion: Removing records and maintaining data integrity by handling related records and relationships.
- Update: Modifying existing records and relationships between records.
- Traversal: Navigating through the network structure to access related records using predefined paths.
- Search: Retrieving specific records based on criteria by navigating the network structure.

## Difference Between the Network Model and the Hierarchical Model

Feature | Hierarchical Model | Network Model |
---|---|---|

Tree-like structure | Graph structure | |

One-to-many (single parent, multiple children) | Many-to-many (multiple parents and children) | |

Less flexible | More flexible | |

Single access path | Multiple access paths | |

Higher redundancy due to rigid hierarchy | Lower redundancy due to shared relationships | |

Simpler to design and implement | More complex to design and manage | |

Suitable for simple, hierarchical data structures | Suitable for complex, interconnected data structures | |

Efficient for hierarchical traversal | Efficient for complex queries and data retrieval | |

Organizational chart | Telecommunications network |

The network model in DBMS offers a flexible way to represent complex data relationships through its graph-based structure. While it allows for many-to-many relationships and more intricate data connections compared to the hierarchical model, it also requires more sophisticated navigational access methods. Understanding its features and operations helps in leveraging its capabilities for scenarios that involve complex data interactions.

## Frequently Asked Questions on Network Model in DBMS – FAQ’s

What are the main advantages of the network model.

The network model’s primary advantages include its ability to handle many-to-many relationships and represent complex data structures. It also provides efficient navigational access to data through predefined paths.

## How does the network model handle data integrity?

The network model maintains data integrity through its owner-member relationships, ensuring that changes in one part of the database appropriately cascade to related records. This helps preserve consistency across the database.

## In what scenarios is the network model particularly useful?

The network model is particularly useful in applications that require complex relationships between data entities, such as telecommunications, transportation networks, and inventory management systems. It is also beneficial in scenarios where efficient navigational access is crucial.

## Please Login to comment...

Similar reads.

- Best Twitch Extensions for 2024: Top Tools for Viewers and Streamers
- Discord Emojis List 2024: Copy and Paste
- Best Adblockers for Twitch TV: Enjoy Ad-Free Streaming in 2024
- PS4 vs. PS5: Which PlayStation Should You Buy in 2024?
- 10 Best Free VPN Services in 2024

## IMAGES

## VIDEO

## COMMENTS

Data Representation. A network is a collection of different devices connected and capable of communicating. For example, a company's local network connects employees' computers and devices like printers and scanners. Employees will be able to share information using the network and also use the common printer/ scanner via the network.

3.3. Representations of Networks#. Now that you know how to represent networks with matrices and have some ideas of properties of networks, let's take a step back and take a look at what network representation is in general, and the different ways you might think about representing networks to understand different aspects of the network.

Network visualization is the practice of creating and displaying graphical representations of network devices, network metrics, and data flows. In plain speak, it's the visual side of network monitoring and analysis. There are a variety of different subcategories of network visualization, including network maps, graphs, charts, and matrices.

Graphs: Network Representation. A network G. (also called a graph) is a set of nodes N. = {1, . . . , n} joined by edges (or links). We will be mostly focusing on simple graphs: a graph with no self-edges or multi edges. A network is typically represented by its adjacency matrix which is an n × n matrix A = [Aij ]i ,j ∈N , where.

What is network representation learning and why is it important? Part 1: Node embeddings (pdf) (ppt) Learning low-dimensional embeddings of nodes in complex networks (e.g., DeepWalk and node2vec). Part 2: Graph neural networks (pdf) (ppt) Techniques for deep learning on network/graph structed data (e.g., graph convolutional networks and GraphSAGE).

Omnipresent network/graph data generally have the characteristics of nonlinearity, sparseness, dynamicity and heterogeneity, which bring numerous challenges to network related analysis problem. Recently, influenced by the excellent ability of deep learning to learn representation from data, representation learning for network data has gradually become a new research hotspot. Network ...

With the rise of large-scale social networks, network mining has become an important sub-domain of data mining. Generating an efficient network representation is one important challenge in applying machine learning to network data. Recently, representation learning methods are widely used in various domains to generate low dimensional latent features from complex high dimensional data. A ...

Network representation learning aims to learn a project from given network data in the original topological space to low-dimensional vector space, while encoding a variety of structural and semantic information. The vector representation obtained could effectively support extensive tasks such as node classification, node clustering, link ...

Network representation learning offers a revolutionary paradigm for mining and learning with network data. In this tutorial, we will give a systematic introduction for representation learning on networks. We will start the tutorial with industry examples from Alibaba, AMiner, Microsoft Academic, WeChat, and XueTangX to explain how network ...

Representation of large data sets became a key question of many scientific disciplines in the last decade. Several approaches for network visualization, data ordering and coarse-graining ...

In this survey, we perform a comprehensive review of the current literature on network representation learning in the data mining and machine learning field. We propose new taxonomies to categorize and summarize the state-of-the-art network representation learning techniques according to the underlying learning mechanisms, the network ...

What is a Network Data Model? Examples, Pros and Cons

Network Data Representation (NDR) is a data encoding and decoding method used in distributed systems, specifically in Remote Procedure Call (RPC) systems. NDR is part of the Distributed Computing Environment (DCE) RPC, allowing different computers and systems to communicate by exchanging data in a standardized format. NDR facilitates data ...

Signed-Magnitude Representation of Negative Numbers Add an extra bit on the left to represent the sign. Use 0 for the „+‟ sign, 1 for the „-‟ sign. Example (3 bits allocated for the magnitude, 1 bit for the sign): 0101=5 10, 1101=-5 10. Problems with the signed-magnitude representation: • Two representations of 0: 0000 and 1000;

7.1 Presentation Formatting . One of the most common transformations of network data is from the representation used by the application program into a form that is suitable for transmission over a network and vice versa.This transformation is typically called presentation formatting.As illustrated in Figure 179, the sending program translates the data it wants to transmit from the ...

Network representation learning (NRL) is an effective graph analytics technique and promotes users to deeply understand the hidden characteristics of graph data. It has been successfully applied in many real-world tasks related to network science, such as social network data processing, biological information processing, and recommender systems. Deep Learning is a powerful tool to learn data ...

A network diagram is a visual representation of a cluster or a small structure of networking devices. Learn more about network diagrams, covering everything from the basics of network diagram symbols and types to best practices for creating and using network diagrams. ... In this type of network diagram, each node relays data for the network ...

What are the different ways of Data Representation?

Data Representation in Neural Networks- Tensor

Computer Network Tutorial

This information is called static data. static data 0000 stack FFFF • Each time you call a method, Java allocates a new block of memory called a stack frame to hold its local variables. These stack frames come from a region of memory called the stack. • Whenever you create a new object, Java allocates space from a pool of memory called the ...

Data Communication - Definition, Components, Types ...

Network Model in DBMS