Analysis of Storm-0558 techniques for unauthorized email access

July 14, 2023

Executive summary

On July 11, 2023, Microsoft published two blogs detailing a malicious campaign by a threat actor tracked as Storm-0558 that targeted customer email that we’ve detected and mitigated: Microsoft Security Response Center and Microsoft on the Issues. As we continue our investigation into this incident and deploy defense in depth measures to harden all systems involved, we’re providing this deeper analysis of the observed actor techniques for obtaining unauthorized access to email data, tools, and unique infrastructure characteristics. 

As described in more detail in our July 11 blogs, Storm-0558 is a China-based threat actor with espionage objectives. Beginning May 15, 2023, Storm-0558 used forged authentication tokens to access user email from approximately 25 organizations, including government agencies and related consumer accounts in the public cloud. No other environment was impacted. Microsoft has successfully blocked this campaign from Storm-0558. As with any observed nation-state actor activity, Microsoft has directly notified targeted or compromised customers, providing them with important information needed to secure their environments.

Since identification of this malicious campaign on June 16, 2023, Microsoft has identified the root cause, established durable tracking of the campaign, disrupted malicious activities, hardened the environment, notified every impacted customer, and coordinated with multiple government entities. We continue to investigate and monitor the situation and will take additional steps to protect customers.

Actor overview

Microsoft Threat Intelligence assesses with moderate confidence that Storm-0558 is a China-based threat actor with activities and methods consistent with espionage objectives. While we have discovered some minimal overlaps with other Chinese groups such as Violet Typhoon (ZIRCONIUM, APT31), we maintain high confidence that Storm-0558 operates as its own distinct group.

Figure 1 shows Storm-0558 working patterns from April to July 2023; the actor’s core working hours are consistent with working hours in China, Monday through Friday from 12:00 AM UTC (8:00 AM China Standard time) through 09:00 AM UTC (5:00 PM China Standard Time).

Heatmap showing observed Storm-0558 activity by day of the week (x-axis) and hour (y-axis).
Figure 1. Heatmap of observed Stom-0558 activity by day of week and hour (UTC).

In past activity observed by Microsoft, Storm-0558 has primarily targeted US and European diplomatic, economic, and legislative governing bodies, and individuals connected to Taiwan and Uyghur geopolitical interests. 

Historically, this threat actor has displayed an interest in targeting media companies, think tanks, and telecommunications equipment and service providers. The objective of most Storm-0558 campaigns is to obtain unauthorized access to email accounts belonging to employees of targeted organizations. Storm-0558 pursues this objective through credential harvesting, phishing campaigns, and OAuth token attacks. This threat actor has displayed an interest in OAuth applications, token theft, and token replay against Microsoft accounts since at least August 2021. Storm-0558 operates with a high degree of technical tradecraft and operational security. The actors are keenly aware of the target’s environment, logging policies, authentication requirements, policies, and procedures. Storm-0558’s tooling and reconnaissance activity suggests the actor is technically adept, well resourced, and has an in-depth understanding of many authentication techniques and applications.

In the past, Microsoft has observed Storm-0558 obtain credentials for initial access through phishing campaigns. The actor has also exploited vulnerabilities in public-facing applications to gain initial access to victim networks. These exploits typically result in web shells, including China Chopper, being deployed on compromised servers. One of the most prevalent malware families used by Storm-0558 is a shared tool tracked by Microsoft as Cigril. This family exists in several variants and is launched using dynamic-link library (DLL) search order hijacking.

After gaining access to a compromised system, Storm-0558 accesses credentials from a variety of sources, including the LSASS process memory and Security Account Manager (SAM) registry hive. Microsoft assesses that once Storm-0558 has access to the desired user credentials, the actor signs into the compromised user’s cloud email account with the valid account credentials. The actor then collects information from the email account over the web service.

Initial discovery and analysis of current activity

On June 16, 2023, Microsoft was notified by a customer of anomalous Exchange Online data access. Microsoft analysis attributed the activity to Storm-0558 based on established prior TTPs. We determined that Storm-0558 was accessing the customer’s Exchange Online data using Outlook Web Access (OWA). Microsoft’s investigative workflow initially assumed the actor was stealing correctly issued Azure Active Directory (Azure AD) tokens, most probably using malware on infected customer devices. Microsoft analysts later determined that the actor’s access was utilizing Exchange Online authentication artifacts, which are typically derived from Azure AD authentication tokens (Azure AD tokens). Further in-depth analysis over the next several days led Microsoft analysts to assess that the internal Exchange Online authentication artifacts did not correspond to Azure AD tokens in Microsoft logs.

Microsoft analysts began investigating the possibility that the actor was forging authentication tokens using an acquired Azure AD enterprise signing key. In-depth analysis of the Exchange Online activity discovered that in fact the actor was forging Azure AD tokens using an acquired Microsoft account (MSA) consumer signing key. This was made possible by a validation error in Microsoft code. The use of an incorrect key to sign the requests allowed our investigation teams to see all actor access requests which followed this pattern across both our enterprise and consumer systems. Use of the incorrect key to sign this scope of assertions was an obvious indicator of the actor activity as no Microsoft system signs tokens in this way. Use of acquired signing material to forge authentication tokens to access customer Exchange Online data differs from previously observed Storm-0558 activity. Microsoft’s investigations have not detected any other use of this pattern by other actors and Microsoft has taken steps to block related abuse.

Actor techniques

Token forgery

Authentication tokens are used to validate the identity of entities requesting access to resources – in this case, email. These tokens are issued to the requesting entity (such as a user’s browser) by identity providers like Azure AD. To prove authenticity, the identity provider signs the token using a private signing key. The relying party validates the token presented by the requesting entity by using a public validation key. Any request whose signature is correctly validated by the published public validation key will be trusted by the relying party. An actor that can acquire a private signing key can then create falsified tokens with valid signatures that will be accepted by relying parties. This is called token forgery.

Storm-0558 acquired an inactive MSA consumer signing key and used it to forge authentication tokens for Azure AD enterprise and MSA consumer to access OWA and Outlook.com. All MSA keys active prior to the incident – including the actor-acquired MSA signing key – have been invalidated. Azure AD keys were not impacted. The method by which the actor acquired the key is a matter of ongoing investigation. Though the key was intended only for MSA accounts, a validation issue allowed this key to be trusted for signing Azure AD tokens. This issue has been corrected.

As part of defense in depth, we continuously update our systems. We have substantially hardened key issuance systems since the acquired MSA key was initially issued. This includes increased isolation of the systems, refined monitoring of system activity, and moving to the hardened key store used for our enterprise systems. We have revoked all previously active keys and issued new keys using these updated systems. Our active investigation indicates these hardening and isolation improvements disrupt the mechanisms we believe the actor could have used to acquire MSA signing keys. No key-related actor activity has been observed since Microsoft invalidated the actor-acquired MSA signing key. Further, we have seen Storm-0558 transition to other techniques, which indicates that the actor is not able to utilize or access any signing keys. We continue to explore other ways the key may have been acquired and add additional defense in depth measures.

Identity techniques for access

Once authenticated through a legitimate client flow leveraging the forged token, the threat actor accessed the OWA API to retrieve a token for Exchange Online from the GetAccessTokenForResource API used by OWA. The actor was able to obtain new access tokens by presenting one previously issued from this API due to a design flaw. This flaw in the GetAccessTokenForResourceAPI has since been fixed to only accept tokens issued from Azure AD or MSA respectively. The actor used these tokens to retrieve mail messages from the OWA API. 

Actor tooling

Microsoft Threat Intelligence routinely identifies threat actor capabilities and leverages file intelligence to facilitate our protection of Microsoft customers. During this investigation, we identified several distinct Storm-0558 capabilities that facilitate the threat actor’s intrusion techniques. The capabilities described in this section are not expected to be present in the victim environment.

Storm-0558 uses a collection of PowerShell and Python scripts to perform REST API calls against the OWA Exchange Store service. For example, Storm-0558 has the capability to use minted access tokens to extract email data such as:

  • Download emails
  • Download attachments
  • Locate and download conversations
  • Get email folder information

The generated web requests can be routed through a Tor proxy or several hardcoded SOCKS5 proxy servers. The threat actor was observed using several User-Agents when issuing web requests, for example:

  • Client=REST;Client=RESTSystem;;
  • Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36
  • Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/106.0.1370.52
  • “Microsoft Edge”;v=”113″, “Chromium”;v=”113″, “Not-A.Brand”;v=”24″

The scripts contain highly sensitive hardcoded information such as bearer access tokens and email data, which the threat actor uses to perform the OWA API calls. The threat actor has the capability to refresh the access token for use in subsequent OWA commands.

Screenshot of Python code snippet of the token refresh functionality
Figure 2. Python code snippet of the token refresh functionality used by the threat actor.
Screenshot of PowerShell code snippet of OWA REST API
Figure 3. PowerShell code snippet of OWA REST API call to GetConversationItems.

Actor infrastructure

During significant portions of Storm-0558’s malicious activities, the threat actor leveraged dedicated infrastructure running the SoftEther proxy software. Proxy infrastructure complicates detection and attribution of Storm-0558 activities. During our response, Microsoft Threat Intelligence identified a unique method of profiling this proxy infrastructure and correlated with behavioral characteristics of the actor intrusion techniques. Our profile was based on the following facets:

  1. Hosts operating as part of this network present a JARM fingerprint consistent with SoftEther VPN: 06d06d07d06d06d06c42d42d000000cdb95e27fd8f9fee4a2bec829b889b8b.
  2. Presented x509 certificate has expiration date of December 31, 2037.
  3. Subject information within the x509 certificate does not contain “softether”.

Over the course of the campaign, the IPs listed in the table below were used during the corresponding timeframes.

IP addressFirst seenLast seenDescription
51.89.156[.]1533/9/20237/10/2023SoftEther proxy
176.31.90[.]1293/28/20236/29/2023SoftEther proxy
137.74.181[.]1003/31/20237/11/2023SoftEther proxy
193.36.119[.]454/19/20237/7/2023SoftEther proxy
185.158.248[.]1594/24/20237/6/2023SoftEther proxy
131.153.78[.]1885/6/20236/29/2023SoftEther proxy
37.143.130[.]1465/12/20235/19/2023SoftEther proxy
146.70.157[.]455/12/20236/8/2023SoftEther proxy
185.195.200[.]395/15/20236/29/2023SoftEther proxy
185.38.142[.]2295/15/20237/12/2023SoftEther proxy
146.70.121[.]445/17/20236/29/2023SoftEther proxy
31.42.177[.]1815/22/20235/23/2023SoftEther proxy
185.51.134[.]526/7/20237/11/2023SoftEther proxy
173.44.226[.]706/9/20237/11/2023SoftEther proxy
45.14.227[.]2336/12/20236/26/2023SoftEther proxy
185.236.231[.]1096/12/20237/3/2023SoftEther proxy
178.73.220[.]1496/16/20237/12/2023SoftEther proxy
45.14.227[.]2126/19/20236/29/2023SoftEther proxy
91.222.173[.]2256/20/20237/1/2023SoftEther proxy
146.70.35[.]1686/22/20236/29/2023SoftEther proxy
146.70.157[.]2136/26/20236/30/2023SoftEther proxy
31.42.177[.]2016/27/20236/29/2023SoftEther proxy
5.252.176[.]87/1/20237/1/2023SoftEther proxy
80.85.158[.]2157/1/20237/9/2023SoftEther proxy
193.149.129[.]887/2/20237/12/2023SoftEther proxy
5.252.178[.]687/3/20237/11/2023SoftEther proxy
116.202.251[.]87/4/20237/7/2023SoftEther proxy
185.158.248[.]936/25/202306/26/2023SoftEther proxy
20.108.240[.]2526/25/20237/5/2023SoftEther proxy
146.70.135[.]1825/18/20236/22/2023SoftEther proxy

As early as May 15, 2023, Storm-0558 shifted to using a separate series of dedicated infrastructure servers specifically for token replay and interaction with Microsoft services. It is likely that the dedicated infrastructure and supporting services configured on this infrastructure offered a more efficient manner of facilitating the actor’s activities. The dedicated infrastructure would host an actor-developed web panel that presented an authentication page at URI /#/login. The observed sign-in pages had one of two SHA-1 hashes: 80d315c21fc13365bba5b4d56357136e84ecb2d4 and 931e27b6f1a99edb96860f840eb7ef201f6c68ec.

Screenshot of the token web panel sign-in page
Figure 4. Token web panel sign-in page with SHA-1 hashes.

As part of the intelligence-driven response to this campaign, and in support of tracking, analyzing, and disrupting actor activity, analytics were developed to proactively track the dedicated infrastructure. Through this tracking, we identified the following dedicated infrastructure.

IP addressFirst seenLast seenDescription
195.26.87[.]2195/15/20236/25/2023Token web panel
185.236.228[.]1835/24/20236/11/2023Token web panel
85.239.63[.]1606/7/20236/11/2023Token web panel
193.105.134[.]586/24/20236/25/2023Token web panel
146.0.74[.]166/28/20237/4/2023Token web panel
91.231.186[.]2266/29/20237/4/2023Token web panel
91.222.174[.]416/29/20237/3/2023Token web panel
185.38.142[.]2496/29/20237/2/2023Token web panel

The last observed dedicated token replay infrastructure associated with this activity was stood down on July 4, 2023, roughly one day following the coordinated mitigation conducted by Microsoft. 

Post-compromise activity

Our telemetry and investigations indicate that post-compromise activity was limited to email access and exfiltration for targeted users.

Mitigation and hardening

No customer action is required to mitigate the token forgery technique or validation error in OWA or Outlook.com. Microsoft has mitigated this issue on customers’ behalf as follows:

  • On June 26, OWA stopped accepting tokens issued from GetAccessTokensForResource for renewal, which mitigated the token renewal being abused.
  • On June 27, Microsoft blocked the usage of tokens signed with the acquired MSA key in OWA preventing further threat actor enterprise mail activity.
  • On June 29, Microsoft completed replacement of the key to prevent the threat actor from using it to forge tokens. Microsoft revoked all MSA signing which were valid at the time of the incident, including the actor-acquired MSA key. The new MSA signing keys are issued in substantially updated systems which benefit from hardening not present at issuance of the actor-acquired MSA key:
    • Microsoft has increased the isolation of these systems from corporate environments, applications, and users.Microsoft has refined monitoring of all systems related to key activity, and increased automated alerting related to this monitoring.
    • Microsoft has moved the MSA signing keys to the key store used for our enterprise systems.
  • On July 3, Microsoft blocked usage of the key for all impacted consumer customers to prevent use of previously-issued tokens.

Ongoing monitoring indicates that all actor activity related to this incident has been blocked. Microsoft will continue to monitor Storm-0558 activity and implement protections for our customers.

Recommendations

Microsoft has mitigated this activity on our customers’ behalf for Microsoft services. No customer action is required to prevent threat actors from using the techniques described above to access Exchange Online and Outlook.com.

Indicators of compromise

IndicatorTypeFirst seenLast seenDescription
d4b4cccda9228624656bff33d8110955779632aaThumbprint  Thumbprint of acquired signing key
195.26.87[.]219IPv45/15/20236/25/2023Token web panel
185.236.228[.]183IPv45/24/20236/11/2023Token web panel
85.239.63[.]160IPv46/7/20236/11/2023Token web panel
193.105.134[.]58IPv46/24/20236/25/2023Token web panel
146.0.74[.]16IPv46/28/20237/4/2023Token web panel
91.231.186[.]226IPv46/29/20237/4/2023Token web panel
91.222.174[.]41IPv46/29/20237/3/2023Token web panel
185.38.142[.]249IPv46/29/20237/2/2023Token web panel
51.89.156[.]153IPv43/9/20237/10/2023SoftEther proxy
176.31.90[.]129IPv43/28/20236/29/2023SoftEther proxy
137.74.181[.]100IPv43/31/20237/11/2023SoftEther proxy
193.36.119[.]45IPv44/19/20237/7/2023SoftEther proxy
185.158.248[.]159IPv44/24/20237/6/2023SoftEther proxy
131.153.78[.]188IPv45/6/20236/29/2023SoftEther proxy
37.143.130[.]146IPv45/12/20235/19/2023SoftEther proxy
146.70.157[.]45IPv45/12/20236/8/2023SoftEther proxy
185.195.200[.]39IPv45/15/20236/29/2023SoftEther proxy
185.38.142[.]229IPv45/15/20237/12/2023SoftEther proxy
146.70.121[.]44IPv45/17/20236/29/2023SoftEther proxy
31.42.177[.]181IPv45/22/20235/23/2023SoftEther proxy
185.51.134[.]52IPv46/7/20237/11/2023SoftEther proxy
173.44.226[.]70IPv46/9/20237/11/2023SoftEther proxy
45.14.227[.]233IPv46/12/20236/26/2023SoftEther proxy
185.236.231[.]109IPv46/12/20237/3/2023SoftEther proxy
178.73.220[.]149IPv46/16/20237/12/2023SoftEther proxy
45.14.227[.]212IPv46/19/20236/29/2023SoftEther proxy
91.222.173[.]225IPv46/20/20237/1/2023SoftEther proxy
146.70.35[.]168IPv46/22/20236/29/2023SoftEther proxy
146.70.157[.]213IPv46/26/20236/30/2023SoftEther proxy
31.42.177[.]201IPv46/27/20236/29/2023SoftEther proxy
5.252.176[.]8IPv47/1/20237/1/2023SoftEther proxy
80.85.158[.]215IPv47/1/20237/9/2023SoftEther proxy
193.149.129[.]88IPv47/2/20237/12/2023SoftEther proxy
5.252.178[.]68IPv47/3/20237/11/2023SoftEther proxy
116.202.251[.]8IPv47/4/20237/7/2023SoftEther proxy

Further reading

For the latest security research from the Microsoft Threat Intelligence community, check out the Microsoft Threat Intelligence Blog: https://aka.ms/threatintelblog.

To get notified about new publications and to join discussions on social media, follow us on Twitter at https://twitter.com/MsftSecIntel.

Source :
https://www.microsoft.com/en-us/security/blog/2023/07/14/analysis-of-storm-0558-techniques-for-unauthorized-email-access/

How to Disable NTLM Authentication in Windows Domain

February 28, 2023

NTLM (NT LAN Manager) is a legacy Microsoft authentication protocol that dates back to Windows NT. Although Microsoft introduced the more secure Kerberos authentication protocol back in Windows 2000, NTLM (mostly NTLMv2) is still widely used for authentication on Windows domain networks. In this article, we will look at how to disable the NTLMv1 and NTLMv2 protocols, and switch to Kerberos in an Active Directory domain.

Contents:

The key NTLMv1 problems:

  • weak encryption;
  • storing password hash in the memory of the LSA service, which can be extracted from Windows memory in plain text using various tools (such as Mimikatz) and used for further attacks using pass-the-has scripts;
  • the lack of mutual authentication between a server and a client, leading to data interception and unauthorized access to resources (some tools such as Responder can capture NTLM data sent over the network and use them to access the network resources);
  • and other vulnerabilities.

Some of these have been in the next version NTLMv2 which uses more secure encryption algorithms and allows to prevent of common NTLM attacks. NTLMv1 and LM authentication protocols are disabled by default starting with Windows 7 and Windows Server 2008 R2.

How to Enable NTLM Authentication Audit Logging?

Before completely disabling NTLM in a domain and switching to Kerberos, it is a good idea to ensure that there are no applications in the domain that require and use NTLM auth. There may be legacy devices or services on your network that still use NTLMv1 authentication instead of NTLMv2 (or Kerberos).

To track accounts or apps that use NTLM authentication, you can enable audit logging policies on all computers using GPO. Open the Default Domain Controller Policy, navigate to the Computer Configuration -> Windows Settings -> Security Settings -> Local Policies -> Security Options section, find and enable the Network Security: Restrict NTLM: Audit NTLM authentication in this domain policy and set its value to Enable all.

Network Security: Restrict NTLM: Audit NTLM authentication in this domain

In the same way, enable the following policies in the Default Domain Policy:

  • Network Security: Restrict NTLM: Audit Incoming NTLM Traffic – set its value to Enable auditing for domain accounts
  • Network security: Restrict NTLM: Outgoing NTLM traffic to remote servers: set Audit all
Network Security: Restrict NTLM: Audit Incoming NTLM Traffic

Once these policies are enabled, events related to the use of NTLM authentication will appear in the Application and Services Logs-> Microsoft -> Windows -> NTLM section of the Event Viewer.

You can analyze the events on each server or collect them to the central Windows Event Log Collector.

You need to search for the events from the source Microsoft-Windows-Security-Auditing with the Event ID 4624 – “An Account was successfully logged on“. Note the information in the “Detailed Authentication Information” section. If there is NTLM in the Authentication Package value, then the NTLM protocol was used to authenticate this user.

Look at the value of Package Name (NTLM only). This line shows which protocol (LM, NTLMv1, or NTLMv2) was used for authentication. So you need to identify any servers/applications that are using the legacy protocol.

eventid 4624 source Microsoft-Windows-Security-Auditing ntlm usage

Also, if NTLM is used for authentication instead of Kerberos, Event ID 4776 will appear in the log:

The computer attempted to validate the credentials for an account
Authentication Package: MICROSOFT_AUTHENTICATION_PACKAGE_V1_0

For example, to search for all NTLMv1 authentication events on all domain controllers, you can use the following PowerShell script:

$ADDCs = Get-ADDomainController -filter
$Now = Get-Date
$Yesterday = $Now.AddDays(-1)
$NewOutputFile = "c:\Events\$($Yesterday.ToString('yyyyddMM'))_AD_NTLMv1_events.log"
function GetEvents($DC){
Write-Host "Searching log on " $DC.HostName
$Events = Get-EventLog "Security" -After $Yesterday.Date -Before $Now.Date -ComputerName $DC.HostName -Message "*NTLM V1*" -instanceid 4624
foreach($Event in $Events){
Write-Host $DC.HostName $Event.EventID $Event.TimeGenerated
Out-File -FilePath $NewOutputFile -InputObject "$($Event.EventID), $($Event.MachineName), $($Event.TimeGenerated), $($Event.ReplacementStrings),($Event.message)" -Append
}
}
foreach($DC in $ADDCs){GetEvents($DC)}

Once you have identified the users and applications that use NTLM in your domain, try switching them to use Kerberos (possibly using SPN). To use Kerberos authentication, some applications need to be slightly reconfigured (Kerberos Authentication in IISConfigure different browsers for Kerberos authenticationCreate a Keytab File Using Kerberos Auth). From my own experience, I see that even large commercial products are still using NTLM instead of Kerberos, some products require updates or configuration changes. The idea is to identify which applications use NTLM authentication, and now you have a way to identify that software and devices.

Small open-source products, old models of various network scanners (which store scans in shared network folders), some NAS devices and other old hardware, legacy software and operating systems are likely to have authentication problems when NTLMv1 is disabled.

Those apps that cannot use Kerberos can be added to the exceptions. This allows them to use NTLM authentication even if it is disabled at the domain level. To do it, the Network security: Restrict NTLM: Add server exceptions for NTLM authentication in this domain policy is used. Add the names of the servers (NetBIOS names, IP addresses, or FQDN), on which NTLM authentication can be used, to the list of exceptions as well. Ideally, this exception list should be empty. You can use the wildcard character *.

GPO: Network security: Restrict NTLM: Add server exceptions for NTLM authentication in this domain

To use Kerberos authentication in an application, you must specify the DNS name of the server, instead of its IP address. If you specify an IP address when connecting to your resources, NTLM authentication will be used.

Configuring Active Directory to Force NTLMv2 via GPO

Before completely disabling NTLM in an AD domain, it is recommended that you first disable its more vulnerable version, NTLMv1. The domain administrator needs to make sure that their network does not allow the use of NTLM or LM for authentication, as in some cases an attacker can use special requests to get a response to an NTLM/LM request.

You can set the preferred authentication type using the domain GPO. Open the Group Policy Management Editor (gpmc.msc) and edit the Default Domain Controllers Policy. Go to the GPO section Computer Configurations -> Policies -> Windows Settings -> Security Settings -> Local Policies -> Security Options and find the policy Network Security: LAN Manager authentication level.

Network Security: LAN Manager authentication level - disable ntlm v1 and lm

There are 6 options to choose from in the policy settings::

  • Send LM & NTLM responses;
  • Send LM & NTLM responses – use NTLMv2 session security if negotiated;
  • Send NTLM response only;
  • Send NTLMv2 response only;
  • Send NTLMv2 response only. Refuse LM;
  • Send NTLMv2 response only. Refuse LM& NTLM.

The NTLM authentication options are listed in the order of their security improvement. By default, Windows 7 and later operating systems use the option Send NTLMv2 response only. If this option is enabled, client computers use NTLMv2 authentication, but AD domain controllers accept LM, NTLM, and NTLMv2 requests.

NTLMv2 can be used where the Kerberos protocol has failed and for some operations (for example, managing local groups and accounts on the domain-joined computers) or in workgroups.

You can change the policy value to the most secure option 6 : “Send NTLMv2 response only. Refuse LM & NTLM”. This policy causes domain controllers to reject LM and NTLM requests as well.

You can also disable NTLMv1 through the registry. To do this, create a DWORD parameter with the name LmCompatibilityLevel with a value between 0 and 5 under the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa. Value 5 corresponds to the policy option “Send NTLMv2 response only. Refuse LM NTLM”.

Make sure that the Network security: Do not store LAN Manager hash value on next password change policy is enabled in the same GPO section. It is enabled by default starting with Windows Vista / Windows Server 2008 and prevents the creation of an LM hash.

Network security: Do not store LAN Manager hash value on next password change

Once you have ensured that you are not using NTLMv1, you can go further and try to disable NTLMv2. NTLMv2 is a more secure authentication protocol but loses significantly to Kerberos in terms of security (although there are fewer vulnerabilities in NTLMv2 than in the NTLMv1, but there is still a chance of capturing and reusing data, as well as it doesn’t support mutual authentication).

The main risk of disabling NTLM is the potential use of legacy or misconfigured applications that may still be using NTLM authentication. If this is the case, they will need to be updated or specially configured to switch to Kerberos.

If you have a Remote Desktop Gateway server on your network, you will need to make an additional configuration to prevent clients from connecting using NTLMv1. Create a registry entry:

REG add "HKLM\Software\Microsoft\Windows NT\CurrentVersion\TerminalServerGateway\Config\Core" /v EnforceChannelBinding /t REG_DWORD /d 1 /f

Restrict NTLM Completely and Use Kerberos Authentication in an AD

To check how authentication works in different applications in a domain without using NTLM, you can add the accounts of the required users to the Protected Users domain group (it is available since the Windows Server 2012 R2 release). Members of this security group can only authenticate using Kerberos (NTLM, Digest Authentication, or CredSSP are not allowed). This allows you to verify that Kerberos user authentication is working correctly in different apps.

Then you can completely disable NTLM on the Active Directory domain using the Network Security: Restrict NTLM: NTLM authentication in this domain policy.

The policy has 5 options:

  • Disable: the policy is disabled (NTLM authentication is allowed in the domain);
  • Deny for domain accounts to domain servers: the domain controllers reject NTLM authentication attempts for all servers under the domain accounts, and the “NTLM is blocked” error message is displayed;
  • Deny for domain accounts: the domain controllers are preventing NTLM authentication attempts for all domain accounts, and the “NTLM is blocked” error appears;
  • Deny for domain servers: NTLM authentication requests are denied for all servers unless the server name is on the exception list in the “Network security: Restrict NTLM: Add server exceptions for NTLM authentication in this domain” policy;
  • Deny all: the domain controllers block all NTLM requests for all domain servers and accounts.
GPO: Network Security: Restrict NTLM: NTLM authentication in this domain

Although NTLM is now disabled on the domain, it is still used to process local logins to computers (NTLM is always used for local user logons).

You can also disable incoming and outgoing NTLM traffic on domain computers using separate Default Domain Policy options:

  • Network security: Restrict NTLM: Incoming NTLM traffic = Deny all accounts
  • Network security: Restrict NTLM: Outgoing NTLM traffic to remote servers = Deny all

After enabling auditing, Event Viewer will also display EventID 6038 from the LsaSRV source when using NTLM for authentication:

Microsoft Windows Server has detected that NTLM authentication is presently being used between clients and this server. This event occurs once per boot of the server on the first time a client uses NTLM with this server.
NTLM is a weaker authentication mechanism. Please check:
Which applications are using NTLM authentication?
Are there configuration issues preventing the use of stronger authentication such as Kerberos authentication?
If NTLM must be supported, is Extended Protection configured?
eventid 6038 from lsasrv source: NTLM authentication is presently being used between clients and this server

You can check that Kerberos is used for user authentication with the command:

klist sessions

klist session - check authentication protocol used

This command shows that all users are Kerberos-authenticated (except the built-in local Administrator, who is always authenticated using NTLM).

If you are experiencing a lot of user account lockout events after disabling NTLM, take a close look at the events with ID 4771 (Kerberos pre-authentication failed). Check the Failure Code in the error description. This will indicate the reason and source of the lock.

To further improve Active Directory security, I recommend reading these articles:


Source :
https://woshub.com/disable-ntlm-authentication-windows/

8 Essential Tips for Data Protection and Cybersecurity in Small Businesses

Michelle Quill — June 6, 2023

Small businesses are often targeted by cybercriminals due to their lack of resources and security measures. Protecting your business from cyber threats is crucial to avoid data breaches and financial losses.

Why is cyber security so important for small businesses?

Small businesses are particularly in danger of cyberattacks, which can result in financial loss, data breaches, and damage to IT equipment. To protect your business, it’s important to implement strong cybersecurity measures.

Here are some tips to help you get started:

One important aspect of data protection and cybersecurity for small businesses is controlling access to customer lists. It’s important to limit access to this sensitive information to only those employees who need it to perform their job duties. Additionally, implementing strong password policies and regularly updating software and security measures can help prevent unauthorized access and protect against cyber attacks. Regular employee training on cybersecurity best practices can also help ensure that everyone in the organization is aware of potential threats and knows how to respond in the event of a breach.

When it comes to protecting customer credit card information in small businesses, there are a few key tips to keep in mind. First and foremost, it’s important to use secure payment processing systems that encrypt sensitive data. Additionally, it’s crucial to regularly update software and security measures to stay ahead of potential threats. Employee training and education on cybersecurity best practices can also go a long way in preventing data breaches. Finally, having a plan in place for responding to a breach can help minimize the damage and protect both your business and your customers.

Small businesses are often exposed to cyber attacks, making data protection and cybersecurity crucial. One area of particular concern is your company’s banking details. To protect this sensitive information, consider implementing strong passwords, two-factor authentication, and regular monitoring of your accounts. Additionally, educate your employees on safe online practices and limit access to financial information to only those who need it. Regularly backing up your data and investing in cybersecurity software can also help prevent data breaches.

Small businesses are often at high risk of cyber attacks due to their limited resources and lack of expertise in cybersecurity. To protect sensitive data, it is important to implement strong passwords, regularly update software and antivirus programs, and limit access to confidential information.

It is also important to have a plan in place in case of a security breach, including steps to contain the breach and notify affected parties. By taking these steps, small businesses can better protect themselves from cyber threats and ensure the safety of their data.

Tips for protecting your small business from cyber threats and data breaches are crucial in today’s digital age. One of the most important steps is to educate your employees on cybersecurity best practices, such as using strong passwords and avoiding suspicious emails or links.

It’s also important to regularly update your software and systems to ensure they are secure and protected against the latest threats. Additionally, implementing multi-factor authentication and encrypting sensitive data can add an extra layer of protection. Finally, having a plan in place for responding to a cyber-attack or data breach can help minimize the damage and get your business back on track as quickly as possible.

Small businesses are attackable to cyber-attacks and data breaches, which can have devastating consequences. To protect your business, it’s important to implement strong cybersecurity measures. This includes using strong passwords, regularly updating software and systems, and training employees on how to identify and avoid phishing scams.

It’s also important to have a data backup plan in place and to regularly test your security measures to ensure they are effective. By taking these steps, you can help protect your business from cyber threats and safeguard your valuable data.

To protect against cyber threats, it’s important to implement strong data protection and cybersecurity measures. This can include regularly updating software and passwords, using firewalls and antivirus software, and providing employee training on safe online practices. Additionally, it’s important to have a plan in place for responding to a cyber attack, including backing up data and having a designated point person for handling the situation.

In today’s digital age, small businesses must prioritize data protection and cybersecurity to safeguard their operations and reputation. With the rise of remote work and cloud-based technology, businesses are more vulnerable to cyber attacks than ever before. To mitigate these risks, it’s crucial to implement strong security measures for online meetings, advertising, transactions, and communication with customers and suppliers. By prioritizing cybersecurity, small businesses can protect their data and prevent unauthorized access or breaches.

Here are 8 essential tips for data protection and cybersecurity in small businesses.

8 Essential Tips for Data Protection and Cybersecurity in Small Businesses

1. Train Your Employees on Cybersecurity Best Practices

Your employees are the first line of defense against cyber threats. It’s important to train them on cybersecurity best practices to ensure they understand the risks and how to prevent them. This includes creating strong passwords, avoiding suspicious emails and links, and regularly updating software and security systems. Consider providing regular training sessions and resources to keep your employees informed and prepared.

2. Use Strong Passwords and Two-Factor Authentication

One of the most basic yet effective ways to protect your business from cyber threats is to use strong passwords and two-factor authentication. Encourage your employees to use complex passwords that include a mix of letters, numbers, and symbols, and to avoid using the same password for multiple accounts. Two-factor authentication adds an extra layer of security by requiring a second form of verification, such as a code sent to a mobile device, before granting access to an account. This can help prevent unauthorized access even if a password is compromised.

3. Keep Your Software and Systems Up to Date

One of the easiest ways for cybercriminals to gain access to your business’s data is through outdated software and systems. Hackers are constantly looking for vulnerabilities in software and operating systems, and if they find one, they can exploit it to gain access to your data. To prevent this, make sure all software and systems are kept up-to-date with the latest security patches and updates. This includes not only your computers and servers but also any mobile devices and other connected devices used in your business. Set up automatic updates whenever possible to ensure that you don’t miss any critical security updates.

4. Use Antivirus and Anti-Malware Software

Antivirus and anti-malware software are essential tools for protecting your small business from cyber threats. These programs can detect and remove malicious software, such as viruses, spyware, and ransomware before they can cause damage to your systems or steal your data. Make sure to install reputable antivirus and anti-malware software on all devices used in your business, including computers, servers, and mobile devices. Keep the software up-to-date and run regular scans to ensure that your systems are free from malware.

5. Backup Your Data Regularly

One of the most important steps you can take to protect your small business from data loss is to back up your data regularly. This means creating copies of your important files and storing them in a secure location, such as an external hard drive or cloud storage service. In the event of a cyber-attack or other disaster, having a backup of your data can help you quickly recover and minimize the impact on your business. Make sure to test your backups regularly to ensure that they are working properly and that you can restore your data if needed.

6. Carry out a risk assessment

Small businesses are especially in peril of cyber attacks, making it crucial to prioritize data protection and cybersecurity. One important step is to assess potential risks that could compromise your company’s networks, systems, and information. By identifying and analyzing possible threats, you can develop a plan to address security gaps and protect your business from harm.

For Small businesses making data protection and cybersecurity is a crucial part. To start, conduct a thorough risk assessment to identify where and how your data is stored, who has access to it, and potential threats. If you use cloud storage, consult with your provider to assess risks. Determine the potential impact of breaches and establish risk levels for different events. By taking these steps, you can better protect your business from cyber threats

7. Limit access to sensitive data

One effective strategy is to limit access to critical data to only those who need it. This reduces the risk of a data breach and makes it harder for malicious insiders to gain unauthorized access. To ensure accountability and clarity, create a plan that outlines who has access to what information and what their roles and responsibilities are. By taking these steps, you can help safeguard your business against cyber threats.

8. Use a firewall

For Small businesses, it’s important to protect the system from cyber attacks by making data protection and reducing cybersecurity risk. One effective measure is implementing a firewall, which not only protects hardware but also software. By blocking or deterring viruses from entering the network, a firewall provides an added layer of security. It’s important to note that a firewall differs from an antivirus, which targets software affected by a virus that has already infiltrated the system.

Small businesses can take steps to protect their data and ensure cybersecurity. One important step is to install a firewall and keep it updated with the latest software or firmware. Regularly checking for updates can help prevent potential security breaches.

Conclusion

Small businesses are particularly vulnerable to cyber attacks, so it’s important to take steps to protect your data. One key tip is to be cautious when granting access to your systems, especially to partners or suppliers. Before granting access, make sure they have similar cybersecurity practices in place. Don’t hesitate to ask for proof or to conduct a security audit to ensure your data is safe.

Source :
https://onlinecomputertips.com/support-categories/networking/tips-for-cybersecurity-in-small-businesses/

Change Your Windows Folder Colors and Icons

Cindy Thomas — June 19, 2023

As you probably know, Windows keeps your files into folders that can also contain subfolders. By using folders, you can keep your computer organized by placing files of certain types in their own folders, such as files for a school project or sales meeting. And of course you can create these folders and subfolders as needed and copy or move your files in and out of them.

You have probably also noticed that all of these folders look the same with the exception of the Windows user folders for Documents, Downloads, Pictures and so on as seen below.

Windows User Folders

Another thing that will affect how your folders look is the view that you have applied to them. You can set your folder views to show them as a list or as icons of various sizes. When you use one of the icon views, you might see a file preview icon on the folder based on what types of files are in the folder itself. This icon can also change when you add or remove files from the folder. Empty folders will not have any file preview icons on them.

Windows Folders

If you are looking for some extra customization, then you can try out the free Folder Marker software which will allow you to apply colors to specific folders as well as custom icons. Once you download and install the software, you can apply color and icon changes by either adding folders to the main interface or by using the new right click context menu item that you will now have on your computer.

Folder Marker Software
Change Your Windows Folder Colors and Icons

If you use the first method where you add or drag folders into the app itself, any changes you make will be applied to all folders in the list so you might want to use the right click method to apply changes to single folders.

If you would rather apply a custom icon to your folder rather than change its color, then you can do so from the Main tab in the app or simply by clicking the icon you like from the right click menu. The User Icons section is used to add your own custom icons if you happen to know how to create those.

Change Your Windows Folder Colors and Icons

The image below shows the same folders with some colors and icons applied to them. As you can see, they stand out much better than they did before the changes were made. If you were to move or copy a folder to a new location, its color or custom icon will stay with it so you don’t need to worry about having to change its appearance again.

Change Your Windows Folder Colors and Icons

If you change your mind and what to revert a folder back to its original look, you can do so by right clicking on it and choosing the Restore Default option. To revert all of your changes, you can open the app itself and then go to the Action menu and click on Rollback All Changes.

As you can see, Folder Marker is easy to use and is a quick way to customize your Windows folders and can really help with your file management tasks. You can download the program from their website here.

Source :
https://onlinecomputertips.com/support-categories/windows/change-your-windows-folder-colors-and-icons/

Export Your Google Sites Website to Your Computer

Preston Mason — January 12, 2023

Everyone who uses a computer knows that Google has a service or an app for just about anything you think of. And for those who want an easy way to create websites, they also have their free online website creation tool called Google Sites which doesn’t require any coding skills or HTML knowledge to use. But if you want to backup your website, you will notice that there is no option to do so. In this article, we will show you how to export your google sites website to your computer.

If you are using traditional website development tools or software, you would generally be working on your individual HTML files and also be backing them up as needed. But with Google Sites, you are working within the Sites interface online and your website it stored in your Google Drive account.

The Backup Process

When you go to your Google Drive account and find your website, you will see that it appears as a single file just like a document or spreadsheet would appear. Then if you right click on it, there is no download option like you would see for other files.

Google Drive right click options

To backup your Sites website, you will first need to make a copy of it within your Google Drive account and move to a different folder. Within Drive, click on the New button and then choose New folder. Name this folder something like Website Backup or whatever you would like.

Next, go to your website file in Drive, right click it and then choose the Make a copy option. This will make a copy of your website in the same location and will add copy of in front of the existing name.

Google Drive file right click options make a copy

You can then drag the copy to your new folder or right click on the copy and choose the Move to option and then choose your new folder. You can also move your original website file to this new folder if you don’t want to make the copy for the backup.

Using Google Takeout

After this copy is created, you can go to the Google Takeout website and just make sure you are logged in with the same Google account that you use for your website. Then the first thing you want to do is click on Deselect all since all of the checkboxes will be selected by default.

Google Takeout website

Then scroll down the list and look for Drive (not Classic Sites) and check the box and then click on All Drive data included.

Google Takeout website

Next you will uncheck the box that says Include all files and folders in Drive and check only the box next to the folder name that contains your website file and click OK.

Export Your Google Sites Website to Your Computer

Then you will need to scroll to the bottom of the page and click on the button that says Next step. Now you will be able to configure the export to either send a download link to your email address or add your backup to your Drive, OneDrive, Dropbox or Box account. I prefer the email option.

You can also setup this backup as a one time export or have it be exported every 2 months for the next year. For the export file type you can choose between zip and tgz file formats. If you have a really large website then the export will be split into 2GB files but your website is most likely much smaller than 2GB.

Once you have everything set correctly, you can click on the Create export button.

Export Your Google Sites Website to Your Computer

You will then be shown an export progress screen telling you that your will be sent an email when the export is complete. Even though it says it can possibly take days to complete, its usually fairly quick and might only take a few minutes.

Export Your Google Sites Website to Your Computer

Once you receive the email, you can click on the Download your files button or the Mange exports button to be taken back to the Google Takeout website to download your files.

Export Your Google Sites Website to Your Computer
Google Takeout Mange Exports

After you download the zip file, you can then extract it and navigate to the location of your website files.

Export Your Google Sites Website to Your Computer

Source :
https://onlinecomputertips.com/support-categories/software/export-your-google-sites-website-to-your-computer/

Tailing Big Head Ransomware’s Variants, Tactics, and Impact

By: Ieriz Nicolle Gonzalez, Katherine Casona, Sarah Pearl Camiling
July 07, 2023

We analyze the technical details of a new ransomware family named Big Head. In this entry, we discuss the Big Head ransomware’s similarities and distinct markers that add more technical details to initial reports on the ransomware.

Reports of a new ransomware family and its variant named Big Head emerged in May, with at least two variants of this family being documented. Upon closer examination, we discovered that both strains shared a common contact email in their ransom notes, leading us to suspect that the two different variants originated from the same malware developer. Looking into these variants further, we  uncovered a significant number of versions of this malware. In this entry, we go deeper into the routines of these variants, their similarities and differences, and the potential impact of these infections when abused for attacks.

Analysis

In this section, we go expound on the three samples of Big Head we found, as well as their distinct functions and routines. While we continue to investigate and track this threat, we also highly suspect that all three samples of the Big Head ransomware are distributed via malvertisement as fake Windows updates and fake Word installers.

First sample

fig1-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 1. The infection routine of the first Big Head ransomware sample

The first sample of Big Head ransomware (SHA256: 6d27c1b457a34ce9edfb4060d9e04eb44d021a7b03223ee72ca569c8c4215438, detected by Trend Micro as Ransom.MSIL.EGOGEN.THEBBBC) featured a .NET compiled binary file. This binary checks the mutex name 8bikfjjD4JpkkAqrz using CreateMutex and terminates itself if the mutex name is found.

fig2-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 2. Calling CreateMutex function
fig3-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 3. MTX value “8bikfjjD4JpkkAqrz”

The sample also has a list of configurations containing details related to the installation process. It specifies various actions such as creating a registry key, checking the existence of a file and overwriting it if necessary, setting system file attributes, and creating an autorun registry entry. These configuration settings are separated by the pipe symbol “|” and are accompanied by corresponding strings that define the specific behavior associated with each action.

fig4-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 4. List of configurations

The format that the malware adheres to in terms of its behavior upon installation is as follows:

[String ExeName] [bool StartProcess] [bool CheckFileExists] [bool SetSystemAttribute] [String FilePath] [bool SetRegistryKey] [None]

Additionally, we noted the presence of three resources that contained data resembling executable files with the “*.exe” extension:

  • 1.exe drops a copy of itself for propagation. This is a piece of ransomware that checks for the extension “.r3d” before encrypting and appending the “.poop” extension.
  • Archive.exe drops a file named teleratserver.exe, a Telegram bot responsible for establishing communication with the threat actor’s chatbot ID.
  • Xarch.exe drops a file named BXIuSsB.exe, a piece of ransomware that encrypts files and encodes file names to Base64. It also displays a fake Windows update to deceive the victim into thinking that the malicious activity is a legitimate process.

These binaries are encrypted, rendering their contents inaccessible without the appropriate decryption mechanism.

fig5-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 5. Three resources found in the main sample
fig6-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 6. The encrypted content of one of the files located within the resource section (“1.exe”)

To extract the three binaries from the resources, the malware employs AES decryption with the electronic codebook (ECB) mode. This decryption process requires an initialization vector (IV) for proper decryption.

It is also noteworthy that the decryption key used is derived from the MD5 hash of the mutex 8bikfjjD4JpkkAqrz. This mutex is a hard-coded string value wherein its MD5 hash is used to decrypt the three binaries 1.exe, archive.exe, and Xarch.exe. It is important to note that the MTX value and the encrypted resources are different per sample.

We manually decrypted the content within each binary by exclusively utilizing the MD5 hash of the mutant name. Once this step was completed, we proceeded with the AES decryption to decrypt the encrypted resource file. 

fig7-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 7. Code for decrypting the three binaries (top) and the decrypted binary file that came from the parent file (bottom)

The following table shows the details of the binaries dropped by the decrypted malware using the MTX value 8bikfjjD4JpkkAqrz. These three binaries exhibit similarities with the parent sample in terms of code structure and binary extraction:

File nameBytesDropped file
1.exe2334881.exe
archive.exe12843536teleratserver.exe
Xarch.exe65552BXIuSsB.exe
fig8-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 8. 1.exe (left), teleratserver.exe (middle), and BXIuSsB.exe (right)

Binaries

This section details the binaries dropped, as identified from the previous table, and the first binary, 1.exe, was dropped by the parent sample.

            1.      Binary: 1.exe
                    Bytes: 222224
                    MTX value that was used to decrypt this file: 2AESRvXK5jbtN9Rvh

Initially, the file will hide the console window by using WinAPI ShowWindow with SW_HIDE (0). The malware will create an autorun registry key, which allows it to execute automatically upon system startup. Additionally, it will make a copy of itself, which it will save as discord.exe in the <%localappdata%> folder in the local machine.

fig9-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 9. ShowWindow API code hides the window of the current process (top) and the creation of the registry key and drops a copy of itself as “discord.exe” (bottom)

The Big Head ransomware checks for the victim’s ID in %appdata%\ID. If the ID exists, the ransomware verifies the ID and reads the content. Otherwise, it creates a randomly generated 40-character string and writes it to the file %appdata%\ID as a type of infection marker to identify its victims.

fig10-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 10. Randomly generating the 40-character string ID (top) and file named ID saved in the “<%appdata%>” folder (bottom)

The observed behavior indicates that files with the extension “.r3d” are specifically targeted for encryption using AES, with the key derived from the SHA256 hash of “123” in cipher block chaining (CBC) mode. As a result, the encrypted files end up having the “.poop” extension appended to them.

fig11-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 11. The malware checks for the extension that contains “.r3d” before encrypting and appending the ”.poop” extension (top) and the file encryption process when the file extension “.r3d” exists (bottom).

In this file, we also observed how the ransomware deletes its shadow copies. The command used to delete shadow copies and backups, which is also used to disable the recovery option is as follows:

/c vssadmin delete shadows /all /quiet & wmic shadowcopy delete & bcdedit /set {default} bootstatuspolicy ignoreallfailures & bcdedit /set {default} recoveryenabled no & wbadmin delete catalog -quiet

It drops the ransom note on the desktop, subdirectories, and the %appdata% folder. The Big Head ransomware also changes the wallpaper of the victim’s machine. 

fig12-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 12. Ransom note of the “1.exe” binary
fig13-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 13. The wallpaper that appears on the victim’s machine

Lastly, it will execute the command to open a browser and access the malware developer’s Telegram account at hxxps[:]//t[.]me/[REDACTED]_69. Our analysis showed no particular action or communication being exchanged with this account in addition to the redirection.

        2.     Binary: teleratserver.exe
                Bytes: 12832480
                MTX value that was used to decrypt this file: OJ4nwj2KO3bCeJoJ1

Teleratserver is a 64-bit Python-compiled binary that acts as a communication channel between the threat actor and the victim via Telegram. It accepts the commands “start”, “help”, “screenshot”, and “message”.

fig14-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 14. Decompiled Python script from the binary

    3.      Binary: BXIuSsB.exe
             Bytes: 54288
             MTX value that was used to decrypt this file: gdmJp5RKIvzZTepRJ

The malware displays a fake Windows Update UI to deceive the victim into thinking that the malicious activity is a legitimate software update process, with the percentage of progress in increments of 100 seconds.

fig15-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 15. The code responsible for fake update (left) and the fake update shown to the user (right)

The malware terminates itself if the user’s system language matches the  Russian, Belarusian, Ukrainian, Kazakh, Kyrgyz, Armenian, Georgian, Tatar, and Uzbek country codes. The malware also disables the Task Manager to prevent users from terminating or investigating its process.

fig16-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 16. The “KillCtrlAltDelete” command responsible for disabling the Task Manager

The malware drops a copy of itself in the hidden folder <%temp%\Adobe> that it created, then creates an entry in the RunOnce registry key, ensuring that it will only run once at the next system startup.

fig17-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 17. Creation of AutoRun registry

The malware also randomly generates a 32-character key that will later be used to encrypt files. This key will then be encrypted using RSA-2048 with a hard-coded public key.

The ransomware then drops the ransom note that includes the encrypted key.

fig18-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 18. The ransom note

The malware avoids the directories that contain the following substrings:

  • WINDOWS or Windows
  • RECYCLER or Recycler
  • Program Files
  • Program Files (x86)
  • Recycle.Bin or RECYCLE.BIN
  • TEMP or Temp
  • APPDATA or AppData
  • ProgramData
  • Microsoft
  • Burn

By excluding these directories from its malicious activities, the malware reduces the likelihood of being detected by security solutions installed in the system and increases its chances of remaining undetected and operational for a longer duration. The following are the extensions that the Big Head ransomware encrypts:

“.mdf”, “.db”, “.mdb”, “.sql”, “.pdb”, “.pdb”, “.pdb”, “.dsk”, “.fp3”, “.fdb”, “.accdb”, “.dbf”, “.crd”, “.db3”, “.dbk”, “.nsf”, “.gdb”, “.abs”, “.sdb”, “.sdb”, “.sdb”, “.sqlitedb”, “.edb”, “.sdf”, “.sqlite”, “.dbs”, “.cdb”, “.cdb”, “.cdb”, “.bib”, “.dbc”, “.usr”, “.dbt”, “.rsd”, “.myd”, “.pdm”, “.ndf”, “.ask”, “.udb”, “.ns2”, “.kdb”, “.ddl”, “.sqlite3”, “.odb”, “.ib”, “.db2”, “.rdb”, “.wdb”, “.tcx”, “.emd”, “.sbf”, “.accdr”, “.dta”, “.rpd”, “.btr”, “.vdb”, “.daf”, “.dbv”, “.fcd”, “.accde”, “.mrg”, “.nv2”, “.pan”, “.dnc”, “.dxl”, “.tdt”, “.accdc”, “.eco”, “.fmp”, “.vpd”, “.his”, “.fid”

The malware also terminates the following processes:

“taskmgr”, “sqlagent”, “winword”, “sqlbrowser”, “sqlservr”, “sqlwriter”, “oracle”, “ocssd”, “dbsnmp”, “synctime”, “mydesktopqos”, “agntsvc.exeisqlplussvc”, “xfssvccon”, “mydesktopservice”, “ocautoupds”, “agntsvc.exeagntsvc”, “agntsvc.exeencsvc”, “firefoxconfig”, “tbirdconfig”, “ocomm”, “mysqld”, “sql”, “mysqld-nt”, “mysqld-opt”, “dbeng50”, “sqbcoreservice”

The malware renames the encrypted files using Base64. We observed the malware using the LockFile function which encrypts files by renaming them and adding a marker. This marker serves as an indicator to determine whether a file has been encrypted. Through further examination, we saw the function checking for the marker inside the encrypted file. When decrypted, the marker can be matched at the end of the encrypted file.

fig19-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 19. The LockFile function
fig20-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 20. Checking for the marker “###” (top) and finding the marker at the end of the encrypted file (bottom)

The malware targets the following languages and region or local settings of the current user’s operating system as listed in the following:

“ar-SA”, “ar-AE”, “nl-BE”, “nl-NL”, “en-GB”, “en-US”, “en-CA”, “en-AU”, “en-NZ”, “fr-BE”, “fr-CH”, “fr-FR”, “fr-CA”, “fr-LU”, “de-AT”, “de-DE”, “de-CH”, “it-CH”, “it-IT”, “ko-KR”, “pt-PT”, “es-ES”, “sv-FI”, “sv-SE”, “bg-BG”, “ca-ES”, “cs-CZ”, “da-DK”, “el-GR”, “en-IE”, “et-EE”, “eu-ES”, “fi-FI”, “hu-HU”, “ja-JP”, “lt-LT”, “nn-NO”, “pl-PL”, “ro-RO”, “se-FI”, “se-NO”, “se-SE”, “sk-SK”, “sl-SI”, “sv-FI”, “sv-SE”, “tr-TR”

The ransomware checks for strings like VBOX, Virtual, or VMware in the disk enumeration registry to determine whether the system is operating within a virtual environment. It also scans for processes that contain the following substring: VBox, prl_(parallel’s desktop), srvc.exe, vmtoolsd.

fig21-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 21. Checking for virtual machine identifiers (top) and processes (bottom)

The malware identifies specific process names associated with virtualization software to determine if the system is running in a virtualized environment, allowing it to adjust its actions accordingly for better success or evasion. It can also proceed to delete recovery backup available by using the following command line:

vssadmin delete shadows /all /quiet & bcdedit.exe /set {default} recoveryenabled no & bcdedit.exe /set {default} bootstatuspolicy ignoreallfailures

After deleting the backup, regardless of the number available, it will proceed to delete itself using the SelfDelete() function. This function initiates the execution of the batch file, which will delete the malware executable and the batch file itself.

fig22-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 22. SelfDelete function

Second sample

The second sample of the Big Head ransomware we observed (SHA256: 2a36d1be9330a77f0bc0f7fdc0e903ddd99fcee0b9c93cb69d2f0773f0afd254, detected by Trend as Ransom.MSIL.EGOGEN.THEABBC) exhibits both ransomware and stealer behaviors.

fig23-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 23. The infection routine of the second sample of the Big Head ransomware

The main file drops and executes the following files:

  • %TEMP%\runyes.Crypter.bat
  • %AppData%\Roaming\azz1.exe
  • %AppData%\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\Server.exe

The ransomware activities are carried out by runyes.Crypter.bat and azz1.exe, while Server.exe is responsible for collecting information for stealing.

The file runyes.Crypter.bat drops a copy of itself and Cipher.psm1 and then executes the following command to begin encryption:

cmd  /c powershell -executionpolicy bypass -win hidden -noexit -file cry.ps1

The malware employs the AES algorithm to encrypt files and adds the suffix “.poop69news@[REDACTED]” to the encrypted files. It specifically targets files with the following extensions:

*.aif ,*.cda ,*.mid ,*.midi ,*.mp3 ,*.mpa ,*.ogg ,*.wav ,*.wma ,*.wpl ,*.7z ,*.arj ,*.deb ,*.pkg ,*.rar ,*.rpm ,*.tar ,*.gz ,*.z ,*.zip ,*.bin ,*.dmg ,*.iso ,*.toas ,*.vcd ,*.csv  ,*.dat ,*.db ,*.dbf ,*.log ,*.mdb ,*.sav ,*.sql ,*.tar ,*.xml ,*.email ,*.eml ,*.emlx ,*.msg ,*.oft ,*.ost ,*.pst ,*.vcf ,*.apk ,*.bat ,*.bin ,*.cgi ,*.pl ,*.com ,*.exe ,*.gadget ,*.jar ,*.msi ,*.py ,*.wsf ,*.fnt ,*.fon ,*.otf ,*.ttf ,*.ai ,*.bmp ,*.gif ,*.ico ,*.jpeg ,*.jpg ,*.png ,*.ps ,*.psd ,*.svg ,*.tif ,*.tiff ,*.asp ,*.aspx ,*.cer ,*.cfm ,*.cgi ,*.pl ,*.css ,*.htm ,*.html ,*.js ,*.jsp ,*.part ,*.php ,*.py ,*.rss ,*.xhtml ,*.key ,*.odp ,*.pps ,*.ppt ,*.pptx ,*.c ,*.class ,*.cpp ,*.cs ,*.h ,*.java ,*.pl ,*.sh ,*.swift ,*.vb ,*.ods ,*.xls ,*.xlsm ,*.xlsx ,*.bak ,*.cab ,*.cfg ,*.cpl ,*.cur ,*.dll ,*.dmp ,*.drv ,*.icns ,*.icoini ,*.lnk ,*.msi ,*.sys ,*.tmp ,*.3g2 ,*.3gp ,*.avi ,*.flv ,*.h264 ,*.m4v ,*.mkv ,*.mov ,*.mp4 ,*.mpg ,*.mpeg ,*.rm ,*.swf ,*.vob ,*.wmv ,*.doc ,*.docx ,*.odt ,*.pdf ,*.rtf ,*.tex ,*.txt ,*.wpd ,*.ps1 ,*.cmd ,*.vbs ,*.vmxf ,*.vmx ,*.vmsd ,*.vmdk ,*.nvram ,*.vbox

The file azz1.exe, which is also involved in other ransomware activities, establishes a registry entry at <HKCU\Software\Microsoft\Windows\CurrentVersion\Run>. This entry ensures the persistence of a copy of itself. It also drops a file containing the victim’s ID and a ransom note:

fig24-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 24. The ransom note for the second sample of the Big Head ransomware

Like the first sample, the second sample also changes the victim’s desktop wallpaper. Afterward, it will open the URL hxxps[:]//github[.]com/[REDACTED]_69 using the system’s default web browser. As of this writing, the URL is no longer available.

Other variants of this ransomware used the dropper azz1.exe as well, although the specific file might differ in each binary. Meanwhile, Server.exe, which we have identified as the WorldWind stealer, collects the following data:

  • Browsing history of all available browsers
  • List of directories
  • Replica of drivers
  • List of running processes
  • Product key
  • Networks
  • Screenshot of the screen after running the file

Third sample

The third sample (SHA256: 25294727f7fa59c49ef0181c2c8929474ae38a47b350f7417513f1bacf8939ff, detected by Trend as Ransom.MSIL.EGOGEN.YXDEL) includes a file infector we identified as Neshta in its chain.

fig25-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 25. The infection routine of the third sample of the Big Head ransomware

Neshta is a virus designed to infect and insert its malicious code into executable files. This malware also has a characteristic behavior of dropping a file called directx.sys, which contains the full path name of the infected file that was last executed. This behavior is not commonly observed in most types of malware, as they typically do not store such specific information in their dropped files.

Incorporating Neshta into the ransomware deployment can also serve as a camouflage technique for the final Big Head ransomware payload. This technique can make the piece of malware appear as a different type of threat, such as a virus, which can divert the prioritization of security solutions that primarily focus on detecting ransomware.

Notably, the ransom note and wallpaper associated with this binary are different from the ones previously mentioned.

fig26-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 26. Wallpaper (top) and ransom note (bottom) used in the victim’s machine post infection

The Big Head ransomware exhibits unique behaviors during the encryption process, such as displaying the Windows update screen as it encrypts files to deceive users and effectively locking them out of their machines, renaming the encrypted files using Base64 encoding to provide an extra layer of obfuscation, and as a whole making it more challenging for users to identify the original file names and types of encrypted files. We also noted the following significant distinctions among the three versions of the Big Head ransomware:

  • The first sample incorporates a backdoor in its infection chain.
  • The second sample employs a trojan spy and/or info stealer.
  • The third sample utilizes a file infector. 

Threat actor

The ransom note clearly indicates that the malware developer utilizes both email and Telegram for communication with their victims. Upon further investigation with the given Telegram username, we were directed to a YouTube account.

The account on the platform is relatively new, having joined on April 19, 2023, With a total of 12 published videos as of this writing. This YouTube channel showcases demonstrations of the piece of malware the cybercriminals have. We also noted that in a pinned comment on each of their videos, they explicitly state their username on Telegram. 

fig27-big-head-ransomware-variants-tactics-impact-worldwind-stealer-neshta
Figure 27. A new YouTube account with a number of videos featuring pieces of malware (top) and a Telegram username pinned in the comments section for all videos (bottom)

While we suspect that this actor engages in transactions on Telegram, it is worth noting that the YouTube name “aplikasi premium cuma cuma” is a phrase in Bahasa that translates to “premium application for free.” While it is possible, we can only speculate on any connection between the ransomware and the countries that use the said language.

Insights

Aside from the specific email address to tie all the samples of the Big Head ransomware together, the ransom notes from the samples have the same bitcoin wallet and drops the same files. Looking at the samples altogether, we can see that all the routines have the same structure in the infection process that it follows once the ransomware infects a system.

The malware developers mention in the comment section of their YouTube videos that they have a “new” Telegram account, indicative of an old one previously used. We also checked their Bitcoin wallet history and found transactions made in 2022. While we’re unaware of what those transactions are, the history implies that these cybercriminals are not new at this type of threats and attacks, although they might not be sophisticated actors as a whole.

The discovery of the Big Head ransomware as a developing piece of malware prior to the occurrence of any actual attacks or infections can be seen as a huge advantage for security researchers and analysts. Analysis and reporting of the variants provide an opportunity to analyze the codes, behaviors, and potential vulnerabilities. This information can then be used to develop countermeasures, patch vulnerabilities, and enhance security systems to mitigate future risks.

Moreover, advertising on YouTube without any evidence of “successful penetrations or infections” might seem premature promotional activities from a non-technical perspective. From a technical point of view, these malware developers left recognizable strings, used predictable encryption methods, or implementing weak or easily detectable evasion techniques, among other “mistakes.”

However, security teams should remain prepared given the malware’s diverse functionalities, encompassing stealers, infectors, and ransomware samples. This multifaceted nature gives the malware the potential to cause significant harm once fully operational, making it more challenging to defend systems against, as each attack vector requires separate attention.

Indicators of Compromise (IOCs)

You can download the IOCs here

Tailing Big Head Ransomware’s Variants, Tactics, and Impact

Indicators of Compromise (IOCs)

Filename				SHA256									Detection			Description
Read Me First!.txt			Ransom note
1.exe 					6d27c1b457a34ce9edfb4060d9e04eb44d021a7b03223ee72ca569c8c4215438	Ransom.MSIL.EGOGEN.THEBBBC 	First sample
1.exe 					226bec8acd653ea9f4b7ea4eaa75703696863841853f488b0b7d892a6be3832a	Ransom.MSIL.EGOGEN.YXDFE	
123yes.exe 				ff900b9224fde97889d37b81855a976cddf64be50af280e04ce53c587d978840	Ransom.MSIL.EGOGEN.YXDEO	
archive.exe 				cf9410565f8a06af92d65e118bd2dbaeb146d7e51de2c35ba84b47cfa8e4f53b	Ransom.MSIL.EGOGEN.YXDFZ	
azz1.exe, discord.exe 			1c8bc3890f3f202e459fb87acec4602955697eef3b08c93c15ebb0facb019845	Ransom.MSIL.EGOGEN.YXDEW	
BXIuSsB.exe 				64246b9455d76a094376b04a2584d16771cd6164db72287492078719a0c749ab	Ransom.MSIL.EGOGEN.YXDEL	
ConsoleApp2.exe 			0dbfd3479cfaf0856eb8a75f0ad4fccb5fd6bd17164bcfa6a5a386ed7378958d	Ransom.MSIL.EGOGEN.YXDEW	
cry.ps1 				6698f8ffb7ba04c2496634ff69b0a3de9537716cfc8f76d1cfea419dbd880c94	Ransom.PS1.EGOGEN.YXDFV	
Cipher.psm1, 													Ransom.PS1.EGOGEN.YXDFZ	
discord.exe 				b8e456861a5fb452bcf08d7b37277972a4a06b0a928d57c5ec30afa101d77ead	Ransom.MSIL.EGOGEN.YXDEL	
discord.exe 				6b3bf710cf4a0806b2c5eaa26d2d91ca57575248ff0298f6dee7180456f37d2e	Ransom.MSIL.EGOGEN.YXDEL	
docx.Crypter.bat, runyes.Crypter.bat 	6b771983142c7fa72ce209df8423460189c14ec635d6235bf60386317357428a	Ransom.BAT.EGOGEN.YXDFZ 	
event-stream.exe 			627b920845683bd7303d33946ff52fb2ea595208452285457aa5ccd9c01c3b0a	HackTool.Win32.EventStream.A	
l.bat 					40d11a20bd5ca039a15a0de0b1cb83814fa9b1d102585db114bba4c5895a8a44	Ransom.BAT.EGOGEN.YXDFZ	
Locker.ps1 				159fbb0d04c1a77d434ce3810d1e2c659fda0a5703c9d06f89ee8dc556783614	Ransom.PS1.EGOGEN.YXDEL	
locker.ps1 				9aa38796e0ce4866cff8763b026272eb568fa79d8a147f7d61824752ad6d8f09	Ransom.PS1.EGOGEN.YXDFZ	
program.exe 				39caec2f2e9fda6e6a7ce8f22e29e1c77c8f1b4bde80c91f6f78cc819f031756	Ransom.MSIL.EGOGEN.YXDEP	
Prynts.exe 				1ada91cb860cd3318adbb4b6fd097d31ad39c2718b16c136c16407762251c5db	TrojanSpy.MSIL.STORMKITTY.D	
r.pyw 					be6416218e2b1a879e33e0517bcacaefccab6ad2f511de07eebd88821027f92d	Ransom.Python.EGOGEN.YXDFZ 	
Server.exe 				9a7889147fa53311ba7ec8166c785f7a935c35eba4a877c1313a8d2e80e3230d	TrojanSpy.MSIL.WORLDWIND.A	Dropped WorldWind Stealer
Server.exe  				f6a2ec226c84762458d53f5536f0a19e34b2a9b03d574ae78e89098af20bcaa3	PE_NESHTA.A	
sfchost.exe, 12.exe 			1942aac761bc2e21cf303e987ef2a7740a33c388af28ba57787f10b1804ea38e	Ransom.MSIL.EGOGEN.YXDEL	
slam.exe 				f354148b5f0eab5af22e8152438468ae8976db84c65415d3f4a469b35e31710f	Ransom.MSIL.EGOGEN.YXDE4	
ssissa.Crypter.bat  			037f9434e83919506544aa04fecd7f56446a7cc65ee03ac0a11570cf4f607853	Ransom.BAT.EGOGEN.YXDFZ	
svchost.com 				980bac6c9afe8efc9c6fe459a5f77213b0d8524eb00de82437288eb96138b9a2	PE_NESHTA.A-O	
teleratserver.exe 			603fcc53fd7848cd300dad85bef9a6b80acaa7984aa9cb9217cdd012ff1ce5f0	Backdoor.WIn64.TELERAT.A	
Xarch.exe     				bcf8464d042171d7ecaada848b5403b6a810a91f7fd8f298b611e94fa7250463	Ransom.MSIL.EGOGEN.YXDEV	
XarchiveOutput.exe			64aac04ffb290a23ab9f537b1143a4556e6893d9ff7685a11c2c0931d978a931	Ransom.MSIL.EGOGEN.YXDEV	
Xatput.exe 				f59c45b71eb62326d74e83a87f821603bf277465863bfc9c1dcb38a97b0b359d	Ransom.MSIL.EGOGEN.YXDEV	
Xserver.exe 				2a36d1be9330a77f0bc0f7fdc0e903ddd99fcee0b9c93cb69d2f0773f0afd254	Ransom.MSIL.EGOGEN.THEABBC	Second sample
Xsput.exe 				66bb57338bec9110839dc9a83f85b05362ab53686ff7b864d302a217cafb7531	Ransom.MSIL.EGOGEN.YXDEV	
Xsuut.exe 				806f64fda529d92c16fac02e9ddaf468a8cc6cbc710dc0f3be55aec01ed65235	Ransom.MSIL.EGOGEN.YXDEV	
Xxut.exe 				9c1c527a826d16419009a1b7797ed20990b9a04344da9c32deea00378a6eeee2	Ransom.MSIL.EGOGEN.YXDEO 	
iXZAF					40e5050b894cb70c93260645bf9804f50580050eb131e24f30cb91eec9ad1a6e	Ransom.MSIL.EGOGEN.THFBIBC	
XBtput.exe 				25294727f7fa59c49ef0181c2c8929474ae38a47b350f7417513f1bacf8939ff	Ransom.MSIL.EGOGEN.YXDEL	Third sample
XBtput2.exe 				dcfa0fca8c1dd710b4f40784d286c39e5d07b87700bdc87a48659c0426ec6cb6	Ransom.MSIL.EGOGEN.YXDEO	

Source :
https://www.trendmicro.com/it_it/research/23/g/tailing-big-head-ransomware-variants-tactics-and-impact.html

Part 2: Rethinking cache purge with a new architecture

21/06/2023

In Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation, we outlined the importance of cache invalidation and the difficulties of purging caches, how our existing purge system was designed and performed, and we gave a high level overview of what we wanted our new Cache Purge system to look like.

It’s been a while since we published the first blog post and it’s time for an update on what we’ve been working on. In this post we’ll be talking about some of the architecture improvements we’ve made so far and what we’re working on now.

Cache Purge end to end

We touched on the high level design of what we called the “coreless” purge system in part 1, but let’s dive deeper into what that design encompasses by following a purge request from end to end:

Step 1: Request received locally

An API request to Cloudflare is routed to the nearest Cloudflare data center and passed to an API Gateway worker. This worker looks at the request URL to see which service it should be sent to and forwards the request to the appropriate upstream backend. Most endpoints of the Cloudflare API are currently handled by centralized services, so the API Gateway worker is often just proxying requests to the nearest “core” data center which have their own gateway services to handle authentication, authorization, and further routing. But for endpoints which aren’t handled centrally the API Gateway worker must handle authentication and route authorization, and then proxy to an appropriate upstream. For cache purge requests that upstream is a Purge Ingest worker in the same data center.

Step 2: Purges tested locally

The Purge Ingest worker evaluates the purge request to make sure it is processible. It scans the URLs in the body of the request to see if they’re valid, then attempts to purge the URLs from the local data center’s cache. This concept of local purging was a new step introduced with the coreless purge system allowing us to capitalize on existing logic already used in every data center.

By leveraging the same ownership checks our data centers use to serve a zone’s normal traffic on the URLs being purged, we can determine if those URLs are even cacheable by the zone. Currently more than 50% of the URLs we’re asked to purge can’t be cached by the requesting zones, either because they don’t own the URLs (e.g. a customer asking us to purge https://cloudflare.com) or because the zone’s settings for the URL prevent caching (e.g. the zone has a “bypass” cache rule that matches the URL). All such purges are superfluous and shouldn’t be processed further, so we filter them out and avoid broadcasting them to other data centers freeing up resources to process more legitimate purges.

On top of that, generating the cache key for a file isn’t free; we need to load zone configuration options that might affect the cache key, apply various transformations, et cetera. The cache key for a given file is the same in every data center though, so when we purge the file locally we now return the generated cache key to the Purge Ingest worker and broadcast that key to other data centers instead of making each data center generate it themselves.

Step 3: Purges queued for broadcasting

purge request to small colo, ingest worker sends to queue worker in T1

Once the local purge is done the Purge Ingest worker forwards the purge request with the cache key obtained from the local cache to a Purge Queue worker. The queue worker is a Durable Object worker using its persistent state to hold a queue of purges it receives and pointers to how far along in the queue each data center in our network is in processing purges.

The queue is important because it allows us to automatically recover from a number of scenarios such as connectivity issues or data centers coming back online after maintenance. Having a record of all purges since an issue arose lets us replay those purges to a data center and “catch up”.

But Durable Objects are globally unique, so having one manage all global purges would have just moved our centrality problem from a core data center to wherever that Durable Object was provisioned. Instead we have dozens of Durable Objects in each region, and the Purge Ingest worker looks at the load balancing pool of Durable Objects for its region and picks one (often in the same data center) to forward the request to. The Durable Object will write the purge request to its queue and immediately loop through all the data center pointers and attempt to push any outstanding purges to each.

While benchmarking our performance we found our particular workload exhibited a “goldilocks zone” of throughput to a given Durable Object. On script startup we have to load all sorts of data like network topology and data center health–then refresh it continuously in the background–and as long as the Durable Object sees steady traffic it stays active and we amortize those startup costs. But if you ask a single Durable Object to do too much at once like send or receive too many requests, the single-threaded runtime won’t keep up. Regional purge traffic fluctuates a lot depending on local time of day, so there wasn’t a static quantity of Durable Objects per region that would let us stay within the goldilocks zone of enough requests to each to keep them active but not too many to keep them efficient. So we built load monitoring into our Durable Objects, and a Regional Autoscaler worker to aggregate that data and adjust load balancing pools when we start approaching the upper or lower edges of our efficiency goldilocks zone.

Step 4: Purges broadcast globally

multiple regions, durable object sends purges to fanouts in other regions, fanout sends to small colos in their region

Once a purge request is queued by a Purge Queue worker it needs to be broadcast to the rest of Cloudflare’s data centers to be carried out by their caches. The Durable Objects will broadcast purges directly to all data centers in their region, but when broadcasting to other regions they pick a Purge Fanout worker per region to take care of their region’s distribution. The fanout workers manage queues of their own as well as pointers for all of their region’s data centers, and in fact they share a lot of the same logic as the Purge Queue workers in order to do so. One key difference is fanout workers aren’t Durable Objects; they’re normal worker scripts, and their queues are purely in memory as opposed to being backed by Durable Object state. This means not all queue worker Durable Objects are talking to the same fanout worker in each region. Fanout workers can be dropped and spun up again quickly by any metal in the data center because they aren’t canonical sources of state. They maintain queues and pointers for their region but all of that info is also sent back downstream to the Durable Objects who persist that data themselves, reliably.

But what does the fanout worker get us? Cloudflare has hundreds of data centers all over the world, and as we mentioned above we benefit from keeping the number of incoming and outgoing requests for a Durable Object fairly low. Sending purges to a fanout worker per region means each Durable Object only has to make a fraction of the requests it would if it were broadcasting to every data center directly, which means it can process purges faster.

On top of that, occasionally a request will fail to get where it was going and require retransmission. When this happens between data centers in the same region it’s largely unnoticeable, but when a Durable Object in Canada has to retry a request to a data center in rural South Africa the cost of traversing that whole distance again is steep. The data centers elected to host fanout workers have the most reliable connections in their regions to the rest of our network. This minimizes the chance of inter-regional retries and limits the latency imposed by retries to regional timescales.

The introduction of the Purge Fanout worker was a massive improvement to our distribution system, reducing our end-to-end purge latency by 50% on its own and increasing our throughput threefold.

Current status of coreless purge

We are proud to say our new purge system has been in production serving purge by URL requests since July 2022, and the results in terms of latency improvements are dramatic. In addition, flexible purge requests (purge by tag/prefix/host and purge everything) share and benefit from the new coreless purge system’s entrypoint workers before heading to a core data center for fulfillment.

The reason flexible purge isn’t also fully coreless yet is because it’s a more complex task than “purge this object”; flexible purge requests can end up purging multiple objects–or even entire zones–from cache. They do this through an entirely different process that isn’t coreless compatible, so to make flexible purge fully coreless we would have needed to come up with an entirely new multi-purge mechanism on top of redesigning distribution. We chose instead to start with just purge by URL so we could focus purely on the most impactful improvements, revamping distribution, without reworking the logic a data center uses to actually remove an object from cache.

This is not to say that the flexible purges haven’t benefited from the coreless purge project. Our cache purge API lets users bundle single file and flexible purges in one request, so the API Gateway worker and Purge Ingest worker handle authorization, authentication and payload validation for flexible purges too. Those flexible purges get forwarded directly to our services in core data centers pre-authorized and validated which reduces load on those core data center auth services. As an added benefit, because authorization and validity checks all happen at the edge for all purge types users get much faster feedback when their requests are malformed.

Next steps

While coreless cache purge has come a long way since the part 1 blog post, we’re not done. We continue to work on reducing end-to-end latency even more for purge by URL because we can do better. Alongside improvements to our new distribution system, we’ve also been working on the redesign of flexible purge to make it fully coreless, and we’re really excited to share the results we’re seeing soon. Flexible cache purge is an incredibly popular API and we’re giving its refresh the care and attention it deserves.

We protect entire corporate networks, help customers build Internet-scale applications efficiently, accelerate any website or Internet applicationward off DDoS attacks, keep hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.

Source :
https://blog.cloudflare.com/rethinking-cache-purge-architecture/

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

14/05/2022

There is a famous quote attributed to a Netscape engineer: “There are only two difficult problems in computer science: cache invalidation and naming things.” While naming things does oddly take up an inordinate amount of time, cache invalidation shouldn’t.

In the past we’ve written about Cloudflare’s incredibly fast response times, whether content is cached on our global network or not. If content is cached, it can be served from a Cloudflare cache server, which are distributed across the globe and are generally a lot closer in physical proximity to the visitor. This saves the visitor’s request from needing to go all the way back to an origin server for a response. But what happens when a webmaster updates something on their origin and would like these caches to be updated as well? This is where cache “purging” (also known as “invalidation”) comes in.

Customers thinking about setting up a CDN and caching infrastructure consider questions like:

  • How do different caching invalidation/purge mechanisms compare?
  • How many times a day/hour/minute do I expect to purge content?
  • How quickly can the cache be purged when needed?

This blog will discuss why invalidating cached assets is hard, what Cloudflare has done to make it easy (because we care about your experience as a developer), and the engineering work we’re putting in this year to make the performance and scalability of our purge services the best in the industry.

What makes purging difficult also makes it useful

(i) Scale
The first thing that complicates cache invalidation is doing it at scale. With data centers in over 270 cities around the globe, our most popular users’ assets can be replicated at every corner of our network. This also means that a purge request needs to be distributed to all data centers where that content is cached. When a data center receives a purge request, it needs to locate the cached content to ensure that subsequent visitor requests for that content are not served stale/outdated data. Requests for the purged content should be forwarded to the origin for a fresh copy, which is then re-cached on its way back to the user.

This process repeats for every data center in Cloudflare’s fleet. And due to Cloudflare’s massive network, maintaining this consistency when certain data centers may be unreachable or go offline, is what makes purging at scale difficult.

Making sure that every data center gets the purge command and remains up-to-date with its content logs is only part of the problem. Getting the purge request to data centers quickly so that content is updated uniformly is the next reason why cache invalidation is hard.  

(ii) Speed
When purging an asset, race conditions abound. Requests for an asset can happen at any time, and may not follow a pattern of predictability. Content can also change unpredictably. Therefore, when content changes and a purge request is sent, it must be distributed across the globe quickly. If purging an individual asset, say an image, takes too long, some visitors will be served the new version, while others are served outdated content. This data inconsistency degrades user experience, and can lead to confusion as to which version is the “right” version. Websites can sometimes even break in their entirety due to this purge latency (e.g. by upgrading versions of a non-backwards compatible JavaScript library).

Purging at speed is also difficult when combined with Cloudflare’s massive global footprint. For example, if a purge request is traveling at the speed of light between Tokyo and Cape Town (both cities where Cloudflare has data centers), just the transit alone (no authorization of the purge request or execution) would take over 180ms on average based on submarine cable placement. Purging a smaller network footprint may reduce these speed concerns while making purge times appear faster, but does so at the expense of worse performance for customers who want to make sure that their cached content is fast for everyone.

(iii) Scope
The final thing that makes purge difficult is making sure that only the unneeded web assets are invalidated. Maintaining a cache is important for egress cost savings and response speed. Webmasters’ origins could be knocked over by a thundering herd of requests, if they choose to purge all content needlessly. It’s a delicate balance of purging just enough: too much can result in both monetary and downtime costs, and too little will result in visitors receiving outdated content.

At Cloudflare, what to invalidate in a data center is often dictated by the type of purge. Purge everything, as you could probably guess, purges all cached content associated with a website. Purge by prefix purges content based on a URL prefix. Purge by hostname can invalidate content based on a hostname. Purge by URL or single file purge focuses on purging specified URLs. Finally, Purge by tag purges assets that are marked with Cache-Tag headers. These markers offer webmasters flexibility in grouping assets together. When a purge request for a tag comes into a data center, all assets marked with that tag will be invalidated.

With that overview in mind, the remainder of this blog will focus on putting each element of invalidation together to benchmark the performance of Cloudflare’s purge pipeline and provide context for what performance means in the real-world. We’ll be reviewing how fast Cloudflare can invalidate cached content across the world. This will provide a baseline analysis for how quick our purge systems are presently, which we will use to show how much we will improve by the time we launch our new purge system later this year.

How does purge work currently?

In general, purge takes the following route through Cloudflare’s data centers.

  • A purge request is initiated via the API or UI. This request specifies how our data centers should identify the assets to be purged. This can be accomplished via cache-tag header(s), URL(s), entire hostnames, and much more.
  • The request is received by any Cloudflare data center and is identified to be a purge request. It is then routed to a Cloudflare core data center (a set of a few data centers responsible for network management activities).
  • When a core data center receives it, the request is processed by a number of internal services that (for example) make sure the request is being sent from an account with the appropriate authorization to purge the asset. Following this, the request gets fanned out globally to all Cloudflare data centers using our distribution service.
  • When received by a data center, the purge request is processed and all assets with the matching identification criteria are either located and removed, or marked as stale. These stale assets are not served in response to requests and are instead re-pulled from the origin.
  • After being pulled from the origin, the response is written to cache again, replacing the purged version.

Now let’s look at this process in practice. Below we describe Cloudflare’s purge benchmarking that uses real-world performance data from our purge pipeline.

Benchmarking purge performance design

In order to understand how performant Cloudflare’s purge system is, we measured the time it took from sending the purge request to the moment that the purge is complete and the asset is no longer served from cache.  

In general, the process of measuring purge speeds involves: (i) ensuring that a particular piece of content is cached, (ii) sending the command to invalidate the cache, (iii) simultaneously checking our internal system logs for how the purge request is routed through our infrastructure, and (iv) measuring when the asset is removed from cache (first miss).

This process measures how quickly cache is invalidated from the perspective of an average user.

  • Clock starts
    As noted above, in this experiment we’re using sampled RUM data from our purge systems. The goal of this experiment is to benchmark current data for how long it can take to purge an asset on Cloudflare across different regions. Once the asset was cached in a region on Cloudflare, we identify when a purge request is received for that asset. At that same instant, the clock started for this experiment. We include in this time any retrys that we needed to make (due to data centers missing the initial purge request) to ensure that the purge was done consistently across our network. The clock continues as the request transits our purge pipeline  (data center > core > fanout > purge from all data centers).  
  • Clock stops
    What caused the clock to stop was the purged asset being removed from cache, meaning that the data center is no longer serving the asset from cache to visitor’s requests. Our internal logging measures the precise moment that the cache content has been removed or expired and from that data we were able to determine the following benchmarks for our purge types in various regions.  

Results

We’ve divided our benchmarks in two ways: by purge type and by region.

We singled out Purge by URL because it identifies a single target asset to be purged. While that asset can be stored in multiple locations, the amount of data to be purged is strictly defined.

We’ve combined all other types of purge (everything, tag, prefix, hostname) together because the amount of data to be removed is highly variable. Purging a whole website or by assets identified with cache tags could mean we need to find and remove a multitude of content from many different data centers in our network.

Secondly, we have segmented our benchmark measurements by regions and specifically we confined the benchmarks to specific data center servers in the region because we were concerned about clock skews between different data centers. This is the reason why we limited the test to the same cache servers so that even if there was skew, they’d all be skewed in the same way.  

We took the latency from the representative data centers in each of the following regions and the global latency. Data centers were not evenly distributed in each region, but in total represent about 90 different cities around the world:  

  • Africa
  • Asia Pacific Region (APAC)
  • Eastern Europe (EEUR)
  • Eastern North America (ENAM)
  • Oceania
  • South America (SA)
  • Western Europe (WEUR)
  • Western North America (WNAM)

The global latency numbers represent the purge data from all Cloudflare data centers in over 270 cities globally. In the results below, global latency numbers may be larger than the regional numbers because it represents all of our data centers instead of only a regional portion so outliers and retries might have an outsized effect.

Below are the results for how quickly our current purge pipeline was able to invalidate content by purge type and region. All times are represented in seconds and divided into P50, P75, and P99 quantiles. Meaning for “P50” that 50% of the purges were at the indicated latency or faster.  

Purge By URL

P50P75P99
AFRICA0.95s1.94s6.42s
APAC0.91s1.87s6.34s
EEUR0.84s1.66s6.30s
ENAM0.85s1.71s6.27s
OCEANIA0.95s1.96s6.40s
SA0.91s1.86s6.33s
WEUR0.84s1.68s6.30s
WNAM0.87s1.74s6.25s
GLOBAL1.31s1.80s6.35s

Purge Everything, by Tag, by Prefix, by Hostname

P50P75P99
AFRICA1.42s1.93s4.24s
APAC1.30s2.00s5.11s
EEUR1.24s1.77s4.07s
ENAM1.08s1.62s3.92s
OCEANIA1.16s1.70s4.01s
SA1.25s1.79s4.106s
WEUR1.19s1.73s4.04s
WNAM0.9995s1.53s3.83s
GLOBAL1.57s2.32s5.97s

A general note about these benchmarks — the data represented here was taken from over 48 hours (two days) of RUM purge latency data in May 2022. If you are interested in how quickly your content can be invalidated on Cloudflare, we suggest you test our platform with your website.

Those numbers are good and much faster than most of our competitors. Even in the worst case, we see the time from when you tell us to purge an item to when it is removed globally is less than seven seconds. In most cases, it’s less than a second. That’s great for most applications, but we want to be even faster. Our goal is to get cache purge to as close as theoretically possible to the speed of light limit for a network our size, which is 200ms.

Intriguingly, LEO satellite networks may be able to provide even lower global latency than fiber optics because of the straightness of the paths between satellites that use laser links. We’ve done calculations of latency between LEO satellites that suggest that there are situations in which going to space will be the fastest path between two points on Earth. We’ll let you know if we end up using laser-space-purge.

Just as we have with network performance, we are going to relentlessly measure our cache performance as well as the cache performance of our competitors. We won’t be satisfied until we verifiably are the fastest everywhere. To do that, we’ve built a new cache purge architecture which we’re confident will make us the fastest cache purge in the industry.

Our new architecture

Through the end of 2022, we will continue this blog series incrementally showing how we will become the fastest, most-scalable purge system in the industry. We will continue to update you with how our purge system is developing  and benchmark our data along the way.

Getting there will involve rearchitecting and optimizing our purge service, which hasn’t received a systematic redesign in over a decade. We’re excited to do our development in the open, and bring you along on our journey.

So what do we plan on updating?

Introducing Coreless Purge

The first version of our cache purge system was designed on top of a set of central core services including authorization, authentication, request distribution, and filtering among other features that made it a high-reliability service. These core components had ultimately become a bottleneck in terms of scale and performance as our network continues to expand globally. While most of our purge dependencies have been containerized, the message queue used was still running on bare metals, which led to increased operational overhead when our system needed to scale.

Last summer, we built a proof of concept for a completely decentralized cache invalidation system using in-house tech – Cloudflare Workers and Durable Objects. Using Durable Objects as a queuing mechanism gives us the flexibility to scale horizontally by adding more Durable Objects as needed and can reduce time to purge with quick regional fanouts of purge requests.

In the new purge system we’re ripping out the reliance on core data centers and moving all that functionality to every data center, we’re calling it coreless purge.

Here’s a general overview of how coreless purge will work:

  • A purge request will be initiated via the API or UI. This request will specify how we should identify the assets to be purged.
  • The request will be routed to the nearest Cloudflare data center where it is identified to be a purge request and be passed to a Worker that will perform several of the key functions that currently occur in the core (like authorization, filtering, etc).
  • From there, the Worker will pass the purge request to a Durable Object in the data center. The Durable Object will queue all the requests and broadcast them to every data center when they are ready to be processed.
  • When the Durable Object broadcasts the purge request to every data center, another Worker will pass the request to the service in the data center that will invalidate the content in cache (executes the purge).

We believe this re-architecture of our system built by stringing together multiple services from the Workers platform will help improve both the speed and scalability of the purge requests we will be able to handle.

Conclusion

We’re going to spend a lot of time building and optimizing purge because, if there’s one thing we learned here today, it’s that cache invalidation is a difficult problem but those are exactly the types of problems that get us out of bed in the morning.

If you want to help us optimize our purge pipeline, we’re hiring.

We protect entire corporate networks, help customers build Internet-scale applications efficiently, accelerate any website or Internet applicationward off DDoS attacks, keep hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.

Source :
https://blog.cloudflare.com/part1-coreless-purge/

All the way up to 11: Serve Brotli from origin and Introducing Compression Rules

23/06/2023

Throughout Speed Week, we have talked about the importance of optimizing performance. Compression plays a crucial role by reducing file sizes transmitted over the Internet. Smaller file sizes lead to faster downloads, quicker website loading, and an improved user experience.

Take household cleaning products as a real world example. It is estimated “a typical bottle of cleaner is 90% water and less than 10% actual valuable ingredients”. Removing 90% of a typical 500ml bottle of household cleaner reduces the weight from 600g to 60g. This reduction means only a 60g parcel, with instructions to rehydrate on receipt, needs to be sent. Extrapolated into the gallons, this weight reduction soon becomes a huge shipping saving for businesses. Not to mention the environmental impact.

This is how compression works. The sender compresses the file to its smallest possible size, and then sends the smaller file with instructions on how to handle it when received. By reducing the size of the files sent, compression ensures the amount of bandwidth needed to send files over the Internet is a lot less. Where files are stored in expensive cloud providers like AWS, reducing the size of files sent can directly equate to significant cost savings on bandwidth.

Smaller file sizes are also particularly beneficial for end users with limited Internet connections, such as mobile devices on cellular networks or users in areas with slow network speeds.

Cloudflare has always supported compression in the form of Gzip. Gzip is a widely used compression algorithm that has been around since 1992 and provides file compression for all Cloudflare users. However, in 2013 Google introduced Brotli which supports higher compression levels and better performance overall. Switching from gzip to Brotli results in smaller file sizes and faster load times for web pages. We have supported Brotli since 2017 for the connection between Cloudflare and client browsers. Today we are announcing end-to-end Brotli support for web content: support for Brotli compression, at the highest possible levels, from the origin server to the client.

If your origin server supports Brotli, turn it on, crank up the compression level, and enjoy the performance boost.

Brotli compression to 11

Brotli has 12 levels of compression ranging from 0 to 11, with 0 providing the fastest compression speed but the lowest compression ratio, and 11 offering the highest compression ratio but requiring more computational resources and time. During our initial implementation of Brotli five years ago, we identified that compression level 4 offered the balance between bytes saved and compression time without compromising performance.

Since 2017, Cloudflare has been using a maximum compression of Brotli level 4 for all compressible assets based on the end user’s “accept-encoding” header. However, one issue was that Cloudflare only requested Gzip compression from the origin, even if the origin supported Brotli. Furthermore, Cloudflare would always decompress the content received from the origin before compressing and sending it to the end user, resulting in additional processing time. As a result, customers were unable to fully leverage the benefits offered by Brotli compression.

Old world

With Cloudflare now fully supporting Brotli end to end, customers will start seeing our updated accept-encoding header arriving at their origins. Once available customers can transfer, cache and serve heavily compressed Brotli files directly to us, all the way up to the maximum level of 11. This will help reduce latency and bandwidth consumption. If the end user device does not support Brotli compression, we will automatically decompress the file and serve it either in its decompressed format or as a Gzip-compressed file, depending on the Accept-Encoding header.

Full end-to-end Brotli compression support

End user cannot support Brotli compression

Customers can implement Brotli compression at their origin by referring to the appropriate online materials. For example, customers that are using NGINX, can implement Brotli by following this tutorial and setting compression at level 11 within the nginx.conf configuration file as follows:

brotli on;
brotli_comp_level 11;
brotli_static on;
brotli_types text/plain text/css application/javascript application/x-javascript text/xml 
application/xml application/xml+rss text/javascript image/x-icon 
image/vnd.microsoft.icon image/bmp image/svg+xml;

Cloudflare will then serve these assets to the client at the exact same compression level (11) for the matching file brotli_types. This means any SVG or BMP images will be sent to the client compressed at Brotli level 11.

Testing

We applied compression against a simple CSS file, measuring the impact of various compression algorithms and levels. Our goal was to identify potential improvements that users could experience by optimizing compression techniques. These results can be seen in the following table:

TestSize (bytes)% Reduction of original file (Higher % better)
Uncompressed response (no compression used)2,747
Cloudflare default Gzip compression (level 8)1,12159.21%
Cloudflare default Brotli compression (level 4)1,11059.58%
Compressed with max Gzip level (level 9)1,12159.21%
Compressed with max Brotli level (level 11)90966.94%

By compressing Brotli at level 11 users are able to reduce their file sizes by 19% compared to the best Gzip compression level. Additionally, the strongest Brotli compression level is around 18% smaller than the default level used by Cloudflare. This highlights a significant size reduction achieved by utilizing Brotli compression, particularly at its highest levels, which can lead to improved website performance, faster page load times and an overall reduction in egress fees.

To take advantage of higher end to end compression rates the following Cloudflare proxy features need to be disabled.

  • Email Obfuscation
  • Rocket Loader
  • Server Side Excludes (SSE)
  • Mirage
  • HTML Minification – JavaScript and CSS can be left enabled.
  • Automatic HTTPS Rewrites

This is due to Cloudflare needing to decompress and access the body to apply the requested settings. Alternatively a customer can disable these features for specific paths using Configuration Rules.

If any of these rewrite features are enabled, your origin can still send Brotli compression at higher levels. However, we will decompress, apply the Cloudflare feature(s) enabled, and recompress on the fly using Cloudflare’s default Brotli level 4 or Gzip level 8 depending on the user’s accept-encoding header.

For browsers that do not accept Brotli compression, we will continue to decompress and send Gzipped responses or uncompressed.

Implementation

The initial step towards implementing Brotli from the origin involved constructing a decompression module that could be integrated into Cloudflare software stack. It allows us to efficiently convert the compressed bits received from the origin into the original, uncompressed file. This step was crucial as numerous features such as Email Obfuscation and Cloudflare Workers Customers, rely on accessing the body of a response to apply customizations.

We integrated the decompressor into  the core reverse web proxy of Cloudflare. This integration ensured that all Cloudflare products and features could access Brotli decompression effortlessly. This also allowed our Cloudflare Workers team to incorporate Brotli Directly into Cloudflare Workers allowing our Workers customers to be able to interact with responses returned in Brotli or pass through to the end user unmodified.

Introducing Compression rules – Granular control of compression to end users

By default Cloudflare compresses certain content types based on the Content-Type header of the file. Today we are also announcing Compression Rules for our Enterprise Customers to allow you even more control on how and what Cloudflare will compress.

Today we are also announcing the introduction of Compression Rules for our Enterprise Customers. With Compression Rules, you gain enhanced control over Cloudflare’s compression capabilities, enabling you to customize how and which content Cloudflare compresses to optimize your website’s performance.

For example, by using Cloudflare’s Compression Rules for .ktx files, customers can optimize the delivery of textures in webGL applications, enhancing the overall user experience. Enabling compression minimizes the bandwidth usage and ensures that webGL applications load quickly and smoothly, even when dealing with large and detailed textures.

Alternatively customers can disable compression or specify a preference of how we compress. Another example could be an Infrastructure company only wanting to support Gzip for their IoT devices but allow Brotli compression for all other hostnames.

Compression rules use the filters that our other rules products are built on top of with the added fields of Media Type and Extension type. Allowing users to easily specify the content you wish to compress.

Deprecating the Brotli toggle

Brotli has been long supported by some web browsers since 2016 and Cloudflare offered Brotli Support in 2017. As with all new web technologies Brotli was unknown and we gave customers the ability to selectively enable or disable BrotlI via the API and our UI.

Now that Brotli has evolved and is supported by all browsers, we plan to enable Brotli on all zones by default in the coming months. Mirroring the Gzip behavior we currently support and removing the toggle from our dashboard. If browsers do not support Brotli, Cloudflare will continue to support their accepted encoding types such as Gzip or uncompressed and Enterprise customers will still be able to use Compression rules to granularly control how we compress data towards their users.

The future of web compression

We’ve seen great adoption and great performance for Brotli as the new compression technique for the web. Looking forward, we are closely following trends and new compression algorithms such as zstd as a possible next-generation compression algorithm.

At the same time, we’re looking to improve Brotli directly where we can. One development that we’re particularly focused on is shared dictionaries with Brotli. Whenever you compress an asset, you use a “dictionary” that helps the compression to be more efficient. A simple analogy of this is typing OMW into an iPhone message. The iPhone will automatically translate it into On My Way using its own internal dictionary.

OMW
OnMyWay

This internal dictionary has taken three characters and morphed this into nine characters (including spaces) The internal dictionary has saved six characters which equals performance benefits for users.

By default, the Brotli RFC defines a static dictionary that both clients and the origin servers use. The static dictionary was designed to be general purpose and apply to everyone. Optimizing the size of the dictionary as to not be too large whilst able to generate best compression results. However, what if an origin could generate a bespoke dictionary tailored to a specific website? For example a Cloudflare-specific dictionary would allow us to compress the words and phrases that appear repeatedly on our site such as the word “Cloudflare”. The bespoke dictionary would be designed to compress this as heavily as possible and the browser using the same dictionary would be able to translate this back.

new proposal by the Web Incubator CG aims to do just that, allowing you to specify your own dictionaries that browsers can use to allow websites to optimize compression further. We’re excited about contributing to this proposal and plan on publishing our research soon.

Try it now

Compression Rules are available now! With End to End Brotli being rolled out over the coming weeks. Allowing you to improve performance, reduce bandwidth and granularly control how Cloudflare handles compression to your end users.

Watch on Cloudflare TV

https://customer-rhnwzxvb3mg4wz3v.cloudflarestream.com/f1c71fdb05263b5ec3077e6e7acdb7e2/iframe?preload=true&poster=https%3A%2F%2Fcustomer-rhnwzxvb3mg4wz3v.cloudflarestream.com%2Ff1c71fdb05263b5ec3077e6e7acdb7e2%2Fthumbnails%2Fthumbnail.jpg%3Ftime%3D1s%26height%3D600

We protect entire corporate networks, help customers build Internet-scale applications efficiently, accelerate any website or Internet applicationward off DDoS attacks, keep hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.

Source :
https://blog.cloudflare.com/this-is-brotli-from-origin/

Speeding up your (WordPress) website is a few clicks away

22/06/2023

Every day, website visitors spend far too much time waiting for websites to load in their browsers. This waiting is partially due to browsers not knowing which resources are critically important so they can prioritize them ahead of less-critical resources. In this blog we will outline how millions of websites across the Internet can improve their performance by specifying which critical content loads first with Cloudflare Workers and what Cloudflare will do to make this easier by default in the future.

Popular Content Management Systems (CMS) like WordPress have made attempts to influence website resource priority, for example through techniques like lazy loading images. When done correctly, the results are magical. Performance is optimized between the CMS and browser without needing to implement any changes or coding new prioritization strategies. However, we’ve seen that these default priorities have opportunities to improve greatly.

In this co-authored blog with Google’s Patrick Meenan we will explain where the opportunities exist to improve website performance, how to check if a specific site can improve performance, and provide a small JavaScript snippet which can be used with Cloudflare Workers to do this optimization for you.

What happens when a browser receives the response?

Before we dive into where the opportunities are to improve website performance, let’s take a step back to understand how browsers load website assets by default.

After the browser sends a HTTP request to a server, it receives a HTTP response containing information like status codes, headers, and the requested content. The browser carefully analyzes the response’s status code and response headers to ensure proper handling of the content.

Next, the browser processes the content itself. For HTML responses, the browser extracts important information from the <head> section of the HTML, such as the page title, stylesheets, and scripts. Once this information is parsed, the browser moves on to the response <body> which has the actual page content. During this stage, the browser begins to present the webpage to the visitor.

If the response includes additional 3rd party resources like CSS, JavaScript, or other content, the browser may need to fetch and integrate them into the webpage. Typically, browsers like Google Chrome delay loading images until after the resources in the HTML <head> have loaded. This is also known as “blocking” the render of the webpage. However, developers can override this blocking behavior using fetch priority or other methods to boost other content’s priority in the browser. By adjusting an important image’s fetch priority, it can be loaded earlier, which can lead to significant improvements in crucial performance metrics like LCP (Largest Contentful Paint).

Images are so central to web pages that they have become an essential element in measuring website performance from Core Web Vitals. LCP measures the time it takes for the largest visible element, often an image, to be fully rendered on the screen. Optimizing the loading of critical images (like LCP images) can greatly enhance performance, improving the overall user experience and page performance.

But here’s the challenge – a browser may not know which images are the most important for the visitor experience (like the LCP image) until rendering begins. If the developer can identify the LCP image or critical elements before it reaches the browser, its priority can be increased at the server to boost website performance instead of waiting for the browser to naturally discover the critical images.

In our Smart Hints blog, we describe how Cloudflare will soon be able to automatically prioritize content on behalf of website developers, but what happens if there’s a need to optimize the priority of the images right now? How do you know if a website is in a suboptimal state and what can you do to improve?

Using Cloudflare, developers should be able to improve image performance with heuristics that identify likely-important images before the browser parses them so these images can have increased priority and be loaded sooner.

Identifying Image Priority opportunities

Just increasing the fetch priority of all images won’t help if they are lazy-loaded or not critical/LCP images. Lazy-loading is a method that developers use to generally improve the initial load of a webpage if it includes numerous out-of-view elements. For example, on Instagram, when you continually scroll down the application to see more images, it would only make sense to load those images when the user arrives at them otherwise the performance of the page load would be needlessly delayed by the browser eagerly loading these out-of-view images. Instead the highest priority should be given to the LCP image in the viewport to improve performance.

So developers are left in a situation where they need to know which images are on users’ screens/viewports to increase their priority and which are off their screens to lazy-load them.

Recently, we’ve seen attempts to influence image priority on behalf of developers. For example, by default, in WordPress 5.5 all images with an IMG tag and aspect ratios were directed to be lazy-loaded. While there are plugins and other methods WordPress developers can use to boost the priority of LCP images, lazy-loading all images in a default manner and not knowing which are LCP images can cause artificial performance delays in website performance (they’re working on this though, and have partially resolved this for block themes).

So how do we identify the LCP image and other critical assets before they get to the browser?

To evaluate the opportunity to improve image performance, we turned to the HTTP Archive. Out of the approximately 22 million desktop pages tested in February 2023, 46% had an LCP element with an IMG tag. Meaning that for page load metrics, LCP had an image included about half the time. Though, among these desktop pages, 8.5 million had the image in the static HTML delivered with the page, indicating a total potential improvement opportunity of approximately 39% of the desktop pages within the dataset.

In the case of mobile pages, out of the ~28.5 million tested, 40% had an LCP element as an IMG tag. Among these mobile pages, 10.3 million had the image in the static HTML delivered with the page, suggesting a potential improvement opportunity in around 36% of the mobile pages within the dataset.

However, as previously discussed, prioritizing an image won’t be effective if the image is lazy-loaded because the directives are contradictory. In the dataset,  approximately 1.8 million LCP desktop images and 2.4 million LCP mobile images were lazy-loaded.

Therefore, across the Internet, the opportunity to improve image performance would be about ~30% of pages that have an LCP image in the original HTML markup that weren’t lazy-loaded, but with a more advanced Cloudflare Worker, the additional 9% of lazy-loaded LCP images can also be improved improved by removing the lazy-load attribute.

If you’d like to determine which element on your website serves as the LCP element so you can increase the priority or remove any lazy-loading, you can use browser developer tools, or speed tests like Webpagetest or Cloudflare Observatory.

39% of desktop images seems like a lot of opportunity to improve image performance. So the next question is how can Cloudflare determine the LCP image across our network and automatically prioritize them?

Image Index

We thought that how soon the LCP image showed up in the HTML would serve as a useful indicator. So we analyzed the HTTP Archive dataset to see where the cumulative percentage of LCP images are discovered based on their position in the HTML, including lazy-loaded images.

We found that approximately 25% of the pages had the LCP image as the first image in the HTML (around 10% of all pages). Another 25% had the LCP image as the second image. WordPress seemed to arrive at a similar conclusion and recently released a development to remove the default lazy-load attribute from the first image on block themes, but there are opportunities to go further.

Our analysis revealed that implementing a straightforward rule like “do not lazy-load the first four images,” either through the browser, a content management system (CMS), or a Cloudflare Worker could address approximately 75% of the issue of lazy-loading LCP images (example Worker below).

Ignoring small images

In trying to find other ways to identify likely LCP images we next turned to the size of the image. To increase the likelihood of getting the LCP image early in the HTML, we looked into ignoring “small” images as they are unlikely to be big enough to be a LCP element. We explored several sizes and 10,000 pixels (less than 100×100) was a pretty reliable threshold that didn’t skip many LCP images and avoided a good chunk of the non-LCP images.

By ignoring small images (<10,000px), we found that the first image became the LCP image in approximately 30-34% of cases. Adding the second image increased this percentage to 56-60% of pages.

Therefore, to improve image priority, a potential approach could involve assigning a higher priority to the first four “not-small” images.

Chrome 114 Image Prioritization Experiment

An experiment running in Chrome 114 does exactly what we described above. Within the browser there are a few different prioritization knobs to play with that aren’t web-exposed so we have the opportunity to assign a “medium” priority to images that we want to boost automatically (directly controlling priority with “fetch priority” lets you set high or low). This will let us move the images ahead of other images, async scripts and parser-blocking scripts late in the body but still keep the boosted image priority below any high-priority requests, particularly dynamically-injected blocking scripts.

We are experimenting with boosting the priority of varying numbers of images (2, 5 and 10) and with allowing one of those medium-priority images to load at a time during Chromes “tight” mode (when it is loading the render-blocking resources in the head) to increase the likelihood that the LCP image will be available when the first paint is done.

The data is still coming in and no “ship” decisions have been made yet but the early results are very promising, improving the LCP time across the entire web for all arms of the experiment (not by massive amounts but moving the metrics of the whole web is notoriously difficult).

How to use Cloudflare Workers to boost performance

Now that we’ve seen that there is a large opportunity across the Internet for helping prioritize images for performance and how to identify images on individual pages that are likely LCP images, the question becomes, what would the results be of implementing a network-wide rule that could boost image priority from this study?

We built a test worker and deployed it on some WordPress test sites with our friends at Rocket.net, a WordPress hosting platform focused on performance. This worker boosts the priority of the first four images while removing the lazy-load attribute, if present. When deployed we saw good performance results and the expected image prioritization.

export default {
  async fetch(request) {
    const response = await fetch(request);
 
    // Check if the response is HTML
    const contentType = response.headers.get('Content-Type');
    if (!contentType || !contentType.includes('text/html')) {
      return response;
    }
 
    const transformedResponse = transformResponse(response);
 
    // Return the transformed response with streaming enabled
    return transformedResponse;
  },
};
 
async function transformResponse(response) {
  // Create an HTMLRewriter instance and define the image transformation logic
  const rewriter = new HTMLRewriter()
    .on('img', new ImageElementHandler());
 
  const transformedBody = await rewriter.transform(response).text()
 
  const transformresponse = new Response(transformedBody, response)
 
  // Return the transformed response with streaming enabled
  return transformresponse
}
 
class ImageElementHandler {
  constructor() {
    this.imageCount = 0;
    this.processedImages = new Set();
  }
 
  element(element) {
    const imgSrc = element.getAttribute('src');
 
    // Check if the image is small based on Chrome's criteria
    if (imgSrc && this.imageCount < 4 && !this.processedImages.has(imgSrc) && !isImageSmall(element)) {
      element.removeAttribute('loading');
      element.setAttribute('fetchpriority', 'high');
      this.processedImages.add(imgSrc);
      this.imageCount++;
    }
  }
}
 
function isImageSmall(element) {
  // Check if the element has width and height attributes
  const width = element.getAttribute('width');
  const height = element.getAttribute('height');
 
  // If width or height is 0, or width * height < 10000, consider the image as small
  if ((width && parseInt(width, 10) === 0) || (height && parseInt(height, 10) === 0)) {
    return true;
  }
 
  if (width && height) {
    const area = parseInt(width, 10) * parseInt(height, 10);
    if (area < 10000) {
      return true;
    }
  }
 
  return false;
}

When testing the Worker, we saw that default image priority was boosted into “high” for the first four images and the fifth image remained “low.” This resulted in an LCP range of “good” from a speed test. While this initial test is not a dispositive indicator that the Worker will boost performance in every situation, the results are promising and we look forward to continuing to experiment with this idea.

While we’ve experimented with WordPress sites to illustrate the issues and potential performance benefits, this issue is present across the Internet.

Website owners can help us experiment with the Worker above to improve the priority of images on their websites or edit it to be more specific by targeting likely LCP elements. Cloudflare will continue experimenting using a very similar process to understand how to safely implement a network-wide rule to ensure that images are correctly prioritized across the Internet and performance is boosted without the need to configure a specific Worker.

Automatic Platform Optimization

Cloudflare’s Automatic Platform Optimization (APO) is a plugin for WordPress which allows Cloudflare to deliver your entire WordPress site from our network ensuring consistent, fast performance for visitors. By serving cached sites, APO can improve performance metrics. APO does not currently have a way to prioritize images over other assets to improve browser render metrics or dynamically rewrite HTML, techniques we’ve discussed in this post. Although this presents a potential opportunity for future development, it requires thorough testing to ensure safe and reliable support.

In the future we’ll look to include the techniques discussed today as part of APO, however in the meantime we recommend using Snippets (and Experiments) to test with the code example above to see the performance impact on your website.

Get in touch!

If you are interested in using the JavaScript above, we recommended testing with Workers or using Cloudflare Snippets. We’d love to hear from you on what your results were. Get in touch via social media and share your experiences.

We protect entire corporate networks, help customers build Internet-scale applications efficiently, accelerate any website or Internet applicationward off DDoS attacks, keep hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.

Source :
https://blog.cloudflare.com/speeding-up-your-website-in-a-few-clicks/

Exit mobile version