New SEC Cybersecurity Rules: What You Need to Know

By: Greg Young – Trendmicro
August 03, 2023
Read time: 4 min (1014 words)

The US Securities and Exchange Commission (SEC) recently adopted rules regarding mandatory cybersecurity disclosure. Explore what this announcement means for you and your organization.

On July 26, 2023, the US Securities and Exchange Commission (SEC) adopted rules regarding mandatory cybersecurity disclosure. What does this mean for you and your organization? As I understand them, here are the major takeaways that cybersecurity and business leaders need to know:

Who does this apply to?

The rules announced apply only to registrants of the SEC i.e., companies filing documents with the US SEC. Not surprisingly, this isn’t limited to attacks on assets located within the US, so incidents concerning SEC registrant companies’ assets in other countries are in scope. This scope also, not surprisingly, does not include the government, companies not subject to SEC reporting (i.e., privately held companies), and other organizations.

Breach notification for these others will be the subject of separate compliance regimes, which will hopefully, at some point in time, be harmonized and/or unified to some degree with the SEC reporting.

Advice for security leaders: be aware that these new rules could require “double reporting,” such as for publicly traded critical infrastructure companies. Having multiple compliance regimes, however, is not new for cybersecurity.

What are the general disclosure requirements?

Some pundits have said “four days after an incident” but that’s not quite correct. The SEC says that “material breaches” must be reported “four business days after a registrant determines that a cybersecurity incident is material.”

We’ve hit the first squishy bit: materiality. Directing companies to disclose material events shouldn’t be necessary before there’s a mixed record of companies making materiality for public company operation. But what kind of cybersecurity incident would be likely to be important to a reasonable investor?

We’ve seen giant breaches that paradoxically did not move stock prices, and minor breaches that did the opposite. I’m clearly on the side of compliance and disclosure, but I recognize it is a gray area. Recently we saw some companies that had the MOVEit vulnerability exploited but had no data loss. Should they report? But in some cases, their response to the vulnerability was in the millions: how about then? I expect and hope there will be further guidance.

Advice for security leaders: monitor the breach investigation and monitor the analysis of materiality. Security leaders won’t often make that call but should give guidance and continuous updates to the CxO who are responsible.

The second squishy bit is that the requirement is the reporting should be made four days after determining the incident is material. So not four days after the incident, but after the materiality determination. I understand why it was structured this way, as a small indicator of compromise must be followed up before understanding the scope and nature of a breach, including whether a breach has occurred at all. But this does give a window to some of the foot-dragging for disclosure we’ve unfortunately seen, including product companies with vulnerabilities.

Advice for security leaders: make management aware of the four-day reporting requirement and monitor the clock once the material line is crossed or identified.

Are there extensions?

There are, but not because you need more time. Instead “The disclosure may be delayed if the United States Attorney General determines that immediate disclosure would pose a substantial risk to national security or public safety and notifies the Commission of such determination in writing.” Note that it specifically states that the Attorney General (AG) makes that determination, and the AG communicates this to the SEC. There could be some delegation of this authority within the Department of Justice in the future, but today it is the AG.

How does it compare to other countries and compliance regimes?

Breach and incident reporting and disclosure is not new, and the concept of reporting material events is already commonplace around the world. GDPR breach reporting is 72 hours, HHS HIPAA requires notice not later than 60 days and 90 days to individuals affected, and the UK Financial Conduct Authority (FCA) has breach reporting requirements. Canada has draft legislation in Bill C-26 that looks at mandatory reporting through the lens of critical industries, which includes verticals such as banking and telecoms but not public companies. Many of the world’s financial oversight bodies do not require breach notification for public companies in the exchanges they are responsible for.

Advice to security leaders: consider the new SEC rules as clarification and amplification of existing reporting requirements for material events rather than a new regime or something that is harsher or different to other geographies.

Is breach reporting the only new rule?

No, I’ve only focused on incident reporting in this post. There’s a few more. The two most noteworthy ones are:

  • Regulation S-K Item 106, requiring registrants to “describe their processes, if any, for assessing, identifying, and managing material risks from cybersecurity threats, as well as the material effects or reasonably likely material effects of risks from cybersecurity threats and previous cybersecurity incidents.”
  • Also specified is that annual 10-Ks “describe the board of directors’ oversight of risks from cybersecurity threats and management’s role and expertise in assessing and managing material risks from cybersecurity threats.”

Bottom line

SEC mandatory reporting for material cybersecurity events was already a requirement under the general reporting requirements, however the timelines and nature of the reporting are getting real and have a ticking four-day timer on them.

Stepping back from the rules, the importance of visibility and continuous monitoring are the real takeaways. Time to detection can’t be at the speed of your least experienced analyst. Platform means unified visibility rather than a wall of consoles. Finding and stopping breaches means internal visibility must include a rich array of telemetry, and that it be continuously monitored.

Many SEC registrants have operations outside the US, and that means visibility needs to include threat intelligence that is localized to other geographies. These new SEC rules show more than ever that that cyber risk is business risk.

To learn more about cyber risk management, check out the following resources:

Source :
https://www.trendmicro.com/en_us/research/23/h/sec-cybersecurity-rules-2023.html

Cybersecurity Threat 1H 2023 Brief with Generative AI

By: Trend Micro
August 08, 2023
Read time: 4 min (1020 words)

How generative AI influenced threat trends in 1H 2023

A lot can change in cybersecurity over the course of just six months in criminal marketplaces. In the first half of 2023, the rapid expansion of generative AI tools began to be felt in scams such as virtual kidnapping and tools by cybercriminals. Tools like WormGPT and FraudGPT are being marketed. The use of AI empowers adversaries to carry out more sophisticated attacks and poses a new set of challenges. The good news is that the same technology can also be used to empower security teams to work more effectively.

As we analyze the major events and patterns observed during this time, we uncover critical insights that can help businesses stay ahead of risk and prepare for the challenges that lie ahead in the second half of the year.

AI-Driven Tools in Cybercrime

The adoption of AI in organizations has increased significantly, offering numerous benefits. However, cybercriminals are also harnessing the power of AI to carry out attacks more efficiently.

As detailed in a Trend research report in June, virtual kidnapping is a relatively new and concerning type of imposter scam. The scammer extorts their victims by tricking them into believing they are holding a friend or family member hostage. In reality, it is AI technology known as a “deepfake,” which enables the fraudster to impersonate the real voice of the “hostage” whilst on the phone. Audio harvested from their social media posts will typically be used to train the AI model.

However, it is generative AI that’s playing an increasingly important role earlier on in the attack chain—by accelerating what would otherwise be a time-consuming process of selecting the right victims. To find those most likely to pay up when confronted with traumatic content, threat groups can use generative AI like ChatGPT to filter large quantities of potential victim data, fusing it with geolocation and advertising analytics. The result is a risk-based scoring system that can show scammers at a glance where they should focus their attacks.

This isn’t just theory. Virtual kidnapping scams are already happening. The bad news is that generative AI could be leveraged to make such attacks even more automated and effective in the future. An attacker could generate a script via ChatGPT to then convert to the hostage’s voice using deepfake and a text-to-speech app.

Of course, virtual kidnapping is just one of a growing number of scams that are continually being refined and improved by threat actors. Pig butchering is another type of investment fraud where the victim is befriended online, sometimes on romance sites, and then tricked into depositing their money into fictitious cryptocurrency schemes. It’s feared that these fraudsters could use ChatGPT and similar tools to improve their conversational techniques and perhaps even shortlist victims most likely to fall for the scams.

What to expect

The emergence of generative AI tools enables cybercriminals to automate and improve the efficiency of their attacks. The future may witness the development of AI-driven threats like DDoS attacks, wipers, and more, increasing the sophistication and scale of cyberattacks.

One area of concern is the use of generative AI to select victims based on extensive data analysis. This capability allows cybercriminals to target individuals and organizations with precision, maximizing the impact of their attacks.

Fighting back

Fortunately, security experts like Trend are also developing AI tools to help customers mitigate such threats. Trend pioneered the use of AI and machine learning for cybersecurity—embedding the technology in products as far back as 2005. From those early days of spam filtering, we began developing models designed to detect and block unknown threats more effectively.

Trend’s defense strategy

Most recently, we began leveraging generative AI to enhance security operations. Companion is a cybersecurity assistant designed to automate repetitive tasks and thereby free up time-poor analysts to focus on high-value tasks. It can also help to fill skills gaps by decoding complex scripts, triaging and recommending actions, and explaining and contextualizing alerts for SecOps staff.

What else happened in 1H 2023?

Ransomware: Adapting and Growing

Ransomware attacks are becoming sophisticated, with illegal actors leveraging AI-enabled tools to automate their malicious activities. One new player on the scene, Mimic, has abused legitimate search tools to identify and encrypt specific files for maximum impact. Meanwhile, the Royal ransomware group has expanded its targets to include Linux platforms, signaling an escalation in their capabilities.

According to Trend data, ransomware groups have been targeting finance, IT, and healthcare industries the most in 2023. From January 1 to July 17, 2023, there have been 219, 206, and 178 successful compromises of victims in these industries, respectively.

Our research findings revealed that ransomware groups are collaborating more frequently, leading to lower costs and increased market presence. Some groups are showing a shift in motivation, with recent attacks resembling those of advanced persistent threat (APT) groups. To combat these evolving threats, organizations need to implement a “shift left” strategy, fortifying their defenses to prevent threats from gaining access to their networks in the first place.

Vulnerabilities: Paring Down Cyber Risk Index

While the Cyber Risk Index (CRI) has lowered to a moderate range, the threat landscape remains concerning. Smaller platforms are exploited by threat actors, such as Clop ransomware targeting MOVEIt and compromising government agencies. New top-level domains by Google pose risks for concealing malicious URLs. Connected cars create new avenues for hackers. Proactive cyber risk management is crucial.

Campaigns: Evading Detection and Expanding Targets

Malicious actors are continually updating their tools, techniques and procedures (TTP) to evade detection and cast a wider net for victims. APT34, for instance, used DNS-based communication combined with legitimate SMTP mail traffic to bypass security policies. Meanwhile, Earth Preta has shifted its focus to target critical infrastructure and key institutions using hybrid techniques to deploy malware.

Persistent threats like the APT41 subgroup Earth Longzhi have resurfaced with new techniques, targeting firms in multiple countries. These campaigns require a coordinated approach to cyber espionage, and businesses must remain vigilant against such attacks.

To learn more about Trend’s 2023 Midyear Cybersecurity Report, please visit: https://www.trendmicro.com/vinfo/us/security/research-and-analysis/threat-reports/roundup/stepping-ahead-of-risk-trend-micro-2023-midyear-cybersecurity-threat-report

Source :
https://www.trendmicro.com/en_us/research/23/h/cybersecurity-threat-2023-generative-ai.html

The Journey to Zero Trust with Industry Frameworks

By: Alifiya Sadikali – Trendmicro
August 09, 2023
Read time: 4 min (1179 words)

Discover the core principles and frameworks of Zero Trust, NIST 800-207 guidelines, and best practices when implementing CISA’s Zero Trust Maturity Model.

With the growing number of devices connected to the internet, traditional security measures are no longer enough to keep your digital assets safe. To protect your organization from digital threats, it’s crucial to establish strong security protocols and take proactive measures to stay vigilant.

What is Zero Trust?

Zero Trust is a cybersecurity philosophy based on the premise that threats can arise internally and externally. With Zero Trust, no user, system, or service should automatically be trusted, regardless of its location within or outside the network. Providing an added layer of security to protect sensitive data and applications, Zero Trust only grants access to authenticated and authorized users and devices. And in the event of a data breach, compartmentalizing access to individual resources limits potential damage.

Your organization should consider Zero Trust as a proactive security strategy to protect its data and assets better.

The pillars of Zero Trust

At its core, the basis for Zero Trust is comprised of a few fundamental principles:

  • Verify explicitly. Only grant access once the user or device has been explicitly authenticated and verified. By doing so, you can ensure that only those with a legitimate need to access your organization’s resources can do so.
  • Least privilege access. Only give users access to the resources they need to do their job and nothing more. Limiting access in this way prevents unauthorized access to your organization’s data and applications.
  • Assume breach. Act as if a compromise to your organization’s security has occurred. Take steps to minimize the damage, including monitoring for unusual activity, limiting access to sensitive data, and ensuring that backups are up-to-date and secure.
  • Microsegmentation. Divide your organization’s network into smaller, more manageable segments and apply security controls to each segment individually. This reduces the risk of a breach spreading from one part of your network to another.
  • Security automation. Use tools and technologies to automate the process of monitoring, detecting, and responding to security threats. This ensures that your organization’s security is always up-to-date and can react quickly to new threats and vulnerabilities.

A Zero Trust approach is a proactive and effective way to protect your organization’s data and assets from cyber-attacks and data breaches. By following these core principles, your organization can minimize the risk of unauthorized access, reduce the impact of a breach, and ensure that your organization’s security is always up-to-date and effective.

The role of NIST 800-207 in Zero Trust

NIST 800-207 is a cybersecurity framework developed by the National Institute of Standards and Technology. It provides guidelines and best practices for organizations to manage and mitigate cybersecurity risks.

Designed to be flexible and adaptable for a variety of organizations and industries, the framework supports the customization of cybersecurity plans to meet their specific needs. Its implementation can help organizations improve their cybersecurity posture and protect against cyber threats.

One of the most important recommendations of NIST 800-207 is to establish a policy engine, policy administrator, and policy enforcement point. This will help ensure consistent policy enforcement and that access is granted only to those who need it.

Another critical recommendation is conducting continuous monitoring and having real-time risk-based decision-making capabilities. This can help you quickly identify and respond to potential threats.

Additionally, it is essential to understand and map dependencies among assets and resources. This will help you ensure your security measures are appropriately targeted based on potential vulnerabilities.

Finally, NIST recommends replacing traditional paradigms, such as implicit trust in assets or entities, with a “trust but verify” methodology. Adopting this approach can better protect your organization’s assets and resources from internal and external threats.

CISA’s Zero Trust Maturity Model

The Zero Trust Maturity Model (ZMM), developed by CISA, provides a comprehensive framework for assessing an organization’s Zero Trust posture. This model covers critical areas including:

  • Identity management: To implement a Zero Trust strategy, it is important to begin with identity. This involves continuously verifying, authenticating, and authorizing any entity before granting access to corporate resources. To achieve this, comprehensive visibility is necessary.
  • Devices, networks, applications: To maintain Zero Trust, use endpoint detection and response capabilities to detect threats and keep track of device assets, network connections, application configurations, and vulnerabilities. Continuously assess and score device security posture and implement risk-informed authentication protocols to ensure only trusted devices, networks and applications can access sensitive data and enterprise systems.
  • Data and governance: To maximize security, implement prevention, detection, and response measures for identity, devices, networks, IoT, and cloud. Monitor legacy protocols and device encryption status. Apply Data Loss Prevention and access control policies based on risk profiles.
  • Visibility and analytics: Zero Trust strategies cannot succeed within silos. By collecting data from various sources within an organization, organizations can gain a complete view of all entities and resources. This data can be analyzed through threat intelligence, generating reliable and contextualized alerts. By tracking broader incidents connected to the same root cause, organizations can make informed policy decisions and take appropriate response actions.
  • Automation and orchestration: To effectively automate security responses, it is important to have access to comprehensive data that can inform the orchestration of systems and manage permissions. This includes identifying the types of data being protected and the entities that are accessing it. By doing so, it ensures that there is proper oversight and security throughout the development process of functions, products, and services.

By thoroughly evaluating these areas, your organization can identify potential vulnerabilities in its security measures and take prompt action to improve your overall cybersecurity posture. CISA’s ZMM offers a holistic approach to security that will enable your organization to remain vigilant against potential threats.

Implementing Zero Trust with Trend Vision One

Trend Vision One seamlessly integrates with third-party partner ecosystems and aligns to industry frameworks and best practices, including NIST and CISA, offering coverage from prevention to extended detection and response across all pillars of zero trust.

Trend Vision One is an innovative solution that empowers organizations to identify their vulnerabilities, monitor potential threats, and evaluate risks in real-time, enabling them to make informed decisions regarding access control. With its open platform approach, Trend enables seamless integration with third-party partner ecosystems, including IAM, Vulnerability Management, Firewall, BAS, and SIEM/SOAR vendors, providing a comprehensive and unified source of truth for risk assessment within your current security framework. Additionally, Trend Vision One is interoperable with SWG, CASB, and ZTNA and includes Attack Surface Management and XDR, all within a single console.

Conclusion

CISOs today understand that the journey towards achieving Zero Trust is a gradual process that requires careful planning, step-by-step implementation, and a shift in mindset towards proactive security and cyber risk management. By understanding the core principles of Zero Trust and utilizing the guidelines provided by NIST and CISA to operationalize Zero Trust with Trend Vision One, you can ensure that your organization’s cybersecurity measures are strong and can adapt to the constantly changing threat landscape.

To read more thought leadership and research about Zero Trust, click here.

Source :
https://www.trendmicro.com/en_us/research/23/h/industry-zero-trust-frameworks.html

ChatGPT Highlights a Flaw in the Educational System

By: William Malik – Trendmicro
August 14, 2023
Read time: 4 min (1014 words)

Rethinking learning metrics and fostering critical thinking in the era of generative AI and LLMs

I recently participated in a conversation about artificial intelligence, specifically ChatGPT and its kin, with a group of educators in South Africa. They were concerned that the software would help students cheat.

We discussed two possible alternatives to ChatGPT: First, teachers could require that students submit handwritten homework. This would force students to at least read the material once before submitting it; Second, teachers could grade the paper submissions no higher than 89 percent (or a “B”), but that to get an “A,” the student would have to stand in front of the class and verbally discuss the material, their research, their conclusion, and answer any questions the teacher or other classmates might ask. (With that verbal defense of the ideas, the teacher might even waive the requirement for paper submission at all!)

The fundamental problem is that the grading system depends on homework. If education aims to teach an individual both a) a body of knowledge and b) the techniques of reasoning with that knowledge, then the metrics proving that achievement is misaligned.

One of the most quoted management scientists is Fredrick W. Taylor. He is most known for saying, “If you can’t measure it, you can’t manage it.” Interestingly, he never said that – which is fortunate because it is entirely wrong. People always manage things without metrics – from driving a car to raising children. He said: “If you measure it, you’ll manage it” – and he intended that as a warning. Whenever you adopt a metric, you will adjust your assessment of the underlying process in terms of your chosen metric. His warning is to be very careful about which metrics you choose.

Sometime in the past forty years, we decided that the purpose of education is to do well on tests. Unfortunately, that is also wrong. The purpose of education is to teach people to gather evidence and to think clearly about it. Students should learn how to judge various forms of evidence. They should understand rhetorical techniques (in the classical sense – how to render ideas clearly). They should be aware of common errors in thinking – the cognitive pitfalls we all fall into when rushed or distracted and logical fallacies which rob our arguments of their validity.

Large Language Models (LLMs) aggregate vast troves of text. Those data sources are not curated, so LLMs reflect the biases, logical limitations, and cognitive distortions in so much of what’s online. We are all familiar with early chatbots that were easily corrupted – the Microsoft chatbot Tay was perverted into being a racist resonator. (See “Twitter taught Microsoft’s AI Chatbot to be a Racist A**hole in Less than a Day” from The Verge, March 24, 2016, at https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist accessed Aug 2023.)

LLMs do not think. They scan as much material as possible, then build a set of probabilities about which word is most likely to follow another word. If the word “pterodactyl” occurs in a text, then the next most likely word might be “soaring,” and “flying” might be in second place. If ChatGPT gets the word “pterodactyl” as input, it will put “soaring” next to it. This may look plausible to a person reading the output, but it cannot be correct. Correctness implies some kind of comprehension and judgment. ChatGPT does neither. It merely arranges words based on their statistical likelihood in the LLM’s database. We are now learning that LLMs that ingest computer-generated content become even more skewed – augmenting the likelihood of one word following another by rescanning the previous output. Over time, LLMs fed AI-generated content will drift farther and farther from actual human writing. The oft-mentioned hallucinations that LLMs generate will become more common as the distillation and amplification of the more likely subset of words leads to a contracted pool of possible machine-generated responses. Eventually – if we are not able to prevent LLMs from ingesting already-processed content – the output of ChatGPT will become more and more constrained, which, taken to the extreme, will yield one plot, one answer, one painting, and one outcome regardless of the specific input. Long before then, people will have abandoned LLM-based efforts for any activity that requires creativity.

Where can LLMs help? By sorting through bounded sets of information. That means an LLM trained on protein sequences could rapidly develop a most likely model for a protein that could attack a particular disease or interrupt an allergic reaction. In that case, the issue isn’t seeking creativity but rapidly scanning a set of nearly identical data overreactions to find the few that stand out enough to make a difference. A human doing this kind of work would quickly grow bored and likely make errors. LLMs can help science move quickly through vast quantities of data in closed domains. But when looking at an unbounded domain (art, poetry, fiction, movies, music, and the like), LLMs can only build average content, filling in the space between works. Artists seek to reach beyond the space their prior work defined.

The core problem with LLMs may be unsolvable. At this point, various organizations are exploring ways to tag AI-generated content (written and graphic) so humans can spend a moment assessing the accuracy and validity of the material. Of course, message digests can be corrupted and watermarks forged. A bad actor might maliciously tag authentic content as AI-generated. Recent developments include malicious ChatGPT variants designed to create BEC and phishing email content,

Students will always look for a shortcut, and that habit is difficult to overcome. In business, it will also be tempting for bureaucrats to use tools to simplify their tasks. How will your firm incorporate LLMs safely into your business processes? Organizations should consider how they will audit their internal procedures to ensure that LLM outputs are incorporated appropriately into communications. Imagine the potential for harm if some publicly traded company was found to have used an LLM to develop its annual financial report!

What do you think? Let me know in the comments below, or contact me @wjmalik@noc.social

Source :
https://www.trendmicro.com/en_us/research/23/h/chatgpt-flaw.html

OT Security is Less Mature but Progressing Rapidly

By: Kazuhisa Tagaya – Trendmicro
August 14, 2023
Read time: 2 min (638 words)

The latest study said that OT security is less mature in several capabilities than IT security, but most organizations are improving it.

e asked participants whether OT security for cybersecurity capabilities is less mature or more mature than IT in their organizations with reference to the NIST CSF.

As an average of all items, 39.5% answered that OT has a lower level of maturity. (18% answered OT security is more mature, and 36.4% at the same level)

Categorizing security capabilities into the five cores of the NIST CSF and aggregating them for each core, the most was that Detect is lower maturity in OT security than in IT. (42%)

figure1
Figure1: What security capabilities in OT are lower than IT (NIST CSF 5 Core)

Furthermore, looking at the specific security capabilities, the score of “Cyber event detection” is the most(45.7%).

figure2
Figure2: What security capabilities in OT are lower than IT (detail)

The OT environment has more diverse legacy assets, and protocol stacks dedicated to ICS/OT, making it difficult to implement sensors to detect malicious behavior or apply the patches on the assets. The inability to implement uniform measures in the same way as IT security is an obstacle to increasing the maturity level.

Detection in OT: Endpoint and Network

The survey asked respondents about their Endpoint Detection and Response (EDR) and Network Security Monitoring (NSM) implementations to measure their visibility in their OT environments. They answered whether EDR (including antivirus) was implemented in the following three places.

  • Server assets running commercial OS (Windows, Linux, Unix): 41%
  • Engineering (engineering workstations, instrumentation laptops, calibration and test equipment) assets running commercial OS (Windows, Unix, Linux): 34%
  • Operator assets (HMI, workstations) running commercial OS (Windows, Linux, Unix): 33% 

In addition, 76% of organizations that have already deployed EDR said they plan to expand their deployment within 24 months.

figure3
Figure3: EDR deployment

We also asked whether NSM (including IDS) was implemented at the following levels referring to the Purdue model.

  • Purdue Level 4 (Enterprise): 30%
  • Purdue Level 3.5 (DMZ): 36%
  • Purdue Level 3 (Site or SCADA-wide): 38%
  • Purdue Level 2 (Control): 20%
  • Purdue Levels 1/0 (Sensors and Actuators): 8%

Like EDR, 70% of organizations that have already implemented NSM said they have plans to expand implementation within 24 months.

figure4
Figure4: NSM deployment

In this survey, EDR implementation rates tended to vary depending on the respondent’s industry and size of organization. The implementation rate of NSM was relatively high in DMZ and Level 3, and the implementation rate decreased according to the lower layers. But I think it is not appropriate to conclude the decisive trend from the average value in the questions, because there are variations in the places where they are implemented EDR and NSM depending on the organization. The implementation rate shown here is just a rough standard. Where and how much to invest depends on the environment and decision-making of the organization. Asset owners can use the result as a reference to see where to implement EDR and NSM and evaluate their implementation plans.

To learn about how to assess risk in your OT environment to invest appropriately, please refer to our practices of risk assessment in smart factories.

Reference:
Breaking IT/OT Silos with ICS/OT Visibility – 2023 SANS ICS/OT visibility survey

Source :
https://www.trendmicro.com/en_us/research/23/h/ot-security-2023.html

Top 10 AI Security Risks According to OWASP

By: Trend Micro
August 15, 2023
Read time: 4 min (1157 words)

The unveiling of the first-ever Open Worldwide Application Security Project (OWASP) risk list for large language model AI chatbots was yet another sign of generative AI’s rush into the mainstream—and a crucial step toward protecting enterprises from AI-related threats.

For more than 20 years, the Open Worldwide Application Security Project (OWASP) top 10 risk list has been a go-to reference in the fight to make software more secure. So it’s no surprise developers and cybersecurity professionals paid close attention earlier this spring when OWASP published an all-new list focused on large language model AI vulnerabilities.

OWASP’s move is yet more proof of how quickly AI chatbots have swept into the mainstream. Nearly half (48%) of corporate respondents to one survey said that by February 2023 they had already replaced workers with ChatGPT—just three months after its public launch. With many observers expressing concern that AI adoption has rushed ahead without understanding of the risks involved, the OWASP top 10 AI risk list is both timely and essential.

Large language model vulnerabilities at a glance

OWASP has released two draft versions of its AI vulnerability list so far: one in May 2023 and a July 1 update with refined classifications and definitions, examples, scenarios, and links to additional references. The most recent is labeled ‘version 0.5’, and a formal version 1 is reported to be in the works.

We did some analysis and found the vulnerabilities identified by OWASP fall broadly into three categories:

  1. Access risks associated with exploited privileges and unauthorized actions.
  2. Data risks such as data manipulation or loss of services.
  3. Reputational and business risks resulting from bad AI outputs or actions.

In this blog, we take a closer look at the specific risks in each case and offer some suggestions about how to handle them.

1. Access risks

Of the 10 vulnerabilities listed by OWASP, four are specific to access and misuse of privileges: insecure plugins, insecure output handling, permissions issues, and excessive agency.

According to OWASP, any large language model that uses insecure plugins to receive “free-form text” inputs could be exposed to malicious requests, resulting in unwanted behaviors or the execution of unauthorized remote code. On the flipside, plugins or applications that handle large language model outputs insecurely—without evaluating them—could be susceptible to cross-site and server-side request forgeries, unauthorized privilege escalations, hijack attacks, and more.

Similarly, when authorizations aren’t tracked between plugins, permissions issues can arise that open the way for indirect prompt injections or malicious plugin usage.

Finally, because AI chatbots are ‘actors’ able to make and implement decisions, it matters how much free reign (i.e., agency) they’re given. As OWASP explains, “When LLMs interface with other systems, unrestricted agency may lead to undesirable operations and actions.” Examples include personal mail reader assistants being exploited to propagate spam or customer service AI chatbots manipulated into issuing undeserved refunds.

In all of these cases, the large language model becomes a conduit for bad actors to infiltrate systems.

2. Data risks

Poisoned training data, supply chain vulnerabilities, prompt injection vulnerabilities and denials of serviceare all data-specific AI risks.

Data can be poisoned deliberately by bad actors who want to harm an organization. It can also be distorted inadvertently when an AI system learns from unreliable or unvetted sources. Both types of poisoning can occur within an active AI chatbot application or emerge from the large language model supply chain, where reliance on pre-trained models, crowdsourced data, and insecure plugin extensions may produce biased data outputs, security breaches, or system failures.

With prompt injections, ill-meaning inputs may cause a large language model AI chatbot to expose data that should be kept private or perform other actions that lead to data compromises.

AI denial of service attacks are similar to classic DOS attacks. They may aim to overwhelm a large language model and deprive users of access to data and apps, or—because many AI chatbots rely on pay-as-you-go IT infrastructure—force the system to consume excessive resources and rack up massive costs.

3. Reputational and business risks

The final OWASP vulnerability (according to our buckets) is already reaping consequences around the world today:overreliance on AI. There’s no shortage of stories about large language models generating false or inappropriate outputs from fabricated citations and legal precedents to racist and sexist language.

OWASP points out that depending on AI chatbots without proper oversight can make organizations vulnerable to publishing misinformation or offensive content that results in reputational damage or even legal action.
Given all these various risks, the question becomes, “What can we do about it?” Fortunately, there are some protective steps organizations can take. 

What enterprises can do about large language model vulnerabilities

From our perspective at Trend Micro, defending against AI access risks requires a zero-trust security stance with disciplined separation of systems (sandboxing). Even though generative AI has the ability to challenge zero-trust defenses in ways that other IT systems don’t—because it can mimic trusted entities—a zero-trust posture still adds checks and balances that make it easier to identify and contain unwanted activity. OWASP also advises that large language models “should not self-police” and calls for controls to be embedded in application programming interfaces (APIs).

Sandboxing is also key to protecting data privacy and integrity: keeping confidential information fully separated from shareable data and making it inaccessible to AI chatbots and other public-facing systems. (See our recent blog on AI cybersecurity policies for more.)

Good separation of data prevents large language models from including private or personally identifiable information in public outputs, and from being publicly prompted to interact with secure applications such as payment systems in inappropriate ways.

On the reputational front, the simplest remedies are to not rely solely on AI-generated content or code, and to never publish or use AI outputs without first verifying they are true, accurate, and reliable.

Many of these defensive measures can—and should—be embedded in corporate policies. Once an appropriate policy foundation is in place, security technologies such as endpoint detection and response (EDR), extended detection and response (XDR), and security information and event management (SIEM) can be used for enforcement and to monitor for potentially harmful activity.

Large language model AI chatbots are here to stay

OWASP’s initial work cataloguing AI risks proves that concerns about the rush to embrace AI are well justified. At the same time, AI clearly isn’t going anywhere, so understanding the risks and taking responsible steps to mitigate them is critically important.

Setting up the right policies to manage AI use and implementing those policies with the help of cybersecurity solutions is a good first step. So is staying informed. The way we see it at Trend Micro, OWASP’s top 10 AI risk list is bound to become as much of an annual must-read as its original application security list has been since 2003.

Next steps

For more Trend Micro thought leadership on AI chatbot security, check out these resources:

Source :
https://www.trendmicro.com/en_us/research/23/h/top-ai-risks.html

8 Essential Tips for Data Protection and Cybersecurity in Small Businesses

Michelle Quill — June 6, 2023

Small businesses are often targeted by cybercriminals due to their lack of resources and security measures. Protecting your business from cyber threats is crucial to avoid data breaches and financial losses.

Why is cyber security so important for small businesses?

Small businesses are particularly in danger of cyberattacks, which can result in financial loss, data breaches, and damage to IT equipment. To protect your business, it’s important to implement strong cybersecurity measures.

Here are some tips to help you get started:

One important aspect of data protection and cybersecurity for small businesses is controlling access to customer lists. It’s important to limit access to this sensitive information to only those employees who need it to perform their job duties. Additionally, implementing strong password policies and regularly updating software and security measures can help prevent unauthorized access and protect against cyber attacks. Regular employee training on cybersecurity best practices can also help ensure that everyone in the organization is aware of potential threats and knows how to respond in the event of a breach.

When it comes to protecting customer credit card information in small businesses, there are a few key tips to keep in mind. First and foremost, it’s important to use secure payment processing systems that encrypt sensitive data. Additionally, it’s crucial to regularly update software and security measures to stay ahead of potential threats. Employee training and education on cybersecurity best practices can also go a long way in preventing data breaches. Finally, having a plan in place for responding to a breach can help minimize the damage and protect both your business and your customers.

Small businesses are often exposed to cyber attacks, making data protection and cybersecurity crucial. One area of particular concern is your company’s banking details. To protect this sensitive information, consider implementing strong passwords, two-factor authentication, and regular monitoring of your accounts. Additionally, educate your employees on safe online practices and limit access to financial information to only those who need it. Regularly backing up your data and investing in cybersecurity software can also help prevent data breaches.

Small businesses are often at high risk of cyber attacks due to their limited resources and lack of expertise in cybersecurity. To protect sensitive data, it is important to implement strong passwords, regularly update software and antivirus programs, and limit access to confidential information.

It is also important to have a plan in place in case of a security breach, including steps to contain the breach and notify affected parties. By taking these steps, small businesses can better protect themselves from cyber threats and ensure the safety of their data.

Tips for protecting your small business from cyber threats and data breaches are crucial in today’s digital age. One of the most important steps is to educate your employees on cybersecurity best practices, such as using strong passwords and avoiding suspicious emails or links.

It’s also important to regularly update your software and systems to ensure they are secure and protected against the latest threats. Additionally, implementing multi-factor authentication and encrypting sensitive data can add an extra layer of protection. Finally, having a plan in place for responding to a cyber-attack or data breach can help minimize the damage and get your business back on track as quickly as possible.

Small businesses are attackable to cyber-attacks and data breaches, which can have devastating consequences. To protect your business, it’s important to implement strong cybersecurity measures. This includes using strong passwords, regularly updating software and systems, and training employees on how to identify and avoid phishing scams.

It’s also important to have a data backup plan in place and to regularly test your security measures to ensure they are effective. By taking these steps, you can help protect your business from cyber threats and safeguard your valuable data.

To protect against cyber threats, it’s important to implement strong data protection and cybersecurity measures. This can include regularly updating software and passwords, using firewalls and antivirus software, and providing employee training on safe online practices. Additionally, it’s important to have a plan in place for responding to a cyber attack, including backing up data and having a designated point person for handling the situation.

In today’s digital age, small businesses must prioritize data protection and cybersecurity to safeguard their operations and reputation. With the rise of remote work and cloud-based technology, businesses are more vulnerable to cyber attacks than ever before. To mitigate these risks, it’s crucial to implement strong security measures for online meetings, advertising, transactions, and communication with customers and suppliers. By prioritizing cybersecurity, small businesses can protect their data and prevent unauthorized access or breaches.

Here are 8 essential tips for data protection and cybersecurity in small businesses.

8 Essential Tips for Data Protection and Cybersecurity in Small Businesses

1. Train Your Employees on Cybersecurity Best Practices

Your employees are the first line of defense against cyber threats. It’s important to train them on cybersecurity best practices to ensure they understand the risks and how to prevent them. This includes creating strong passwords, avoiding suspicious emails and links, and regularly updating software and security systems. Consider providing regular training sessions and resources to keep your employees informed and prepared.

2. Use Strong Passwords and Two-Factor Authentication

One of the most basic yet effective ways to protect your business from cyber threats is to use strong passwords and two-factor authentication. Encourage your employees to use complex passwords that include a mix of letters, numbers, and symbols, and to avoid using the same password for multiple accounts. Two-factor authentication adds an extra layer of security by requiring a second form of verification, such as a code sent to a mobile device, before granting access to an account. This can help prevent unauthorized access even if a password is compromised.

3. Keep Your Software and Systems Up to Date

One of the easiest ways for cybercriminals to gain access to your business’s data is through outdated software and systems. Hackers are constantly looking for vulnerabilities in software and operating systems, and if they find one, they can exploit it to gain access to your data. To prevent this, make sure all software and systems are kept up-to-date with the latest security patches and updates. This includes not only your computers and servers but also any mobile devices and other connected devices used in your business. Set up automatic updates whenever possible to ensure that you don’t miss any critical security updates.

4. Use Antivirus and Anti-Malware Software

Antivirus and anti-malware software are essential tools for protecting your small business from cyber threats. These programs can detect and remove malicious software, such as viruses, spyware, and ransomware before they can cause damage to your systems or steal your data. Make sure to install reputable antivirus and anti-malware software on all devices used in your business, including computers, servers, and mobile devices. Keep the software up-to-date and run regular scans to ensure that your systems are free from malware.

5. Backup Your Data Regularly

One of the most important steps you can take to protect your small business from data loss is to back up your data regularly. This means creating copies of your important files and storing them in a secure location, such as an external hard drive or cloud storage service. In the event of a cyber-attack or other disaster, having a backup of your data can help you quickly recover and minimize the impact on your business. Make sure to test your backups regularly to ensure that they are working properly and that you can restore your data if needed.

6. Carry out a risk assessment

Small businesses are especially in peril of cyber attacks, making it crucial to prioritize data protection and cybersecurity. One important step is to assess potential risks that could compromise your company’s networks, systems, and information. By identifying and analyzing possible threats, you can develop a plan to address security gaps and protect your business from harm.

For Small businesses making data protection and cybersecurity is a crucial part. To start, conduct a thorough risk assessment to identify where and how your data is stored, who has access to it, and potential threats. If you use cloud storage, consult with your provider to assess risks. Determine the potential impact of breaches and establish risk levels for different events. By taking these steps, you can better protect your business from cyber threats

7. Limit access to sensitive data

One effective strategy is to limit access to critical data to only those who need it. This reduces the risk of a data breach and makes it harder for malicious insiders to gain unauthorized access. To ensure accountability and clarity, create a plan that outlines who has access to what information and what their roles and responsibilities are. By taking these steps, you can help safeguard your business against cyber threats.

8. Use a firewall

For Small businesses, it’s important to protect the system from cyber attacks by making data protection and reducing cybersecurity risk. One effective measure is implementing a firewall, which not only protects hardware but also software. By blocking or deterring viruses from entering the network, a firewall provides an added layer of security. It’s important to note that a firewall differs from an antivirus, which targets software affected by a virus that has already infiltrated the system.

Small businesses can take steps to protect their data and ensure cybersecurity. One important step is to install a firewall and keep it updated with the latest software or firmware. Regularly checking for updates can help prevent potential security breaches.

Conclusion

Small businesses are particularly vulnerable to cyber attacks, so it’s important to take steps to protect your data. One key tip is to be cautious when granting access to your systems, especially to partners or suppliers. Before granting access, make sure they have similar cybersecurity practices in place. Don’t hesitate to ask for proof or to conduct a security audit to ensure your data is safe.

Source :
https://onlinecomputertips.com/support-categories/networking/tips-for-cybersecurity-in-small-businesses/

Export Your Google Sites Website to Your Computer

Preston Mason — January 12, 2023

Everyone who uses a computer knows that Google has a service or an app for just about anything you think of. And for those who want an easy way to create websites, they also have their free online website creation tool called Google Sites which doesn’t require any coding skills or HTML knowledge to use. But if you want to backup your website, you will notice that there is no option to do so. In this article, we will show you how to export your google sites website to your computer.

If you are using traditional website development tools or software, you would generally be working on your individual HTML files and also be backing them up as needed. But with Google Sites, you are working within the Sites interface online and your website it stored in your Google Drive account.

The Backup Process

When you go to your Google Drive account and find your website, you will see that it appears as a single file just like a document or spreadsheet would appear. Then if you right click on it, there is no download option like you would see for other files.

Google Drive right click options

To backup your Sites website, you will first need to make a copy of it within your Google Drive account and move to a different folder. Within Drive, click on the New button and then choose New folder. Name this folder something like Website Backup or whatever you would like.

Next, go to your website file in Drive, right click it and then choose the Make a copy option. This will make a copy of your website in the same location and will add copy of in front of the existing name.

Google Drive file right click options make a copy

You can then drag the copy to your new folder or right click on the copy and choose the Move to option and then choose your new folder. You can also move your original website file to this new folder if you don’t want to make the copy for the backup.

Using Google Takeout

After this copy is created, you can go to the Google Takeout website and just make sure you are logged in with the same Google account that you use for your website. Then the first thing you want to do is click on Deselect all since all of the checkboxes will be selected by default.

Google Takeout website

Then scroll down the list and look for Drive (not Classic Sites) and check the box and then click on All Drive data included.

Google Takeout website

Next you will uncheck the box that says Include all files and folders in Drive and check only the box next to the folder name that contains your website file and click OK.

Export Your Google Sites Website to Your Computer

Then you will need to scroll to the bottom of the page and click on the button that says Next step. Now you will be able to configure the export to either send a download link to your email address or add your backup to your Drive, OneDrive, Dropbox or Box account. I prefer the email option.

You can also setup this backup as a one time export or have it be exported every 2 months for the next year. For the export file type you can choose between zip and tgz file formats. If you have a really large website then the export will be split into 2GB files but your website is most likely much smaller than 2GB.

Once you have everything set correctly, you can click on the Create export button.

Export Your Google Sites Website to Your Computer

You will then be shown an export progress screen telling you that your will be sent an email when the export is complete. Even though it says it can possibly take days to complete, its usually fairly quick and might only take a few minutes.

Export Your Google Sites Website to Your Computer

Once you receive the email, you can click on the Download your files button or the Mange exports button to be taken back to the Google Takeout website to download your files.

Export Your Google Sites Website to Your Computer
Google Takeout Mange Exports

After you download the zip file, you can then extract it and navigate to the location of your website files.

Export Your Google Sites Website to Your Computer

Source :
https://onlinecomputertips.com/support-categories/software/export-your-google-sites-website-to-your-computer/

Part 2: Rethinking cache purge with a new architecture

21/06/2023

In Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation, we outlined the importance of cache invalidation and the difficulties of purging caches, how our existing purge system was designed and performed, and we gave a high level overview of what we wanted our new Cache Purge system to look like.

It’s been a while since we published the first blog post and it’s time for an update on what we’ve been working on. In this post we’ll be talking about some of the architecture improvements we’ve made so far and what we’re working on now.

Cache Purge end to end

We touched on the high level design of what we called the “coreless” purge system in part 1, but let’s dive deeper into what that design encompasses by following a purge request from end to end:

Step 1: Request received locally

An API request to Cloudflare is routed to the nearest Cloudflare data center and passed to an API Gateway worker. This worker looks at the request URL to see which service it should be sent to and forwards the request to the appropriate upstream backend. Most endpoints of the Cloudflare API are currently handled by centralized services, so the API Gateway worker is often just proxying requests to the nearest “core” data center which have their own gateway services to handle authentication, authorization, and further routing. But for endpoints which aren’t handled centrally the API Gateway worker must handle authentication and route authorization, and then proxy to an appropriate upstream. For cache purge requests that upstream is a Purge Ingest worker in the same data center.

Step 2: Purges tested locally

The Purge Ingest worker evaluates the purge request to make sure it is processible. It scans the URLs in the body of the request to see if they’re valid, then attempts to purge the URLs from the local data center’s cache. This concept of local purging was a new step introduced with the coreless purge system allowing us to capitalize on existing logic already used in every data center.

By leveraging the same ownership checks our data centers use to serve a zone’s normal traffic on the URLs being purged, we can determine if those URLs are even cacheable by the zone. Currently more than 50% of the URLs we’re asked to purge can’t be cached by the requesting zones, either because they don’t own the URLs (e.g. a customer asking us to purge https://cloudflare.com) or because the zone’s settings for the URL prevent caching (e.g. the zone has a “bypass” cache rule that matches the URL). All such purges are superfluous and shouldn’t be processed further, so we filter them out and avoid broadcasting them to other data centers freeing up resources to process more legitimate purges.

On top of that, generating the cache key for a file isn’t free; we need to load zone configuration options that might affect the cache key, apply various transformations, et cetera. The cache key for a given file is the same in every data center though, so when we purge the file locally we now return the generated cache key to the Purge Ingest worker and broadcast that key to other data centers instead of making each data center generate it themselves.

Step 3: Purges queued for broadcasting

purge request to small colo, ingest worker sends to queue worker in T1

Once the local purge is done the Purge Ingest worker forwards the purge request with the cache key obtained from the local cache to a Purge Queue worker. The queue worker is a Durable Object worker using its persistent state to hold a queue of purges it receives and pointers to how far along in the queue each data center in our network is in processing purges.

The queue is important because it allows us to automatically recover from a number of scenarios such as connectivity issues or data centers coming back online after maintenance. Having a record of all purges since an issue arose lets us replay those purges to a data center and “catch up”.

But Durable Objects are globally unique, so having one manage all global purges would have just moved our centrality problem from a core data center to wherever that Durable Object was provisioned. Instead we have dozens of Durable Objects in each region, and the Purge Ingest worker looks at the load balancing pool of Durable Objects for its region and picks one (often in the same data center) to forward the request to. The Durable Object will write the purge request to its queue and immediately loop through all the data center pointers and attempt to push any outstanding purges to each.

While benchmarking our performance we found our particular workload exhibited a “goldilocks zone” of throughput to a given Durable Object. On script startup we have to load all sorts of data like network topology and data center health–then refresh it continuously in the background–and as long as the Durable Object sees steady traffic it stays active and we amortize those startup costs. But if you ask a single Durable Object to do too much at once like send or receive too many requests, the single-threaded runtime won’t keep up. Regional purge traffic fluctuates a lot depending on local time of day, so there wasn’t a static quantity of Durable Objects per region that would let us stay within the goldilocks zone of enough requests to each to keep them active but not too many to keep them efficient. So we built load monitoring into our Durable Objects, and a Regional Autoscaler worker to aggregate that data and adjust load balancing pools when we start approaching the upper or lower edges of our efficiency goldilocks zone.

Step 4: Purges broadcast globally

multiple regions, durable object sends purges to fanouts in other regions, fanout sends to small colos in their region

Once a purge request is queued by a Purge Queue worker it needs to be broadcast to the rest of Cloudflare’s data centers to be carried out by their caches. The Durable Objects will broadcast purges directly to all data centers in their region, but when broadcasting to other regions they pick a Purge Fanout worker per region to take care of their region’s distribution. The fanout workers manage queues of their own as well as pointers for all of their region’s data centers, and in fact they share a lot of the same logic as the Purge Queue workers in order to do so. One key difference is fanout workers aren’t Durable Objects; they’re normal worker scripts, and their queues are purely in memory as opposed to being backed by Durable Object state. This means not all queue worker Durable Objects are talking to the same fanout worker in each region. Fanout workers can be dropped and spun up again quickly by any metal in the data center because they aren’t canonical sources of state. They maintain queues and pointers for their region but all of that info is also sent back downstream to the Durable Objects who persist that data themselves, reliably.

But what does the fanout worker get us? Cloudflare has hundreds of data centers all over the world, and as we mentioned above we benefit from keeping the number of incoming and outgoing requests for a Durable Object fairly low. Sending purges to a fanout worker per region means each Durable Object only has to make a fraction of the requests it would if it were broadcasting to every data center directly, which means it can process purges faster.

On top of that, occasionally a request will fail to get where it was going and require retransmission. When this happens between data centers in the same region it’s largely unnoticeable, but when a Durable Object in Canada has to retry a request to a data center in rural South Africa the cost of traversing that whole distance again is steep. The data centers elected to host fanout workers have the most reliable connections in their regions to the rest of our network. This minimizes the chance of inter-regional retries and limits the latency imposed by retries to regional timescales.

The introduction of the Purge Fanout worker was a massive improvement to our distribution system, reducing our end-to-end purge latency by 50% on its own and increasing our throughput threefold.

Current status of coreless purge

We are proud to say our new purge system has been in production serving purge by URL requests since July 2022, and the results in terms of latency improvements are dramatic. In addition, flexible purge requests (purge by tag/prefix/host and purge everything) share and benefit from the new coreless purge system’s entrypoint workers before heading to a core data center for fulfillment.

The reason flexible purge isn’t also fully coreless yet is because it’s a more complex task than “purge this object”; flexible purge requests can end up purging multiple objects–or even entire zones–from cache. They do this through an entirely different process that isn’t coreless compatible, so to make flexible purge fully coreless we would have needed to come up with an entirely new multi-purge mechanism on top of redesigning distribution. We chose instead to start with just purge by URL so we could focus purely on the most impactful improvements, revamping distribution, without reworking the logic a data center uses to actually remove an object from cache.

This is not to say that the flexible purges haven’t benefited from the coreless purge project. Our cache purge API lets users bundle single file and flexible purges in one request, so the API Gateway worker and Purge Ingest worker handle authorization, authentication and payload validation for flexible purges too. Those flexible purges get forwarded directly to our services in core data centers pre-authorized and validated which reduces load on those core data center auth services. As an added benefit, because authorization and validity checks all happen at the edge for all purge types users get much faster feedback when their requests are malformed.

Next steps

While coreless cache purge has come a long way since the part 1 blog post, we’re not done. We continue to work on reducing end-to-end latency even more for purge by URL because we can do better. Alongside improvements to our new distribution system, we’ve also been working on the redesign of flexible purge to make it fully coreless, and we’re really excited to share the results we’re seeing soon. Flexible cache purge is an incredibly popular API and we’re giving its refresh the care and attention it deserves.

We protect entire corporate networks, help customers build Internet-scale applications efficiently, accelerate any website or Internet applicationward off DDoS attacks, keep hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.

Source :
https://blog.cloudflare.com/rethinking-cache-purge-architecture/

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

14/05/2022

There is a famous quote attributed to a Netscape engineer: “There are only two difficult problems in computer science: cache invalidation and naming things.” While naming things does oddly take up an inordinate amount of time, cache invalidation shouldn’t.

In the past we’ve written about Cloudflare’s incredibly fast response times, whether content is cached on our global network or not. If content is cached, it can be served from a Cloudflare cache server, which are distributed across the globe and are generally a lot closer in physical proximity to the visitor. This saves the visitor’s request from needing to go all the way back to an origin server for a response. But what happens when a webmaster updates something on their origin and would like these caches to be updated as well? This is where cache “purging” (also known as “invalidation”) comes in.

Customers thinking about setting up a CDN and caching infrastructure consider questions like:

  • How do different caching invalidation/purge mechanisms compare?
  • How many times a day/hour/minute do I expect to purge content?
  • How quickly can the cache be purged when needed?

This blog will discuss why invalidating cached assets is hard, what Cloudflare has done to make it easy (because we care about your experience as a developer), and the engineering work we’re putting in this year to make the performance and scalability of our purge services the best in the industry.

What makes purging difficult also makes it useful

(i) Scale
The first thing that complicates cache invalidation is doing it at scale. With data centers in over 270 cities around the globe, our most popular users’ assets can be replicated at every corner of our network. This also means that a purge request needs to be distributed to all data centers where that content is cached. When a data center receives a purge request, it needs to locate the cached content to ensure that subsequent visitor requests for that content are not served stale/outdated data. Requests for the purged content should be forwarded to the origin for a fresh copy, which is then re-cached on its way back to the user.

This process repeats for every data center in Cloudflare’s fleet. And due to Cloudflare’s massive network, maintaining this consistency when certain data centers may be unreachable or go offline, is what makes purging at scale difficult.

Making sure that every data center gets the purge command and remains up-to-date with its content logs is only part of the problem. Getting the purge request to data centers quickly so that content is updated uniformly is the next reason why cache invalidation is hard.  

(ii) Speed
When purging an asset, race conditions abound. Requests for an asset can happen at any time, and may not follow a pattern of predictability. Content can also change unpredictably. Therefore, when content changes and a purge request is sent, it must be distributed across the globe quickly. If purging an individual asset, say an image, takes too long, some visitors will be served the new version, while others are served outdated content. This data inconsistency degrades user experience, and can lead to confusion as to which version is the “right” version. Websites can sometimes even break in their entirety due to this purge latency (e.g. by upgrading versions of a non-backwards compatible JavaScript library).

Purging at speed is also difficult when combined with Cloudflare’s massive global footprint. For example, if a purge request is traveling at the speed of light between Tokyo and Cape Town (both cities where Cloudflare has data centers), just the transit alone (no authorization of the purge request or execution) would take over 180ms on average based on submarine cable placement. Purging a smaller network footprint may reduce these speed concerns while making purge times appear faster, but does so at the expense of worse performance for customers who want to make sure that their cached content is fast for everyone.

(iii) Scope
The final thing that makes purge difficult is making sure that only the unneeded web assets are invalidated. Maintaining a cache is important for egress cost savings and response speed. Webmasters’ origins could be knocked over by a thundering herd of requests, if they choose to purge all content needlessly. It’s a delicate balance of purging just enough: too much can result in both monetary and downtime costs, and too little will result in visitors receiving outdated content.

At Cloudflare, what to invalidate in a data center is often dictated by the type of purge. Purge everything, as you could probably guess, purges all cached content associated with a website. Purge by prefix purges content based on a URL prefix. Purge by hostname can invalidate content based on a hostname. Purge by URL or single file purge focuses on purging specified URLs. Finally, Purge by tag purges assets that are marked with Cache-Tag headers. These markers offer webmasters flexibility in grouping assets together. When a purge request for a tag comes into a data center, all assets marked with that tag will be invalidated.

With that overview in mind, the remainder of this blog will focus on putting each element of invalidation together to benchmark the performance of Cloudflare’s purge pipeline and provide context for what performance means in the real-world. We’ll be reviewing how fast Cloudflare can invalidate cached content across the world. This will provide a baseline analysis for how quick our purge systems are presently, which we will use to show how much we will improve by the time we launch our new purge system later this year.

How does purge work currently?

In general, purge takes the following route through Cloudflare’s data centers.

  • A purge request is initiated via the API or UI. This request specifies how our data centers should identify the assets to be purged. This can be accomplished via cache-tag header(s), URL(s), entire hostnames, and much more.
  • The request is received by any Cloudflare data center and is identified to be a purge request. It is then routed to a Cloudflare core data center (a set of a few data centers responsible for network management activities).
  • When a core data center receives it, the request is processed by a number of internal services that (for example) make sure the request is being sent from an account with the appropriate authorization to purge the asset. Following this, the request gets fanned out globally to all Cloudflare data centers using our distribution service.
  • When received by a data center, the purge request is processed and all assets with the matching identification criteria are either located and removed, or marked as stale. These stale assets are not served in response to requests and are instead re-pulled from the origin.
  • After being pulled from the origin, the response is written to cache again, replacing the purged version.

Now let’s look at this process in practice. Below we describe Cloudflare’s purge benchmarking that uses real-world performance data from our purge pipeline.

Benchmarking purge performance design

In order to understand how performant Cloudflare’s purge system is, we measured the time it took from sending the purge request to the moment that the purge is complete and the asset is no longer served from cache.  

In general, the process of measuring purge speeds involves: (i) ensuring that a particular piece of content is cached, (ii) sending the command to invalidate the cache, (iii) simultaneously checking our internal system logs for how the purge request is routed through our infrastructure, and (iv) measuring when the asset is removed from cache (first miss).

This process measures how quickly cache is invalidated from the perspective of an average user.

  • Clock starts
    As noted above, in this experiment we’re using sampled RUM data from our purge systems. The goal of this experiment is to benchmark current data for how long it can take to purge an asset on Cloudflare across different regions. Once the asset was cached in a region on Cloudflare, we identify when a purge request is received for that asset. At that same instant, the clock started for this experiment. We include in this time any retrys that we needed to make (due to data centers missing the initial purge request) to ensure that the purge was done consistently across our network. The clock continues as the request transits our purge pipeline  (data center > core > fanout > purge from all data centers).  
  • Clock stops
    What caused the clock to stop was the purged asset being removed from cache, meaning that the data center is no longer serving the asset from cache to visitor’s requests. Our internal logging measures the precise moment that the cache content has been removed or expired and from that data we were able to determine the following benchmarks for our purge types in various regions.  

Results

We’ve divided our benchmarks in two ways: by purge type and by region.

We singled out Purge by URL because it identifies a single target asset to be purged. While that asset can be stored in multiple locations, the amount of data to be purged is strictly defined.

We’ve combined all other types of purge (everything, tag, prefix, hostname) together because the amount of data to be removed is highly variable. Purging a whole website or by assets identified with cache tags could mean we need to find and remove a multitude of content from many different data centers in our network.

Secondly, we have segmented our benchmark measurements by regions and specifically we confined the benchmarks to specific data center servers in the region because we were concerned about clock skews between different data centers. This is the reason why we limited the test to the same cache servers so that even if there was skew, they’d all be skewed in the same way.  

We took the latency from the representative data centers in each of the following regions and the global latency. Data centers were not evenly distributed in each region, but in total represent about 90 different cities around the world:  

  • Africa
  • Asia Pacific Region (APAC)
  • Eastern Europe (EEUR)
  • Eastern North America (ENAM)
  • Oceania
  • South America (SA)
  • Western Europe (WEUR)
  • Western North America (WNAM)

The global latency numbers represent the purge data from all Cloudflare data centers in over 270 cities globally. In the results below, global latency numbers may be larger than the regional numbers because it represents all of our data centers instead of only a regional portion so outliers and retries might have an outsized effect.

Below are the results for how quickly our current purge pipeline was able to invalidate content by purge type and region. All times are represented in seconds and divided into P50, P75, and P99 quantiles. Meaning for “P50” that 50% of the purges were at the indicated latency or faster.  

Purge By URL

P50P75P99
AFRICA0.95s1.94s6.42s
APAC0.91s1.87s6.34s
EEUR0.84s1.66s6.30s
ENAM0.85s1.71s6.27s
OCEANIA0.95s1.96s6.40s
SA0.91s1.86s6.33s
WEUR0.84s1.68s6.30s
WNAM0.87s1.74s6.25s
GLOBAL1.31s1.80s6.35s

Purge Everything, by Tag, by Prefix, by Hostname

P50P75P99
AFRICA1.42s1.93s4.24s
APAC1.30s2.00s5.11s
EEUR1.24s1.77s4.07s
ENAM1.08s1.62s3.92s
OCEANIA1.16s1.70s4.01s
SA1.25s1.79s4.106s
WEUR1.19s1.73s4.04s
WNAM0.9995s1.53s3.83s
GLOBAL1.57s2.32s5.97s

A general note about these benchmarks — the data represented here was taken from over 48 hours (two days) of RUM purge latency data in May 2022. If you are interested in how quickly your content can be invalidated on Cloudflare, we suggest you test our platform with your website.

Those numbers are good and much faster than most of our competitors. Even in the worst case, we see the time from when you tell us to purge an item to when it is removed globally is less than seven seconds. In most cases, it’s less than a second. That’s great for most applications, but we want to be even faster. Our goal is to get cache purge to as close as theoretically possible to the speed of light limit for a network our size, which is 200ms.

Intriguingly, LEO satellite networks may be able to provide even lower global latency than fiber optics because of the straightness of the paths between satellites that use laser links. We’ve done calculations of latency between LEO satellites that suggest that there are situations in which going to space will be the fastest path between two points on Earth. We’ll let you know if we end up using laser-space-purge.

Just as we have with network performance, we are going to relentlessly measure our cache performance as well as the cache performance of our competitors. We won’t be satisfied until we verifiably are the fastest everywhere. To do that, we’ve built a new cache purge architecture which we’re confident will make us the fastest cache purge in the industry.

Our new architecture

Through the end of 2022, we will continue this blog series incrementally showing how we will become the fastest, most-scalable purge system in the industry. We will continue to update you with how our purge system is developing  and benchmark our data along the way.

Getting there will involve rearchitecting and optimizing our purge service, which hasn’t received a systematic redesign in over a decade. We’re excited to do our development in the open, and bring you along on our journey.

So what do we plan on updating?

Introducing Coreless Purge

The first version of our cache purge system was designed on top of a set of central core services including authorization, authentication, request distribution, and filtering among other features that made it a high-reliability service. These core components had ultimately become a bottleneck in terms of scale and performance as our network continues to expand globally. While most of our purge dependencies have been containerized, the message queue used was still running on bare metals, which led to increased operational overhead when our system needed to scale.

Last summer, we built a proof of concept for a completely decentralized cache invalidation system using in-house tech – Cloudflare Workers and Durable Objects. Using Durable Objects as a queuing mechanism gives us the flexibility to scale horizontally by adding more Durable Objects as needed and can reduce time to purge with quick regional fanouts of purge requests.

In the new purge system we’re ripping out the reliance on core data centers and moving all that functionality to every data center, we’re calling it coreless purge.

Here’s a general overview of how coreless purge will work:

  • A purge request will be initiated via the API or UI. This request will specify how we should identify the assets to be purged.
  • The request will be routed to the nearest Cloudflare data center where it is identified to be a purge request and be passed to a Worker that will perform several of the key functions that currently occur in the core (like authorization, filtering, etc).
  • From there, the Worker will pass the purge request to a Durable Object in the data center. The Durable Object will queue all the requests and broadcast them to every data center when they are ready to be processed.
  • When the Durable Object broadcasts the purge request to every data center, another Worker will pass the request to the service in the data center that will invalidate the content in cache (executes the purge).

We believe this re-architecture of our system built by stringing together multiple services from the Workers platform will help improve both the speed and scalability of the purge requests we will be able to handle.

Conclusion

We’re going to spend a lot of time building and optimizing purge because, if there’s one thing we learned here today, it’s that cache invalidation is a difficult problem but those are exactly the types of problems that get us out of bed in the morning.

If you want to help us optimize our purge pipeline, we’re hiring.

We protect entire corporate networks, help customers build Internet-scale applications efficiently, accelerate any website or Internet applicationward off DDoS attacks, keep hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you’re looking for a new career direction, check out our open positions.

Source :
https://blog.cloudflare.com/part1-coreless-purge/