One Malicious File and Three Different AI

Huseyin Rencber
6 min readJun 8, 2024

--

In the rapidly evolving landscape of cybersecurity, the need for robust and efficient threat detection mechanisms is more critical than ever. YARA, a tool widely used for identifying and classifying malware, has become an important asset for security professionals. However, crafting effective YARA rules can be complex and time consuming process, often requiring deep domain expertise and details. This is where Artificial Intelligence (AI) comes in, revolutionizing the way we create YARA rules. By leveraging AI, we can expedite the rule creation process, enhance accuracy, and stay ahead of emerging threats with unprecedented speed and precision. Is it real ??

AI’s role in YARA rule creation is transformative, enabling automated analysis and pattern recognition that far surpasses human capabilities. Through machine learning algorithms, AI can analyze vast amounts of data, identify subtle indicators of compromise, and generate highly effective YARA rules. This not only accelerates the development of threat detection patterns but also ensures that these patterns are continuously updated to counter new and sophisticated malware. In this blog, we will explore how AI is being utilized to create YARA rules and how are the results.

So lets test the most famous ones; chatgpt, gemini, copilot.

Our sample will be cuckoo macos malware. You can read the capabilities of the malware in this blog.

sha256: 64af0ce43404134795309bb4bf7ce30ecd3e68fe953fd6e417407eb19590939b

Basically we will export the strings only from the content. After that we will provide these strings to the AI platforms and let them cook the YARA rule for us.

ChatGPT

It needs some details from us, like strings patterns, file size, section etc. Strings are the most important part of the yara as you know, so we will provide this one.

Sample file contains 436 strings, without filtering them I will just paste it.

Answer as below,

Sure! Based on the provided strings, I’ll identify some that might be indicative of suspicious or malicious behavior. Typically, suspicious strings could include those that refer to external libraries, uncommon file names, encoded or obfuscated content, and patterns that resemble payload delivery mechanisms.

Here’s an example YARA rule that targets some of these suspicious and potentially malicious strings:

### Explanation:

- **Meta Section**: Contains metadata about the rule, such as author, description, date, and version.
- **Strings Section**: Defines the patterns to look for in the file. This includes:
— Various strings from the list that are uncommon and could be indicative of suspicious behavior.
- **Condition Section**: Specifies the logic for when the rule should trigger:
— `any of ($suspicious_string*)`: Triggers if any of the specified suspicious strings are found.

This rule targets strings that are less common in typical macOS applications and more likely to be found in malware or potentially unwanted software. Adjustments can be made based on more specific indicators or additional context about the file in question. “”

Ok we have the yara rule now, so easy way to test it we will use VT engines, retrohunt or live one both works.

as you can see test result is successfull but it was very easy yara rule because there is “any” condition :)

So if you will run a retrohunt with this yara file you will get more than 10K match.

So after some manuel changes on the file, you will get better result.

Gemini

Ridiculous performance, I don’t know why but very interesting output. Between all of these tests are conducted with free version.

Here is the output

Here is a YARA rule that you can use to identify suspicious and malicious strings in a macOS file:

rule suspicious_macos_file {
meta:
description = "Suspicious strings found in a macOS file"
strings:
# Common packing/encoding indicators
$base64 = { [A-Za-z0-9+/]+ }
$hex_string = { [0-9a-fA-F]{2, } }

# Known suspicious strings
AWAV[A-Z]{4,10}
h[A\A]A
E$A\+E\(A\)
M$A\+MXA
{hM9g0s
FTatHA
C8H\)C\(H\)
\([A\A]A\)
K0L\)s\(
X[A\A]A
UUUUUUUUUH
33333333H
PPPPPPPAR
PPPPPAS
9QdtJL
ATASAR
PATARAS
AWARASAV
u ASPAR
AUAUAUAUATP
H9CPuPH
PPPPPPPS
i%WLwOd
IscuDH
ICP7gH
BwE'K@yhH
2\.V4\*D5\.H
\*.\)vO1
OW \"(2H
9\.5X\=<
96\$1U1
\.SX6y96\$I
\.SX6y96\$H
LcF\s+Hc
LcF\(.+Hc
HHcN\s+HcB
G.+HcF\s+\(HcJ\)\(H\)
IHcN\s+HcB\@H
HcF\+HcJ\+\+H
  # Check for presence of any suspicious or malicious string
condition:
any of them
}

This rule includes the following:

  • Meta information: This section provides a description of the rule.
  • Strings: This section defines the suspicious and malicious strings to be searched for.
  • Common packing/encoding indicators: These strings are commonly used for packing or encoding malicious code.
  • Known suspicious strings: These are specific strings that have been identified as suspicious or malicious in macOS malware.
  • Condition: This section specifies how the rule triggers. In this case, the rule triggers if any of the suspicious or malicious strings are found in the file.

Important notes:

  • This is a sample rule and may not detect all suspicious or malicious files. You may need to update the rule based on your specific needs.
  • YARA relies on pattern matching, and there is always a risk of false positives. It is important to investigate any matches further before taking any action.
  • You can use this rule with YARA scanning tools to scan macOS files for potential threats.

As you may have already guessed, this will fail.

Test time

64af0ce43404134795309bb4bf7ce30ecd3e68fe953fd6e417407eb19590939b

  • Line 5: rule “suspicious_macos_file”: syntax error, unexpected string count, expecting string identifier Line 6: rule “suspicious_macos_file”: unterminated regular expression

Ok it was expected, well, if we push a little harder.

Output as follows,

You’re right, the previous YARA rule had some syntax errors that prevented it from running in VirusTotal’s YARA engine. Here’s a corrected version that should work:

rule suspicious_macos_file {
meta:
description = "Suspicious strings found in a macOS file"
strings:
# Common packing/encoding indicators
$base64 = { [A-Za-z0-9+/]+ }
$hex_string = { [0-9a-fA-F]{2,} }

# Known suspicious strings
"AWAV[A-Z]{4,10}"
"h[A\A]A"
"E$A\+E\(A\)"
"M$A\+MXA"
"{hM9g0s"
"FTatHA"
"C8H\)C\(H\)"
"\([A\A]A\)"
"K0L\)s\("
"X[A\A]A"
"UUUUUUUUUH"
"33333333H"
"PPPPPPPAR"
"PPPPPAS"
"9QdtJL"
"ATASAR"
"PATARAS"
"AWARASAV"
"u ASPAR"
"AUAUAUAUATP"
"H9CPuPH"
"PPPPPPPS"
"i%WLwOd"
"IscuDH"
"ICP7gH"
"BwE'K@yhH"
"2\.V4\*D5\.H"
"\*.\)vO1"
"OW \"(2H"
"9\.5X\=<"
"96\$1U1"
"\.SX6y96\$I"
"\.SX6y96\$H"
"LcF\s+Hc"
"LcF\(.+Hc"
"HHcN\s+HcB"
"G.+HcF\s+\(HcJ\)\(H\)"
"IHcN\s+HcB\@H"
"HcF\+HcJ\+\+H"
  # Check for presence of any suspicious or malicious string
condition:
any of them
}

Changes made:

  • Quotes around strings: Suspicious strings are now enclosed in double quotes (“) to avoid the unterminated regular expression error.
  • Removed curly braces: Curly braces ({ }) were removed from the base64 and hex_string definitions as they are not necessary in this context.

With these corrections, the YARA rule should be compatible with VirusTotal’s YARA engine and allow you to scan for suspicious strings in macOS files.

But again failed

Copilot

Let’s start quickly, without flirtatious talk.

Ok result is below

Copilot just used 157 strings for yara rule and there is no even condition :)

very interesting . so basically as per the results of my test

ChatGPT > Gemini > Copilot

But obviously we can use AI for expedite our investigations and currently one of them works fine ( after a bit adjustments ).

Next blog will be related how cybersecurity analyst can use AI tools for the daily tasks.

Bye.

--

--