Tuesday, December 23, 2014

PDF Versions Malicious Content Distribution

While attack vectors based on Malicious PDF are a well known topic (SANS, Didier's tools), understanding how those vectors are spread up nowadays is an interesting "research" (at least in my personal opinion). Recently, Yoroi 's toolset gave me the ability to analize almost 2k PDF per hour, so I decided to analyze an entire hour of captures harvested from many different sources (mainly emails, repositories and http streams) and to put my findings in this quick and dirty post just to fix them in my "diary".

Since PDF are one of the most used document format, attackers figured out how to make them malicious.The following image shows a romantic attack vector used to infect a victim through PDF Malware. The infected PDF wraps up an object content which eventually downloads a payload from Internet (for example a .exe or junk of bytes excetuded "directly in memory") and runs it. The payload might perform several tasks such as: reading/writing fylesystem, executing objects, sniffing passwords, listening for contents, substitute content and so on and so forth, making the original PDF malicious.
Romantic Attack Path driven by PDF
A commonly used way to implement the downloader is through JavaScript which is able to run on PDF in order to introduce simple effects, anchors and dynamisms. The following image shows a simple JavaScript downloader hidden into a PDF.

A Romantic PDF Malware Object
My curiosity was about discovering how many malicious PDF over 2k PDF total I was able to find. By using simple scripts I was able to automize the 'first level of analysis' including:
  • Downloading PDF from internal/external sources
  • Automatize Detection (I borrowed some code from pdfid.py by Didier)
  • Calculating simple statistics on analyzed Malware
A NodeJs downloader script grabs the entire files from Yoroi's internal repository as follows (you might use the same code to download from google or whatever you like).

A simple downloader script
Once the PDF has been locally saved, a python script starts its execution to analyze the PDF content. The following image shows a piece of code taken from Didier's tools (pdfid.py) that has been used to build the automatic first stage analyzer in oder to extract content.

Analyze PDF content from pdfid.py by  Didier Stevens
A post processing static analyzer runs to figure out the "stream content maliciousness". After several hours of computational analysis (ok, performances and timing were not an issues in my case since what I did was just for personal curiosity) I came out with the following results:

Total PDF analyzed: 1988
Total Size on Disk: 1.83GB

Figuring out the most affected PDF version was my next step. The following graph shows the distribution of malicious content (JS, Encrypted and Embedded File) found in 1988 PDFs.

Malicious Content Over PDF Version

If we assume the analyzed set of data as "significant set of data" we might assert that PDF1.1 and PDF1.7 are the most safe PDF versions regarding malicious JS, EncryptedContent and Embedded Executalbes. Less than ten (10) malicious contents were found in both versions. Contrary PDF1.6 and PDF1.4 result as the most "affecteed" PDF versions. But malicious contents might hid after EOF and use the PDF as a passive carrier. The following graph shows the distribution of malicious content found after the End Of File.

Malicious Content after EOF

If we assume the analyzed set of data as "significant set of data" we might assert that PDF version 1.1 and PDF version 1.2 are the most safe versions against malicious content after the End Of File. Surprisingly PDF version 1.7 is not "so safe" anymore. Comparing the averall data I came out with the following pie chart in where we might appreciate the fact that PDF version 1.4 is the most affected of malicious contents. We might see PDF version 1.3, PDF version 1.5 and PDF version 1.6 following it.  

Overall Malicious Content By Type

Not much conclusions here: if you are working with these versions most (1.4,1.3,1.5), you'd better watch out since the probability to get a Malware PDF is higher than other PDF versions.

Just remember we are assuming the data I collected as significant data because comming from many different organizations within different businesses.

I do have an open question so far:
  1. Does it make sense for anti-malware engines ponderate the use of computational resources depending of what PDF version is currently processing? For example: if an anti-Malware is running analysis on PDF version 1.6 should it allocate more computational resource (RAM, CPU, IO, etc.) rather then if it is analysing PDF version 1.1 ?

Thursday, December 4, 2014

Operation Clever

I knew the presence of "Clever" Malware, actually with no real evidence, (at that time I didn't know "Clever" it was its future name) from a cyber friend of mine who worked with me on Malware evasion techniques. I knew Iranian hackers were getting better and better, but what I did not know was the high cyber security level they reached ! (NOTE: PrivEsc is a clear plagiarism of MS10-015 ! I do agree to Cylance).  Cylance did a great job in putting al the information and all the spread analysis together discovering this incredible targeted cyber attack originated from Iran. Are you wondering when and where did we hear about Iranian hackers ? No problem, let's take a look to a clear timeline from Cylance showing Iranian-centric attacks either as victims (on the left) and attackers (on the right)

From Cylance Report
If you are wondering how Cylance  knows about the attacks' origin ... well, the answer is straight into the code. If you reverse Clever Malware (BTW, you want to download it from  here) you'll see : Persian names, most ips and DNA written into the code belong to Iranians, ASN belonging to Iranian companies, the entire infrastructure is hosted in Netafraz.com an Iranian provider, and so on.

The initial compromise techniques according to Cylance where simple and well known even if having them all together into an unique piece of Malware make this attack "spectacular"! Quoting the report:
  • "Initial compromise techniques include SQL injection, web attacks, and creative deceptionbasedattacks – all of which have been implemented in the past by Chinese and Russian hacking teams.  
  • Pivoting and exploitation techniques leveraged existing public exploits for MS08-067 and Windows privilege escalations, and were coupled with automated, worm-like propagation mechanisms. 
  • Customized private tools with functions that include ARP poisoning, encryption, credential dumping, ASP.NET shells, web backdoors, process enumeration, WMI querying, HTTP and SMB communications, network interface sniffing, and keystroke logging. "
One of the most difficult questions to be answered is "What the most attacked country" ? Well, it's going to be easy answering to such a question talking about numbers but considering opportunities and economy speaking... almost all the top countries (economy wise) in the world have been targeted.

Targeted Countries, taken from Cylance Report

Interesting the way the attackers want to make sure the victims are not coming from IRAN. The following image show how the shell client controls the IP location. The code handles the XML response from freegeoip.net, and displays the information as different colors based on different attributes. For instance, if the string “ERROR” is in the response, the text is displayed with the color magenta. If the string IRAN is in the response, the text is displayed with the color red. It should be noted that no other country name contains the substring IRAN. 

Piece of Shell Creator from Cylance Report

The entire system has been detected to use at least two different proxies: CCProxy (a China and MiddleEast based company) and Squid (OpenSource, world wide).  Interesting the way the attackers made use of CCProxy sources [... thinking about it ...] From the proxy configurations Cylance folks figured out IPS, Usernames and Passwords of Command -and- Controls belonging. They did find that domains, usernames and password were attributable to Tarh Andishan. Quoting Cylance Report:

"Tarh Andishan has been suspected in the past of launching attacks in the interest of Iran. The operators of the blog IranRedLine.org, which comments on Iran’s nuclear weapons efforts, has mentioned in multiple posts having been the target of debilitating brute-force authentication attacks from IP addresses registered to the same Tarh Andishan team found in Cleaver. In one of IranRedLine.org’s blog posts8, the author speculates on Tarh Andishan’s involvement with the Iranian government by showing close proximity to SPND, the Organization of Defensive Innovation and Research; however, the phone number listed under the registrant contact information has yet to be completely validated."
The Clever Malware owns many ways to be delivered from spread phising to watering leak. Once the Malware is dropped into the victims PC, it grabs local and network credentials (by using standard techniques) and use them to spread itself through PsExec, SMB shares, DLL injections etc, making it wormable. Clever Malware grabs user infos and sends them to external sources through FTP servers, SMTP Servers, SOAP based servers and if needed ssh controllers. Clever Malware uses a common version of TinyZBot (ut to 2013) to communicate back to ComandAndContols.

It is a pretty nice piece of Malware which, in my personal point of view, shows how easy could be  making a world wide targeted attack having good development skills and wise "underground knowledge". "Undergraund Knowledge" is useful to re-use piece of malware, shellcode generators, encryptors, proxies, spreading techniques, infection vectors, multiple stage infections, etc... in order to avoid new developments or new infection processes; development  skills are useful to fit all the re-used software together and to make it working.