Picviz Labs

Monday, October 22, 2012

A Collective Approach To Fraud Detection

Fraudulent activity in today’s business world is a booming industry in its own right.

More and more organised criminal groups and individuals actively seek and implement new ways in which they can extort, steal and extract valuable intellectual property, funds, business intelligence and confidential data from companies of all sizes within all industries. Unfortunately, most of this theft often takes place without the knowledge of the organisation/s affected until well after the event has occurred and the money, data or IP is already lost, abused, sold on or made public.

Adding further complexity to the fight against these damaging fraudulent activities is the huge and ever-growing reliance of and advances in technology that place cyber-crime and hacking expertise at the fore-front of this conduct.

How should companies defend themselves against fraud?

What is the best fraud detection practice for businesses today (and for the future) to help them quickly, easily and pro-actively detect fraudulent activity when it happens internally or by an attack from external source?

A common and sometimes successful approach is to assemble an internal fraud detection team. Hiring experienced security or data specialists to implement a complex, technical and expensive monitoring system that only they can extract the relevant data from is a familiar practice.

Although this sounds like a rational way to address such an important need within a company, it can instead lead to a costly, restrictive and limited solution. Despite their best efforts, the security and data specialists may only be in a position to discover ‘known’ methods of fraudulent behaviour. Other experienced staff within a company, often better placed to notice fraudulent activity in and around their own areas of expertise and responsibility, may never get the chance to detect acts of fraud occurring directly around them.

Key questions can be derived from this method of managing fraud detection:

Can a small team of experts effectively produce the best results in their employer’s fight against fraud?

Or instead could a collective solution, involving many people within an organisation (not just security or data specialists) intuitively highlighting fraudulent behaviour, produce better, more accurate results?

Below is a very recent scenario that can qualify these key questions.

Head of fraud detection team steals from her own employer

In September 2012, the press heavily reported on the criminal case of a former Head of Online Security stealing £2.4 million from the UK’s Lloyds Bank, where she was a respected, senior member of staff. In summary, the person in charge of managing the bank’s own online fraud detection team and relevant systems decided that she was working harder and longer hours than her decent basic salary warranted. Subsequently, to top-up her income, she submitted fake invoices for technology based projects and services over a 3 year period that never actually happened or existed!

There are many things wrong with this very real scenario. That Lloyds allowed such unsophisticated, fraudulent activity to continue undetected for so long is quite incredible. Furthermore, the person in charge of detecting fraudulent activity for the firm, a person of trust and responsibility, fell into the trap of defrauding her own employers. Unbelievably, she simultaneously maintained the company’s extensive fraud detection environment! The concept of a ‘collective approach’ to fraud detection may never have allowed this situation to occur.

A company the size of Lloyds Bank may be able to absorb such a financial loss. The reputational damage caused by the perception that they cannot be trusted to protect their customers’ funds and confidential data could actually cost them a lot more. Independent of Lloyds Bank, losing over £2 million may have put a smaller company out of business completely before any concern of reputational damage had even arisen.

The 'Collective Approach' - A smarter way to manage fraud detection

Consider involving the majority of staff within an organisation in the fraud detection process rather than limiting it to a select few experts.

Imagine your company, typically sub-divided into teams, departments and areas of specific business functions, implemented a solution that was relevant and available to each individual business group. The systems, processes, input and output commonly utilised and produced by these individual groups would be accessible via this solution. The staff within each team, as a slight extension of their existing roles and utilising the familiarity and experience of their own function, can now actively notice and report suspicious and possibly fraudulent behaviour themselves. In theory, operating this type of collective solution applies the practice of ‘Neighbourhood Watch’ to fraud detection within a business, with staff keeping an eye out on their local environment.

In turn, this practice will create an army of people fighting fraud detection within a company, all specialists in their own area of employment and readily aware of what to look out for. This collective approach would also condense the need for a dedicated, isolated and expensive fraud detection team and establish a greater level of responsibility and awareness by all staff within a company to the pitfalls of fraudulent behaviour.

For the detractors out there, claiming that this could instead create a distracting culture of blame and distrust within a business, and that some staff may lack the motivation to participate in such an approach, think again. If some form of un-detected fraud causes huge financial losses for a company, putting jobs and salaries at risk, any diligent employee would pro-actively participate in this type of collective solution to ensure that situation is never allowed to occur.

Here at Picviz, we are working to provide this type of cost effective, collective solution, to empower companies of all sizes to better manage their fraud detection needs. Be sure to get in touch with us to learn more about our solution and how your fraud detection practices can easily be enhanced for the future.

Dean Edwards

Picviz Labs - 2012 Assises de la Sécurité Award Winner for Innovation

www.picviz.com

@picviz

@deanedwards78

Wednesday, May 02, 2012

A Concise Introduction to using CSS with Qt Classes and Custom-Made Classes

Qt is a really nice, efficient framework. It's a real pleasure to be able to style objects with CSS-like declarations. However, I had a hard time to make my own Qt widgets interact faithfully with my CSS declarations.

I think the main reason is because the most relevant part of Qt's documentation is a bit scattered in it's structure and is not easily linked when you browse through it.

This post is a quick introduction and summary of what you should know to work efficiently with Qt's stylsheets.

Note!: This post is about Qt version 4.8. It should also be valid, more or less, for older or newer versions of Qt.

Note! #2: This post is not a tutorial. it is intended as a collection of pointers to the most relevant part of Qt's documentation.

The Very Basics

First of all, if you are not very familiar with CSS stylesheets, or if you think you forgot how selectors work in some cases, you should have a look at Qt's Style Sheet Syntax.

Secondly, if you want some information on the way the Box Model works, and what content/padding/border/margin means, go to the Customizing Qt Widgets Using Style Sheets page.

Don't miss the explanations on sub-controls at the end of the page: they are very specific for Qt widgets.

The Reference Document

You would be wise to print or bookmark the Qt Style Sheets Reference. Whenever you need to know what can be done with Qt Style Sheets, and how to do it, you will find the answers within the reference.

If you need some examples for a specific kind of widget, take a look at Qt Qtyle Sheets Examples.

What to do with your Own Widgets?

Sometimes, in your own project, you need to derive from a QWidget. If you do so, you need to pay special attention to what is said in the Qt Style Sheets Reference about QWidget, especially if you still wish to use Qt Style sheets to customize the look and feel of your own classes:

If you subclass from QWidget, you need to provide a paintEvent for your custom QWidget as below:

 void CustomWidget::paintEvent(QPaintEvent *)
 {
     QStyleOption opt;
     opt.init(this);
     QPainter p(this);
     style()->drawPrimitive(QStyle::PE_Widget, &opt, &p, this);
 }

The above code is a no-operation if there is no stylesheet set.

Warning: Make sure you define the Q_OBJECT macro for your custom widget.

Updating CSS: don't recompile! Reload Style Sheets...

When you are in the process of fine-tuning your Style Sheets and when you constantly need to check the result of these small modifications, it is not always practical to constantly recompile your project.

I usually define a global (application-wide) shortcut that dynamically reloads the main Qt Style Sheet from a specific location on the hard drive (and not from the resources pseudo-file system because these resources can only be modified by recompiling the project...).

This process can save a lot of time!

For those interested by this way of working, here is a quick'n dirty way of doing it. It is based on the idea of accessing a file on the hard drive and not via the Qt Resource System. Otherwise a compilation would be needed and this is what we are trying to avoid!

Here's how you can trap a certain Key Event of YourWidget (let's say it is the $ key...):

void YourWidget::keyPressEvent(QKeyEvent *event) {

[...]

case Qt::Key_Dollar:
{

            // We access the file and load it
            QFile css_file("/path/to/your/css/file/gui.css");
            css_file.open(QFile::ReadOnly);
            QTextStream css_stream(&css_file);
            QString css_string(css_stream.readAll());
            css_file.close();

            // We apply the CSS file
            setStyleSheet(css_string);
            setStyle(QApplication::style());
            break;
        }

}

Good Luck!

[Posted by PhS]

Wednesday, March 14, 2012

Our CanSecWest 2012 slides on passive DNS and Picviz

Alexandre Dulaunoy from CIRCL.LU and Sebastien Tricaud from Picviz Labs have been talking at CanSecWest 2012 in Vancouver, Canada, on how to scrutinize a country using passive DNS and Picviz.

It has been a great conference, and the opportunity was taken to share a joint research project on how to run passive dns services and rank bgp AS on one side, and analyze huge datasets using Picviz and its visualization on the other side.

The slides are available here for download. Enjoy!

Thursday, January 26, 2012

Syrian Bluecoat logs analysis - part 1

Back to October 2010, Telecomix released 54 Gb of compressed BlueCoat SG-9000 logs (7 out of 15 proxies) covering the period from 2011 July 22nd to 2011 August 5th. Logs can be grabbed from http://tcxsyria.ceops.eu/95191b161149135ba7bf6936e01bc3bb .

Having such logs is really cool, because there aren't much free logs available out there. I mean, real and usable logs (not just logs containing attacks nor normal traffic, but both). People are still writing papers using old DARPA dataset from 1998!

This is a great way for us to demonstrate our technology, as Picviz Inspector is able to handle big log data analysis. As we've found some cool stuff during a quick analysis (the whole process took about thirty minutes) we think it is worth sharing it.

Computer used for the analysis

We've used our ASPI L 192 station, which is made of two Intel Xeon 2.66GHZ CPU that have 12 cores each. 12 RAM strips of 16Gb each and two graphic cards: one nVidia Quadro 5000 and one nVidia Tesla C2050.

This is a great machine to compile your code in record time :-)

We need such a machine because we want big data visualization with interactivity.

Data overview

$ file SG_main__420722212535.log
SG_main__420722212535.log: ASCII text, with very long lines, with CRLF line terminators

When looking at the data, raw files show things like (just 2 events):
2011-07-22 20:34:51 282 ce6de14af68ce198 - - - OBSERVED "unavailable" http://www.surfjunky.com/members/sj-a.php?r=44864 200 TCP_NC_MISS GET text/html http www.surfjunky.com 80 /members/sj-a.php ?r=66556 php "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.65 Safari/534.24" 82.137.200.42 1395 663 -
2011-07-22 20:34:51 216 6154d919f8d56690 - - - OBSERVED "unavailable" http://x31.iloveim.com/build_3.9.2.1/comet.html 200 TCP_NC_MISS GET text/html;charset=UTF-8 http x31.iloveim.com 80 /servlets/events ?1122064400327 - "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.18) Gecko/20110614 Firefox/3.6.18" 82.137.200.42 473 1129 -

When formatted properly, one event looks like:

You can see 25 dimensions per event, some are empty, some have been replaced (c-ip) with a hash value to avoid finding real guys offending the government by some random people doing the analysis!

We open the log with in Picviz using our Rapid Log Acquisition, as you can see in this whitepaper.
While focusing on one field may be cool to establish a top 10 in a pie chart as you can see there: http://hellais.github.com/syria-censorship/, it is insufficient to have a global and detailed view of those logs, all those logs.

As Parallel Coordinates is the only technique that can plot such large data with so many dimensions without letting the user away from them (via a top/least-something or only looking at a maximum of three dimensions), we have decided to plot them in Picviz so we could start looking at them and quickly find cool stuff into it.

If you want more information on Parallel Coordinates, I recommend you to go read this page.

As this is a rather quick analysis (writing this blog post takes more time!), there will be of course more articles of stuff we can extract from those logs, but that will not cover the basics and it will come in other parts.

First of all, let's have a look at those data:

We have here the global structure of the data. Some dimensions have been removed as they were all empties in the log file: cs-username, cs-auth-group and x-virus-id. Some other were added (splitting the cs(Referer) field in 6 dimensions to have a better understanding of the Referer URL (protocol, domain only, TLD, port, URL, variable added to the URL).

Data analysis

Tracking Zeus

Zeus is a rather famous botnet. For more information on Zeus, you can read what had been written by the Polish CERT.

First, let's have a look at Zeus domains, using the regular expression defined by the excellent Polish CERT:

[a-z0-9]{32,48}\.(ru|com|biz|info|org|net)

This gives the following selection:

Interestingly, we can see that in this period of time, only one user is affected, the expression matches the following four domains:

df600de61d94e3e43300a2160d3d72f4.info
ebook.howtoviewprivatefacebookprofiles.com
howtoviewprivatefacebookprofiles.com
www.effectivetimemanagementstrategies.com

As for the c-ip record, they all match the IP "0.0.0.0" and not the user hash we are aware of in most of the log.

Finding funny User Agents

The User Agents dimension is always full of surprises. We decide to apply a filter on its frequency of appearance using the log function in order to separate the small values clearly from the other.

When working with sorted uniques values, we've got a lot of cool stuff. The list is about 50k entries. Things that could look like a parser issue have been double checked and they are not. That are the real user agent that have been placed there, as the other fields have been filled correctly. Among the stuff that we enjoyed, we have:

Mozila/4.0 (compatible; MSIE 5.0; LEAKCHECK)
%7BPRODUCT_NAME%7D/1.7.6 CFNetwork/485.13.8 Darwin/11.0.0
%D8%B1%D8%B3%D8%A7%D8%A6%D9%84%20%D8%A7%D9%84%D8%AD%D8%A8/1.1.0 CFNetwork/485.13.9 Darwin/11.0.0
Microsoft(r) Windows(tm) FTP Folder
'%22()&%1<ScRiPt >prompt(953201)</ScRiPt>
QSP 196:3[0] R{81388-}
Î´ÂšÂ°ï¿½â€™s://ieframe.dll/background_gradient.jpg
1pB4kE1pB1m1wnG882g5_sxigw002284sn0k85gzEjBARMTEuMC4yLjU1Ng==

We filtered the last one to understand what kind of request could generate something that looked like (but isn't) base64 encoded stuff or a random hash value. First, we though it could be covered channels. It isn't. We've found this:

And with the associated data (one event in 645):
2011-08-02,11:21:23,34,0.0.0.0,-,-,-,OBSERVED,unavailable,,-,,,,{NULLCHAR}00,TCP_HIT,GET,application/octet-stream,http,dnl-18.geo.kaspersky.com,80,/index/u0607g.xml.dif,-,dif,1pBqgBumBovkhvCgvk6rx6ssywkr9qo0115t2w0oCUARMTEuMC4xLjQwMA==,82.137.200.42,774,272,-

All those different values were associated with the domain ".{3}-\d+.geo.karsperky.com". We wonder why such a user agent is being used.

Conclusion

This is a first attempt to analyze globally this large volume of logs. It is very fortunate for log analysts to have such a great resource. We would like to thank Telecomix for sharing this. It is great to see how the Picviz approach to those data can be successful to find stuff quickly. Stuff we were not looking for.

We will share more analysis on this blog in the future, you will see some interesting domain names that are being blocked at the moment (live.com, yahoo mail etc.) by the Syrian regime. And as we have finally the pleasure to work interactively with so much data and dimensions, we will of course find interesting stuff we are not aware of at the moment.

If you have any comments, feedback and questions, do not hesitate as it can help us to improve the following articles.

Wednesday, January 25, 2012

Picviz Labs blog grand opening!

Welcome to our fresh new blog where we will post life at Picviz Labs, along with cool data analysis!