Analysis of large log files

Kasper Laursen s093078

Kongens Lyngby 2012 IMM-B.Sc.-2012-37

Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 4525 3351, Fax +45 4588 2673 reception@imm.dtu.dk www.imm.dtu.dk IMM-B.Sc.-2012-37

Summary

This thesis covers pattern recognition of large log files using clustering analysis in form of mini-batch K-means clustering and data fitting, to find abnormal traffic in network flows provided by DeIC, formerly The Danish Research Network.

The implementation is a modified clustering algorithm using the Mahalanobis distance. In the analysis, more than 109 network flows from a single day was split into different clusters, and outliers were detected. The calculations of the clustering analysis took less than 13 hours, which means that outliers can be detected the following day. The implementation and analysis could be further improved by selecting a different set of fields from the log files, a parallel imple- mentation of the mini-batch K-means clustering algorithm and a more thorough analysis of the detected outliers.

Preface

This bachelor thesis was prepared at the department of Informatics and Math- ematical Modelling at the Technical University of Denmark in fulfillment of the requirements for acquiring a B.Sc.Eng. degree in Software Technology.

Lyngby, 14 December 2012

Kasper Laursen

Acknowledgements

I would like to thank my supervisor Robin Sharp for weekly meetings and sup- port through the whole project.

Tanks to The Danish Research Network for providing network log files for this analysis.

I would like to give a special thanks to Rasmus Jul Hansen for proofreading this project, thanks to Simon Laursen for discussion and finalizing the report and thanks to SÃ¸ren LÃ¸vborg for proofreading, help and discussion through the whole project phase.

Contents

1 Introduction 1

2 Preliminaries 3 2.1 Machine learning . . . . . .

Analysis of large log files

Citation styles:

More Computer Science essays:

Computer crime

CMIP vs. SNMP - Network Management Protocols

Netscape Plug-Ins

Is your information safe?

Questions of Ethics in Computer Systems and their Future

Students & Profs. say about us:

We've been mentioned in:

Where our 375,000 members study:

Popular essays:

Computer Information Paper

The road ahead by bill gates

History Of Computer

A Review of Computer Abilities

Computer Engineering

WriteWork

Short-cuts

Research

WRITING GUIDES