Interview: An artificial world created to protect human privacy: cybersecurity

Interview: An artificial world created to protect human privacy: cybersecurity picture
24/11/2023
#NANTERRE #MASARYK #EDUC-SHARE #CYBERSECURITY #INTERNSHIP

The article was created within the H2020 EDUC-SHARE project framework (Working Package 8 "Dissemination and communication”, Task 8.2 “Internships for student journalists in genuinely scientific environment”).

 

Social media platforms, the number and use of which are increasing day by day, bring many benefits and challenges. Undoubtedly, the first things that stand out among these challenges are data security and the sharing of personal information. This interview with Ján Jančár highlights cybersecurity and its encryption (information or data converting into a code) and includes diverse data confidentiality elements.

PROTECTING DIGITAL INFORMATIONAL ASSETS

What is cybersecurity and how does it function? 

In general, cybersecurity is about protecting some assets, usually digital informational assets from some attackers whether you’re a company, you’re a state or you’re a person. For example, you might be a person shopping online and you don’t want your personal data leaked. It is all the way from users, individuals to governments, states. That for me, is a sort of cybersecurity. 

As I said, I deal mostly with very specific parts of it. We mostly do research on attacks, breaking things, and attacks on cryptography. We try to break sort of things like digital signatures theocratically (the math is broken) or in practice to see the issues before the actual attackers do. 

What about cyber-attacks? I’ve read that Russia has been doing several cyber-attacks on countries like Poland and Latvia for the past years and my question is that are cybersecurity regulations or ethical considerations changing according to countries or is there an international law which determines the attacks or the legal regulations of cybersecurity? 

I think the regulations are related to what is and what you can call a cyber-attack, which varies country by country. From what I know, throughout most of Europe, they are very wide. Lots of things can be a cyber-attack even if you are doing open research, sort of like we are. Obviously, our goal is not to break things, hurt people or so on but we discover issues, problems and sometimes the companies might not like what we find. 

I think according to Czech laws and Slovak laws, they would have some ground to stand on and they are these laws that are wide which cover what we are doing as a sort of cyber-attack even though our aim is not to break anything really. We haven’t gotten in trouble, but I know other groups in other countries who had some problems. For example, one group in the Netherlands, around 2008, they were looking into the security of some smartcards, and they found a critical issue. Essentially the smartcard was used for things like public transport (a smartcard with some credit on it) and they found an issue that they wanted to publish or do some interesting research on it to help the company fix it. However, the company didn’t like it and I think they sued them to make them stop publishing it.  At some points they had to stop but, in the end, they were able to publish it. 

So even as a white head (ethical security hacker) you need to have some legal backing usually. At least the university can help you if you are a university researcher. 

That being said, is there anything that you’re not permitted to do or what are the major limits of cryptography?

For the attacks, the only limit is that it’s advertising something good for the public or reusable for research purposes. It doesn’t work like “let’s break something and then see what’s happening”. The attacks have some value, and the defenders think fixing it has some value. That’s really the only limit. Some companies might choose to not cooperate but that wasn’t very limiting for us so far. 

From a wider cryptography perspective, I can mention 2 things. One of them is this type of quantum computers. The past 50 years of cryptography has a problem that if these big and practical quantum computers ever come into existence (right now some of them exist but they are tiny and not practically usable), most of cryptography will be broken. Around 99% of current cryptography will not work or not be secure anymore. However, there is a big split in people that think “will they ever come” or “will they not ever come”. It’s the bleeding edge of physics research. 

The other interesting development which is really pushing the limits right now is crypto currencies. Not really the currencies themselves or blockchains but usually the associated technologies that they’ve created. Sometimes the cryptography behind these crypto currencies is actually interesting and it allows people to do things like private payments. You can fully make sure if you’re paying someone that no one will display (without the quantum computers of course) to whom and how much you’re paying.

What about the use of AI in this field? I’ve found two AI systems used in cybersecurity: SEC-PALM (used by Google cloud) and Knowledge Graph Embeddings. I would like to know how many different AI systems are used in cybersecurity? Are there a limited number or numerous of them? How is Artificial Intelligence integrated into cybersecurity?

There have been uses of machine learning in cybersecurity but mostly in network type of securities where you have a network of a faculty or a university and someone monitoring the traffic on it. There is a model of AI which analyzes and sees the patterns. 

It is also used to detect viruses. However, it is not really used in cryptography so far. There has been little use of machine learning and AI in cryptography. They are slightly getting used in some fields but even the use of them is very basic. 

For example, the SEC-PALM models (if I looked at it correctly) are top level assistant sort of AI language models. This is a very high-level use with a very strong generic model that I haven’t seen before, and I don’t think many organizations are using it that much. I think that is a very new thing like the original source models (like Chatgpt) which are also quite new, just a few months old basically. I think on the scale of these models, there aren’t that many models because these models are very huge and training them takes huge amounts of resources. I think Google can afford it or maybe Facebook but not many other companies. 

Are open-source information, clouds or shared documents protected by cryptography? And if so, how are they protected? 

That depends on the service. Different services will use different things. Let’s pick an example: Google drive or the whole suite of Google services. I would assume that somewhere on their server, they will likely be encrypted but they will be encrypted with data of lots of other things. However, when you are accessing your data through your browser, usually what will happen is that only the connection between you and the server (or website) will be encrypted to ensure that you are talking to the right server and not to someone who is pretending to be Google.com. Basically, Google or servers can read your stuff if they want to. That is essentially the default. Yes, they do some encryption (also in transit) but they need to work with your data to offer you some services. That is the usual default model and then there’s the other model which is an end-to-end encrypted model. 

The end-to-end encrypted model means that the services don’t have access to your data. The data is encrypted once it leaves your device, and the server doesn’t have access to it. It is a lot harder to do because the server wants to do something with your data. So, there are these new cryptographic concepts which are very advanced, and which might be deployable on a large scale. Right now, this is getting to a point where it can be done. For example, for messaging, this is already done. WhatsApp, Signal or Facebook servers can’t really tell what’s the content of the messages that were exchanged. They of course can tell who is messaging who, how long the messages are, when the messages happened and this sort of meta data, but they can’t tell the content. So, for messaging this is mostly solved. However, for other services like clouds, emails and so on, this is harder. It is slowly getting there. 

Is the use of cookies on websites related to what you’ve just said?

Cookies are many things, but they are mostly used by websites to remember that it is talking to you. Then, also for tracking and so on but yes, there is a cryptographic connection. 

COMBINING THEORY AND PRACTICE: PRIMALITY TESTS

In the article “Fooling primality tests on smart cards” published in “Computer Security – ESORICS 2020”, Ján Jančár and his colleagues explain different primality tests and the methods by which these tests can be tricked. 

What are primality tests and how do they relate to the security of smartcards? 

In cryptography, you often use prime numbers. So, prime numbers are the numbers that are only divisible by themselves or 1. For example 5 is a prime number but 6 is not because it is divisible by 2 and 3. Of course, in cryptography, we are talking about numbers that are much larger than this, such as numbers that have hundreds of digits. How it usually works is that you generate some random hundred digits numbers and then you test if the number is prime. You usually try and try again until you get a prime number. You need a primality test for this. The simplest primality test is that you try to divide a number by all the numbers that are smaller than it. However, for 100-digit numbers, that won’t really work. You’ll never finish. You might divide it by the numbers that are smaller than the square root of the number that you are testing, but that still won’t get you far because, for 100 digits, it will also be going to be huge. 

Then, you have more clever primality tests which are algorithms that remining that the number is prime. You just give a number, and it tells you whether it is prime or not. These algorithms have 2 variants. One variant is so-called deterministic and by deterministic, I mean that the algorithm always gives you the correct answer. It doesn’t use any randomness, it always gives you the correct answer, but it might take a long time for some larger inputs. It gives you proof that the number is prime and if you follow the proof, you can be sure that the number is prime. 

And then, there are probabilistic algorithms which have this property of using randomness. They give you some assurance that the number is prime. They try hard for a while and if they are not able to decompose the number or to prove that the number is not prime, it says that the number seems prime. You can set how long you want them to try on and that somehow determines the probability that it will fail. In this paper we are looking at these probabilistic ones. These algorithms have some guarantees. If you give them random numbers of some size, their error or the probability of error will be small. There are some proofs of this kind, if you give them random numbers, they will give you the right answer 99,999…% of the time. However, all these proofs only argue about what happens with these primality testing under the ideal conditions. It means that the chosen numbers are fully random, and you are not trying to fool the test. 

However, there is this scenario which, on that paper, we are following and building up on, it is the adversarial scenario where you try to fool the test. You know how this is implemented and what algorithm it uses but you may or may not know the numbers chosen by the test. Then, you can try to fool it by constructing your inputs in such a way that increases the probability of error. That was the reasoner before us, and they showed that some of these tests are probabilistic. If they pick their random choices from a small subset or from a big concrete subset of some sort of small trace sets, it is practically possible and computable to find these huge numbers that are actually composite. Meanwhile, the tests say that these numbers are prime. So, it is possible to fool these primality tests. They (reasoners) showed it for the case of cryptographic libraries (software) whereas we tried to look at hardware exploitation. So, we did similar things, but we looked at smart cards. Cryptography implemented in hardware is often very different than implemented in software.

What is the difference between hardware and software?

In this concrete case, the authors of the previous work on software, they were analyzing open-source libraries, which means that they had access to the source code of the library so they could look and see the algorithm, the subset selected by it and the amount of randomness used. Apart from the actual random choices, they knew what the algorithm was doing, what the overall structure was. Whereas, in the smart card or in the hardware case, if you are a researcher like us, you don’t have access to this sort of thing, you only have the smart card. You don’t know the algorithm. So, it is a very different model for researching. Even if we had access to some code or to specification of the hardware, it would still be different in software and hardware because the way you implement cryptography is different in these two worlds. In hardware, let’s say you are designing circuits, you are doing very low-level work and in software, you are programming in some language. You are using different tools, techniques, and languages. The analysis is also different, the way you look at the issues is different.

Whether the companies and the organizations that use smartcards in their systems, are they the ones who do the primality tests or are these tests done by researchers or by some specific organizations?

Primality tests are used in cryptography quite a lot. These tests are something that, for example, will happen when you are generating a cryptographic key. Let’s say that I am getting a new electronic ID. Somewhere out there, there is a certain company or government office where these ID’s are initialized, they stamp my name on it and so on. At that point, the ID generates a key which will be my ID key. At this moment, the smartcard is generating random numbers and doing a primality test. Sometimes the primality test might also be done during a normal operation. Again, I am at the airport, and I have my electronic passport. I put it on the scanner and the passport is doing a cryptographic protocol with the reader to assure the reader that I am who I say I am, but also the reader needs to assure my passport that the reader is actually a valid reader and not just some random person. There as well, there might be a primality test because cryptography uses primes and needs them quite often to work. Essentially, primality tests are used during the lifetime of the smart card when it’s doing some cryptographic thing that needs primes. 

I have one last question which is not really related to your research topic, but what do you think about Chatgpt and the use of Chatgpt in the academic world?

It is very interesting because when we are teaching, we see that students use it for homework. For example, we give them a short questionnaire at the start of one seminar. It is for a small number of points, but they do it and it’s essentially an open book questionnaire, but we found out that it’s also like an open chat questionnaire. Some of the students tried to use both to solve the questionnaire. It’s interesting because some of the students are good at it, meaning that they got a full score whereas some of the students got 0 points. So, the way you use these tools matters quite a lot. I wonder whether that will become a skill that you might be trained in because apparently some people are good at this, and some are not. 

Also, we see them while writing papers. There’s already some saying that publishers are putting out statements about whether you can or can’t use these tools or how you should do so. You should acknowledge that you use them in a way. For example, saying that this paper was edited by using some tool. It is interesting because it does also mean that if I use some editing service to fix my grammar, does that also count? Because there might also be some machine learning there. So, I don’t even know what to think but I don’t believe that it’s something that can necessarily be stopped.

Can Chatgpt break some cryptography or do some major operations?

I think that it’s not there. It can help you develop a code as you said and It’s probably very good at it. I’ve seen some examples of that. I don’t use it personally, but I’ve seen, and you can do that quite well. However, I think cryptography is so far safe from AI. 

The bigger threat for cryptography, to be fully honest, is human errors. Cryptography from a theoretical perspective is very strong. These schemes that people designed 30 years ago are not actually broken theoretically. Usually what goes wrong is some people are implementing them somewhere and they make a mistake and then things get broken, or they could be used wrongly. So, using cryptography properly is also very hard. We have a whole course here to teach bachelor students how to use cryptography and it’s very tricky. We also have several seminars on that, and we barely scratched the surface. Some of our main takeaways are that: “this is really hard so you shouldn’t just use things that the other people did that we know are good”.

 

 

-----------------------------------------------------------------------------------

Ján Jančár’s Profile

Ján Jančár is a research specialist and a PhD student at Masaryk University in the Department of Computer Systems and Communications. He graduated from the same university, and he also studied and did research at “Radboud University” in the Netherlands and “the George Washington University” in the United States. Jančár has been teaching at the University of Masaryk since 2020. His research is mainly focused on the use of cryptography in the field of cybersecurity.

 

Author: Selin Zobu

 

“The project EDUC-SHARE has received funding from the European Union´s Horizon 2020 research and innovation programme under grant agreement No 101017526.”