How to use attribute-based encryption to sell digital products securely.

Our product

Creating artificial intelligence and train machine learning algorithms in special is a hard task. Not just because the algorithms are hard to understand, but training neural networks is the most time-consuming task in developing smart software. For example, if you want to build an app, which is able to detect different objects using the live stream of smartphone cameras, you have to train the neural network, so it can differ between those objects. Therefore you need a dataset containing all the different objects, label them and give them as input to the machine learning model. This is a very time-consuming task. Imagine you want to be able to detect cats and dogs. To train the model, you have to label about 20,000 images to provide some kind of accuracy. In other words, you can build a neural network within days, but creating training data you will need weeks to months. And here comes Labler. Together with four other interested people, we decided to face that problem. If you ever had the scenario, in which you should create a large dataset, I think you would be very interested in existing datasets, which fit your needs. The same goes for buying software. In general, you don’t want to “waste” your time creating the software you need on your own. You just buy it. This analogy let us think about a platform providing datasets for data-scientists. Not only that. We thought about a platform, which offers the possibility to sell datasets made by different individuals, a multi-vendor system. This platform helps creative people to create very interesting products. Further, we eliminate bottlenecks, caused by time-consuming tasks like labeling image datasets.

The architecture

Now, let’s talk about our product in depth. Because this article is about creating a secure mechanism to share digital products among customers, who want to buy datasets to build their own applications, we have to look at a system, which actually tries to sell those kinds of products. As an example, I present the architecture of Labler.

Security analysis

One main problem is the communication between clients and data storage. To view, the records bought the clients need access to specific data stored in the cloud. Further, not only one client can buy a record, but many. So it is important, that there is a mechanism, which checks if clients are actually allowed to access data and prevent unauthenticated access. For example, if a customer wants to buy datasets containing images labeled with different objects, he has to put the product in his shopping cart and finish the transaction. Now the customer should be able to view or download the dataset. But what is stopping him from viewing or downloading other datasets, which he hasn’t bought right now?

Basics

Access-policies

In general, policies, which controls who has access to data are represented by two formats. First, policies can be created using boolean functions. Second, policies can be described using linear secret sharing schemes (LSSS). This section provides a brief introduction to both based on [3].

Bilinear maps

Attribute-Based Encryption

In contrast to traditional secret-key encryption, attribute-based encryption results in ciphers for multiple users instead of just one. Imagine, one wants to upload files on a cloud. To grant access to multiple users using (symmetric) secret-key encryption, every party needs the secret key in order to decrypt the files stored in the cloud. To be more precise, every time someone wants to decrypt those files, there must be some mechanism, which checks the authenticity of the user. In addition, keys must be shared using key exchange schemes. Attribute-based encryption (ABE) combines that in one single protocol. Encrypted data can be decrypted based on the user's properties or assigned access policies, which are assigned by an attribute authority. In general, ABE schemes can be divided into two categories.

Key-Policy Attribute-Based Encryption (KP-ABE)

KP-ABE schemes use access policies for private keys and calculate ciphers based on chosen attributes. As an example, we will define an access policy as A B and send it to a user. In addition, we create a cipher using attribute { A }. If the user tries to decrypt the cipher, it will fail, because AB does not satisfy the used attribute { A }. But if we encrypt a message using { A, B }, the user can decrypt it, because the access policy matches the attributes used for encryption. Summarized, attributes are used for encryption and access policies are used as private keys. A KP-ABE scheme consists of four algorithms and are defined as follows [4].

  1. Setup(n) → (PK, MSK): This algorithm takes as input the security parameter n and returns a set of public parameters PK and a master secret key MSK.
  2. KeyGen(𝔸, MSK) → SK: The KeyGen algorithm takes as input an access structure 𝔸 and the master secret key MSK. A secret key SK will be calculated and returned.
  3. Enc(m, A, PK) → c: The encryption algorithm takes as input a message m, a non-empty set of attributes A and public parameters PK. It outputs the encrypted ciphertext c based on m.
  4. Dec(c, SK) → {m, ⊥}: The decryption algorithm takes as input the ciphertext c and the private key SK. It outputs the decrypted message m or returns an error ⊥, if the decryption failed.

Ciphertext-Policy Attribute-Based Encryption (CP-ABE)

CP-ABE schemes differ from KP-ABE schemes just slightly. The main difference is, that KP-ABE schemes use access policies as keys where CP-ABE schemes use attributes to generate private keys for users. For example, we create a private key for user X using the attribute set { A, B }. In addition, we create another private key for user Y using the attribute set { B }. Now, we encrypt a message using an access policy described by AB. In this case, user X can decrypt the cipher because his private key is generated using attribute set { A, B }. User Y is not able to decrypt the message. A CP-ABE scheme consists of four algorithms and are defined as follows [1, 3].

  1. Setup(n) → (PK, MSK): This algorithm takes as input the security parameter n and returns a set of public parameters PK and a master secret key MSK.
  2. KeyGen(A, MSK) → SK: The KeyGen algorithm takes as input a set of attributes A and the master secret key MSK. A secret key SK will be calculated and returned.
  3. Enc(m, 𝔸, PK) → c: The encryption algorithm takes as input a message m, an access structure 𝔸 and public parameters PK. It outputs the encrypted ciphertext c based on m.
  4. Dec(c, SK) → {m, ⊥}: The decryption algorithm takes as input the ciphertext c and the private key SK. It outputs the decrypted message m or returns an error ⊥, if the decryption failed.

Single-/Multi-Authority

Generating the private keys using policies or attributes can be done in two different ways. On the one hand, a central authority can be involved, which will generate keys for all users. Such kind of system is called single-authority or central-authority. This instance is generating all of the keys involved in the scheme and must be trusted. This might be a risk because the central-authority has access to all private keys. For example, it can decrypt all ciphertexts inside the running system using the generated private keys, which are actually intended for the users [3].

Implementation

In this section, the attribute-based encryption will be applied to our product. In previous sections only the interfaces are presented and need to be defined more precise. In special, we first need to decide, which kind of ABE scheme, (KP-ABE or CP-ABE) we want to use. Then we need to implement the presented algorithms based on the chosen scheme. Also, we need to decide whether to involve a single- or multi-authority to generate trust among users.

Conclusion

In this article I discussed the problem behind creating intelligent software. With a team consisting of five people we target the time consuming task generating datasets for machine learning algorithms and together we created the online marketplace Labler. One can use this platform to search for datasets, which fits the needs of individual machine learning projects. Then I presented the architecture of the platform to analyze possible security threats. Because Labler provides bought datasets over its own cloud storage, there needs to be a mechanism to share datasets based on users properties. For example, it should only be possible to access data, if the user actually paid for it. In another possible scenario, a premium user should always be able to access and encrypt data from the cloud. Therefore I presented a solution using attribute-based encryption. In particular, I discussed the implementation of ciphertext-policy attribute-based encryption, which uses attributes of users to decrypt ciphertexts of the system and the difference between multi- and single-authority schemes.

References

[1] B. Waters. Ciphertext-policy attribute-based encryption: An expressive, efficient, and provably secure realization. Cryptology ePrint Archive, Report 2008/290, 2008. https://eprint.iacr.org/2008/290

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store