How to use attribute-based encryption to sell digital products securely.
Imagine you have an online marketplace and you want people to be able to sell and buy digital products. Now, someone wants to buy a decent amount of data using your platform. In eCommerce, not only the way of storing this data is important, but the ability to share it in a secure way. Often customers will receive download links to bought data and the platform needs to make sure, that only those buyers are able to download it. In other words, there must be protection against unauthenticated access to files of the platform. This article will target this problem and provides a secure mechanism to share digital products in a secure way.
Our product
Creating artificial intelligence and train machine learning algorithms in special is a hard task. Not just because the algorithms are hard to understand, but training neural networks is the most time-consuming task in developing smart software. For example, if you want to build an app, which is able to detect different objects using the live stream of smartphone cameras, you have to train the neural network, so it can differ between those objects. Therefore you need a dataset containing all the different objects, label them and give them as input to the machine learning model. This is a very time-consuming task. Imagine you want to be able to detect cats and dogs. To train the model, you have to label about 20,000 images to provide some kind of accuracy. In other words, you can build a neural network within days, but creating training data you will need weeks to months. And here comes Labler. Together with four other interested people, we decided to face that problem. If you ever had the scenario, in which you should create a large dataset, I think you would be very interested in existing datasets, which fit your needs. The same goes for buying software. In general, you don’t want to “waste” your time creating the software you need on your own. You just buy it. This analogy let us think about a platform providing datasets for data-scientists. Not only that. We thought about a platform, which offers the possibility to sell datasets made by different individuals, a multi-vendor system. This platform helps creative people to create very interesting products. Further, we eliminate bottlenecks, caused by time-consuming tasks like labeling image datasets.
The architecture
Now, let’s talk about our product in depth. Because this article is about creating a secure mechanism to share digital products among customers, who want to buy datasets to build their own applications, we have to look at a system, which actually tries to sell those kinds of products. As an example, I present the architecture of Labler.
The webserver is configured as Apache web server and contains all the files needed to render the website. Every time a client wants to visit a webpage, it sends requests to the webserver and receives the needed files. The website itself is created using WordPress as backend and some plugins, which enables multi-vendor functionality, so everybody is able to sell products using the user interface of Labler. Besides the WordPress installation, there is the data storage, which contains all datasets uploaded by vendors. These datasets are meant to be sold and only the webserver should be able to store and change data inside the storage. In other words, if a vendor wants to publish new datasets, the communication needs to flow through the webserver and it is the webserver that decides, what to do. On the other hand, clients, who buy datasets should be able to download or at least view the bought data. Therefore clients must have access to the files directly and need to communicate with the data storage.
Security analysis
One main problem is the communication between clients and data storage. To view, the records bought the clients need access to specific data stored in the cloud. Further, not only one client can buy a record, but many. So it is important, that there is a mechanism, which checks if clients are actually allowed to access data and prevent unauthenticated access. For example, if a customer wants to buy datasets containing images labeled with different objects, he has to put the product in his shopping cart and finish the transaction. Now the customer should be able to view or download the dataset. But what is stopping him from viewing or downloading other datasets, which he hasn’t bought right now?
One possible solution might be the encryption of records using traditional methods like symmetric or asymmetric encryption schemes. Using symmetric encryption means, that every record is encrypted using private keys and can only be decrypted using the same private keys. Therefore every customer, who has bought one record, must receive the key for this special record.
Another method would be to generate local copies of bought datasets and encrypt them using the public/private key of the buyer. It is clear, that this method is inefficient in terms of memory management. Also, there has to be a mechanism to map customers, keys, products, and more properties like if a user has some kind of premium status. In this case, a user is always allowed to access datasets.
Because the access to data is in relation to users' attributes, we can consider attribute-based encryption (ABE), which is a relatively new approach to encrypt and grant access to data based on specific properties of a user. It can be seen as encryption and access-control combined. In the following, I will give a brief introduction to the theory of attribute-based encryption and how it can improve the security of the architecture of Labler. Of course, it can be applied to any other related systems with the same requirements.
Basics
Access-policies
In general, policies, which controls who has access to data are represented by two formats. First, policies can be created using boolean functions. Second, policies can be described using linear secret sharing schemes (LSSS). This section provides a brief introduction to both based on [3].
Definition 1 can be interpreted as follows. All supersets of each element B ∈ 𝔸 must also be in 𝔸. The following example should make this clearer.
Example: Let 𝕌 = { 1, 2, 3, 4 } be an universe and 𝔸 ⊆ 2^𝕌 an access structure. The access structure 𝔸 = { {1, 2}, {3, 4} } is not monotone, because element {1, 2, 3} is not part of 𝔸. The access structure 𝔸 = { {3, 4}, {1, 3, 4}, {1, 2, 3, 4} } is monotone, because at least one superset for each element ∈ 𝔸 is also in 𝔸.
Example: Let 𝕄 be the set of all matrices with l rows and n columns and let every element of any matrix M ∈ 𝕄 be in {0, 1}. Let ρ : ℕ × 𝕄 → ℤⁿ denote a function, which returns the i-th row of a matrix ∈ 𝕄, where i ∈ ℕ ∧ 0 ≤ i < l. In addition, every row of M ∈ 𝕄 is labeled by a party Pi, such that ρ(i, M) returns the share of the given party index.
Let’s take a look at matrix M and the combination { P2, P4 }. Using the function ρ we receive the following rows.
To check, if the combination of both parties is authenticated, we simply just calculate the sum of all row vectors. If the sum results in (1, 0, 0, 0), then we know, the combination is valid.
This example shows the combination { P2, P4 } is valid and authenticated.
Bilinear maps
Example: Let (G, +) denote a group and x, y z ∈ G. Let (H, *) denote a multiplicative Group and e : G × G → H a bilinear map. Then the following applies.
Attribute-Based Encryption
In contrast to traditional secret-key encryption, attribute-based encryption results in ciphers for multiple users instead of just one. Imagine, one wants to upload files on a cloud. To grant access to multiple users using (symmetric) secret-key encryption, every party needs the secret key in order to decrypt the files stored in the cloud. To be more precise, every time someone wants to decrypt those files, there must be some mechanism, which checks the authenticity of the user. In addition, keys must be shared using key exchange schemes. Attribute-based encryption (ABE) combines that in one single protocol. Encrypted data can be decrypted based on the user's properties or assigned access policies, which are assigned by an attribute authority. In general, ABE schemes can be divided into two categories.
Key-Policy Attribute-Based Encryption (KP-ABE)
KP-ABE schemes use access policies for private keys and calculate ciphers based on chosen attributes. As an example, we will define an access policy as A ∧ B and send it to a user. In addition, we create a cipher using attribute { A }. If the user tries to decrypt the cipher, it will fail, because A ∧ B does not satisfy the used attribute { A }. But if we encrypt a message using { A, B }, the user can decrypt it, because the access policy matches the attributes used for encryption. Summarized, attributes are used for encryption and access policies are used as private keys. A KP-ABE scheme consists of four algorithms and are defined as follows [4].
- Setup(n) → (PK, MSK): This algorithm takes as input the security parameter n and returns a set of public parameters PK and a master secret key MSK.
- KeyGen(𝔸, MSK) → SK: The KeyGen algorithm takes as input an access structure 𝔸 and the master secret key MSK. A secret key SK will be calculated and returned.
- Enc(m, A, PK) → c: The encryption algorithm takes as input a message m, a non-empty set of attributes A and public parameters PK. It outputs the encrypted ciphertext c based on m.
- Dec(c, SK) → {m, ⊥}: The decryption algorithm takes as input the ciphertext c and the private key SK. It outputs the decrypted message m or returns an error ⊥, if the decryption failed.
Ciphertext-Policy Attribute-Based Encryption (CP-ABE)
CP-ABE schemes differ from KP-ABE schemes just slightly. The main difference is, that KP-ABE schemes use access policies as keys where CP-ABE schemes use attributes to generate private keys for users. For example, we create a private key for user X using the attribute set { A, B }. In addition, we create another private key for user Y using the attribute set { B }. Now, we encrypt a message using an access policy described by A ∧ B. In this case, user X can decrypt the cipher because his private key is generated using attribute set { A, B }. User Y is not able to decrypt the message. A CP-ABE scheme consists of four algorithms and are defined as follows [1, 3].
- Setup(n) → (PK, MSK): This algorithm takes as input the security parameter n and returns a set of public parameters PK and a master secret key MSK.
- KeyGen(A, MSK) → SK: The KeyGen algorithm takes as input a set of attributes A and the master secret key MSK. A secret key SK will be calculated and returned.
- Enc(m, 𝔸, PK) → c: The encryption algorithm takes as input a message m, an access structure 𝔸 and public parameters PK. It outputs the encrypted ciphertext c based on m.
- Dec(c, SK) → {m, ⊥}: The decryption algorithm takes as input the ciphertext c and the private key SK. It outputs the decrypted message m or returns an error ⊥, if the decryption failed.
Single-/Multi-Authority
Generating the private keys using policies or attributes can be done in two different ways. On the one hand, a central authority can be involved, which will generate keys for all users. Such kind of system is called single-authority or central-authority. This instance is generating all of the keys involved in the scheme and must be trusted. This might be a risk because the central-authority has access to all private keys. For example, it can decrypt all ciphertexts inside the running system using the generated private keys, which are actually intended for the users [3].
To minimize the security risks of trusting a single-authority, multi-authority schemes are invented. Instead of a single central-authority, the private keys are generated by multiple instances. Therefore, a single-authority cannot reconstruct any secrets or decrypt any ciphertexts itself, as it does not have all the parts of the actual key [2]. In addition, this method relieves everyone involved because the key management is realized through multiple instances [3].
Implementation
In this section, the attribute-based encryption will be applied to our product. In previous sections only the interfaces are presented and need to be defined more precise. In special, we first need to decide, which kind of ABE scheme, (KP-ABE or CP-ABE) we want to use. Then we need to implement the presented algorithms based on the chosen scheme. Also, we need to decide whether to involve a single- or multi-authority to generate trust among users.
We start with the decision whether to use KP-ABE or CP-ABE. As already mentioned, KP-ABE uses access policies to generate private keys and attributes to encrypt data. In other words, users receive policies, which are used to check if encrypted ciphers can be decrypted by the user. CP-ABE in contrast uses attributes to generate private keys for users and access policies to encrypt data. Here we can assign properties to user accounts and create policies to control, which properties in combination can decrypt ciphers. With CP-ABE we can therefore better control which type of user can access which data, since the ciphers contain the rules that have to be met for decryption.
Now, we need to examine the algorithms and implement the interfaces of CP-ABE. The construction is based on the most efficient construction of [1].
Setup(n):
KeyGen(A, MSK):
Enc(m, 𝔸, PK):
Dec(c, SK):
Now that we implemented the algorithms used in CP-ABE we need to decide whether to use a multi-authority or single-/central-authority system. Because Labler sells datasets and therefore customers have to trust the service, it is not necessary to implement a multi-authority scheme. Multi-authority scheme are often used in cloud contexts, where users want to store and encrypt data, such that the service itself is not able to read it. To prevent the service from being able to read data, multiple authorities are involved to create the secret keys used to encrypt the users data. In the case of Labler, the data is meant to be used and will be sold by Labler. In particular, Labler must be able to verify uploaded datasets and images for copyright infringements. Therefore, a single-authority scheme fits the needs of the system.
Conclusion
In this article I discussed the problem behind creating intelligent software. With a team consisting of five people we target the time consuming task generating datasets for machine learning algorithms and together we created the online marketplace Labler. One can use this platform to search for datasets, which fits the needs of individual machine learning projects. Then I presented the architecture of the platform to analyze possible security threats. Because Labler provides bought datasets over its own cloud storage, there needs to be a mechanism to share datasets based on users properties. For example, it should only be possible to access data, if the user actually paid for it. In another possible scenario, a premium user should always be able to access and encrypt data from the cloud. Therefore I presented a solution using attribute-based encryption. In particular, I discussed the implementation of ciphertext-policy attribute-based encryption, which uses attributes of users to decrypt ciphertexts of the system and the difference between multi- and single-authority schemes.
References
[1] B. Waters. Ciphertext-policy attribute-based encryption: An expressive, efficient, and provably secure realization. Cryptology ePrint Archive, Report 2008/290, 2008. https://eprint.iacr.org/2008/290
[2] S. Kim. Multi-Authority Attribute-Based Encryptionfrom LWE in the OT Model. Cryptology ePrint Archive, Report 2019/280, 2019. https://eprint.iacr.org/2019/280.pdf
[3] S. Belguith, N. Kaaniche, M. Laurent, A. Jemai, and R. Attia. PHOABE: Securely outsourcing multi-authority attribute based encryption with policy hidden for cloud assisted IoT. Computer Networks, Volume 133, 2018. https://doi.org/10.1016/j.comnet.2018.01.036
[4] N. Attrapadung. Fully Secure and Succinct Attribute Based Encryption for Circuits from Multi-linear Maps. Cryptology ePrint Archive, Report 2014/772, 2014. https://eprint.iacr.org/2014/772.pdf