Skip to main content

Openstack Swift

Storage plays an important role due to the dependencies that various sub systems have. Now a days due to the types of distributed systems available in market, storage gained more relevance. As data gets replicated with a minimum replication factor of 3, the space used by these systems is going up to make the applications more and more fault tolerant. 

Block storages have been the most used storage types for all types of work loads as data can be accessed in blocks & it is easier for the sub systems to operate. But block storages use up a lot of space due to the distributed nature of various sub systems. For example, for storing a backup file the block storage replicates the same block thrice & essentially uses 3 times of the actual file size. For such types of usages, object storage is getting more popular as it stores the metadata of data rather than the actual data. This allows it to use far lesser space than what object stores use. However, object stores may not be very good for transactional data due to the latencies it provides with retrieval.

Various backend storages like Ceph are providing object storage capabilities. 

Openstack Swift is one of the most widely used object store API which can be integrated with many backend solutions available in the market today. Swift is a python based utility which is part of the openstack techstack and uses keystone authentication. 

It is a two step process:

  • First swift uses the python-keystoneclient to access the keystone with userid, password and authentication URL
  • Next, swift uses the python-swiftclient to access the backend storage with a storage URL & a authentication token returned from the above step to access the storage
Keystone is part of openstack and provides authentication and authorization mechanism for all openstack services. So the same two step process is applicable for all services in openstack like Nova, Cinder etc.

I am going to outline the steps to setup python-swiftclient on linux machines. There are two ways to set it up:
  • Install using pip
  • Install it using tar balls with all dependencies
Swiftclient is a python package and there is currently no RPM available to set it up on linux machines & this has to be done using python package manager pip. 

Steps to install python-swiftclient using pip

This is the easiest way to setup the swift client but it requires internet access on your machine.

Install python-setuptools which comes as an RPM
$ sudo yum install python-setuptools

Using easy_install setup the pip. Note that this would require internet access as it downloads external packages. 
$ sudo easy_install pip

This installs the pip on your machine & allows you to start installing packages
$ sudo pip install python-swiftclient
$ sudo pip install python-keystoneclient

The second step also installs keystoneclient which can be used to directly access the openstack services.

Steps to install python-swiftclient using tarballs

Most of the times in an organisation, there is always a possibility that the servers may not be exposed to internet & a proxy may not provide desired results. I was under a similar situation & hence had to find my own way to set this up using individual tarballs with dependencies.

The steps involve installing dependencies using tarballs. Following are the dependencies that need to be setup. The tarball can be downloaded from Pypi website https://pypi.python.org/pypi. Search for following packages & download the tarballs

  • futures
  • pbr
  • six
  • requests
  • python-swiftclient (requires python 2.7 for latest 3.3 version)
Download each of these & place them in the machine

cd into each & every directory & run below commands. I have the below versions as on date.

$ cd futures-3.0.5 ; sudo python setup.py install
$ cd pbr-2.0.0 ; sudo python setup.py install
$ cd six-1.10.0 ; sudo python setup.py install
$ cd requests-2.13.0; sudo python setup.py install
$ cd python-swiftclient-3.3.0 ; sudo python setup.py install

After this, the swift client should be available for all users. 

Under a situation where the users are unable to access the client, you may have to set the permissions & the pythonpath for all users for it to work. 

The following paths must be available for users to be able to access the swift.

/usr/lib/python2.6/site-packages/futures-3.0.5-py2.6.egg
/usr/lib/python2.6/site-packages/requests-2.13.0-py2.6.egg
/usr/lib/python2.6/site-packages/six-1.10.0-py2.6.egg


And the same can be validated by running below command
$ python -c "import sys; print '\n'.join(sys.path)"

The below command sets the permissions on the required directories for all users to access the packages needed to run swift

$ sudo chmod -R 755 {/usr/lib/python*.*/site-packages/swiftclient,/usr/bin/swift,/usr/lib/python*.*/site-packages/pbr-*,/usr/lib/python*.*/site-packages/requests-*,/usr/lib/python*.*/site-packages/futures-*,/usr/lib/python*.*/site-packages/six-*,/usr/lib/python*.*/site-packages/python_swiftclient-*}

Create a file with the paths in place & place it in /etc/profile.d so all users can access the packages required by swift.

Note that these steps install only the python-swiftclient but not the keystone client. This means, you will have to find a way to get the auth token & storage url. Once you have those, you can pass them to the swift as shown below

$ swift --os-auth-token <authtoken> --os-storage-url <storageURL> <command> <options>

Uploading a file by creating a new bucket

$ swift --os-auth-token <authtoken> --os-storage-url <storageURL> -S <segmentsize> upload <bucket> <filename>


My next blog will outline the steps to build an RPM & the same can be used to create an RPM for the python swiftclient for easy setup.

Comments

Popular posts from this blog

Cloud burst

CLOUD, BIG DATA & ANALYTICS are the buzz worlds in today's tech world.  But I clearly feel & see that cloud computing is definitely the game changer in today's IT world The reason I feel this way is due to the fact that everything is now getting distributed. With so many distributed softwares & platforms around us, cloud computing is enabling us to realise all our needs with easy accessibility to various resources. Resources like cpu, RAM & storage. First what is cloud computing? There are many definitions out there & many which I have read but for me, cloud computing is the ability to provide accessibility, scalability, self provisioning & adaptability to the end users. I would like to clearly explain what I mean by each of the above points that I quoted Accessibility : by this I mean that all the services which the underlying infrastructure is providing, should be accessible by users equally. Various infra services like compute, netwo...

NOSQL

Introduction: We all know RDBMS & we were pretty much happy with them. Transactions in RDBMS are well protected, recovery was good & we are able to come back from failures pretty well & the row-column structure was a very good data model which kind of structured the data well while loading it so we can retrieve the data easily. So what was the need for a new theory called NoSQL? Well the motivation really was due to the huge unstructured data building up. And loading this huge data into an RDBMS with a schema structure was a challenge. And secondly, RDBMS systems scaled up well (you provide more memory & CPU resources it performs well) but they really could not scale out (horizontally by adding more machines to them). And lastly, RDBMS focus more on data consistency rather than the performance. When you stress more on consistency, there is an impact on performance. This blog discussed some basic theories NoSQL systems are built upon. This blog also kind of sets a ...