Skip to main content

Distributed Versioning

The version control systems are generally rated as follows based on their usability.
  1. Visual Source Safe (Low)
  2. CVS
  3. SVN
  4. Git (High)

Cons:
  • Visual Source Safe doesn't allow "Concurrent Access". Only one developer can edit a file at a time. Hence, not suitable for a large team with frequent commits.
  • Some times gives weird error messages, making the system unstable.
Pro:
  • Allows "Concurrent Access". Multiple developer can edit the file and system has the capability to merge the changes. If there is a conflict, developer is informed to resolve it.
Cons:
  • Commits are not atomic. The changes are maintained at the file level and not at the repository level. For eg: if we commit five files at once and want to revoke it, it has to be done individually for each files.
Pro:
  • Commits are atomic. Committed files can be revoked together. Repository maintains the check-ins.
Cons:
  • Cannot view the history of the file, when not connected to the server.
Git (Distributed Revision Control System (DRCS))
Pro:
  • Along with the master repository, every developer will have a copy of the repository. Hence, even in disconnected mode, the developer can see the history of the files. The developer can apply the patch to the master repository.
  • The patch can also be sent to the peer, which makes it excellent to work in a distributed team.
In a centralized team, SVN is a good choice as version control and Git is ideal for the distributed teams.

Comments

Popular posts from this blog

Productivity improvement for remote teams!

The typical working hours in IT companies are from 10 am to 6 pm, though it could extend beyond this time depending on the nature of the project. Usually, we expect everyone to put in about 8 hours a day. There are two broad categories to classify these eight hours: Collaboration time and Core working time.  Collaboration time is when interactions with others are needed and includes all the client meetings, standups, team huddles, and discussions. Ideally, these are the hours that enable individuals to complete their work. Individuals in the team have limited choices on when these meetings have to happen as it could involve multiple stakeholders. Core working time is when the actual work gets done and is the productive hours of the individual. The more focused the individual is, the more effective they are.  These two times overlap with regular office working hours and are not conducive to peak productivity. Some teams strive to have dedicated Core working hours when there are no

Six ways to land rovers on Mars.

Six ways to land robotic rovers on Mars Mars Rover problem is a popular problem statement used by companies to check object orientation and test-driven development skills. In this article, we'll take the core problem statement and see how the solution evolves through six different levels. Knowledge of high school level maths and little python helps to follow this article. The actual Problem Statement: A squad of robotic rovers is to be landed by NASA on a plateau on Mars. This plateau, which is curiously rectangular, must be navigated by the rovers so that their on-board cameras can get a complete view of the surrounding terrain to send back to Earth. A rover's position is represented by a combination of x and y coordinates and a letter representing one of the four cardinal compass points. The plateau is divided up into a grid to simplify navigation. An example position might be 0, 0, N, which means the rover is in the bottom left corner facing North. In order t

Import 1 billion records from Oracle to HDFS in a record time

The problem: A large scale manufacturing organization aggregates data from different sources, maintains it in a single Oracle table, and the number of records is in the order of a little over a billion. A monthly process has to fetch the data from Oracle to HDFS.  The constraint: Ideally, only the difference for each month could be fetched. But, there is little to no control over the Oracle data source and there is no reliable way to identify the delta. Hence, all the data have to be fetched all the time. To give a perspective, if the table is exported as a CSV from a SQL Client (say, SQL Developer), it takes more than 20 hours to download the table. The tool: Sqoop is the standard tool used to import data from the relational database to HDFS. The solution: $ sqoop import -D **oracle.row.fetch.size=50000 --fetch-size 15000 --num-mappers 40** --table ` <schema>.<table_name> ` -connect ` <jdbc_connection_url> `   --username ` <user> ` -P --target-dir ` <hdfs_ta