Linkedin Network Analysis with ggplot2

In this part, we will demonstrate how to create a network graph like the one you see in the Linkedin InMap

You may achieve the same result using igraph but in this post, we will focus on how to do it in ggplot2. We will not focus on how to download the linkedin data and format it into an appropriate data structure. We assume that you can get your data in the appropriate format like the LinkedinData (download for example), an adjacent matrix like the following:

Jeff Grif Cody Bolh Curtis Blag Eric Wood
Jeff Grif 0 0 0 0
Cody Bolh 1 0 0 0
Curtis Blag 1 0 0 0
Eric Wood 1 0 0 0

Our output will be as beautiful as below image:

Linkedin InMap with R

Linkedin InMap with R


We will use the following code to draw based on that data:



LinkedinList LinkedinList 0, ]

##community dectection
g ###cluster

# Function to generate paths between each connected node
edgeMaker index.temp1<-which(LinkedinList[whichRow,1]==temp[,3], arr.ind = TRUE)
index.temp2<-which(LinkedinList[whichRow,2]==temp[,3], arr.ind = TRUE) if(temp[index.temp1,5]>=temp[index.temp2,5]){

fromC toC

# Add curve:
graphCenter bezierMid bezierMid<-as.matrix(bezierMid)
distance1 if(distance1 < sum((graphCenter - c(toC[1], fromC[2]))^2)){
bezierMid } # To select the best Bezier midpoint
bezierMid if(curved == FALSE){bezierMid

edge c(fromC[2], bezierMid[2], toC[2]) # X & y
,evaluation = len)) # Bezier path coordinates
edge$Sequence edge$Group ")

# Generate a (curved) edge path for each pair of connected nodes
allEdges allEdges

#cleaning plot
new_theme_empty new_theme_empty$line new_theme_empty$rect new_theme_empty$strip.text new_theme_empty$axis.text new_theme_empty$plot.title new_theme_empty$axis.title new_theme_empty$plot.margin valid.unit = 3L, class = "unit")

zp1 zp1 zp1 size=factor(size)),pch = 21)
zp1<-zp1+geom_text(data=temp,aes(x=x,y=y,label=name,hjust = 0, vjust = 0))
zp1 zp1<-zp1+theme(legend.position="none")


Passion Driven Statistics – Coursera

The association between the measure of diameter and depth among craters with or without layers.

The association between the measure of diameter and depth among craters with or without layers


The study about properties of craters on Mars, created by Stuart Robbins (2011), can allow us to understand crustal properties, surface ages and modification events.

This project is based on a subset of the Robbins Crater Database with  384343 craters. In our project, we studied about 2 main problems:

  • The association between the size of craters (diameter) with depths and number of layers. The craters’ diameters can be larger when the craters have a large number of layers.
  • The depth of craters with layers can have more effect on the diameters than the craters without layers.

We think that a large crater should have high number of layers, therefore they can have a higher depth and also make their diameter bigger. These association effects can be stronger when the number of layers is higher.

Research Question

With that thought in mind, we propose the following 2 questions that could help us to have a better understanding about craters’ properties.

  1.  Is number of layers associated with the measure of diameters among craters on Mars?
  2.  Is the association between the depth and diameters simillar for all craters with or without layers?



There are 384343 observations in which 364612 craters (94.8%) have no layers and the others are 19731 (5.2%), were from the Mars Study database.

The Mars Study, is researched by Stuart Robbins, presents a sample of Mars’ craters with their physics properties (e.g. location on Mars, size and depth, 3 kinds of ejecta morpholophy, number of layers).


Each of craters has Crater_ID which is identified internally based upon the region of the planet. Craters size are shown by DIAM_CIRCLE_IMAGE variables (units are km) which is the measurement of diameter of a non-linear least-squares circle fit to selected vertices on craters rim.

DEPTH_RIMFLOOR_TOPOG is calculated by taking average elevation of the determined N points along (or inside) the crater rim (units are km).

NUMBER_OF_LAYERS is determined by the maximum number of cohesivelayer in any azimuthal direction. There are 6 levels of number of layers (from 0 to 5). The new Layer category variable is symboled as 0 if the craters have 0 number of layers (called craters without layers) and 1 if the craters have more than 1 number of layers (called craters with layers).

In this study, to make it easier in analyzing data, the DIAM_CIRCLE_IMAGE variable is divided into 4 categories and each category has the same number of craters. The new variable DIAM is set values:

DIAM=1 if DIAM_CIRCLE_IMAGE  less than 1.18

DIAM=2 if DIAM_CIRCLE_IMAGE from 1.18 to 1.55

DIAM=3 if DIAM_CIRCLE_IMAGE from 1.55 to 2.55

DIAM=2 if DIAM_CIRCLE_IMAGE greater than 2.55

  • Procedures

The project is analyzed by SAS program version 4.3. The source code is presented later in SAS program post. 



There are 75% craters’ diameters less than 2.5 km. The mean of diameter (3.5 km) is greater than the median (1.5 km) and the standard deviation is 8.6 km. Therefore, most of craters have small diameters while there are some outliers have significant higher values up to 1164 km.


Univariate result for diameter

In general, the depth of craters is almost small with 99% less than 1 km. The maximum value of depth is 4.95 and the minimum is -0.42.


Univariate Result for Depth

The diam variable has 4 categories, each of them has nearly the same number of craters.

Nearly 95% craters have 0 number of layers. There are only few craters have 4 or five layers. The number of creaters decreases when the number of layers increases.


Frequency for number of layers


1.  Because the large amount of craters with no layers (94.8%), I can only draw a boxplot figure for craters with layers from 1 to 5. The figure below shows us the box plot of each layer with the diameter of crater.  Noticed that there are small spreads within the layers and these box plots are very little overlap.  


                            Boxplot for different number of layers, 1-5



The bar chart is then used with mean diameters of layers is added. Most of craters with no layer have small diameter.We can see that the mean diameters increase significantly with respect to the increasing level of layers from 0 to 5.


                         Barchart for different number of layers (0-5)

As expected, the Anova analysis showed the positive and significant association between the number of layer (category explanatory) with the measure of diameter (quantitative respone), small p-value (<0.0001) and high F value (1494.68).  That is, the number of layers increases leading to the increasing diameters.




ANOVA test for the relationship between number of layers and diameter mean

2. Considering the associations between 2 variables diamter of craters (DIAM_CIRCLE_IMAGE) and depth of craters (DEPTH_RIMFLOOR_TOPOG) in 2 case: subset data on craters without layers (case 1) and on craters with layers (case 2). Based on the below scatter plot, we can deduce that there is association between the diameter and depth of craters.


                             Scatter plot between diameter and depth

Indeed, the two Pearson Correlation calculations give us efficient p-value (<0.0001) and the positive correlation numbers.





Then we can conclude that there is significantly positive correlation between diameter and depth of all craters. Futhermore, the correlation number in case 2 (0.73) greater than in case 1 (0.61).  However we must notice that there are lot of data in case 1 and the values of craters’depth is distinct too much from the maximum and the rest. Then we can only conclude that there seems to have same deep associations in 2 cases. Futhermore, the craters with layers tends to have higher measurement of depth with higher levels of diameter.


Line Plot for the relationship between diameter and depth, with and without layer


What might the results mean?

The craters with have higer number of layers will have larger depths.

In high level of diameter, the craters having layer seems to have higher depth than the ones without.


Results are based on the subset sample of Robbins Crater Database.


The number of craters without layer is too large compared to craters with layers. This might cause difficulty when we do test on craters with layers and without layers. Futhermore, it is quite tough to divide range of dimaters since the measure of craters is nearly 75% smaller than 2.55 km and the rest is larger.

Recommend future research 

We can consider the role of craters’ locations impact on their porperties. That is needed to more research about the association between craters’ locations and their depths or diameters.



What surprises me about sport business

Sport Business

Sport Business

I has long been a fan of Manchester United and it is fascinating for me to learn about their business and in general, sport business. So, I followed a course in Coursera, to find out more about this industry.

I always have these big questions:

  • How much do these sport companies earn?

  • How do they justify their big spending on player transfer?

  • What are different source of revenues, like advertising, media streaming, player transfer, youth training and transferring, gate tickets?

  • How can the weak team survive, comparing to big teams like Manchester or Real Madrid?

  • How people make sure that these teams don’t cheat on their games?

  • How about drug testing? (this question is already answered on this blog post)

Their Revenue and Income

Big Sport Teams like Manchester United only earns very little comparing with some giant Internet companies like Facebook or Microsoft. Their players can earn a lot comparing to other companies’ employees, though.

Manchester United market capitalisation is: 3.10 Billion $. Well, that is relatively low, comparing to 1B$ of Instagram or 47$B of Facebook. We will investigate each of their sources of revenues to understand where is the big budget and what the league does to make sure it stays competitive and interesting.

Gate Income
is sometimes shared between home and away team, normally ranging from 0%-33% to the away team. Here is the following list of 10 top gate receipts. This gate income contributes a significant part of the income for the clubs:

1 Real Madrid – 438.6m

2 Barcelona – 398.1m

3 Man Utd – 349.8m

4 Bayern Munich – 323.0m

5 Arsenal – 274.1m

6 Chelsea – 255.9m

7 AC Milan – 235.8m

8 Liverpool – 225.3m

9 Inter – 224.8m

10 Juventus 205.0m

Media Income:

The Media is working really well with sport business as the percentage of audience watching a delayed sport is very low, only 4.4%. The following table shows the proportion of audience that watches a delayed show.


% of audience watching delayed

Sport Events


Award Ceremonies




There are different business models in this area: TV Networks, Cable TV Model, Regional Sport Network. They can benefit from both the Subscriber Fee and the Advertising Revenue.

The Media Income of some big sporting event:

- Superbowl: 106 million audience (at year 2010)

- UEFA Champions League: 109 million audience (at year 2010)

Other income:

There are other incomes from training the young players and selling them to other teams. The naming fee for the stadiums and the team. There is also salary cap for the players inside the league as well.


Drug Testing and Statistics


I feel very surprised that people like Lance Amstrong or any other atheletes could by pass the drug test so easily for many years until they got caught. Until recently, I read a blog post about the statistics that reveal the truth about this:

Anti-doping tests have a huge false-negative problem. I have been talking about this for years

As it is a huge false-negative problem, most of the time, the dopers will escape it. But if a player tested positive, it is very high likely that they use doping. It is only somebody suspects and request an official legal and expensive process.

And even if you pass 500 doping test is not an impressive as you thought, more here

The anti-doping agencies are so concerned about not falsely accusing anyone that they leave a gigantic hole for dopers to walk through. . . . While we think about Armstrong’s plight, let’s not forget about this fact: every one of those who now confessed passed hundreds of tests in their careers, just like Armstrong did. In fact, fallen stars like Tyler Hamilton and Floyd Landis also passed lots of tests before they got caught. In effect, dopers face a lottery with high odds of winning and low odds of losing. .

SnappyCam App looks very good

I just read about this on Techcrunch and try it out, it is a very handy camera app. Although I am not sure the image result looks actually great because it is only 213kb? I also would love to know how he can do it

Snappy Camera

Snappy Camera

Your standard iPhone camera app is actually pretty slow, able to take just three to six photos per second at 8 megapixels each. But with SnappyCam 3.0, you can shoot 20 full-resolution photos per second thanks to a breakthrough in discrete cosine transform JPG science by its inventor. Twenty frames per second is fast enough to capture shot-by-shot animations or every gruesome detail of an extreme sports crash.

Debt, Inventory and Revenue



Your code is your debt

You spend money, efforts and bug management to control your debt. Code doesn’t automatically generate revenue, user features and user satisfaction do. It doesn’t matter that you write 100 000 lines of code in 10 000 hours and complexity is 1 million (it, well, matters for technical guy) if those efforts doesn’t acquire new users or generate more revenue. It is like saying: I have borrowed 1 million dollars and spent all in this project. It sounds cool but it doesn’t do any benefit to the company. Even worse, it harms the company.


Inventory is what you produce but just sitting on some warehouse/storage and does not generate any money. It could be even worse if it costs you any money to store those things.
Like Joelonsoftware said: inventory can happen in each of the following software process, and they can have different results:

      Decision-Making Process: documentation, product backlog, feature ideas…
      Design Process: diagrams,
      Implementation Process
      Testing Process
      Debugging Process
      Deployment Process

Each of stage’s products can never be implemented, get ignored or become unrealistic the next time. Here, we don’t talk about the waterfall process, which could make it even tremendous. For example, the feature backlog that is written in hundreds of pages that 90% are not implemented. The bug database contains all the bugs, efforts to maintain them and understand them but only 10% of them get fixed after a long time.

As with any kinds of inventories, after a while, your products inside the inventory gets obsolete, and needs cleaning up so the new things can be added in. The obsolete inventories will cost you the efforts and time to create it, maintain it and get rid of it. It is the same for software engineer, the bugs that are no longer bugs (after lots of updates), the features documentation that are not compatible with the current products…

It is important for manager to understand about the similarity of the cost, the debt, the inventory and the revenue in a software engineering process. It is easy to measure engineer by how much code they write, but it is the same as measuring how much debt he brings to the team. Higher debt doesn’t mean higher revenue, so be careful.


Top 5 Energy Saving Apps for Small Businesses

Software applications and resources that can allow business owners to reduce their overhead and operational costs may be a resource that few startups can afford to do without. Resources that can allow you to more easily and effectively track your electric consumption, curb demand for costly utilities or help you find the best , electric companies in Hamilton Texas, and surrounding regions have many benefits for smaller businesses and startups that are seeking to reduce the cost of their energy usage. Finding and making use of the best apps can ensure that your business is able to make use of more competitive utility rates and reduce costly waste and excessive energy consumption.

Choosing the Right Energy Provider

Paying too much for any resource could be a serious misstep, one that a business operating with fixed or limited financial resources may not be able to make. Sorting through your options and comparing electric providers and utility services in an effort to find the most cost-effective options and the greatest values can be a daunting task when you lack for the right resources to aid you in your search. Applications able to provide real time pricing information and easier navigation of the providers you have to choose from can ensure you are able to find the most beneficial options for your electrical service.

Tracking Consumption for Greater Energy Efficiency

Having to keep tabs on the operational habits and costs of powering your business is a labor intensive task, one that may rob yourself and staff of energy that would be better spent on more important matters. Software that can speed up or automate the process will ensure that you are able to stay informed about the level and cost of power consumption your operation is resulting in. Superior information may be required in order to more effectively and successfully curb your consumption.

Reducing Electric Waste
Fixtures and appliances that are not as efficient as they could be, devices that are being powered when not required and other habits and situations that could be costing you more than you might realize are all situations that can be more effectively remedied when you have access to the right information. Tracking your electrical consumption with a software application can allow you to target and identify any aspect of your operations or working process that could be made more efficient. Using less energy will reduce your utility bills and overhead expenses and may allow your business to become more profitable as a result.

Finding the Best Resources

With a number of applications to choose from, finding and selecting the best of them could seem like quite the challenge. Conducting a little research and finding the applications and software resources that will be of most benefit for your business will ensure that you do not lack for a superior resource. The tools you need to ensure your business is able to be made more efficient and less costly could make a big impact on the results and success of your efforts.

Startups TDD or not

Startup Trap

Startup Trap

This question is not a simple one, so you may not expect a simple question. If you read about this blog post, Uncle Bob Martin is a big fan of TDD, even for startup. I would say, for any startups, or any company, the technology process has to be aligned with the business process, which means it has to satisfy both the short term and long term goal of the business.

Think of any business you know, think of their products.  If you serve a bigger, long term goal, the team will be given more money and time, otherwise, it will be given much less. If their products are the main cash cow, it is the long term product, otherwise, if it only built to last a couple of weeks and months, it is not.

So, startup, even though not a mature business, still needs to deal with this problem on a daily basic. I think the best approach for any startup is to determine how long the project would need and reconsider their decisions every few months to make sure they are still on track. When the startup thinks that the product is going to last long or the number , they need to add more tests into it, to refactor the source code, to raise the source code requirement. In economics, it is an important concept that “in the long term, everybody dies”. So, who cares about long term if we are going to die tomorrow? But, if we live for 10 years without a plan, I am sure that you will die within the next few weeks. The matter is how long you think your product will live.

And of course, if you keep the same plan, or the same process for the startup when the product grows, you sure gonna die. That’s the job of the executives to keep teams aligned with the business goals. And any company cannot do this will not survive for long.

Windows Phone Potential?

I normally hear from Windows developer to say that their Windows market share is much bigger than the total of iOS and Android together. And the sales of Windows 8 has bypassed all the sales of iOS and Android from the beginning to now.

Windows Phone 8

Windows Phone 8

It is quite unfair to compare the whole Windows 8 with iOS or Android. And if we do take that demand into PCs and laptops, then why the hell we do not compare the supply side? It is very clear that if Windows 8 could be a shared platform between PCs, tablets and phones, then all the old software, all the old games would have beome compatible to the Windows 8 system in the short amount of time. And many apps for tablets and phones would be quite different than the ones in PC. No, I don’t mean technology, I am talking about the business model, the purpose of the software.

I don’t mean that iOS and Android is a better market than Windows Phone for indie business. These markets have become very competitive and continue being so. You either have to figure out a market niche or be very lucky.