<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0' version='2.0'><channel><atom:id>tag:blogger.com,1999:blog-4164822870909166302</atom:id><lastBuildDate>Mon, 14 May 2012 04:52:19 +0000</lastBuildDate><title>wardselitelimo.com</title><description></description><link>http://www.wardselitelimo.com/</link><managingEditor>noreply@blogger.com (iptelephony)</managingEditor><generator>Blogger</generator><openSearch:totalResults>7</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4164822870909166302.post-740814537716048310</guid><pubDate>Thu, 04 Nov 2010 12:47:00 +0000</pubDate><atom:updated>2010-11-04T05:47:18.522-07:00</atom:updated><title>COMBINATION FUNCTION</title><description>Posted by Big Joe . Published on 27 January 2009, No Comments Received&lt;br /&gt;&lt;br /&gt;Now that we have a method of determining which records are most similar to the new, unclassified record, we need to establish how these similar records will combine to provide a classification decision for the new record. That is, we need a combination function. The most basic combination function is simple unweighted voting.&lt;br /&gt;&lt;br /&gt;Simple Unweighted Voting&lt;br /&gt;&lt;br /&gt;   1. Before running the algorithm, decide on the value of k, that is, howmany records will have a voice in classifying the new record.&lt;br /&gt;   2. Then, compare the newrecord to the k nearest neighbors, that is, to the k records that are of minimum distance from the new record in terms of the Euclidean distance or whichever metric the user prefers.&lt;br /&gt;   3. Once the k records have been chosen, then for simple unweighted voting, their distance from the new record no longer matters. It is simple one record, one vote.&lt;br /&gt;&lt;br /&gt;We observed simple unweighted voting in the examples for Figures 5.4 and 5.5. In Figure 5.4, for k = 3, a classification based on simple voting would choose drugs A and X (medium gray) as the classification for new patient 2, since two of the three closest points are medium gray. The classification would then be made for drugs A and X, with confidence 66.67%, where the confidence level represents the count of records, with the winning classification divided by k.&lt;br /&gt;&lt;br /&gt;On the other hand, in Figure 5.5, for k = 3, simple voting would fail to choose a clear winner since each of the three categories receives one vote. There would be a tie among the three classifications represented by the records in Figure 5.5, and a tie may not be a preferred result.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4164822870909166302-740814537716048310?l=www.wardselitelimo.com' alt='' /&gt;&lt;/div&gt;</description><link>http://www.wardselitelimo.com/2010/11/combination-function.html</link><author>noreply@blogger.com (iptelephony)</author><thr:total>1</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4164822870909166302.post-2692809559523719360</guid><pubDate>Thu, 04 Nov 2010 12:46:00 +0000</pubDate><atom:updated>2010-11-04T05:46:50.381-07:00</atom:updated><title>SIGMOID ACTIVATION FUNCTION</title><description>Whyuse the sigmoid function? Because it combines nearly linear behavior, curvilinear behavior, and nearly constant behavior, depending on the value of the input. Figure 7.3 shows the graph of the sigmoid function y = f (x) = 1/(1 + e?x ), for ?5 &lt; x &lt; 5 [although f (x) may theoretically take any real-valued input]. Through much of the center of the domain of the input x (e.g., ?1 &lt; x &lt; 1), the behavior of f (x) is nearly linear. As the input moves away from the center, f (x) becomes curvilinear. By the time the input reaches extreme values, f (x) becomes nearly constant.&lt;br /&gt;&lt;br /&gt;Moderate increments in the value of x produce varying increments in the value of f (x), depending on the location of x. Near the center, moderate increments in the value of x produce moderate increments in the value of f (x); however, near the extremes, moderate increments in the value of x produce tiny increments in the value of f (x). The sigmoid function is sometimes called a squashing function, since it takes any real-valued input and returns an output bounded between zero and 1.&lt;br /&gt;&lt;br /&gt;BACK-PROPAGATION&lt;br /&gt;&lt;br /&gt;How does the neural network learn? Neural networks represent a supervised learning method, requiring a large training set of complete records, including the target variable. As each observation from the training set is processed through the network, an output value is produced from the output node (assuming that we have only one output node, as in Figure 7.2). This output value is then compared to the actual value of the target variable for this training set observation, and the error (actual ? output) is calculated. This prediction error is analogous to the residuals in regression models.&lt;br /&gt;&lt;br /&gt;The problem is therefore to construct a set of model weights that will minimize the SSE. In this way, the weights are analogous to the parameters of a regression model. The true values for the weights that will minimize SSE are unknown, and our task is to estimate them, given the data. However, due to the nonlinear nature of the sigmoid functions permeating the network, there exists no closed-form solution for minimizing SSE as exists for least-squares regression.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4164822870909166302-2692809559523719360?l=www.wardselitelimo.com' alt='' /&gt;&lt;/div&gt;</description><link>http://www.wardselitelimo.com/2010/11/sigmoid-activation-function.html</link><author>noreply@blogger.com (iptelephony)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4164822870909166302.post-4417519951850520337</guid><pubDate>Thu, 04 Nov 2010 12:45:00 +0000</pubDate><atom:updated>2010-11-04T05:46:14.763-07:00</atom:updated><title>Age distribution</title><description>According to China’s Population Report, over 30 per cent of the population were under 20 years of age. On the other hand, the population segment of the people aged over 60 accounted for over 10 per cent, and those above 65 reached 6.95 per cent in the 2000 census. Population aging will become increasingly felt. By 2010 the people born in the baby boom period in the 1950s and 1960s will enter the elderly groupings and the elderly segment will experience the fastest growth. In the next 25 years the elderly population will be double today’s count.&lt;br /&gt;&lt;br /&gt;When studying the age distribution, marketers are invited to look into the two extremes of the population,10 most populous provinces have 57% of the total population one of which is the population of teenagers or what is called the ‘little emperors’ generation, and the other is the growing aging population. Marketers have already invested efforts to address the needs of the pampered little emperors but little attention has been given to the elderly population.&lt;br /&gt;&lt;br /&gt;The little emperors have enjoyed higher living standards and better education and training than previous generations in China. But they have also displayed distortions in behaviour. Research indicates that the little emperors do not only represent a group of consumers, but also have increasing influence over parents’ purchase decision-making. Over time, there will be more ‘only’ children joining the workforce. They will have better education as a result of their parents’ investment in only children schooling. Considerable study effort is necessary to understand the behavioural characteristics of only children, as they will soon become the core of Chinese society.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4164822870909166302-4417519951850520337?l=www.wardselitelimo.com' alt='' /&gt;&lt;/div&gt;</description><link>http://www.wardselitelimo.com/2010/11/age-distribution.html</link><author>noreply@blogger.com (iptelephony)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4164822870909166302.post-2634636119698558677</guid><pubDate>Thu, 04 Nov 2010 12:45:00 +0000</pubDate><atom:updated>2010-11-04T05:45:48.791-07:00</atom:updated><title>DANGERS OF EXTRAPOLATION</title><description>Suppose that a new cereal (say, the Chocolate Frosted Sugar Bombs loved by Calvin, the comic strip character written by Bill Watterson) arrives on the market with a very high sugar content of 30 grams per serving. Let us use our estimated regression equation to estimate the nutritional rating for Chocolate Frosted Sugar Bombs: y = 59.4 ? 2.42(sugars)=59.4 ? 2.42(30) = ?13.2. In other words, Calvins cereal has so much sugar that its nutritional rating is actually a negative number, unlike any of the other cereals in the data set (minimum = 18) and analogous to a student receiving a negative grade on an exam. What is going on here?&lt;br /&gt;&lt;br /&gt;The negative estimated nutritional rating for Chocolate Frosted Sugar Bombs is an example of the dangers of extrapolation. Analysts should confine the estimates and predictions made using the ERE to values of the predictor variable contained within the range of the values of x in the data set. For example, in the cereals data set, the lowest sugar content is zero grams and the highest is 15 grams, so that predictions of nutritional rating for any value of x (sugar content) between zero and 15 grams would be appropriate. However, extrapolation, making predictions for xvalues lying outside this range, can be dangerous, since we do not know the nature of the relationship between the response and predictor variables outside this range.&lt;br /&gt;&lt;br /&gt;Extrapolation should be avoided if possible. If predictions outside the given range of x must be performed, the end user of the prediction needs to be informed that no x-data is available to support such a prediction. The danger lies in the possibility that the relationship between x and y, which may be linear within the range of x in the data set, may no longer be linear outside these bounds.&lt;br /&gt;&lt;br /&gt;Consider Figure 4.4. Suppose that our data set consisted only of the data points in black but that the true relationship between x and y consisted of both the black (observed) and the gray (unobserved) points. Then, a regression line based solely on the available (black dot) data would look approximately similar to the regression line indicated. Suppose that we were interested in predicting the value of y for an x-value located at the triangle. The prediction based on the available data would then be represented by the dot on the regression line indicated by the upper arrow. Clearly, this prediction has failed spectacularly, as shown by the vertical line indicating the huge prediction error. Of course, since the analyst would be completely unaware of the hidden data, he or she would hence be oblivious to the massive scope of the error in prediction. Policy recommendations based on such erroneous predictions could certainly have costly results.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4164822870909166302-2634636119698558677?l=www.wardselitelimo.com' alt='' /&gt;&lt;/div&gt;</description><link>http://www.wardselitelimo.com/2010/11/dangers-of-extrapolation.html</link><author>noreply@blogger.com (iptelephony)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4164822870909166302.post-1342616110361114406</guid><pubDate>Thu, 04 Nov 2010 12:44:00 +0000</pubDate><atom:updated>2010-11-04T05:44:48.474-07:00</atom:updated><title>Charitable Gift Annuity</title><description>Similar to a charitable remainder annuity trust, in a charitable gift annuity the donor contributes assets to a not-for-profit organization in exchange for a promise by the organization to pay a fixed amount over a specified period of time to the donor or to other third parties. It is important to note that the third parties are designated by the donor.&lt;br /&gt;&lt;br /&gt;The agreements are similar to charitable remainder annuity trusts except that no trust exists, the assets received are held as general assets of the not-for-profit organization, and the annuity liability is a general obligation of the organization. (NFP Audit Guide)&lt;br /&gt;&lt;br /&gt;An example of a charitable gift annuity would be as follows: A donor transfers assets to a not-for-profit organization in exchange for a promise by the organization to pay a specific dollar amount annually to the donor’s wife until the wife dies.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4164822870909166302-1342616110361114406?l=www.wardselitelimo.com' alt='' /&gt;&lt;/div&gt;</description><link>http://www.wardselitelimo.com/2010/11/charitable-gift-annuity.html</link><author>noreply@blogger.com (iptelephony)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4164822870909166302.post-1566900275191964187</guid><pubDate>Thu, 04 Nov 2010 12:43:00 +0000</pubDate><atom:updated>2010-11-04T05:44:11.306-07:00</atom:updated><title>METHODOLOGY FOR SUPERVISED MODELING</title><description>Most supervised data mining methods apply the following methodology for building and evaluating a model. First, the algorithm is provided with a training set of data, which includes the preclassified values of the target variable in addition to the predictor variables. For example, if we are interested in classifying income bracket, based on age, gender, and occupation, our classification algorithm would need a large pool of records, containing complete (as complete as possible) information about every field, including the target field, income bracket. In other words, the records in the training set need to be preclassified.Aprovisional data mining model is then constructed using the training samples provided in the training data set.&lt;br /&gt;&lt;br /&gt;However, the training set is necessarily incomplete; that is, it does not include the new or future data that the data modelers are really interested in classifying. Therefore, the algorithm needs to guard against memorizing the training set and blindly applying all patterns found in the training set to the future data. For example, it may happen that all customers named David in a training set may be in the highincome&lt;br /&gt;bracket.We would presumably not want our final model, to be applied to new data, to include the pattern If the customers first name is David, the customer has a high income. Such a pattern is a spurious artifact of the training set and needs to be verified before deployment.&lt;br /&gt;&lt;br /&gt;Therefore, the next step in supervised data mining methodology is to examine how the provisional data mining model performs on a test set of data. In the test set, a holdout data set, the values of the target variable are hidden temporarily from the provisional model, which then performs classification according to the patterns and structure it learned from the training set. The efficacy of the classifications are then evaluated by comparing them against the true values of the target variable. The provisional data mining model is then adjusted to minimize the error rate on the test set.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4164822870909166302-1566900275191964187?l=www.wardselitelimo.com' alt='' /&gt;&lt;/div&gt;</description><link>http://www.wardselitelimo.com/2010/11/methodology-for-supervised-modeling.html</link><author>noreply@blogger.com (iptelephony)</author><thr:total>0</thr:total></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4164822870909166302.post-1936459580096503730</guid><pubDate>Thu, 04 Nov 2010 12:42:00 +0000</pubDate><atom:updated>2010-11-04T05:43:26.485-07:00</atom:updated><title>Complete-Linkage Clustering</title><description>Next, lets examine whether using the complete-linkage criterion would result in a different clustering of this sample data set. Complete linkage seeks to minimize the distance among the records in two clusters that are farthest from each other. Figure 8.3 illustrates complete-linkage clustering for this data set.&lt;br /&gt;&lt;br /&gt;    * Step 1: Since each cluster contains a single record only, there is no difference between single linkage and complete linkage at step 1. The two clusters each containing 33 are again combined.&lt;br /&gt;    * Step 2: Just as for single linkage, the clusters containing values 15 and 16 are combined into a new cluster. Again, this is because there is no difference in the two criteria for single-record clusters.&lt;br /&gt;    * Step 3: At this point, complete linkage begins to diverge from its predecessor. In single linkage, cluster {15,16} was at this point combined with cluster {18}. But complete linkage looks at the farthest neighbors, not the nearest neighbors. The farthest neighbors for these two clusters are 15 and 18, for a distance of 3. This is the same distance separating clusters {2} and {5}. The completelinkage criterion is silent regarding ties, so we arbitrarily select the first such combination found, therefore combining the clusters {2} and {5} into a new cluster.&lt;br /&gt;    * Step 4: Now cluster {15,16} is combined with cluster {18}.&lt;br /&gt;    * Step 5: Cluster {2,5} is combined with cluster {9}, since the complete-linkage distance is 7, the smallest among remaining clusters.&lt;br /&gt;    * Step 6: Cluster {25} is combined with cluster {33,33}, with a complete-linkage distance of 8.&lt;br /&gt;    * Step 7: Cluster {2,5,9} is combined with cluster {15,16,18}, with a completelinkage distance of 16.&lt;br /&gt;    * Step 8: Cluster {25,33,33} is combined with cluster {45}, with a completelinkage distance of 20.&lt;br /&gt;    * Step 9: Cluster {2,5,9,15,16,18} is combined with cluster {25,33,33,45}. All records are now contained in this last large cluster.&lt;br /&gt;&lt;br /&gt;Finally, with average linkage, the criterion is the average distance of all the records in cluster A from all the records in cluster B. Since the average of a single record is the records value itself, this method does not differ from the earlier methods in the early stages, where single-record clusters are being combined. At step 3, average linkage would be faced with the choice of combining clusters {2} and {5}, or combining the {15, 16} cluster with the single-record {18} cluster. The average distance between the {15, 16} cluster and the {18} cluster is the average of |18 ? 15| and |18?16|, which is 2.5, while the average distance between clusters {2} and {5} is of course 3. Therefore, average linkage would combine the {15, 16} cluster with cluster&lt;br /&gt;{18} at this step, followed by combining cluster {2} with cluster {5}. The reader may verify that the average-linkage criterion leads to the same hierarchical structure for this example as the complete-linkage criterion. In general, average linkage leads to clusters more similar in shape to complete linkage than does single linkage.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4164822870909166302-1936459580096503730?l=www.wardselitelimo.com' alt='' /&gt;&lt;/div&gt;</description><link>http://www.wardselitelimo.com/2010/11/complete-linkage-clustering.html</link><author>noreply@blogger.com (iptelephony)</author><thr:total>0</thr:total></item></channel></rss>
