Introduction\n\nThe internet is an amazing resource which allows information to be shared very easily and quickly around the globe. Each day it seems that our daily lives are becoming more and more connected, providing us with many opportunities that we didn't have before, and I am fortunate to remember a time where banking was predominantly branch-based, and the primary tool for multiplayer gaming was a sofa! Whilst this clearly provides us with many blessings, it can also be a curse in many cases despite the best of intentions 😢. When it comes to dealing with software and the internet, there is one saying that will absolutely SAVE YOUR BACON 🐷 if you swear by it.\n\nIf it is on the internet, someone will find it.\n\nMan using his binoculars to find your hidden service on the internet\n\nIf you deploy a web-enabled service on ANY network and make it freely accessible, you should EXPECT someone on that network to find it. Whether it is a local, internal network, a wider network and ESPECIALLY the internet, if you spin up a service and allow all inbound connections said service, don't be surprised if people find it! If you are wondering why my previous sentences are statements akin to "water is wet" and "fire is hot", please know that it pains me more that I have to write this in the first place! Unfortunately, there are MANY instances of people and/or organisations putting services on networks (such as the internet) who subsequently try and wrap them up in a kind of invisibility cloak to pull the wool over everyone's eyes. There are numerous flaws in, lets just say, this questionable approach, and if they had stuck to the aforementioned quote, they would save themselves from a lot of unnecessary bother!\n\nHiding in a Public Space\n\nThe internet is chocka-block full of numerous unique, disparate sites and/or services which serve us all sorts of content to be "consumed" as we want. Everything is arranged in a sort of web 🕸 of links, where we can jump from site to site by clicking links and following them through to their destination. With so many sites about so many subjects to choose from, how on earth do you find what you want 🤔?! Whilst it is possible to find addresses directly (e.g. by visiting a physical premises, using other mediums such as TV and Radio, or even using social media platforms also available on the internet) and browse through all the content available, most will fire up their favourite search engine like Google, Bing, DuckDuckGo, Yahoo etc and search for a few terms to find what they want. These engines will then suggest results that they think best serve the specification provided by the user, as well as maybe throw up a bunch of adverts to make money! Even though this is probably the most popular way find sites and their content, it isn't the only way.\n\nI think that many people/organisations make the mistake in thinking that search engines are THE ONLY way to find content on a network like the internet, which leads to some very interesting scenarios. As I've alluded to before, it seems that many believe that they can put up services on the internet and "hide" them from everyone, meaning that only select individuals know the services exist and use them. This is an AWFUL security and privacy posture, because "Security Through Obscurity" is a shockingly bad tactic to use, and as soon as a link is made known (or more likely discovered), everything falls apart! I have seen some sites/services that people/organisations have tried (and failed) to hide on the internet due to misconfiguration on the very search engines they don't want these services to appear on just by minding my own business and searching!\n\nHidden website found using the Google search engine\n\nHidden website found using the Bing search engine\n\nHidden website found using the Yahoo search engine\n\nHidden website found using the DuckDuckGo search engine\n\nHere, the site operators have told search engines to NOT index their sites with the intention of preventing these pages from appearing in search results. This is where the "No information is available for this page" and "We would like to show you a description here but the site won't allow us" messages displayed with the shown results are coming from. The webmasters have done this by modifying their robots.txt hosted in their root directory to block all crawling by web spiders 🕷 (used by search engines to find pages and index the internet) by using the directive "User-agent: * Disallow: /". When this directive is found, the crawler should move on and not add the page content to the searchable index, in an action wholly up to the search engines. Nothing actually stops them from reading and indexing the content and just not showing it in results, but that is the topic for another debate!\n\nAbout /robots.txt\n\nGoogle: No Page Information in Search Results\n\nBing: Missing Page Description in Search Results \n\nIn this case, the search engines are working as (the search engine developers) intended, resulting in the sites still showing up in search engine results 🤦♀️🤷♂️. Looking at the further information provided by the Google and Bing links, adding the directive "User-agent: * Disallow: /" to your robots.txt doesn't actually stop your site from appearing in search engine results, rather it tells the search engines to not read and index the content of the site. To actually stop pages from appearing in search engine results, you need to add the 'noindex' robots meta tag within the head of your page HTML and remove the "User-agent: * Disallow: /" directive from your robots.txt file. This robots.txt directive is supposed to stop the crawler from reading site data, so if you have the 'noindex' attribute in your site data, theoretically the robots.txt directive leads to the search engine not actually reading the 'noindex' attribute within the site contents to know to not index the site in the first place😫! Google explain this in the link below.\n\nGoogle: Block Search Indexing with 'noindex'\n\nFear of Human Nature\n\nThis tendency to "hide" services actually scares me due to the potential implications of human nature. On local machines and networks (isolated from the outside world), it isn't surprising (but HIGHLY INFURIATING) to find out-of-date tools and services running to fulfil specific tasks. You might have an old version of a productivity tool accessed using only HTTP 😒, old drivers and even older, unpatched operating systems running. There might even be databases being accessed via unencrypted connections with no/weaker passwords (sent in plaintext, so let's just ban WireShark) accessible across your network. Yes, this can be a huge risk depending on the context, but mitigations can be put in place to control this, like separated networks, email scanning, blocked ports etc. Ultimately (and I hate saying this), because the services are being run locally and are not accessible to the outside world you have "SOME" initial protection (that could be nuked by one suspect email being opened/clicked so don't get comfortable).\n\nDespite what some companies like to say, sometimes they may not be taking privacy and security as seriously as we would like\n\nAll of this goes completely out of the window when you put your services on the internet, where ANYONE in the world can REMOTELY access these solutions with ease and use a myriad of different tools to REMOTELY discover vulnerabilities that they can exploit REMOTELY to devastating effect. No specially crafted email, malicious website or hooky malware (posing as legit software) would be needed to completely OWN your system from across the internet. Vulnerabilities that could potentially exist in your service will likely not be a closely-guarded secret, and will likely have been documented, disclosed and (hopefully) patched out by the developer. If documented, these vulnerabilities may even have an automated exploit in notable penetration testing tools such as the ubiquitous Metasploit Framework. Don't think your service is somehow "special", and that "oh it will never happen to me", you are just as vulnerable as everyone else so take security as seriously as you would pontificate in your eventual released breach confession when you are forced to disclose. If you put something on the internet, where it could be REMOTELY accessed by ANYONE, you need to consider the implications.\n\nCommon Vulnerabilities and Exposures Database\n\nCommon Vulnerabilities and Exposures Details\n\nMetaspoit Framework\n\nMy fear is that when people/organisations decide that a service isn't for wider consumption, put it on the internet and "hide" it in plain sight, they will likely bring this a weak kind of mentality along for the ride and not think about the REMOTE dangers they are opening themselves up to. Will the components be regularly updated to combat disclosed CVEs? Will the security be as hardened as it should be? Will access be limited appropriately? Maybe it will, and I'm sure there are tonnes of security-minded people/organisations operating in this manner much to the benefit to their users 😊. However, there are so many examples of organisations slipping up when it comes to security, leading to breaches, leaks, hacks and total embarrassment that it wouldn't surprise me if many of the services "hidden" on the internet have a few (little) kinks that may need to be ironed out...\n\nIgnore BlueKeep's Damp Squib, People Must Patch Faster\n\nExposed Database Left Terabyte of Travelers' Data Open to the Public\n\nAdobe Left 7.5 million Creative Cloud User Records Exposed Online\n\nVedantu Data Breach Exposes 687,000 Customer Details\n\nVideo-Editing Upstart Bares Users' Raunchy Flicks to World+Dog Via Leaky AWS Bucket\n\n1.2 Billion Records Found Exposed Online in a Single Server\n\nFurther Information\n\nFollowing Google's advice to allow crawlers to crawl your page contents for the 'noindex' attribute so that search engines know to NOT index your site is, in my humbly honest opinion, a sticky plaster over a much bigger problem. If you look at the missing info link provided from Google and Microsoft (makes of Bing), they actually suggest REMOVING THE PAGE ENTIRELY as their first solution, and Google suggests putting it behind authentication as their second solution to remove the result from search engines. This makes a lot of sense if you think about it, and conforms nicely to the saying I mentioned earlier. By putting something on a network, you should be expecting that everyone who has access to that network will access the service and thus you should be building security around that assumption. Remember, "Security Through Obscurity" is a TERRIBLE method that shouldn't be relied on at all! If you don't want people on your network (ESPECIALLY the internet) viewing the content of your site, WHY ON EARTH ARE YOU PUTTING IT THERE WITHOUT ANY AUTHENTICATION? Doing this is like trying to have your cake and eat it as well, as whilst the internet is open for all of your intended recipients to access your service, it's also open for others to do the same thing MY GOSH!\n\nMachines on the internet are connected together in a web\n\nHiding information from search engines also shows another fundamental lack of understanding of how computer networks work, namely that search engines AREN'T THE ONLY WAY to find machines, and the services they run on a network! If you are spending a great deal of time to hide the pages that you are putting on the PUBLIC internet 🙄, I say that you are wasting your time and it would be a much better use of our time to learn a little about cyber security and ways you can protect your information. Yes, configuring a pretty robots.txt file, and adding the 'noindex' attribute will keep out Tom, Richard and Sally who are having a quick browse on the internet for cute puppy videos to send to their friends, but you will be kidding yourself if you think this keep out security researchers who warn companies that they are insecure, LET ALONE make even the slightest dent to those who are out there actively trying to cause you damage. Remember also that your robots.txt file is also publicly visible, and ANY specific paths you put in it will also be used by attackers to breach your machine. If you tell search engines to not crawl the /super/secret/data.doc path in your robots.txt file, where do you think PEOPLE reading that file will look first 🕵️♀️? To attackers this is LAUGHABLE, and they will waste no time exploiting your lapse in judgement to their gain and your detriment. Actually take privacy and security seriously, because on top of being perfectly capable of using a search engine just like everyone else, hackers also have a whole swathe of tools at their disposal to find your machines and the services that you run. \n\nShodan\n\nNmap\n\nBinaryEdge\n\nCensys\n\nZoomEye\n\nPentest Tools\n\nThe sheer amount of tools available out there means that it is almost a literal impossibility for you to put a machine on the internet and it not be found by some tool doing a scan/sweep of the entire network, so don't waste your time. Your machine will be scanned, and even penetration tested (for FREE) for you from all over the web! So if you are putting something on the internet, MAKE SURE that you have considered the security implications, and that you put the appropriate protections in place which ensure that only the eyeballs that you intend, see the content that you publish. Yes it will add time, resources and cost to the deployment of your site, but it will pay off big time in the future when malicious individuals don't just walk up to the unlocked, open front window of your building and steal all of your secrets just by looking inside or climbing through unopposed! Such measures could include:\n\nNot Putting ANY Sensitive Information on the Internet that Doesn't Need to be on the Internet\n\nProtecting ANY Sensitive Information that Needs to be Made Available with Authentication and Authorisation\n\nLimiting the IP Addresses that Can Connect to Your Machines Using BOTH Software and Hardware Firewalls\n\nUse Mutual TLS Authentication on Your Webserver so that Only Clients with Specific Certificates can Connect\n\nUsing a Web Application Firewall to Block Suspicious Requests (like the excellent, open-source ModSecurity)\n\nSecurity Via Explicit Action\n\nThe internet is a fantastic web of content that allows us to easily connect and share content with one another. Anything that we put on such a public tool should be properly considered and properly protected when necessary. It is a futile endeavour to try and hide such content, because whilst it may remain hidden from the masses, it will be sitting there waiting to be accessed by the very people YOU REALLY DON'T want accessing it. These individuals and organisations have the expanded toolset, acumen and sheer drive to sniff out your sensitive information, and the only way to keep them at bay is to use tried and tested security measures. After all the right controls have been put in place, we can all enjoy the this fantastic resource and the various ways it enriches our lives.\n\nTake care and all the best. Si.